What's Changed
- ci: fix the update_whl_index script to regonize version number with "post" and add torch2.5 by @yzh119 in #694
- bugfix: casting int array to int32 for rope input arguments by @yzh119 in #697
- bugfix: only use sm90 group gemm when torch cuda >= 12.3 by @yzh119 in #699
- misc: remove release-please workflow by @yzh119 in #705
- Customizable SM90 prefill kernels. by @hyhieu in #704
- hotfix: revert torch.library register by @yzh119 in #709
- Improve compatibility with pytorch 2.5 by @zifeitong in #711
- misc: add bibtex reference by @yzh119 in #712
- sampling: simplify min-p sampling by @yzh119 in #713
- perf: fix the iteration bound of SWA in FA2 prefill template by @yzh119 in #714
- bugfix: fix min-p AOT compilation in #713 by @yzh119 in #717
- Triton implementation of
silu_and_mul
by @nandor in #716 - bugfix: FusedAddRMSNorm kernels might require more than 48KB shared memory when d is large. by @bobboli in #718
- bugfix: Choose sm90 kernels only for Hopper GPUs. by @bobboli in #719
- Finer-grained control over fp16/fp8 builds by @nandor in #722
- Align KV chunk size binary search with actual KV chunk splitting. by @timzsu in #728
- ci: rename python package name to
flashinfer-python
by @yzh119 in #729 - Add a note about int32/int64 datatypes to the
kv_layout
tutorial by @fergusfinn in #737 - fix return type of cuBLAS by @zhyncs in #749
- [Refactor] Unify JIT/Customization/AOT mode by @yzh119 in #748
- Move allocations out of torch ops by @nandor in #740
- [Lint] Fix some linting issues and provide automatic format check script by @LeiWang1999 in #743
- Filter out unsupported head dim for sm90 by @abcdabcd987 in #751
- bugfix: various AOT issues by @abcdabcd987 in #752
- [bugfix] Fix cpp tests/benchmarks by @yzh119 in #753
- fix pin memory device by @youkaichao in #755
- Add dev container for easier development by @ByronHsu in #680
- hotfix: bugfix to #756 by @yzh119 in #757
- Change
apply_rope_with_cos_sin_cache
to acceptcos_sin_cache
by @ByronHsu in #754 - fix: match statement not supported in Python 3.8 by @xslingcn in #759
- bugfix: use actual sm count for num_sm90_ctas by @LLLLKKKK in #762
- bugfix: Fix block-sparse attention API by @yzh119 in #767
- Version bump: v0.2.0.post2 by @yzh119 in #768
New Contributors
- @hyhieu made their first contribution in #704
- @zifeitong made their first contribution in #711
- @bobboli made their first contribution in #718
- @timzsu made their first contribution in #728
- @fergusfinn made their first contribution in #737
- @LeiWang1999 made their first contribution in #743
- @youkaichao made their first contribution in #755
- @LLLLKKKK made their first contribution in #762
Full Changelog: v0.2.0.post1...v0.2.0.post2