Release v0.2.0.post2 · flashinfer-ai/flashinfer

What's Changed

ci: fix the update_whl_index script to regonize version number with "post" and add torch2.5 by @yzh119 in #694
bugfix: casting int array to int32 for rope input arguments by @yzh119 in #697
bugfix: only use sm90 group gemm when torch cuda >= 12.3 by @yzh119 in #699
misc: remove release-please workflow by @yzh119 in #705
Customizable SM90 prefill kernels. by @hyhieu in #704
hotfix: revert torch.library register by @yzh119 in #709
Improve compatibility with pytorch 2.5 by @zifeitong in #711
misc: add bibtex reference by @yzh119 in #712
sampling: simplify min-p sampling by @yzh119 in #713
perf: fix the iteration bound of SWA in FA2 prefill template by @yzh119 in #714
bugfix: fix min-p AOT compilation in #713 by @yzh119 in #717
Triton implementation of silu_and_mul by @nandor in #716
bugfix: FusedAddRMSNorm kernels might require more than 48KB shared memory when d is large. by @bobboli in #718
bugfix: Choose sm90 kernels only for Hopper GPUs. by @bobboli in #719
Finer-grained control over fp16/fp8 builds by @nandor in #722
Align KV chunk size binary search with actual KV chunk splitting. by @timzsu in #728
ci: rename python package name to flashinfer-python by @yzh119 in #729
Add a note about int32/int64 datatypes to the kv_layout tutorial by @fergusfinn in #737
fix return type of cuBLAS by @zhyncs in #749
[Refactor] Unify JIT/Customization/AOT mode by @yzh119 in #748
Move allocations out of torch ops by @nandor in #740
[Lint] Fix some linting issues and provide automatic format check script by @LeiWang1999 in #743
Filter out unsupported head dim for sm90 by @abcdabcd987 in #751
bugfix: various AOT issues by @abcdabcd987 in #752
[bugfix] Fix cpp tests/benchmarks by @yzh119 in #753
fix pin memory device by @youkaichao in #755
Add dev container for easier development by @ByronHsu in #680
hotfix: bugfix to #756 by @yzh119 in #757
Change apply_rope_with_cos_sin_cache to accept cos_sin_cache by @ByronHsu in #754
fix: match statement not supported in Python 3.8 by @xslingcn in #759
bugfix: use actual sm count for num_sm90_ctas by @LLLLKKKK in #762
bugfix: Fix block-sparse attention API by @yzh119 in #767
Version bump: v0.2.0.post2 by @yzh119 in #768

New Contributors

@hyhieu made their first contribution in #704
@zifeitong made their first contribution in #711
@bobboli made their first contribution in #718
@timzsu made their first contribution in #728
@fergusfinn made their first contribution in #737
@LeiWang1999 made their first contribution in #743
@youkaichao made their first contribution in #755
@LLLLKKKK made their first contribution in #762

Full Changelog: v0.2.0.post1...v0.2.0.post2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0.post2

What's Changed

New Contributors

Contributors