Skip to content

v0.2.0.post2

Latest
Compare
Choose a tag to compare
@yzh119 yzh119 released this 31 Jan 19:49
· 6 commits to main since this release
200e954

What's Changed

  • ci: fix the update_whl_index script to regonize version number with "post" and add torch2.5 by @yzh119 in #694
  • bugfix: casting int array to int32 for rope input arguments by @yzh119 in #697
  • bugfix: only use sm90 group gemm when torch cuda >= 12.3 by @yzh119 in #699
  • misc: remove release-please workflow by @yzh119 in #705
  • Customizable SM90 prefill kernels. by @hyhieu in #704
  • hotfix: revert torch.library register by @yzh119 in #709
  • Improve compatibility with pytorch 2.5 by @zifeitong in #711
  • misc: add bibtex reference by @yzh119 in #712
  • sampling: simplify min-p sampling by @yzh119 in #713
  • perf: fix the iteration bound of SWA in FA2 prefill template by @yzh119 in #714
  • bugfix: fix min-p AOT compilation in #713 by @yzh119 in #717
  • Triton implementation of silu_and_mul by @nandor in #716
  • bugfix: FusedAddRMSNorm kernels might require more than 48KB shared memory when d is large. by @bobboli in #718
  • bugfix: Choose sm90 kernels only for Hopper GPUs. by @bobboli in #719
  • Finer-grained control over fp16/fp8 builds by @nandor in #722
  • Align KV chunk size binary search with actual KV chunk splitting. by @timzsu in #728
  • ci: rename python package name to flashinfer-python by @yzh119 in #729
  • Add a note about int32/int64 datatypes to the kv_layout tutorial by @fergusfinn in #737
  • fix return type of cuBLAS by @zhyncs in #749
  • [Refactor] Unify JIT/Customization/AOT mode by @yzh119 in #748
  • Move allocations out of torch ops by @nandor in #740
  • [Lint] Fix some linting issues and provide automatic format check script by @LeiWang1999 in #743
  • Filter out unsupported head dim for sm90 by @abcdabcd987 in #751
  • bugfix: various AOT issues by @abcdabcd987 in #752
  • [bugfix] Fix cpp tests/benchmarks by @yzh119 in #753
  • fix pin memory device by @youkaichao in #755
  • Add dev container for easier development by @ByronHsu in #680
  • hotfix: bugfix to #756 by @yzh119 in #757
  • Change apply_rope_with_cos_sin_cache to accept cos_sin_cache by @ByronHsu in #754
  • fix: match statement not supported in Python 3.8 by @xslingcn in #759
  • bugfix: use actual sm count for num_sm90_ctas by @LLLLKKKK in #762
  • bugfix: Fix block-sparse attention API by @yzh119 in #767
  • Version bump: v0.2.0.post2 by @yzh119 in #768

New Contributors

Full Changelog: v0.2.0.post1...v0.2.0.post2