diff --git a/README.md b/README.md index 5a55f70..c89a9df 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,19 @@ FlagAttention now offers two operators. When further customization is required, FlagAttention servers as an example. +## Changelog + +### v0.1 + +Add piecewise_attention & flash_attention. + +### v0.2 + +Optimization of operators. +1. applying mask only when needed. +2. use a separate kernel to compute the gradien of q to avoid atomic RMW to global memory. + + ## Requirements FlagAttention requires Pytorch and Triton. To use the new features of Triton, a nightly release is recommended. @@ -235,5 +248,5 @@ The performance of piecewise_attention has improved compared to that in v0.1. In ## More -For more about the open source system for large models from BAAI, please with [BAAI?FlagOpen](https://flagopen.baai.ac.cn/). +For more about the open source system for large models from BAAI, please with [BAAI/FlagOpen](https://flagopen.baai.ac.cn/). [](https://flagopen.baai.ac.cn/) diff --git a/README_cn.md b/README_cn.md index 9cc2f7d..3075b53 100644 --- a/README_cn.md +++ b/README_cn.md @@ -18,6 +18,18 @@ FlagAttention 目前提供了两个算子。 如果需要更多的定制,FlagAttention 中的算子实现也可以作为参考。 +## 更新日志 + +### v0.1 + +添加 piecewise_attention 和 flash_attention 算子。 + +### v0.2 + +优化算子性能。 +1. 仅在必要时使用 masking. +2. 使用一个单独的 kernel 来计算 q 的梯度,以避免对全局内存的 RMW 操作。 + ## 依赖 FlagAttention 依赖 Torch 和 Triton。 为了使用 Triton 的新功能,建议使用 nightly 版。