FlagOpen · iclementine · Feb 5, 2024 · Feb 5, 2024
diff --git a/README.md b/README.md
@@ -18,6 +18,19 @@ FlagAttention now offers two operators.
 
 When further customization is required, FlagAttention servers as an example.
 
+## Changelog
+
+### v0.1
+
+Add piecewise_attention & flash_attention.
+
+### v0.2
+
+Optimization of operators.
+1. applying mask only when needed.
+2. use a separate kernel to compute the gradien of q to avoid atomic RMW to global memory.
+
+
 ## Requirements
 
 FlagAttention requires Pytorch and Triton. To use the new features of Triton, a nightly release is recommended.
@@ -235,5 +248,5 @@ The performance of piecewise_attention has improved compared to that in v0.1. In
 
 ## More
 
-For more about the open source system for large models from BAAI, please with [BAAI?FlagOpen](https://flagopen.baai.ac.cn/).
+For more about the open source system for large models from BAAI, please with [BAAI/FlagOpen](https://flagopen.baai.ac.cn/).
 [<img src="./assets/logo/baai-flagopen.jpeg">](https://flagopen.baai.ac.cn/)
diff --git a/README_cn.md b/README_cn.md
@@ -18,6 +18,18 @@ FlagAttention 目前提供了两个算子。
 
 如果需要更多的定制，FlagAttention 中的算子实现也可以作为参考。
 
+## 更新日志
+
+### v0.1
+
+添加 piecewise_attention 和 flash_attention 算子。
+
+### v0.2
+
+优化算子性能。
+1. 仅在必要时使用 masking.
+2. 使用一个单独的 kernel 来计算 q 的梯度，以避免对全局内存的 RMW 操作。
+
 ## 依赖
 
 FlagAttention 依赖 Torch 和 Triton。 为了使用 Triton 的新功能，建议使用 nightly 版。