Skip to content

Commit

Permalink
update README and fix typos (#19)
Browse files Browse the repository at this point in the history
  • Loading branch information
iclementine authored Feb 5, 2024
1 parent 1641d0c commit c9d9589
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 1 deletion.
15 changes: 14 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,19 @@ FlagAttention now offers two operators.

When further customization is required, FlagAttention servers as an example.

## Changelog

### v0.1

Add piecewise_attention & flash_attention.

### v0.2

Optimization of operators.
1. applying mask only when needed.
2. use a separate kernel to compute the gradien of q to avoid atomic RMW to global memory.


## Requirements

FlagAttention requires Pytorch and Triton. To use the new features of Triton, a nightly release is recommended.
Expand Down Expand Up @@ -235,5 +248,5 @@ The performance of piecewise_attention has improved compared to that in v0.1. In

## More

For more about the open source system for large models from BAAI, please with [BAAI?FlagOpen](https://flagopen.baai.ac.cn/).
For more about the open source system for large models from BAAI, please with [BAAI/FlagOpen](https://flagopen.baai.ac.cn/).
[<img src="./assets/logo/baai-flagopen.jpeg">](https://flagopen.baai.ac.cn/)
12 changes: 12 additions & 0 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,18 @@ FlagAttention 目前提供了两个算子。

如果需要更多的定制,FlagAttention 中的算子实现也可以作为参考。

## 更新日志

### v0.1

添加 piecewise_attention 和 flash_attention 算子。

### v0.2

优化算子性能。
1. 仅在必要时使用 masking.
2. 使用一个单独的 kernel 来计算 q 的梯度,以避免对全局内存的 RMW 操作。

## 依赖

FlagAttention 依赖 Torch 和 Triton。 为了使用 Triton 的新功能,建议使用 nightly 版。
Expand Down

0 comments on commit c9d9589

Please sign in to comment.