Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update README and fix typos #19

Merged
merged 1 commit into from
Feb 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,19 @@ FlagAttention now offers two operators.

When further customization is required, FlagAttention servers as an example.

## Changelog

### v0.1

Add piecewise_attention & flash_attention.

### v0.2

Optimization of operators.
1. applying mask only when needed.
2. use a separate kernel to compute the gradien of q to avoid atomic RMW to global memory.


## Requirements

FlagAttention requires Pytorch and Triton. To use the new features of Triton, a nightly release is recommended.
Expand Down Expand Up @@ -235,5 +248,5 @@ The performance of piecewise_attention has improved compared to that in v0.1. In

## More

For more about the open source system for large models from BAAI, please with [BAAI?FlagOpen](https://flagopen.baai.ac.cn/).
For more about the open source system for large models from BAAI, please with [BAAI/FlagOpen](https://flagopen.baai.ac.cn/).
[<img src="./assets/logo/baai-flagopen.jpeg">](https://flagopen.baai.ac.cn/)
12 changes: 12 additions & 0 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,18 @@ FlagAttention 目前提供了两个算子。

如果需要更多的定制,FlagAttention 中的算子实现也可以作为参考。

## 更新日志

### v0.1

添加 piecewise_attention 和 flash_attention 算子。

### v0.2

优化算子性能。
1. 仅在必要时使用 masking.
2. 使用一个单独的 kernel 来计算 q 的梯度,以避免对全局内存的 RMW 操作。

## 依赖

FlagAttention 依赖 Torch 和 Triton。 为了使用 Triton 的新功能,建议使用 nightly 版。
Expand Down