Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Sliding Window Attention #22

Open
tjtanaa opened this issue Nov 29, 2023 · 6 comments
Open

Feature request: Sliding Window Attention #22

tjtanaa opened this issue Nov 29, 2023 · 6 comments
Labels

Comments

@tjtanaa
Copy link

tjtanaa commented Nov 29, 2023

It would be wonderful if there is a support for this feature which is equivalent to flash attention v2.3.
This would also support Mistral-7B model as well, which is one of the best opensource 7B model architecture.

May I know is there a plan to bump flash attention v2.0.4 ROCm to v2.3?

@jayz0123
Copy link

Hi @tjtanaa, I am not sure when the sliding window feature will be implemented in this repo because that will depend on the feature to be implemented in the CK backend. There is a new CK for attention with better performance under development so this flash-attention for ROCm will then be refactored based on that one. But until then it will stays at v2.0.4. You may also want to check with the Flash-Attention in PyTorch for ROCm to see if their implementation are going to support that feature soon.

@jamestwhedbee
Copy link

Hey just wanted to check in if there were any updates on this?

@fe1ixxu
Copy link

fe1ixxu commented Feb 28, 2024

+1

@ehartford
Copy link

Please, this is impacting customers.

The current flash attention version does not support sliding window attention, for a more memory efficient implementation make sure to upgrade flash-attn library.

@linchen111
Copy link

+1 , Please~

@Bellk17
Copy link

Bellk17 commented Apr 21, 2024

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants