Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable relative positional embedding in flash attention #7997

Open
KumoLiu opened this issue Aug 6, 2024 · 1 comment
Open

Enable relative positional embedding in flash attention #7997

KumoLiu opened this issue Aug 6, 2024 · 1 comment

Comments

@KumoLiu
Copy link
Contributor

KumoLiu commented Aug 6, 2024

From reading this thread:
pytorch/pytorch#96099 (comment)
It seems to me that the relative positional embedding can be integrated with scaled_dot_product_attention 's attn_mask argument. However, it can be slow as it's not taking the "fast path".

Do you think we can keep this option open for users who wants to use flash_attention and rel_pos_embedding?

Originally posted by @mingxin-zheng in #7977 (comment)

@vadimkantorov
Copy link

I would think that Dao-AILab/flash-attention#617 needs to be completed for FAv2 support for arbitrary attention bias. And then depending on actual needed relative encoding formula, maybe Dao-AILab/flash-attention#956 could be pushed

Another way forward is trying PyTorch's flex_attention which can fuse modification of attention matrix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants