Enable relative positional embedding in flash attention #7997

KumoLiu · 2024-08-06T15:35:09Z

From reading this thread:
pytorch/pytorch#96099 (comment)
It seems to me that the relative positional embedding can be integrated with scaled_dot_product_attention 's attn_mask argument. However, it can be slow as it's not taking the "fast path".

Do you think we can keep this option open for users who wants to use flash_attention and rel_pos_embedding?

Originally posted by @mingxin-zheng in #7977 (comment)

The text was updated successfully, but these errors were encountered:

vadimkantorov · 2024-08-31T09:28:44Z

I would think that Dao-AILab/flash-attention#617 needs to be completed for FAv2 support for arbitrary attention bias. And then depending on actual needed relative encoding formula, maybe Dao-AILab/flash-attention#956 could be pushed

Another way forward is trying PyTorch's flex_attention which can fuse modification of attention matrix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable relative positional embedding in flash attention #7997

Enable relative positional embedding in flash attention #7997

KumoLiu commented Aug 6, 2024

vadimkantorov commented Aug 31, 2024

Enable relative positional embedding in flash attention #7997

Enable relative positional embedding in flash attention #7997

Comments

KumoLiu commented Aug 6, 2024

vadimkantorov commented Aug 31, 2024