Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Compiler/Runtime/External Libs] Add flash attention external library & pass through mechanism #99

Merged
merged 11 commits into from
Jan 25, 2024

Conversation

zhekunz2
Copy link
Collaborator

@zhekunz2 zhekunz2 commented Jan 22, 2024

This PR moves runtime flash attention kernels to external_libs/, supports flash attention kvcache, and uses it in a pass-through fashion by declaring Byre::CustomOp. Byre::CustomOp has three main components:

  1. StrAttr:$lib_path , which specifies the path of the library file.
  2. StrAttr:$api_name, which specifies the symbol name of the library for this custom op.
  3. ArrayAttr:$extra_args, which specifies the additional arguments that needs to be passed to the api call.

The following changes are made to support pass-through on flash attention op:

  • Remove runtime flash-attention kernels, update and move it to external_libs/ (adds flash attention kvcache support).
  • Add runtime Byre::CustomOp support. runtime/lib/backends/cuda/providers/default/custom/custom.cc
  • Add compiler HloToByreCustomPass conversion pass. compiler/lib/Conversion/HloToByreTensor/HloToByreCustom.cpp

@zhekunz2 zhekunz2 changed the title squash [Compiler/Runtime/External Libs] Add flash attention external library & pass through mechanism Jan 22, 2024
@liwenchangbdbz liwenchangbdbz added the enhancement New feature or request label Jan 22, 2024
@zhekunz2 zhekunz2 marked this pull request as ready for review January 24, 2024 02:10
@zhekunz2 zhekunz2 merged commit b60d7a6 into main Jan 25, 2024
10 checks passed
@zhekunz2 zhekunz2 deleted the zhekunz/flash_attn branch January 25, 2024 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants