-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: install additional fms-acceleration plugins #350
Conversation
Signed-off-by: Anh Uong <[email protected]>
Signed-off-by: Anh Uong <[email protected]>
Thanks for making a pull request! 😃 |
Signed-off-by: Anh Uong <[email protected]>
@@ -671,6 +672,16 @@ Notes: | |||
- works only for *multi-gpu*. | |||
- currently only includes the version of *multipack* optimized for linear attention implementations like *flash-attn*. | |||
|
|||
Note: To pass the above flags via a JSON config, each of the flags expects the value to be a mixed type list, so the values must be a list. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in line 653: - attention_and_distributed_packing (experimental) we have mentioned it as experimental, but we are talking about releasing it to product with openshift 2.14, is it still experimental or ready for release @fabianlim @anhuong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I can mark these as ready in this PR as well and no longer experimental from earlier conversation with Fabian. Will wait on @fabianlim to review as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Padding free is already upstreamed to HF main. Instruct lab is using multipack, and this has been tested for up to about 500K samples in the dataset. Beyond that, I am not aware of the speed performance of multipack, as it runs through the lengths of each example before the start of every epoch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any issue to including these new plugins into product if the fused-op-and-kernels plugin uses Apache 2.0 license but (contains extracted code) from unsloth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anhuong yes that is a good point thanks for bringing this up.
- unsloth is Apache 2.0, but we were disturbed by those "comments" peppered in the code.
- we only extracted part of the unsloth code, and we did the extraction on a version that existed before those "comments" appeared (as far as we could tell)
- all extracted portions contained the relevant License Notice headers credited to the owners of unsloth
Beyond what we have done, I am not any more knowledgable to say what is permissible and what is not. This requires a person knowledgable in these things to run through.
The peft
plugin also contains a triton-only extraction of the ModelCloud fork of AutoGPTQ https://github.com/foundation-model-stack/fms-acceleration/tree/main/plugins/accelerated-peft#gptq-loras-autogptq---current-implementation-vs-legacy-implementation. The fork is released as Apache 2.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anhuong Code scan should pass with no issue as regarding the inclusion of the new plug-ins, and as noted by @fabianlim unsloth is apache 2.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. ILAB training uses multipack, so it to some extent quite ready, but see my comment
Signed-off-by: Anh Uong <[email protected]>
Signed-off-by: Anh Uong <[email protected]>
Signed-off-by: Anh Uong <[email protected]>
Description of the change
Users of the image will be able to automatically use padding free, multipack, and fast kernels via the fms-acceleration plugins.
Related issue number
NA
How to verify the PR
Tested the installation and running tuning with and without the flags. Just because they are installed does not mean they are enabled, the user must still pass the necessary flags/configs.
Was the PR tested