-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Liger kernel integration #96
Comments
A quick comment: I think we should be really careful about adding features that require patches and more package dependencies. As their optimizations may not be applicable for all transformers models and would increase maintaining workload. |
cc @ByronHsu |
There is big wave of using RL for LLMs after deepseek R1 . If there is no Liger kernel integration , veRL will be facing huge GPU memory requirements and training cost . You can either catch this wave or play safe . |
Cannot agree more. We'll support it soon. |
…)` to load model (#133) ## Summary This PR enables to use Liger Kernel's `_apply_liger_kernel_to_instance` to init a fsdp worker model. ## Main Changes 1. Adding an option of using `liger_kernel.transformers.AutoLigerKernelForCausalLM` to load a model from pretained, instead of the default `transformers.AutoModelForCausalLM` 2. Added a test case using configuration file `tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh` ## Related Issue #96 ## TODO #97 optimize the memory usage when computing entropy & log_probs https://github.com/volcengine/verl/blob/6d96fda3d47f057caaa8f494ca7804181903e911/verl/workers/actor/dp_actor.py#L94-L106 --------- Signed-off-by: Hongpeng Guo <[email protected]>
…)` to load model (volcengine#133) ## Summary This PR enables to use Liger Kernel's `_apply_liger_kernel_to_instance` to init a fsdp worker model. ## Main Changes 1. Adding an option of using `liger_kernel.transformers.AutoLigerKernelForCausalLM` to load a model from pretained, instead of the default `transformers.AutoModelForCausalLM` 2. Added a test case using configuration file `tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh` ## Related Issue volcengine#96 ## TODO volcengine#97 optimize the memory usage when computing entropy & log_probs https://github.com/volcengine/verl/blob/6d96fda3d47f057caaa8f494ca7804181903e911/verl/workers/actor/dp_actor.py#L94-L106 --------- Signed-off-by: Hongpeng Guo <[email protected]>
Already implemented in PPOTrainer and SFTTrainer |
Integrate liger kernel for main stream models such as Qwen, llama.
Help wanted
The text was updated successfully, but these errors were encountered: