Liger kernel integration #96

eric-haibin-lin · 2025-01-12T03:33:25Z

Integrate liger kernel for main stream models such as Qwen, llama.
Help wanted

PeterSH6 · 2025-01-12T03:51:29Z

A quick comment: I think we should be really careful about adding features that require patches and more package dependencies.

As their optimizations may not be applicable for all transformers models and would increase maintaining workload.

hongpeng-guo · 2025-01-19T09:59:22Z

cc @ByronHsu

deter3 · 2025-01-25T11:01:18Z

A quick comment: I think we should be really careful about adding features that require patches and more package dependencies.

As their optimizations may not be applicable for all transformers models and would increase maintaining workload.

There is big wave of using RL for LLMs after deepseek R1 . If there is no Liger kernel integration , veRL will be facing huge GPU memory requirements and training cost . You can either catch this wave or play safe .

PeterSH6 · 2025-01-25T11:47:52Z

A quick comment: I think we should be really careful about adding features that require patches and more package dependencies.

As their optimizations may not be applicable for all transformers models and would increase maintaining workload.

There is big wave of using RL for LLMs after deepseek R1 . If there is no Liger kernel integration , veRL will be facing huge GPU memory requirements and training cost . You can either catch this wave or play safe .

Cannot agree more. We'll support it soon.

…)` to load model (#133) ## Summary This PR enables to use Liger Kernel's `_apply_liger_kernel_to_instance` to init a fsdp worker model. ## Main Changes 1. Adding an option of using `liger_kernel.transformers.AutoLigerKernelForCausalLM` to load a model from pretained, instead of the default `transformers.AutoModelForCausalLM` 2. Added a test case using configuration file `tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh` ## Related Issue #96 ## TODO #97 optimize the memory usage when computing entropy & log_probs https://github.com/volcengine/verl/blob/6d96fda3d47f057caaa8f494ca7804181903e911/verl/workers/actor/dp_actor.py#L94-L106 --------- Signed-off-by: Hongpeng Guo <[email protected]>

…)` to load model (volcengine#133) ## Summary This PR enables to use Liger Kernel's `_apply_liger_kernel_to_instance` to init a fsdp worker model. ## Main Changes 1. Adding an option of using `liger_kernel.transformers.AutoLigerKernelForCausalLM` to load a model from pretained, instead of the default `transformers.AutoModelForCausalLM` 2. Added a test case using configuration file `tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh` ## Related Issue volcengine#96 ## TODO volcengine#97 optimize the memory usage when computing entropy & log_probs https://github.com/volcengine/verl/blob/6d96fda3d47f057caaa8f494ca7804181903e911/verl/workers/actor/dp_actor.py#L94-L106 --------- Signed-off-by: Hongpeng Guo <[email protected]>

vermouth1992 · 2025-02-09T09:02:41Z

Already implemented in PPOTrainer and SFTTrainer

eric-haibin-lin added the help wanted Extra attention is needed label Jan 12, 2025

hongpeng-guo mentioned this issue Jan 26, 2025

[Liger-kernel] Add an option to use _apply_liger_kernel_to_instance() to load model #133

Merged

vermouth1992 closed this as completed Feb 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Liger kernel integration #96

Liger kernel integration #96

eric-haibin-lin commented Jan 12, 2025

PeterSH6 commented Jan 12, 2025

hongpeng-guo commented Jan 19, 2025

deter3 commented Jan 25, 2025

PeterSH6 commented Jan 25, 2025

vermouth1992 commented Feb 9, 2025

Liger kernel integration #96

Liger kernel integration #96

Comments

eric-haibin-lin commented Jan 12, 2025

PeterSH6 commented Jan 12, 2025

hongpeng-guo commented Jan 19, 2025

deter3 commented Jan 25, 2025

PeterSH6 commented Jan 25, 2025

vermouth1992 commented Feb 9, 2025