Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liger kernel integration #96

Closed
eric-haibin-lin opened this issue Jan 12, 2025 · 5 comments
Closed

Liger kernel integration #96

eric-haibin-lin opened this issue Jan 12, 2025 · 5 comments
Labels
help wanted Extra attention is needed

Comments

@eric-haibin-lin
Copy link
Collaborator

Integrate liger kernel for main stream models such as Qwen, llama.
Help wanted

@eric-haibin-lin eric-haibin-lin added the help wanted Extra attention is needed label Jan 12, 2025
@PeterSH6
Copy link
Collaborator

A quick comment: I think we should be really careful about adding features that require patches and more package dependencies.

As their optimizations may not be applicable for all transformers models and would increase maintaining workload.

@hongpeng-guo
Copy link
Contributor

cc @ByronHsu

@deter3
Copy link

deter3 commented Jan 25, 2025

A quick comment: I think we should be really careful about adding features that require patches and more package dependencies.

As their optimizations may not be applicable for all transformers models and would increase maintaining workload.

There is big wave of using RL for LLMs after deepseek R1 . If there is no Liger kernel integration , veRL will be facing huge GPU memory requirements and training cost . You can either catch this wave or play safe .

@PeterSH6
Copy link
Collaborator

A quick comment: I think we should be really careful about adding features that require patches and more package dependencies.

As their optimizations may not be applicable for all transformers models and would increase maintaining workload.

There is big wave of using RL for LLMs after deepseek R1 . If there is no Liger kernel integration , veRL will be facing huge GPU memory requirements and training cost . You can either catch this wave or play safe .

Cannot agree more. We'll support it soon.

vermouth1992 pushed a commit that referenced this issue Jan 30, 2025
…)` to load model (#133)

## Summary

This PR enables to use Liger Kernel's `_apply_liger_kernel_to_instance`
to init a fsdp worker model.

## Main Changes

1. Adding an option of using
`liger_kernel.transformers.AutoLigerKernelForCausalLM` to load a model
from pretained, instead of the default
`transformers.AutoModelForCausalLM`
2. Added a test case using configuration file
`tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh`

## Related Issue

#96 

## TODO

#97 optimize the memory usage when computing entropy & log_probs

https://github.com/volcengine/verl/blob/6d96fda3d47f057caaa8f494ca7804181903e911/verl/workers/actor/dp_actor.py#L94-L106

---------

Signed-off-by: Hongpeng Guo <[email protected]>
Chendong98 pushed a commit to Chendong98/verl that referenced this issue Feb 4, 2025
…)` to load model (volcengine#133)

## Summary

This PR enables to use Liger Kernel's `_apply_liger_kernel_to_instance`
to init a fsdp worker model.

## Main Changes

1. Adding an option of using
`liger_kernel.transformers.AutoLigerKernelForCausalLM` to load a model
from pretained, instead of the default
`transformers.AutoModelForCausalLM`
2. Added a test case using configuration file
`tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh`

## Related Issue

volcengine#96 

## TODO

volcengine#97 optimize the memory usage when computing entropy & log_probs

https://github.com/volcengine/verl/blob/6d96fda3d47f057caaa8f494ca7804181903e911/verl/workers/actor/dp_actor.py#L94-L106

---------

Signed-off-by: Hongpeng Guo <[email protected]>
@vermouth1992
Copy link
Collaborator

Already implemented in PPOTrainer and SFTTrainer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants