Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fused CE loss integration #97

Open
eric-haibin-lin opened this issue Jan 12, 2025 · 1 comment
Open

Fused CE loss integration #97

eric-haibin-lin opened this issue Jan 12, 2025 · 1 comment
Labels
call for contribution help wanted Extra attention is needed

Comments

@eric-haibin-lin
Copy link
Collaborator

Integrate it with main stream models: https://github.com/apple/ml-cross-entropy so that model with large vocab size uses much less memory

@eric-haibin-lin eric-haibin-lin added the help wanted Extra attention is needed label Jan 12, 2025
@hongpeng-guo
Copy link
Contributor

hongpeng-guo commented Jan 19, 2025

I can take a look of this one if it's still available.

As posted in an adjacent issue, Liger kernel supports this optimization. Might worth thinking about if we want to directly use that wheel.

vermouth1992 pushed a commit that referenced this issue Jan 30, 2025
…)` to load model (#133)

## Summary

This PR enables to use Liger Kernel's `_apply_liger_kernel_to_instance`
to init a fsdp worker model.

## Main Changes

1. Adding an option of using
`liger_kernel.transformers.AutoLigerKernelForCausalLM` to load a model
from pretained, instead of the default
`transformers.AutoModelForCausalLM`
2. Added a test case using configuration file
`tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh`

## Related Issue

#96 

## TODO

#97 optimize the memory usage when computing entropy & log_probs

https://github.com/volcengine/verl/blob/6d96fda3d47f057caaa8f494ca7804181903e911/verl/workers/actor/dp_actor.py#L94-L106

---------

Signed-off-by: Hongpeng Guo <[email protected]>
Chendong98 pushed a commit to Chendong98/verl that referenced this issue Feb 4, 2025
…)` to load model (volcengine#133)

## Summary

This PR enables to use Liger Kernel's `_apply_liger_kernel_to_instance`
to init a fsdp worker model.

## Main Changes

1. Adding an option of using
`liger_kernel.transformers.AutoLigerKernelForCausalLM` to load a model
from pretained, instead of the default
`transformers.AutoModelForCausalLM`
2. Added a test case using configuration file
`tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh`

## Related Issue

volcengine#96 

## TODO

volcengine#97 optimize the memory usage when computing entropy & log_probs

https://github.com/volcengine/verl/blob/6d96fda3d47f057caaa8f494ca7804181903e911/verl/workers/actor/dp_actor.py#L94-L106

---------

Signed-off-by: Hongpeng Guo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
call for contribution help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants