Fused CE loss integration #97

eric-haibin-lin · 2025-01-12T03:40:39Z

Integrate it with main stream models: https://github.com/apple/ml-cross-entropy so that model with large vocab size uses much less memory

hongpeng-guo · 2025-01-19T09:51:20Z

I can take a look of this one if it's still available.

As posted in an adjacent issue, Liger kernel supports this optimization. Might worth thinking about if we want to directly use that wheel.

…)` to load model (#133) ## Summary This PR enables to use Liger Kernel's `_apply_liger_kernel_to_instance` to init a fsdp worker model. ## Main Changes 1. Adding an option of using `liger_kernel.transformers.AutoLigerKernelForCausalLM` to load a model from pretained, instead of the default `transformers.AutoModelForCausalLM` 2. Added a test case using configuration file `tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh` ## Related Issue #96 ## TODO #97 optimize the memory usage when computing entropy & log_probs https://github.com/volcengine/verl/blob/6d96fda3d47f057caaa8f494ca7804181903e911/verl/workers/actor/dp_actor.py#L94-L106 --------- Signed-off-by: Hongpeng Guo <[email protected]>

…)` to load model (volcengine#133) ## Summary This PR enables to use Liger Kernel's `_apply_liger_kernel_to_instance` to init a fsdp worker model. ## Main Changes 1. Adding an option of using `liger_kernel.transformers.AutoLigerKernelForCausalLM` to load a model from pretained, instead of the default `transformers.AutoModelForCausalLM` 2. Added a test case using configuration file `tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh` ## Related Issue volcengine#96 ## TODO volcengine#97 optimize the memory usage when computing entropy & log_probs https://github.com/volcengine/verl/blob/6d96fda3d47f057caaa8f494ca7804181903e911/verl/workers/actor/dp_actor.py#L94-L106 --------- Signed-off-by: Hongpeng Guo <[email protected]>

eric-haibin-lin added the help wanted Extra attention is needed label Jan 12, 2025

hongpeng-guo mentioned this issue Jan 26, 2025

[Liger-kernel] Add an option to use _apply_liger_kernel_to_instance() to load model #133

Merged

PeterSH6 added the call for contribution label Feb 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fused CE loss integration #97

Fused CE loss integration #97

eric-haibin-lin commented Jan 12, 2025

hongpeng-guo commented Jan 19, 2025 •

edited

Loading

Fused CE loss integration #97

Fused CE loss integration #97

Comments

eric-haibin-lin commented Jan 12, 2025

hongpeng-guo commented Jan 19, 2025 • edited Loading

hongpeng-guo commented Jan 19, 2025 •

edited

Loading