[BUG]: how to fine tune DeepSeek-R1-Distill-Qwen-7B with lora #6232

AI-HR · 2025-02-28T06:35:30Z

Is there an existing issue for this bug?

I have searched the existing issues

The bug has not been fixed in the latest main branch

I have checked the latest main branch

Do you feel comfortable sharing a concise (minimal) script that reproduces the error? :)

Yes, I will share a minimal reproducible script.

🐛 Describe the bug

[extension] Time taken to load fused_optim_cuda op: 0.057875633239746094 seconds
[rank2]: Traceback (most recent call last):
[rank2]: File "/data/lzj/ColossalAI/applications/ColossalChat/lora_finetune.py", line 455, in
[rank2]: train(args)
[rank2]: File "/data/lzj/ColossalAI/applications/ColossalChat/lora_finetune.py", line 252, in train
[rank2]: model, optimizer, _, dataloader, lr_scheduler = booster.boost(
[rank2]: File "/data/miniconda3/envs/colossal/lib/python3.10/site-packages/colossalai/booster/booster.py", line 154, in boost
[rank2]: model, optimizer, criterion, dataloader, lr_scheduler = self.plugin.configure(
[rank2]: File "/data/miniconda3/envs/colossal/lib/python3.10/site-packages/colossalai/booster/plugin/moe_hybrid_parallel_plugin.py", line 457, in configure
[rank2]: model = HybridParallelModule(
[rank2]: File "/data/miniconda3/envs/colossal/lib/python3.10/site-packages/colossalai/booster/plugin/hybrid_parallel_plugin.py", line 86, in init
[rank2]: module, self.shared_params = shardformer.optimize(module, policy=custom_policy)
[rank2]: File "/data/miniconda3/envs/colossal/lib/python3.10/site-packages/colossalai/shardformer/shard/shardformer.py", line 55, in optimize
[rank2]: shared_params = sharder.shard()
[rank2]: File "/data/miniconda3/envs/colossal/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 41, in shard
[rank2]: shared_params = self.policy.get_shared_params()
[rank2]: File "/data/miniconda3/envs/colossal/lib/python3.10/site-packages/colossalai/shardformer/policies/qwen2.py", line 482, in get_shared_params
[rank2]: id(qwen2_model.embed_tokens.weight) == id(self.model.lm_head.weight)
[rank2]: File "/data/miniconda3/envs/colossal/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1729, in getattr
[rank2]: raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
[rank2]: AttributeError: 'Qwen2ForCausalLM' object has no attribute 'embed_tokens'

Environment

No response

TongLi3701 · 2025-02-28T06:53:24Z

For distill model, you can use the sft script directly.

lora_finetune.py is used for fine-tuning R1 model.

AI-HR added the bug Something isn't working label Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: how to fine tune DeepSeek-R1-Distill-Qwen-7B with lora #6232

[BUG]: how to fine tune DeepSeek-R1-Distill-Qwen-7B with lora #6232

AI-HR commented Feb 28, 2025

TongLi3701 commented Feb 28, 2025

[BUG]: how to fine tune DeepSeek-R1-Distill-Qwen-7B with lora #6232

[BUG]: how to fine tune DeepSeek-R1-Distill-Qwen-7B with lora #6232

Comments

AI-HR commented Feb 28, 2025

Is there an existing issue for this bug?

The bug has not been fixed in the latest main branch

Do you feel comfortable sharing a concise (minimal) script that reproduces the error? :)

🐛 Describe the bug

Environment

TongLi3701 commented Feb 28, 2025