Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When enabling naive model parallelism using device_map, the liger-kernel does not work. #593

Open
Songjw133 opened this issue Mar 3, 2025 · 1 comment

Comments

@Songjw133
Copy link

🐛 Describe the bug

When the model is split across multiple GPUs using device_map="auto", the liger-kernel will return a ValueError.

Reproduce

os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'
from transformers import AutoModelForCausalLM, set_seed
from transformers.loss.loss_utils import ForCausalLMLoss
set_seed(0)

from liger_kernel.transformers import apply_liger_kernel_to_qwen2
import torch
apply_liger_kernel_to_qwen2(
    rope=True,
    swiglu=True,
    cross_entropy=False,
    fused_linear_cross_entropy=False,
    rms_norm=True
)
model = AutoModelForCausalLM.from_pretrained("./Qwen2.5-3B-Instruct",
                                             torch_dtype=torch.bfloat16,
                                             device_map="auto")
model.train()
inputs={
    'input_ids': torch.tensor([1]).unsqueeze(0),
    'attention_mask':torch.tensor([1]).unsqueeze(0),
    'labels': torch.tensor([2]).unsqueeze(0),
}
loss =model(**inputs).loss

output:

ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)

Versions

Python version: 3.11.11
Liger Kernel version: 0.5.4
PyTorch version: 2.5.1+cu124
CUDA version: 12.4
HIP(ROCm) version: Not available
Triton version: 3.1.0
Transformers version: 4.49.0
XPU version: XPU Not Available

@vulkomilev
Copy link

I will try to work on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants