Fix dtype mismatch in fused_linear_cross_entropy_forward #307

kostum123 · 2024-10-12T16:58:43Z

Fixes #305

Fix dtype mismatch in fused_linear_cross_entropy_forward function.

Cast logits_chunk to the data type of _input_chunk before performing operations on it.

I tested this in Colab after the change and it solved the problem.

{
"epoch": 1.0,
"eval_loss": 1.885668396949768,
"eval_runtime": 0.1708,
"eval_samples_per_second": 5.856,
"eval_steps_per_second": 5.856,
"total_flos": 1766475165597696.0,
"train_loss": 1.9928909236309575,
"train_runtime": 115.5799,
"train_samples_per_second": 0.441,
"train_steps_per_second": 0.441
}

For more details, open the Copilot Workspace session.

Fixes linkedin#305 Fix dtype mismatch in fused_linear_cross_entropy_forward function. * Cast `logits_chunk` to the data type of `_input_chunk` before performing operations on it. --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/linkedin/Liger-Kernel/issues/305?shareId=XXXX-XXXX-XXXX-XXXX).

yundai424

logit chunk has to stay at fp32 while computing forward & backward w.r.t. CE loss to ensure numerical stability and consistency with HF model code (https://github.com/huggingface/transformers/blob/v4.45.2/src/transformers/models/qwen2/modeling_qwen2.py#L1187). I think the issue here is that after CE computation is done and logit has been casted back to torch autocast dtype (line 102 in https://github.com/linkedin/Liger-Kernel/blob/v0.3.0/src/liger_kernel/ops/fused_linear_cross_entropy.py#L122) there is somehow mismatch between the dtype of _input.dtype and the inferred autocast dtype 🤔 we might need a different solution here. Will think about it

lancerts requested review from qingquansong and ByronHsu October 13, 2024 16:04

Merge branch 'main' into fix-dtype-mismatch

c267e59

yundai424 requested changes Oct 15, 2024

View reviewed changes

Merge branch 'linkedin:main' into fix-dtype-mismatch

e7b0fb5

yundai424 mentioned this pull request Oct 21, 2024

RuntimeError due to dtype mismatch in fused_linear_cross_entropy_forward #305

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dtype mismatch in fused_linear_cross_entropy_forward #307

Fix dtype mismatch in fused_linear_cross_entropy_forward #307

kostum123 commented Oct 12, 2024 •

edited

Loading

yundai424 left a comment •

edited

Loading

Fix dtype mismatch in fused_linear_cross_entropy_forward #307

Are you sure you want to change the base?

Fix dtype mismatch in fused_linear_cross_entropy_forward #307

Conversation

kostum123 commented Oct 12, 2024 • edited Loading

yundai424 left a comment • edited Loading

Choose a reason for hiding this comment

kostum123 commented Oct 12, 2024 •

edited

Loading

yundai424 left a comment •

edited

Loading