You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need to determine whether Thunder has real accuracy problems computing HF's Qwen 2 model.
The test added in #1406 might fail because the loss computed by the Thunder-generated function is slightly different from HF's implementation. Here's the snippet to reproduce the problem:
importtorchfromthunder.dynamoimportThunderCompilerfromtransformersimportQwen2Config, Qwen2ForCausalLMtorch.manual_seed(0)
# https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/blob/main/config.jsonconfiguration=Qwen2Config(
# Qwen2.5-7B-Instruct uses Grouped-Query Attention, while the default# config uses Multi-Head Attentionnum_attention_heads=28,
num_key_value_heads=4,
# Scaled down for testinghidden_size=56,
vocab_size=2,
max_position_embeddings=32,
)
configuration.num_hidden_layers=1withtorch.device("cuda"):
model=Qwen2ForCausalLM(configuration).to(torch.bfloat16)
# thunder.jit doesn't work with Qwen2, so we use torch.compile# https://github.com/Lightning-AI/lightning-thunder/issues/1405backend=ThunderCompiler()
compiled_model=torch.compile(model, backend=backend, fullgraph=True)
input_ids=torch.randint(0, configuration.vocab_size, (1, configuration.max_position_embeddings), device="cuda")
# input_ids = torch.ones_like(input_ids) * 0ref_output=model(input_ids=input_ids, labels=input_ids)
ref_loss=ref_output.losscompiled_output=compiled_model(input_ids=input_ids, labels=input_ids)
compiled_loss=compiled_output.losstorch.testing.assert_close(compiled_loss, ref_loss)
Thunder may return a different result because upcasting and downcasting to bf16 are different. However, we need to know that Thunder's is indeed more accurate by comparing the distance to the fp64 result and the tolerances in the test may need to be tweaked.
🐛 Bug
We need to determine whether Thunder has real accuracy problems computing HF's Qwen 2 model.
The test added in #1406 might fail because the loss computed by the Thunder-generated function is slightly different from HF's implementation. Here's the snippet to reproduce the problem:
Thunder may return a different result because upcasting and downcasting to bf16 are different. However, we need to know that Thunder's is indeed more accurate by comparing the distance to the fp64 result and the tolerances in the test may need to be tweaked.
cc @apaz-cli
The text was updated successfully, but these errors were encountered: