questions about inconsistent evaluation result #392

coorful · 2023-07-24T03:56:03Z

Hi，i have used deepspeed framework to train gpt-117M model.
when i evaluate model perfomance on wikitext-103, result by using tasks/eval_harness/evaluate.py vs. first convert checkpoint to megatron format and use tasks/main.py , there exists a large performance gap in PPL...
May I ask what is the reason for this phenomenon? @mayank31398

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

questions about inconsistent evaluation result #392

questions about inconsistent evaluation result #392

coorful commented Jul 24, 2023

questions about inconsistent evaluation result #392

questions about inconsistent evaluation result #392

Comments

coorful commented Jul 24, 2023