You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,i have used deepspeed framework to train gpt-117M model.
when i evaluate model perfomance on wikitext-103, result by using tasks/eval_harness/evaluate.py vs. first convert checkpoint to megatron format and use tasks/main.py , there exists a large performance gap in PPL...
May I ask what is the reason for this phenomenon? @mayank31398
The text was updated successfully, but these errors were encountered:
Hi,i have used deepspeed framework to train gpt-117M model.
when i evaluate model perfomance on wikitext-103, result by using tasks/eval_harness/evaluate.py vs. first convert checkpoint to megatron format and use tasks/main.py , there exists a large performance gap in PPL...
May I ask what is the reason for this phenomenon? @mayank31398
The text was updated successfully, but these errors were encountered: