Clarification regarding emb_dim parameter value used in the paper #328

asolano · 2021-02-12T05:52:01Z

Greetings,

Would it be possible to get a confirmation on the emb_dim parameter value used for training the BERT model on the original XLM paper? I am trying to measure its effect on accuracy, GPU memory and training time, but the 2048 value suggested on the README always fails to improve after a few epochs (512 and 1024 have no issue increasing).

For reference, in the paper, section 5.1 Training details, it says "we use a Transformer architecture with 1024 hidden units", but both the README and issue #112 suggest using 2048.

Thanks,

Alfredo

The text was updated successfully, but these errors were encountered:

snowood1 · 2021-05-11T22:19:15Z

Same question here. Confused.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification regarding emb_dim parameter value used in the paper #328

Clarification regarding emb_dim parameter value used in the paper #328

asolano commented Feb 12, 2021

snowood1 commented May 11, 2021

Clarification regarding emb_dim parameter value used in the paper #328

Clarification regarding emb_dim parameter value used in the paper #328

Comments

asolano commented Feb 12, 2021

snowood1 commented May 11, 2021