Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Clarification regarding emb_dim parameter value used in the paper #328

Open
asolano opened this issue Feb 12, 2021 · 1 comment
Open

Clarification regarding emb_dim parameter value used in the paper #328

asolano opened this issue Feb 12, 2021 · 1 comment

Comments

@asolano
Copy link

asolano commented Feb 12, 2021

Greetings,

Would it be possible to get a confirmation on the emb_dim parameter value used for training the BERT model on the original XLM paper? I am trying to measure its effect on accuracy, GPU memory and training time, but the 2048 value suggested on the README always fails to improve after a few epochs (512 and 1024 have no issue increasing).

For reference, in the paper, section 5.1 Training details, it says "we use a Transformer architecture with 1024 hidden units", but both the README and issue #112 suggest using 2048.

Thanks,

Alfredo

@snowood1
Copy link

Same question here. Confused.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants