Do you need to tokenize your data when using a BERT/ROBERTA model? #111

zolastro · 2022-04-25T10:07:20Z

Considering that these models have their own tokenization and BPE models, what is the format of the input files to train a QE model using any of this LM? Should you apply any kind of previous tokenization/casing model?

Thanks in advance for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do you need to tokenize your data when using a BERT/ROBERTA model? #111

Do you need to tokenize your data when using a BERT/ROBERTA model? #111

zolastro commented Apr 25, 2022

Do you need to tokenize your data when using a BERT/ROBERTA model? #111

Do you need to tokenize your data when using a BERT/ROBERTA model? #111

Comments

zolastro commented Apr 25, 2022