Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Getting Assertion error: How to use XLM for Unsupervised NMT of language pairs other than English-French, English-German and English-Romanian #339

Open
rashikumar01 opened this issue Jul 27, 2021 · 1 comment

Comments

@rashikumar01
Copy link

rashikumar01 commented Jul 27, 2021

How XLM can be pretrained on other monolingual languages dataset and then be used for Unsupervised NMT.
I have preprocessed the data and then run this command:

!python train.py --exp_name test_sahi_mlm --dump_path ./dumped/ --data_path ./data/processed/sa-hi/ --lgs 'sa-hi' --clm_steps '' --mlm_steps 'sa,hi' --emb_dim 1024 --n_layers 6 --n_heads 8 --dropout 0.1 --attention_dropout 0.1 --gelu_activation true --batch_size 32 --bptt 256 --optimizer adam,lr=0.0001 --epoch_size 200000 --validation_metrics _valid_mlm_ppl --stopping_criterion _valid_mlm_ppl,10 --fp16 true

I get the following error:
File "/content/drive/MyDrive/XLM/xlm/data/loader.py", line 26, in process_binarized
(data['sentences'].dtype == np.int32) and (1 << 16 <= len(dico) < 1 << 31))
AssertionError

@rashikumar01 rashikumar01 changed the title Assertion errorHow to use XLM for Unsupervised NMT of language pairs other than English-French, English-German and English-Romanian Getting Assertion error: How to use XLM for Unsupervised NMT of language pairs other than English-French, English-German and English-Romanian Jul 27, 2021
@saikoneru
Copy link

Can you preprocess the data again and try (delete your already processed data/ use a new folder). I think something is wrong during pre-processing

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants