Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OverflowError: out of range integral type conversion attempted #69

Open
1-sf opened this issue Dec 31, 2023 · 3 comments
Open

OverflowError: out of range integral type conversion attempted #69

1-sf opened this issue Dec 31, 2023 · 3 comments

Comments

@1-sf
Copy link

1-sf commented Dec 31, 2023

I get the following error:

You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding.
100% 1061/1061 [2:04:03<00:00, 3.45s/it]Traceback (most recent call last):
File "/content/mm-cot/main.py", line 395, in
T5Trainer(
File "/content/mm-cot/main.py", line 284, in T5Trainer
metrics = trainer.evaluate(eval_dataset = test_set, max_length=args.output_len)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer_seq2seq.py", line 159, in evaluate
return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3043, in evaluate
output = eval_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3343, in evaluation_loop
metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
File "/content/mm-cot/main.py", line 215, in compute_metrics_rougel
preds = tokenizer.batch_decode(preds, skip_special_tokens=True, clean_up_tokenization_spaces=True)
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3469, in batch_decode
return [
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3470, in
self.decode(
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3509, in decode
return self._decode(
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py", line 546, in _decode
text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
OverflowError: out of range integral type conversion attempted

when I run the inference for rationale generation

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py \
  --data_root data/ScienceQA/data \
  --caption_file data/instruct_captions.json \
  --model declare-lab/flan-alpaca-large \
  --user_msg rationale --img_type vit \
  --bs 2 --eval_bs 4  --epoch 50 --lr 5e-5 --output_len 512 \
  --use_caption --use_generate --prompt_format QCM-E \
  --output_dir experiments \
  --evaluate_dir models/mm-cot-large-rationale

This happens after those 1061 iterations are completed. As a consequence it doesn't generate experiments/rationale_declare-lab-flan-alpaca-large_vit_QCM-E_lr5e-05_bs8_op512_ep50/predictions_ans_eval.json which is expected by answer inference phase for inference

@SanghyeokSon
Copy link

I have the same problem.

I suspect it's an error caused by the tokenizer not being able to decode it, as preds contain a value of -100,
It is related to this issue (huggingface/transformers#22634)

@1-sf
Copy link
Author

1-sf commented Jan 14, 2024

I tried huggingface/transformers#24433 (comment) and it seems to have worked

@cooelf
Copy link
Contributor

cooelf commented May 19, 2024

This issue may be due to the update of the transformers library. The solution above seems to be effective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants