Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Expected 3-dimensional input for 3-dimensional weight [512, 1, 10], but got 4-dimensional input of size [1, 1, 72, 1011] instead #2

Open
laleye opened this issue Aug 28, 2022 · 4 comments

Comments

@laleye
Copy link

laleye commented Aug 28, 2022

I'm trying to reuse your interesting code for speech translation on my own data.
I get the following size error with lna_ed configuration:

Traceback (most recent call last):                                                                    
  File "lib/python3.8/site-packages/fairseq-1.0.0a0+88dba0a-py3.8-linux-x86_64.egg/fairseq_cli/hydra_train.py", line 45, in hydra_main
    distributed_utils.call_main(cfg, pre_main)
  File "lib/python3.8/site-packages/fairseq-1.0.0a0+88dba0a-py3.8-linux-x86_64.egg/fairseq/distributed/utils.py", line 369, in call_main
    main(cfg, **kwargs)
  File "lib/python3.8/site-packages/fairseq-1.0.0a0+88dba0a-py3.8-linux-x86_64.egg/fairseq_cli/train.py", line 169, in main
    valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "lib/python3.8/site-packages/fairseq-1.0.0a0+88dba0a-py3.8-linux-x86_64.egg/fairseq_cli/train.py", line 279, in train
    log_output = trainer.train_step(samples)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "lib/python3.8/site-packages/fairseq-1.0.0a0+88dba0a-py3.8-linux-x86_64.egg/fairseq/trainer.py", line 694, in train_step
    raise e
  File "lib/python3.8/site-packages/fairseq-1.0.0a0+88dba0a-py3.8-linux-x86_64.egg/fairseq/trainer.py", line 662, in train_step
    loss, sample_size_i, logging_output = self.task.train_step(
  File "lib/python3.8/site-packages/fairseq-1.0.0a0+88dba0a-py3.8-linux-x86_64.egg/fairseq/tasks/fairseq_task.py", line 475, in train_step
    loss, sample_size, logging_output = criterion(model, sample)
  File "lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "lib/python3.8/site-packages/fairseq-1.0.0a0+88dba0a-py3.8-linux-x86_64.egg/fairseq/criterions/label_smoothed_cross_entropy.py", line 79, in forward
    net_output = model(**sample["net_input"])
  File "lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/frejus/Projects/tafsiri-st/iwslt-2021/fairseq_modules/models/wav2vec_s2t.py", line 150, in forward
    encoder_out = self.encoder(
  File "lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/frejus/Projects/tafsiri-st/iwslt-2021/fairseq_modules/models/wav2vec_s2t.py", line 218, in forward
    encoder_out = super().forward(
  File "lib/python3.8/site-packages/fairseq-1.0.0a0+88dba0a-py3.8-linux-x86_64.egg/fairseq/models/wav2vec/wav2vec2_asr.py", line 372, in forward
    x, padding_mask = self.w2v_model.extract_features(**w2v_args)
  File "lib/python3.8/site-packages/fairseq-1.0.0a0+88dba0a-py3.8-linux-x86_64.egg/fairseq/models/wav2vec/wav2vec2.py", line 631, in extract_features
    res = self.forward(source, padding_mask, mask=mask, features_only=True)
  File "lib/python3.8/site-packages/fairseq-1.0.0a0+88dba0a-py3.8-linux-x86_64.egg/fairseq/models/wav2vec/wav2vec2.py", line 486, in forward
    features = self.feature_extractor(source)
  File "lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "lib/python3.8/site-packages/fairseq-1.0.0a0+88dba0a-py3.8-linux-x86_64.egg/fairseq/models/wav2vec/wav2vec2.py", line 741, in forward
    x = conv(x)
  File "lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "lib/python3.8/site-packages/torch/nn/modules/conv.py", line 301, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "lib/python3.8/site-packages/torch/nn/modules/conv.py", line 297, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Expected 3-dimensional input for 3-dimensional weight [512, 1, 10], but got 4-dimensional input of size [1, 1, 72, 1011] instead

Do you know what I'm doing wrong?

@johntsi
Copy link
Collaborator

johntsi commented Aug 31, 2022

Hi, maybe your data are not in the correct format?

The input to the model has to be single-channel and sampled at 16kHz. You can convert them with the following command:

ls ${path_to_wavs}/*.* | parallel -j 4 ffmpeg -i {} -ac 1 -ar 16000 -hide_banner -loglevel error {.}.wav

@laleye
Copy link
Author

laleye commented Sep 1, 2022

Thank for your reply.
All data was already in this format, however I still converted again but it remained without success.
I always have the same error.

@johntsi
Copy link
Collaborator

johntsi commented Sep 2, 2022

Could you maybe try with a standard dataset like MuST-C, to see whether the problem is in the data?

@laleye
Copy link
Author

laleye commented Sep 3, 2022

@johntsi I will try it and let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants