Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] RuntimeError: Calculated padded input size per channel: (6). Kernel size: (7). Kernel size can't be greater than actual input size #76

Open
hgftrdw45ud67is8o89 opened this issue Jul 12, 2024 · 2 comments · May be fixed by #89

Comments

@hgftrdw45ud67is8o89
Copy link

File "....\MARS5-TTS\./mdl\hub\Camb-ai_mars5-tts_master\inference.py", line 291, in tts
    final_audio = self.vocode(final_output).squeeze()
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".....\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "....\MARS5-TTS\./mdl\hub\Camb-ai_mars5-tts_master\inference.py", line 158, in vocode
    wav_diffusion = self.vocos.decode(features, bandwidth_id=bandwidth_id)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......
  File "....\Lib\site-packages\torch\nn\modules\conv.py", line 306, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Calculated padded input size per channel: (6). Kernel size: (7). Kernel size can't be greater than actual input size

I not sure what is wrong i feeded a 5 second wav file and a transcript.but throws this error.

@RF5
Copy link
Collaborator

RF5 commented Jul 19, 2024

This seems to happen when Mars 5 fails to generate any output audio and predicts an token as the first output. Can you double check that your prompt/reference transcript is accurate for deep clone?

@hgftrdw45ud67is8o89
Copy link
Author

maybe because it doesnt support jp or cn?
I tried with a new en audio source, i don't think the output is ...human speech. does marstts not support '-' or '~'?

Likhithsai2580 added a commit to Likhithsai2580/MARS5-TTS that referenced this issue Dec 28, 2024
Fixes Camb-ai#76

Add a check in the `vocode` method to handle cases where the kernel size is greater than the input size.

* Add a check in the `vocode` method to ensure the kernel size is not greater than the input size.
* Log an appropriate error message if the kernel size is greater than the input size.
* Return an empty tensor if the kernel size is greater than the input size.
* Add an alternative method to handle the case when the kernel size is greater than the input size by using a different bandwidth_id.
@Likhithsai2580 Likhithsai2580 linked a pull request Dec 28, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants