-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
time dimension doesn't match #11
Comments
Can you give some details about how you installed mfa? Since mfa installed from |
I am facing the same problem here |
Thanks @MingjieChen and @mmgn123 for your report. The MFA part should definitely be updated as MFA has been updated recently, but even with that, you should have no problem if you got the TextGrid from your conda version of MFA and then preprocessed the dataset with it. So is this the process that you took? If so, I think the issue mentioned should not occur. If not, please let me know what your process was. |
exactly, I have installed MFA then I have got TextGrid files. |
I'm also still facing the same tensor mismatching problem. Thanks in advance for your help |
I am actually using mls_german dataset, which can be found here: http://www.openslr.org/94/. |
Thanks for the info. Can you print out the shape of duration, pitch, energy, and mel just before this line during running |
That's correct, the lengths of duration, pitch and energy are same! 122 in my case |
and the summation of duration is equal to the length of the mel? |
exactly, the sum of duration is 987 and the shape of the mel is (80,987) |
also, does the length of phone in here have the same length of duration, pitch and energy? Then, I think there was no issue on MFA for preprocessing. |
yes, the lengths are the same! |
thanks for checking! Ok, then we can confirm that the data is processed correctly. Now, we can think of these:
|
1- here the duration pitch and energy they do have the same length except the quary_duration which have a different length |
1- and phone in here also has the same length of pitch(,duration and energy)? |
phone here is having a different length |
gotcha, I should mention this first, you have to modify |
I checked the output of the text_to_sequence function and I found that there are some parts of the sentence were not converted to phonemes correctly like in this example: wie schon die während der letzten krise mehrfach vorgekommenen versuche bedrängter italischer parteichefs daselbst sich festzusetzen hinreichend bewiesen can this be a reason for this tensor mismatching problem? |
exactly. The missing phonemes must also be missed here, which is the part you must modify along with your languages. Again, you need to make sure that the output of |
That's it! I didn't change the valid_symbols set! |
great! @MingjieChen @Yaccoub , I hope this can help you too. |
FYI, I updated MFA description in README.md |
Hello, I am using LibriTTS, but I am not sure whether it is also because of the missing phoneme problem. I will take a look. |
^MTraining: 0%| | 0/200000 [00:00<?, ?it/s]
^MEpoch 1: 0%| | 0/454 [00:00<?, ?it/s]^[[APrepare training ...
Number of StyleSpeech Parameters: 28197333
Removing weight norm...
Traceback (most recent call last):
File "train.py", line 224, in
main(args, configs)
File "train.py", line 98, in main
output = (None, None, model((batch[2:-5])))
File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 165, in forward
return self.module(*inputs[0], **kwargs[0])
File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/StyleSpeech.py", line 144, in forward
d_control,
File "/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/StyleSpeech.py", line 88, in G
d_control,
File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/modules.py", line 417, in forward
x = x + pitch_embedding
RuntimeError: The size of tensor a (132) must match the size of tensor b (130) at non-singleton dimension 1
^MTraining: 0%| | 1/200000 [00:02<166:02:12, 2.99s/it]
I think it might because of mfa I used.
As mentioned in https://montreal-forced-aligner.readthedocs.io/en/latest/getting_started.html, I installed mfa through conda.
Then I used
mfa align raw_data/LibriTTS lexicon/librispeech-lexicon.txt english preprocessed_data/LibriTTS
instead of the way you showed.
But I can't find a way to run it as the way you showed, because I installed mfa through conda.
The text was updated successfully, but these errors were encountered: