time dimension doesn't match #11

MingjieChen · 2022-01-24T13:11:39Z

^MTraining: 0%| | 0/200000 [00:00<?, ?it/s]
^MEpoch 1: 0%| | 0/454 [00:00<?, ?it/s]^[[APrepare training ...
Number of StyleSpeech Parameters: 28197333
Removing weight norm...
Traceback (most recent call last):
File "train.py", line 224, in
main(args, configs)
File "train.py", line 98, in main
output = (None, None, model((batch[2:-5])))
File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 165, in forward
return self.module(*inputs[0], **kwargs[0])
File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/StyleSpeech.py", line 144, in forward
d_control,
File "/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/StyleSpeech.py", line 88, in G
d_control,
File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/modules.py", line 417, in forward
x = x + pitch_embedding
RuntimeError: The size of tensor a (132) must match the size of tensor b (130) at non-singleton dimension 1
^MTraining: 0%| | 1/200000 [00:02<166:02:12, 2.99s/it]

I think it might because of mfa I used.
As mentioned in https://montreal-forced-aligner.readthedocs.io/en/latest/getting_started.html, I installed mfa through conda.

Then I used
mfa align raw_data/LibriTTS lexicon/librispeech-lexicon.txt english preprocessed_data/LibriTTS
instead of the way you showed.
But I can't find a way to run it as the way you showed, because I installed mfa through conda.

The text was updated successfully, but these errors were encountered:

MingjieChen · 2022-01-24T13:41:41Z

Can you give some details about how you installed mfa? Since mfa installed from conda install mfa -c conda-forge doesn't support mfa_aligh commandline

mmgn123 · 2022-02-09T07:00:57Z

I am facing the same problem here
result = self.forward(*input, **kwargs)
File "~/StyleSpeech/model/modules.py", line 420, in forward
x = x + pitch_embedding
RuntimeError: The size of tensor a (124) must match the size of tensor b (184) at non-singleton dimension 1
Any solution?

keonlee9420 · 2022-02-09T09:20:50Z

Thanks @MingjieChen and @mmgn123 for your report. The MFA part should definitely be updated as MFA has been updated recently, but even with that, you should have no problem if you got the TextGrid from your conda version of MFA and then preprocessed the dataset with it.

So is this the process that you took?
conda version MFA installation -> align with dataset (and get TextGrid files) -> preprocess with the TextGrid -> running train.py

If so, I think the issue mentioned should not occur. If not, please let me know what your process was.

mmgn123 · 2022-02-09T09:35:50Z

exactly, I have installed MFA then I have got TextGrid files.
After running preprocess with the TextGrid, I have got energy, pitch, mel, duration folders as well as train.txt and val.txt files (but no tarin_filtered.txt file).
But after that when running train.py, I get this error.
When I tried to print the shapes of x and pitch_embedding I got respectively [16,125,256] and [16,198,256]

Yaccoub · 2022-02-09T09:40:55Z

I'm also still facing the same tensor mismatching problem. Thanks in advance for your help

keonlee9420 · 2022-02-09T12:17:40Z

@mmgn123 @Yaccoub what dataset are you using?

mmgn123 · 2022-02-09T12:38:55Z

I am actually using mls_german dataset, which can be found here: http://www.openslr.org/94/.
I brought it to the same format as LibriTTS in train-clean-100 then I run the prepare_align.py, it worked and I got the raw_data.
After that I run "mfa train raw_data/mls/ german-lexicon.txt german_acoustic_model.zip" to get the german_acoustic_model.zip
then "mfa align raw_data/mls/ german-lexicon.txt german_acoustic_model.zip preprocessed_data/mls" which lead to the TextGrid files.

keonlee9420 · 2022-02-09T12:59:22Z

Thanks for the info. Can you print out the shape of duration, pitch, energy, and mel just before this line during running preprocessor.py? If you set "phoneme_level" for both pitch and energy in preprocess.yaml, the length of duration, pitch, energy should be the same.

mmgn123 · 2022-02-09T13:15:59Z

That's correct, the lengths of duration, pitch and energy are same! 122 in my case

keonlee9420 · 2022-02-09T13:22:14Z

and the summation of duration is equal to the length of the mel?

mmgn123 · 2022-02-09T13:33:21Z

exactly, the sum of duration is 987 and the shape of the mel is (80,987)

keonlee9420 · 2022-02-09T13:38:44Z

also, does the length of phone in here have the same length of duration, pitch and energy? Then, I think there was no issue on MFA for preprocessing.

mmgn123 · 2022-02-09T13:45:49Z

yes, the lengths are the same!

keonlee9420 · 2022-02-09T14:02:29Z

thanks for checking! Ok, then we can confirm that the data is processed correctly. Now, we can think of these:

During data loading, can you check that every element of the input is from the same filename (such as in here) and has the same length to each other? I think the length mismatch such as 124 and 184 as in your log can be from the mismatch of the source file of them. But as in @MingjieChen 's case where the two tensors have 130 and 132 each, and I think this discrepancy can be from the version mismatch for the pitch extractor (padding rule might be different and hence output in a slightly different length). In the latter case, upgrading/downgrading the module could resolve the issue.
Did you change some parts of the model architecture? I think the line that raised the issue was 420 in your log, but actually, that's in line 422 as here. So I guess there are some modifications on the model side code.
other than that, I cannot think of any other reason without seeing your code, sorry;(

mmgn123 · 2022-02-09T14:23:02Z

1- here the duration pitch and energy they do have the same length except the quary_duration which have a different length
2- these are two print lines
Do you mean upgrading/downgrading MFA?

keonlee9420 · 2022-02-09T14:32:50Z

1- and phone in here also has the same length of pitch(,duration and energy)?
2- I see. So no modification at all.
MFA version might be an issue but if it's already matched, then you can ignore it.

mmgn123 · 2022-02-09T14:42:13Z

phone here is having a different length

keonlee9420 · 2022-02-09T14:47:14Z

gotcha, I should mention this first, you have to modify /text as in your case where the target language is not English. In the current code, the output of text_to_sequence function is different from the MFA output based on 'raw_data/mls/ german-lexicon.txt'. To resolve this, you have to match the output of both functions. This is also important at inference time, where we will use the same function in /text.

mmgn123 · 2022-02-09T15:15:56Z

I checked the output of the text_to_sequence function and I found that there are some parts of the sentence were not converted to phonemes correctly like in this example:

wie schon die während der letzten krise mehrfach vorgekommenen versuche bedrängter italischer parteichefs daselbst sich festzusetzen hinreichend bewiesen
{V IIH SH OOH N D IIH V EHH RR AX N T D EH EX L EH TS T AX N K RR IIH Z AX M EEH EX F AH X spn F EH EX Z UUH X AX spn spn spn D AAH Z EH L P S T Z IH CC spn HH IH N RR AY CC AX N T B AX V IIH Z AX N}
[143, 132, 119, 90, 143, 119, 133, 90, 92, 117, 92, 133, 119, 116, 146, 118, 104, 72, 358, 104, 92, 146, 358, 358, 358, 90, 146, 92, 117, 129, 131, 133, 146, 107, 358, 106, 107, 119, 84, 119, 133, 88, 143, 146, 119].

can this be a reason for this tensor mismatching problem?
Thank you very much for your help!

keonlee9420 · 2022-02-09T15:21:41Z

exactly. The missing phonemes must also be missed here, which is the part you must modify along with your languages. Again, you need to make sure that the output of text_to_sequence function should always be matched with the TextGrid's phoneme sequence (MFA lexicons).

mmgn123 · 2022-02-09T15:55:39Z

That's it! I didn't change the valid_symbols set!
Thank you very much for your timely reply help and support!

keonlee9420 · 2022-02-09T16:06:26Z

great! @MingjieChen @Yaccoub , I hope this can help you too.

keonlee9420 · 2022-02-10T02:02:02Z

FYI, I updated MFA description in README.md

MingjieChen · 2022-02-10T09:09:00Z

Hello, I am using LibriTTS, but I am not sure whether it is also because of the missing phoneme problem. I will take a look.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

time dimension doesn't match #11

time dimension doesn't match #11

MingjieChen commented Jan 24, 2022

MingjieChen commented Jan 24, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

Yaccoub commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

keonlee9420 commented Feb 10, 2022

MingjieChen commented Feb 10, 2022

time dimension doesn't match #11

time dimension doesn't match #11

Comments

MingjieChen commented Jan 24, 2022

MingjieChen commented Jan 24, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

Yaccoub commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

mmgn123 commented Feb 9, 2022

keonlee9420 commented Feb 9, 2022

keonlee9420 commented Feb 10, 2022

MingjieChen commented Feb 10, 2022