Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add unsupported language? (nob) #18

Open
thomasht86 opened this issue Mar 2, 2024 · 6 comments
Open

How to add unsupported language? (nob) #18

thomasht86 opened this issue Mar 2, 2024 · 6 comments

Comments

@thomasht86
Copy link

Strangely enough, I can see from MMS coverage
that nob (Norwegian) is not supported for TTS.
image

What must be done in order to support it?

@ylacombe
Copy link
Owner

ylacombe commented Mar 5, 2024

Hi, unfortunately I don't think the authors are planning to release other MMS models.
From our side, to release such a model, we'd need a good Norwegian TTS datasets, do you have such a dataset in mind?
Would you also be interested in training such a model from scratch?
If so, let me know, I can give you some pointers and some help,
Best

@thomasht86
Copy link
Author

Hi!
Thanks for the reply!
There are at least two large datasets available in Norwegian:

  1. https://huggingface.co/datasets/NbAiLab/NPSC
  2. https://huggingface.co/datasets/NbAiLab/NST

I have tried running your scripts, and converting a checkpoint from https://huggingface.co/facebook/mms-tts-swe, with the premise that Swedish and Norwegian is quite similar.
Had to modify vocab.json manually, by adding two characters [æ,å] to map to same token_id as their Swedish "counterparts".

I have played around with different learning rates, and parameters, but I consistently get infinity for KL loss, and NaN loss after 100 steps or so....

image

If you could give pointers for training a model from scratch, I could give it a shot. 😊

@ylacombe
Copy link
Owner

ylacombe commented Mar 5, 2024

How did you initialize the model ? This might have an important role.
[EDIT:] looking at this model, it seems okay, did you initialize it from scratch ?

Also which hyper-parameters did you use ? I'll recommend using the default one from the Vits original training.

@thomasht86
Copy link
Author

thomasht86 commented Mar 5, 2024

No, I generated that one from the swedish model, with convert_original_discriminator_checkpoint.py.
But starting with a model initialized from another language might probably require some tricks to finetune..?

I used the hyperparameters you provided in https://github.com/ylacombe/finetune-hf-vits/tree/main/training_config_examples as basis, but did a "random manual search" from there.

Where can I find the default ones from original training?

@ylacombe
Copy link
Owner

ylacombe commented Mar 5, 2024

In that case, here is a snippet that you can modify to initialize from scratch:

from utils.configuration_vits import VitsConfig
from utils.modeling_vits_training import VitsModelForPreTraining
from utils.feature_extraction_vits import VitsFeatureExtractor
from transformers import AutoTokenizer

NEW_REPO_ID = ...

config = VitsConfig.from_pretrained("thomasht86/mms-tts-nob")
VitsModelForPreTraining(config).push_to_hub(NEW_REPO_ID)

VitsFeatureExtractor.from_pretrained("thomasht86/mms-tts-nob").push_to_hub(NEW_REPO_ID)
AutoTokenizer.from_pretrained("thomasht86/mms-tts-nob").push_to_hub(NEW_REPO_ID)

In terms of training, I'd advice:

  • focusing on a single speaker per model, as it will facilitate training and be of better quality
  • follow the original hyper-parameters (learning rate and loss weights): here

@JackismyShephard
Copy link

I am in sort of the same situation but looking to finetune MMS for danish (which is very similar to norwegian).

I am having trouble understanding where the above code snippet fits into the training pipeline. Should it be executed after converting a checkpoint using the convert_original_discriminator_checkpoint.py script?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants