DiffSinger requires Python 3.8 or later. We strongly recommend you create a virtual environment via Conda or venv before installing dependencies.
-
Install The latest PyTorch following the official instructions according to your OS and hardware.
-
Install other dependencies via the following command:
pip install -r requirements.txt
Some essential materials and assets are needed before continuing with this repository. See materials for training and using models for detailed instructions.
Every model needs a configuration file to run preprocessing, training, inference and deployment. Templates of configurations files are in configs/templates. Please copy the templates to your own data directory before you edit them.
Before you continue, it is highly recommended to read through Best Practices, which is a more detailed tutorial on how to configure your experiments.
For more details about configurable parameters, see Configuration Schemas.
Tips: to see which parameters are required or recommended to be edited, you can search by customizability in the configuration schemas.
Raw data pieces and transcriptions should be binarized into dataset files before training. Before doing this step, please ensure all required configurations like raw_data_dir
and binary_data_dir
are set properly, and all your desired functionalities and features are enabled and configured.
Assume that you have a configuration file called my_config.yaml
. Run:
python scripts/binarize.py --config my_config.yaml
Preprocessing can be accelerated through multiprocessing. See binarization_args.num_workers for more explanations.
Assume that you have a configuration file called my_config.yaml
and the name of your model is my_experiment
. Run:
python scripts/train.py --config my_config.yaml --exp_name my_experiment --reset
Checkpoints will be saved at the checkpoints/my_experiment/
directory. When interrupting the program and running the above command again, the training resumes automatically from the latest checkpoint.
For more suggestions related to training performance, see performance tuning.
Run the following command to start the TensorBoard:
tensorboard --logdir checkpoints/
NOTICE
If you are training a model with multiple GPUs (DDP), please add
--reload_multifile=true
option when launching TensorBoard, otherwise it may not update properly.
Inference of DiffSinger is based on DS files. Assume that you have a DS file named my_song.ds
and your model is named my_experiment
.
If your model is a variance model, run:
python scripts/infer.py variance my_song.ds --exp my_experiment
or run
python scripts/infer.py variance --help
for more configurable options.
If your model is an acoustic model, run:
python scripts/infer.py acoustic my_song.ds --exp my_experiment
or run
python scripts/infer.py acoustic --help
for more configurable options.
DiffSinger uses ONNX as the deployment format.
Due to TorchScript issues, exporting to ONNX now requires PyTorch 1.13. Please ensure the correct dependencies through following steps:
-
Create a new separate environment for exporting ONNX.
-
Install PyTorch 1.13 following the official instructions. A CPU-only version is enough.
-
Install other dependencies via the following command:
pip install -r requirements-onnx.txt
Assume that you have a model named my_experiment
.
If your model is a variance model, run:
python scripts/export.py variance --exp my_experiment
or run
python scripts/export.py variance --help
for more configurable options.
If your model is an acoustic model, run:
python scripts/export.py acoustic --exp my_experiment
or run
python scripts/export.py acoustic --help
for more configurable options.
To export an NSF-HiFiGAN vocoder checkpoint, run:
python scripts/export.py nsf-hifigan --config CONFIG --ckpt CKPT
where CONFIG
is a configuration file that has configured the same mel parameters as the vocoder (can be configs/acoustic.yaml for most cases) and CKPT
is the path of the checkpoint to be exported.
For more configurable options, run
python scripts/export.py nsf-hifigan --help
There are other useful CLI tools in the scripts/ directory not mentioned above:
- drop_spk.py - delete speaker embeddings from checkpoints (for data security reasons when distributing models)
- vocoder.py - bypass the acoustic model and only run the vocoder on given mel-spectrograms