Getting Started

Installation

Environments and dependencies

DiffSinger requires Python 3.8 or later. We strongly recommend you create a virtual environment via Conda or venv before installing dependencies.

Install The latest PyTorch following the official instructions according to your OS and hardware.
Install other dependencies via the following command:
```
pip install -r requirements.txt
```

Materials and assets

Some essential materials and assets are needed before continuing with this repository. See materials for training and using models for detailed instructions.

Configuration

Every model needs a configuration file to run preprocessing, training, inference and deployment. Templates of configurations files are in configs/templates. Please copy the templates to your own data directory before you edit them.

Before you continue, it is highly recommended to read through Best Practices, which is a more detailed tutorial on how to configure your experiments.

For more details about configurable parameters, see Configuration Schemas.

Tips: to see which parameters are required or recommended to be edited, you can search by customizability in the configuration schemas.

Preprocessing

Raw data pieces and transcriptions should be binarized into dataset files before training. Before doing this step, please ensure all required configurations like raw_data_dir and binary_data_dir are set properly, and all your desired functionalities and features are enabled and configured.

Assume that you have a configuration file called my_config.yaml. Run:

python scripts/binarize.py --config my_config.yaml

Preprocessing can be accelerated through multiprocessing. See binarization_args.num_workers for more explanations.

Training

Assume that you have a configuration file called my_config.yaml and the name of your model is my_experiment. Run:

python scripts/train.py --config my_config.yaml --exp_name my_experiment --reset

Checkpoints will be saved at the checkpoints/my_experiment/ directory. When interrupting the program and running the above command again, the training resumes automatically from the latest checkpoint.

For more suggestions related to training performance, see performance tuning.

TensorBoard

Run the following command to start the TensorBoard:

tensorboard --logdir checkpoints/

NOTICE

If you are training a model with multiple GPUs (DDP), please add --reload_multifile=true option when launching TensorBoard, otherwise it may not update properly.

Inference

Inference of DiffSinger is based on DS files. Assume that you have a DS file named my_song.ds and your model is named my_experiment.

If your model is a variance model, run:

python scripts/infer.py variance my_song.ds --exp my_experiment

or run

python scripts/infer.py variance --help

for more configurable options.

If your model is an acoustic model, run:

python scripts/infer.py acoustic my_song.ds --exp my_experiment

or run

python scripts/infer.py acoustic --help

for more configurable options.

Deployment

DiffSinger uses ONNX as the deployment format.

Due to TorchScript issues, exporting to ONNX now requires PyTorch 1.13. Please ensure the correct dependencies through following steps:

Create a new separate environment for exporting ONNX.
Install PyTorch 1.13 following the official instructions. A CPU-only version is enough.
Install other dependencies via the following command:
```
pip install -r requirements-onnx.txt
```

Assume that you have a model named my_experiment.

If your model is a variance model, run:

python scripts/export.py variance --exp my_experiment

or run

python scripts/export.py variance --help

for more configurable options.

If your model is an acoustic model, run:

python scripts/export.py acoustic --exp my_experiment

or run

python scripts/export.py acoustic --help

for more configurable options.

To export an NSF-HiFiGAN vocoder checkpoint, run:

python scripts/export.py nsf-hifigan --config CONFIG --ckpt CKPT

where CONFIG is a configuration file that has configured the same mel parameters as the vocoder (can be configs/acoustic.yaml for most cases) and CKPT is the path of the checkpoint to be exported.

For more configurable options, run

python scripts/export.py nsf-hifigan --help

Other utilities

There are other useful CLI tools in the scripts/ directory not mentioned above:

drop_spk.py - delete speaker embeddings from checkpoints (for data security reasons when distributing models)
vocoder.py - bypass the acoustic model and only run the vocoder on given mel-spectrograms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GettingStarted.md

GettingStarted.md

Getting Started

Installation

Environments and dependencies

Materials and assets

Configuration

Preprocessing

Training

TensorBoard

Inference

Deployment

Other utilities

Files

GettingStarted.md

Latest commit

History

GettingStarted.md

File metadata and controls

Getting Started

Installation

Environments and dependencies

Materials and assets

Configuration

Preprocessing

Training

TensorBoard

Inference

Deployment

Other utilities