diff --git a/dnn/torch/fargan/README.md b/dnn/torch/fargan/README.md index a92c914dc..b986e387c 100644 --- a/dnn/torch/fargan/README.md +++ b/dnn/torch/fargan/README.md @@ -1,6 +1,9 @@ # Framewise Auto-Regressive GAN (FARGAN) -Implementation of FARGAN, a low-complexity neural vocoder. +Implementation of FARGAN, a low-complexity neural vocoder. Pre-trained models +are provided as C code in the dnn/ directory with the corresponding model in +dnn/models/ directory (name starts with fargan_). If you don't want to train +a new FARGAN model, you can skip straight to the Inference section. ## Data preparation diff --git a/dnn/torch/rdovae/README.md b/dnn/torch/rdovae/README.md index 14359d82d..d1d3f4981 100644 --- a/dnn/torch/rdovae/README.md +++ b/dnn/torch/rdovae/README.md @@ -1,24 +1,48 @@ -# Rate-Distortion-Optimized Variational Auto-Encoder +# Deep REDundancy (DRED) with RDO-VAE -## Setup -The python code requires python >= 3.6 and has been tested with python 3.6 and python 3.10. To install requirements run +This is a rate-distortion-optimized variational autoencoder (RDO-VAE) designed +to coding redundancy information. Pre-trained models are provided as C code +in the dnn/ directory with the corresponding model in dnn/models/ directory +(name starts with rdovae_). If you don't want to train a new DRED model, you can +skip straight to the Inference section. + +## Data preparation + +For data preparation you need to build Opus as detailed in the top-level README. +You will need to use the --enable-dred configure option. +The build will produce an executable named "dump_data". +To prepare the training data, run: ``` -python -m pip install -r requirements.txt +./dump_data -train in_speech.pcm out_features.f32 out_speech.pcm ``` +Where the in_speech.pcm speech file is a raw 16-bit PCM file sampled at 16 kHz. +The speech data used for training the model can be found at: +https://media.xiph.org/lpcnet/speech/tts_speech_negative_16k.sw +The out_speech.pcm file isn't needed for DRED, but it is needed to train +the FARGAN vocoder (see dnn/torch/fargan/ for details). ## Training -To generate training data use dump date from the main LPCNet repo + +To perform training, run the following command: ``` -./dump_data -train 16khz_speech_input.s16 features.f32 data.s16 +python ./train_rdovae.py --cuda-visible-devices 0 --sequence-length 400 --split-mode random_split --state-dim 80 --batch-size 512 --epochs 400 --lambda-max 0.04 --lr 0.003 --lr-decay-factor 0.0001 out_features.f32 output_dir ``` +The final model will be in output_dir/checkpoints/chechpoint_400.pth. -To train the model, simply run +The model can be converted to C using: ``` -python train_rdovae.py features.f32 output_folder +python export_rdovae_weights.py output_dir/checkpoints/chechpoint_400.pth dred_c_dir ``` +which will create a number of C source and header files in the fargan_c_dir directory. +Copy these files to the opus/dnn/ directory (replacing the existing ones) and recompile Opus. -To train on CUDA device add `--cuda-visible-devices idx`. +## Inference - -## ToDo -- Upload checkpoints and add URLs +DRED is integrated within the Opus codec and can be evaluated using the opus_demo +executable. For example: +``` +./opus_demo voip 16000 1 64000 -loss 50 -dred 100 -sim_loss 50 input.pcm output.pcm +``` +Will tell the encoder to encode a 16 kHz raw audio file at 64 kb/s using up to 1 second +of redundancy (units are based on 10-ms) and then simulate 50% loss. Refer to `opus_demo --help` +for more details.