Skip to content

Voice Conversion by CycleGAN (语音克隆/语音转换): CycleGAN-VC2

License

Notifications You must be signed in to change notification settings

jackaduma/CycleGAN-VC2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

55f842e · Mar 22, 2023

History

42 Commits
Jul 14, 2020
Jul 14, 2020
Jul 14, 2020
Jan 18, 2021
May 13, 2020
May 5, 2022
Mar 22, 2023
Jul 14, 2020
Jul 14, 2020
Jul 14, 2020
Jul 14, 2020
Nov 17, 2020
Jul 14, 2020

Repository files navigation

CycleGAN-VC2-PyTorch

standard-readme compliant Donate

中文说明 | English


This code is a PyTorch implementation for paper: CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion, a nice work on Voice-Conversion/Voice Cloning.


Update

2020.11.17: fixed issues: re-implements the second step adverserial loss.

2020.08.27: add the second step adverserial loss by Jeffery-zhang-nfls

CycleGAN-VC2

To advance the research on non-parallel VC, we propose CycleGAN-VC2, which is an improved version of CycleGAN-VC incorporating three new techniques: an improved objective (two-step adversarial losses), improved generator (2-1-2D CNN), and improved discriminator (Patch GAN).

network


This repository contains:

  1. model code which implemented the paper.
  2. audio preprocessing script you can use to create cache for training data.
  3. training scripts to train the model.
  4. Examples of Voice Conversion - converted result after training.

Table of Contents


Requirement

pip install -r requirements.txt

Usage

preprocess

python preprocess_training.py

is short for

python preprocess_training.py --train_A_dir ./data/S0913/ --train_B_dir ./data/gaoxiaosong/ --cache_folder ./cache/

train

python train.py

is short for

python train.py --logf0s_normalization ./cache/logf0s_normalization.npz --mcep_normalization ./cache/mcep_normalization.npz --coded_sps_A_norm ./cache/coded_sps_A_norm.pickle --coded_sps_B_norm ./cache/coded_sps_B_norm.pickle --model_checkpoint ./model_checkpoint/ --resume_training_at ./model_checkpoint/_CycleGAN_CheckPoint --validation_A_dir ./data/S0913/ --output_A_dir ./converted_sound/S0913 --validation_B_dir ./data/gaoxiaosong/ --output_B_dir ./converted_sound/gaoxiaosong/

Pretrained

a pretrained model which converted between S0913 and GaoXiaoSong

download from Google Drive <735MB>


Demo

Samples:

reference speaker A: S0913(./data/S0913/BAC009S0913W0351.wav)

reference speaker B: GaoXiaoSong(./data/gaoxiaosong/gaoxiaosong_1.wav)

speaker A's speech changes to speaker B's voice: Converted from S0913 to GaoXiaoSong (./converted_sound/S0913/BAC009S0913W0351.wav)


Star-History

star-history


Reference

  1. CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion. Paper, Project
  2. Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks. Paper, Project
  3. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Paper, Project, Code
  4. Image-to-Image Translation with Conditional Adversarial Nets. Paper, Project, Code

Donation

If this project help you reduce time to develop, you can give me a cup of coffee :)

AliPay(支付宝)

ali_pay

WechatPay(微信)

wechat_pay

paypal


License

MIT © Kun