This is a tensorflow implementation of NATURAL TTS SYNTHESIS BY CONDITIONING WAVENET ON MEL SPECTROGRAM PREDICTIONS.
Initially I will use existing components from tacotron and other opensource implementations
The attention mechanism will not be the same as in the paper (initially)
LJ Speech Dataset(https://keithito.com/LJ-Speech-Dataset)
A lot of the base work has been taken from Kubyong Park's ([email protected]) implementation of Deep Voice 3 (https://www.github.com/kyubyong/deepvoice3)