GAN-TTS

A pytorch implementation of the GAN-TTS: HIGH FIDELITY SPEECH SYNTHESIS WITH ADVERSARIAL NETWORKS(https://arxiv.org/pdf/1909.11646.pdf)

Prepare dataset

Download dataset for training. This can be any wav files with sample rate 24000Hz.
Edit configuration in utils/audio.py (hop_length must remain unchanged)
Process data: python process.py --wav_dir="wavs" --output="data"

Train & Tensorboard

python train.py --input="data/train"
tensorboard --logdir logdir

Inference

python generate.py --input="data/test"

Result

You can find the results in the samples directory.

Attention

I did not use the loss function mentioned in the paper. I modified the loss function and learn from ParallelWaveGAN(https://arxiv.org/pdf/1910.11480.pdf).
I did not use linguistic features, I use mel spectrogram, so the model can be considered a vocoder.

Notes

This is not official implementation, some details are not necessarily correct.
In order to accelerate convergence, I modified some network structures and loss functions.

Reference

kan-bayashi/ParallelWaveGAN(https://github.com/kan-bayashi/ParallelWaveGAN)
Parallel WaveGAN(https://arxiv.org/pdf/1910.11480.pdf)
GAN-TTS: HIGH FIDELITY SPEECH SYNTHESIS WITH ADVERSARIAL(https://arxiv.org/pdf/1909.11646.pdf)

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
images		images
models		models
samples		samples
utils		utils
README.md		README.md
generate.py		generate.py
process.py		process.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GAN-TTS

Prepare dataset

Train & Tensorboard

Inference

Result

Attention

Notes

Reference

About

Releases

Packages

Languages

yanggeng1995/GAN-TTS

Folders and files

Latest commit

History

Repository files navigation

GAN-TTS

Prepare dataset

Train & Tensorboard

Inference

Result

Attention

Notes

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages