Expressive Tacotron (implementation with Pytorch)

Introduction

This repository provides a multi-mode and multi-speaker expressive speech synthesis framework, including multi-attentive Tacotron, DurIAN, Non-attentive Tacotron.

The framework also includes various deep learning architectures such as Global Style Token (GST), Variational Autoencoder (VAE), and Gaussian Mixture Variational Autoencoder (GMVAE), and X-vectors for building prosody encoder.

Only provides kernel model files, not including data prepared scripts, training scripts and synthesis scripts
You can reference ExpressiveTacotron for more training scripts

Available recipes

Expressive Mode

Global Style Token (GST)
Variational Autoencoder (VAE)
Gaussian Mixture VAE (GMVAE)
X-vectors

Framework Mode

Tacotron2
ForwardAttention
DurIAN
Non-attentive Tacotron
GMMv2 Attention
Dynamic Convolution Attention (Todo)

Differences

Non-attentive Tacotron: duration stacked convolution layers are concatenated with encoder outputs

Acknowledgements

This implementation uses code from the following repos: NVIDIA, ESPNet, ERISHA, ForwardAttention

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Expressive Tacotron (implementation with Pytorch)

Introduction

Available recipes

Expressive Mode

Framework Mode

Differences

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Expressive Tacotron (implementation with Pytorch)

Introduction

Available recipes

Expressive Mode

Framework Mode

Differences

Acknowledgements