Skip to content

Latest commit

 

History

History
31 lines (23 loc) · 1.81 KB

README.md

File metadata and controls

31 lines (23 loc) · 1.81 KB

Expressive Tacotron (implementation with Pytorch)

Introduction

This repository provides a multi-mode and multi-speaker expressive speech synthesis framework, including multi-attentive Tacotron, DurIAN, Non-attentive Tacotron.

The framework also includes various deep learning architectures such as Global Style Token (GST), Variational Autoencoder (VAE), and Gaussian Mixture Variational Autoencoder (GMVAE), and X-vectors for building prosody encoder.

  • Only provides kernel model files, not including data prepared scripts, training scripts and synthesis scripts
  • You can reference ExpressiveTacotron for more training scripts

Available recipes

Expressive Mode

Framework Mode

Differences

  • Non-attentive Tacotron: duration stacked convolution layers are concatenated with encoder outputs

Acknowledgements

This implementation uses code from the following repos: NVIDIA, ESPNet, ERISHA, ForwardAttention