Parrot-TTS

Parrot-TTS is a text-to-speech (TTS) system that utilizes a Transformer based sequence-to-sequence model to map character tokens to HuBERT quantized units and a modified HiFi-GAN vocoder for speech synthesis. This repository is an official impplementation of our EACL 2024 paper available at https://aclanthology.org/2024.findings-eacl.6/. This repository provides instructions for installation, demo execution, and training the TTS model on your own data. We have uploaded a few files generated with our model (trained with no transliteration for non-English characters) and are available at https://drive.google.com/file/d/1b4uoeRv106J-4NvzVnotfBiAuFz049_q/view?usp=sharing

Libraries Installation

Create and activate a new Conda environment:

conda create --name parrottts python=3.8.19
conda activate parrottts

Install the required libraries:

pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu125

Running a Demo

Run a demo using the provided Jupyter notebook, demo.ipynb. The checkpoints are trained on training data available from https://sites.google.com/view/limmits25/home?authuser=0

The notebook will automatically download the following files from Google Drive and store at following locations:
- runs/aligner/symbol.pkl: A dictionary to map characters to tokens.
- runs/TTE/ckpt: Model to convert character text tokens to HuBERT units.
- runs/vocoder/checkpoints: Model to predict speech from HuBERT units.

Training Parrot-TTS on Your Data

To train Parrot-TTS on your dataset, follow these steps (1-10):

Step 1: Compute Unique Symbols/Characters

Update the dataset_dir folder in utils/aligner/aligner_preprocessor_config.yaml. The dataset_dir contains individual speakers and within it contains their wavs and txt files. The code cleans text files per speaker, stores them separately, and computes unique characters across all speakers. For non-english speakers, make sure to check do_transliteration flag in utils/aligner/aligner_preprocessor_config.yaml.
```
python utils/aligner/preprocessor.py utils/aligner/aligner_preprocessor_config.yaml
```

Step 2: Train Aligner for Each Speaker

Update base_dataset_dir in train.sh. base_dataset_dir is the same as dataset_dir used in Step 1.
```
bash utils/aligner/train.sh
```

Step 3: Extract HuBERT Units

Download the HuBERT checkpoint and quantizer from this link and store them in utils/hubert_extraction. Once downloaded, the following command can be run. Note: You may need to clone and install fairseq to run this step.

Run the following command to extract HuBERT units:

python utils/hubert_extraction/extractor.py utils/hubert_extraction/hubert_config.yaml

Note: HuBERT units have already been extracted for the corpus and are available at this Google Drive link. Download and save it at runs/hubert_extraction.

Step 4: Create Files for TTE Training

Prepare the necessary files for training the TTE module:

python utils/TTE/preprocessor.py utils/TTE/TTE_config.yaml

Step 5: Train the TTE Module

Train the TTE module using the following command:

python train.py --config utils/TTE/TTE_config.yaml --num_gpus 1

Step 6: Infer HuBERT Prediction

Run inference to predict HuBERT from the trained TTE module:

python inference.py --config utils/TTE/TTE_config.yaml --checkpoint_pth runs/TTE/ckpt/parrot_model-step=50000-val_total_loss_step=0.00.ckpt --device cuda:2

Step 7: Create Training and Validation Files for Vocoder

Generate training and validation files for the vocoder:

python utils/vocoder/preprocessor.py --input_file runs/hubert_extraction/hubert.txt --root_path runs/vocoder

Step 8: Train HiFi-GAN Vocoder

Set the number of GPUs in the nproc_per_node variable and run the following command:

CUDA_VISIBLE_DEVICES=1,2,3 python -m torch.distributed.run --nproc_per_node=3 utils/vocoder/train.py --checkpoint_path runs/vocoder/checkpoints --config utils/vocoder/config.json

Step 9: Infer Vocoder on Validation File

Infer the vocoder on the validation file:

python utils/vocoder/inference.py --checkpoint_file runs/vocoder/checkpoints -n 100 --vc --input_code_file runs/vocoder/val.txt --output_dir runs/vocoder/generations_vocoder

Step 10: Infer Vocoder on Actual Predictions

Infer the vocoder on predictions from the TTE module:

python utils/vocoder/inference.py --checkpoint_file runs/vocoder/checkpoints -n 100 --vc --input_code_file runs/TTE/predictions.txt --output_dir runs/vocoder/generations_tte

Acknowledgements

This repository is developed using insights from:

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
modules		modules
utils		utils
.gitignore		.gitignore
README.md		README.md
demo.ipynb		demo.ipynb
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parrot-TTS

Libraries Installation

Running a Demo

Training Parrot-TTS on Your Data

Step 1: Compute Unique Symbols/Characters

Step 2: Train Aligner for Each Speaker

Step 3: Extract HuBERT Units

Step 4: Create Files for TTE Training

Step 5: Train the TTE Module

Step 6: Infer HuBERT Prediction

Step 7: Create Training and Validation Files for Vocoder

Step 8: Train HiFi-GAN Vocoder

Step 9: Infer Vocoder on Validation File

Step 10: Infer Vocoder on Actual Predictions

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

parrot-tts/Parrot-TTS

Folders and files

Latest commit

History

Repository files navigation

Parrot-TTS

Libraries Installation

Running a Demo

Training Parrot-TTS on Your Data

Step 1: Compute Unique Symbols/Characters

Step 2: Train Aligner for Each Speaker

Step 3: Extract HuBERT Units

Step 4: Create Files for TTE Training

Step 5: Train the TTE Module

Step 6: Infer HuBERT Prediction

Step 7: Create Training and Validation Files for Vocoder

Step 8: Train HiFi-GAN Vocoder

Step 9: Infer Vocoder on Validation File

Step 10: Infer Vocoder on Actual Predictions

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages