mirror of
https://github.com/shivammehta25/Matcha-TTS.git
synced 2026-02-04 09:49:21 +08:00
Initial commit
This commit is contained in:
171
README.md
Normal file
171
README.md
Normal file
@@ -0,0 +1,171 @@
|
||||
<div align="center">
|
||||
|
||||
# Matcha-TTS: A fast TTS architecture with conditional flow matching
|
||||
|
||||
### [Shivam Mehta](https://www.kth.se/profile/smehta), [Ruibo Tu](https://www.kth.se/profile/ruibo), [Jonas Beskow](https://www.kth.se/profile/beskow), [Éva Székely](https://www.kth.se/profile/szekely), and [Gustav Eje Henter](https://people.kth.se/~ghe/)
|
||||
|
||||
[](https://github.com/pre-commit/pre-commit)
|
||||
[](https://pytorch.org/get-started/locally/)
|
||||
[](https://pytorchlightning.ai/)
|
||||
[](https://hydra.cc/)
|
||||
[](https://black.readthedocs.io/en/stable/)
|
||||
[](https://pycqa.github.io/isort/)
|
||||
|
||||
</div>
|
||||
|
||||
<p style="text-align: center;">
|
||||
<img src="https://shivammehta25.github.io/Matcha-TTS/images/logo.png" height="128"/>
|
||||
</p>
|
||||
|
||||
> This is the official code implementation of 🍵 Matcha-TTS.
|
||||
|
||||
We propose 🍵 Matcha-TTS, a new approach to non-autoregressive neural TTS, that uses conditional flow matching (similar to rectified flows) to speed up ODE-based speech synthesis. Our method:
|
||||
|
||||
- Is probabilistic
|
||||
- Has compact memory footprint
|
||||
- Sounds highly natural
|
||||
- Is very fast to synthesise from
|
||||
|
||||
Check out our [demo page](https://shivammehta25.github.io/Matcha-TTS). Read our [arXiv preprint for more details](https://arxiv.org/abs/2309.03199).
|
||||
|
||||
<br>
|
||||
|
||||
## Installation
|
||||
|
||||
1. Create an environment (suggested but optional)
|
||||
|
||||
```
|
||||
conda create -n matcha_tts python=3.10 -y
|
||||
conda activate matcha_tts
|
||||
```
|
||||
|
||||
2. Install Matcha TTS using pip from source
|
||||
(in future we plan to add it to PyPI)
|
||||
|
||||
```bash
|
||||
pip install git+https://github.com/shivammehta25/Matcha-TTS.git
|
||||
```
|
||||
|
||||
3. Run CLI / gradio app / jupyter notebook
|
||||
|
||||
```bash
|
||||
# This will download the required models and list available arguments
|
||||
matcha_tts --help
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
```bash
|
||||
matcha_tts_app
|
||||
```
|
||||
|
||||
or open `synthesis.ipynb` on jupyter notebook
|
||||
|
||||
### CLI Arguments
|
||||
|
||||
- To synthesise from given text, run:
|
||||
|
||||
```bash
|
||||
match_tts --text "<INPUT TEXT>"
|
||||
```
|
||||
|
||||
- To synthesise from a file, run:
|
||||
|
||||
```bash
|
||||
match_tts --file <PATH TO FILE>
|
||||
```
|
||||
|
||||
- To batch synthesise from a file, run:
|
||||
|
||||
```bash
|
||||
match_tts --file <PATH TO FILE> --batched
|
||||
```
|
||||
|
||||
Additional arguments
|
||||
|
||||
- Speaking rate
|
||||
|
||||
```bash
|
||||
match_tts --text "<INPUT TEXT>" --speaking_rate 1.0
|
||||
```
|
||||
|
||||
- Sampling temperature
|
||||
|
||||
```bash
|
||||
match_tts --text "<INPUT TEXT>" --temperature 0.667
|
||||
```
|
||||
|
||||
- Euler ODE solver steps
|
||||
|
||||
```bash
|
||||
match_tts --text "<INPUT TEXT>" --steps 10
|
||||
```
|
||||
|
||||
## Citation information
|
||||
|
||||
If you find this work useful, please cite our paper:
|
||||
|
||||
```text
|
||||
@article{mehta2023matcha,
|
||||
title={Matcha-TTS: A fast TTS architecture with conditional flow matching},
|
||||
author={Mehta, Shivam and Tu, Ruibo and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
|
||||
journal={arXiv preprint arXiv:2309.03199},
|
||||
year={2023}
|
||||
}
|
||||
```
|
||||
|
||||
## Train with your own dataset
|
||||
|
||||
Let's assume we are training with LJSpeech
|
||||
|
||||
1. Download the dataset from [here](https://keithito.com/LJ-Speech-Dataset/), extract it to `data/LJSpeech-1.1`, and prepare the filelists to point to the extracted data like the [5th point of setup in Tacotron2 repo](https://github.com/NVIDIA/tacotron2#setup).
|
||||
|
||||
2. Clone and enter this repository
|
||||
|
||||
```bash
|
||||
git clone https://github.com/shivammehta25/Matcha-TTS.git
|
||||
cd Matcha-TTS
|
||||
```
|
||||
|
||||
3. Install the package from source
|
||||
|
||||
```bash
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
4. Go to `configs/data/ljspeech.yaml` and change
|
||||
|
||||
```yaml
|
||||
train_filelist_path: data/filelists/ljs_audio_text_train_filelist.txt
|
||||
valid_filelist_path: data/filelists/ljs_audio_text_val_filelist.txt
|
||||
```
|
||||
|
||||
to the paths of your train and validation filelists.
|
||||
|
||||
5. Run the training script
|
||||
|
||||
```bash
|
||||
make train-ljspeech
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
```bash
|
||||
python matcha/train.py experiment=ljspeech
|
||||
```
|
||||
|
||||
for multi-gpu training, run
|
||||
|
||||
```bash
|
||||
python matcha/train.py experiment=ljspeech trainer.devices=[0,1]
|
||||
```
|
||||
|
||||
## Acknowledgements
|
||||
|
||||
Since this code uses: [Lightning-Hydra-Template](https://github.com/ashleve/lightning-hydra-template), you have all the powers attached to it.
|
||||
|
||||
Other source codes I would like to acknowledge:
|
||||
|
||||
- [Coqui-TTS](https://github.com/coqui-ai/TTS/tree/dev)
|
||||
- [Grad-TTS](https://github.com/huawei-noah/Speech-Backbones/tree/main/Grad-TTS)
|
||||
- [torchdyn](https://github.com/DiffEqML/torchdyn)
|
||||
Reference in New Issue
Block a user