From 068d135e20a9df992747f73dd45e1503b0f91d7b Mon Sep 17 00:00:00 2001 From: Shivam Mehta Date: Mon, 27 May 2024 13:57:10 +0200 Subject: [PATCH] Adding alginment information to readme --- README.md | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/README.md b/README.md index e33084d..1318df4 100644 --- a/README.md +++ b/README.md @@ -252,6 +252,43 @@ python3 -m matcha.onnx.infer model.onnx --text "hey" --output-dir ./outputs --vo This will write `.wav` audio files to the output directory. +## Extract phoneme alignments from Matcha-TTS + +If the dataset is structured as + +```bash +data/ +└── LJSpeech-1.1 + ├── metadata.csv + ├── README + ├── test.txt + ├── train.txt + ├── val.txt + └── wavs +``` +Then you can extract the phoneme level alignments from a Trained Matcha-TTS model using: +```bash +python matcha/utils/get_durations_from_trained_model.py -i dataset_yaml -c +``` +Example: +```bash +python matcha/utils/get_durations_from_trained_model.py -i ljspeech.yaml -c matcha_ljspeech.ckpt +``` +or simply: +```bash +matcha-tts-get-durations -i ljspeech.yaml -c matcha_ljspeech.ckpt +``` +--- +## Train using extracted alignments + +In the datasetconfig turn on load duration. +Example: `ljspeech.yaml` +``` +load_durations: True +``` +or see an examples in configs/experiment/ljspeech_from_durations.yaml + + ## Citation information If you use our code or otherwise find this work useful, please cite our paper: