deepgeek/Matcha-TTS

mirror of https://github.com/shivammehta25/Matcha-TTS.git synced 2026-02-04 17:59:19 +08:00

Go to file

Gustav Eje Henter 5a7f220662 Update demo page

Tweak the text, mention ICASSP acceptance, and update citation information.

2024-01-14 15:43:13 +01:00

Trying to add logo

2023-09-07 09:38:41 +00:00

updating number of ode solver audios

2023-09-06 16:14:40 +00:00

_config.yaml

Adding first table

2023-09-06 02:04:14 +00:00

favicon.ico

Adding first table

2023-09-06 02:04:14 +00:00

README.md

Update demo page

2024-01-14 15:43:13 +01:00

README.md

Matcha-TTS: A fast TTS architecture with conditional flow matching

<head> </head>

Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, and Gustav Eje Henter

We propose 🍵 Matcha-TTS, a new approach to non-autoregressive neural TTS, that uses conditional flow matching (similar to rectified flows) to speed up ODE-based speech synthesis. Our method:

Is probabilistic
Has compact memory footprint
Sounds highly natural
Is very fast to synthesise from

See below for audio examples, or read our ICASSP 2024 paper for more details. Code is available in our GitHub repository, along with pre-trained models.

You can also try 🍵 Matcha-TTS in your browser on HuggingFace 🤗 spaces.

<style type="text/css"> .tg { border-collapse: collapse; border-color: #9ABAD9; border-spacing: 0; } .tg td { background-color: #EBF5FF; border-color: #9ABAD9; border-style: solid; border-width: 1px; color: #444; font-family: Arial, sans-serif; font-size: 14px; overflow: hidden; padding: 0px 20px; word-break: normal; font-weight: bold; vertical-align: middle; text-align: center; white-space: nowrap; } .tg th { background-color: #409cff; border-color: #9ABAD9; border-style: solid; border-width: 1px; color: #fff; font-family: Arial, sans-serif; font-size: 14px; font-weight: bold; overflow: hidden; padding: 0px 20px; word-break: normal; font-weight: bold; vertical-align: middle; text-align: center; white-space: nowrap; margin: auto; } .tg th a { background-color: #409cff; color: #fff; text-decoration: none; font-family: Arial, sans-serif; font-size: 14px; font-weight: bold; overflow: hidden; padding: 0px 20px; word-break: normal; font-weight: bold; vertical-align: middle; text-align: center; white-space: nowrap; margin: auto; } .tg .tg-0pky { border-color: inherit; text-align: center; vertical-align: top, } td img { position: relative; margin: 0 auto; max-width: 650px; padding: 5px; border: 0px; } .tg .tg-fymr { border-color: inherit; font-weight: bold; text-align: center; vertical-align: top } .slider { -webkit-appearance: none; width: 75%; height: 15px; border-radius: 5px; background: #d3d3d3; outline: none; opacity: 0.7; -webkit-transition: .2s; transition: opacity .2s; } .slider::-webkit-slider-thumb { -webkit-appearance: none; appearance: none; width: 25px; height: 25px; border-radius: 50%; background: #409cff; cursor: pointer; } .slider::-moz-range-thumb { width: 25px; height: 25px; border-radius: 50%; background: #409cff; cursor: pointer; } /* audio { width: 240px; } */ /* CSS */ .button-12 { display: flex; flex-direction: column; align-items: center; padding: 10px 54px; font-family: -apple-system, BlinkMacSystemFont, 'Roboto', sans-serif; font-weight: bold; border-radius: 6px; border: none; background: #6E6D70; box-shadow: 0px 0.5px 1px rgba(0, 0, 0, 0.1), inset 0px 0.5px 0.5px rgba(255, 255, 255, 0.5), 0px 0px 0px 0.5px rgba(0, 0, 0, 0.12); color: #DFDEDF; user-select: none; -webkit-user-select: none; touch-action: manipulation; } .button-12:focus { box-shadow: inset 0px 0.8px 0px -0.25px rgba(255, 255, 255, 0.2), 0px 0.5px 1px rgba(0, 0, 0, 0.1), 0px 0px 0px 3.5px rgba(58, 108, 217, 0.5); outline: 0; } audio { margin: 0.5em; } .slider { -webkit-appearance: none; width: 75%; height: 15px; border-radius: 5px; background: #d3d3d3; outline: none; opacity: 0.7; -webkit-transition: .2s; transition: opacity .2s; } .slider::-webkit-slider-thumb { -webkit-appearance: none; appearance: none; width: 25px; height: 25px; border-radius: 50%; background: #409cff; cursor: pointer; } .slider::-moz-range-thumb { width: 25px; height: 25px; border-radius: 50%; background: #409cff; cursor: pointer; } </style> <script src="transcripts.js"></script> <script> transcript_listening_test = { 1: "It had established periodic regular review of the status of four hundred individuals;", //4 2: "The narrative of these events is based largely on the recollections of the participants,", // 3 3: "The jury did not believe him, and the verdict was for the defendants.", // 7 4: "One by one the huge uprights of black timber were fitted together,", // 19 5: "The position of this palmprint on the carton was parallel with the long axis of the box, and at right angles with the short axis;", // 23 6: "The boy declared he saw no one, and accordingly passed through without paying the toll of a penny." // 38 } function play_audio(filename, audio_id, condition_name, transcription){ audio = document.getElementById(audio_id); audio_source = document.getElementById(audio_id + "-src"); block_quote = document.getElementById(audio_id + "-transcript"); stimulus_span = document.getElementById(audio_id + "-span"); audio.pause(); audio_source.src = filename; block_quote.innerHTML = transcription; stimulus_span.innerHTML = condition_name; audio.load(); audio.play(); } </script>

Stimuli from the listening test

Click the buttons in the table to load and play the different stimuli.

Currently loaded stimulus: MAT-10 : Sentence 1

Audio player:

Transcription:

It had established periodic regular review of the status of four hundred individuals;

System	Condition	Sentence 1	Sentence 2	Sentence 3	Sentence 4	Sentence 5	Sentence 6
Vocoded speech	VOC
Matcha-TTS	MAT-10
	MAT-4
	MAT-2
Grad-TTS	GRAD-10
Grad-TTS	GRAD-4
Grad-TTS+CFM	GCFM-4
FastSpeech 2	FS2
VITS	VITS

Effect of the number of ODE solver steps

1 500

Steps:

<script> var itr_slider = document.getElementById("itr_slider"); var itr_vals = document.getElementsByClassName("itr_val"); // Functions to update values var iterations = { 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 10, 7: 15, 8: 20, 9: 25, 10: 50, 11: 100, 12: 500, }; function updateVals(classes, value){ for(var i=0; i < classes.length; i++) { classes[i].innerHTML= iterations[parseInt(value)]; } } let systems = [ "MAT", "GRAD", "GCFM" ] updateVals(itr_vals, 6); itr_slider.oninput = function() { updateVals(itr_vals, this.value); let iteration = iterations[parseInt(this.value)]; // Update sources for (let sent=1; sent <= 3; sent++){ for (let system_idx = 0; system_idx < systems.length; system_idx++){ let audio = document.getElementById(systems[system_idx] + "_sent_" + sent); let audio_src = document.getElementById( systems[system_idx] + "_sent_src_" + sent); audio_src.src = "stimuli/number_of_ode_solver/" + systems[system_idx] + "-" + iteration + "_" + sent + ".wav"; audio.load(); } } } </script>

System	Sentence 1	Sentence 2	Sentence 3
Matcha-TTS
Grad-TTS
Grad-TTS + CFM

Citation information

@inproceedings{mehta2024matcha,
  title={Matcha-{TTS}: A fast {TTS} architecture with conditional flow matching},
  author={Mehta, Shivam and Tu, Ruibo and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
  booktitle={Proc. ICASSP},
  year={2024}
}