# Matcha-TTS: A fast TTS architecture with conditional flow matching
##### [Shivam Mehta][shivam_profile], [Ruibo Tu][ruibo_profile], [Jonas Beskow][jonas_profile], [Éva Székely][eva_profile], and [Gustav Eje Henter][gustav_profile] We propose Matcha-TTS, a new approach to non-autoregressive neural TTS, that uses conditional flow matching to speed up ODE-based speech synthesis. Our method: - Is probabilistic - Has compact memory footprint - Sounds highly natural - Is very fast to synthesise from Please check out the audio examples below and read our arXiv preprint for more details. Code and pre-trained models will be made available shortly after the ICASSP deadline. [shivam_profile]: https://www.kth.se/profile/smehta [ruibo_profile]: https://www.kth.se/profile/ruibo [jonas_profile]: https://www.kth.se/profile/beskow [eva_profile]: https://www.kth.se/profile/szekely [gustav_profile]: https://people.kth.se/~ghe/ [this_page]: https://shivammehta25.github.io/Matcha-TTS ## Architecture
## Stimuli from the listening test
> Click the buttons in the table to load and play the different stimuli.
Currently loaded stimulus: MAT-10 : Sentence 1
Audio player:
Transcription:
It had established periodic regular review of the status of four hundred individuals;
| System | Condition | Sentence 1 | Sentence 2 | Sentence 3 | Sentence 4 | Sentence 5 | Sentence 6 |
|---|---|---|---|---|---|---|---|
| Vocoded speech |
VOC | |
|
|
|
|
|
| Matcha-TTS | MAT-10 | |
|
|
|
|
|
| MAT-4 | |
|
|
|
|
|
|
| MAT-2 | |
|
|
|
|
|
|
| Grad-TTS | GRAD-10 | |
|
|
|
|
|
| GRAD-4 | |
|
|
|
|
|
|
| Grad-TTS+CFM | GCFM-4 | |
|
|
|
|
|
| FastSpeech 2 | FS2 | |
|
|
|
|
|
| VITS | VITS | |
|
|
|
|
|
| System | Sentence 1 | Sentence 2 | Sentence 3 |
|---|---|---|---|
| Matcha-TTS | |||
| Grad-TTS | |||
| Grad-TTS + CFM |