Matcha-TTS: A fast TTS architecture with conditional flow matching

# Matcha-TTS: A fast TTS architecture with conditional flow matching ##### [Shivam Mehta][shivam_profile], [Ruibo Tu][ruibo_profile], [Jonas Beskow][jonas_profile], [Éva Székely][eva_profile], and [Gustav Eje Henter][gustav_profile] We propose Matcha-TTS, a new approach to non-autoregressive neural TTS, that uses conditional flow matching to speed up ODE-based speech synthesis. Our method: - Is probabilistic - Has compact memory footprint - Sounds highly natural - Is very fast to synthesise from Please check out the audio examples below and read our arXiv preprint for more details. Code and pre-trained models will be made available shortly after the ICASSP deadline. [shivam_profile]: https://www.kth.se/profile/smehta [ruibo_profile]: https://www.kth.se/profile/ruibo [jonas_profile]: https://www.kth.se/profile/beskow [eva_profile]: https://www.kth.se/profile/szekely [gustav_profile]: https://people.kth.se/~ghe/ [this_page]: https://shivammehta25.github.io/Matcha-TTS ## Architecture Architecture of OverFlow

## Stimuli from the listening test > Click the buttons in the table to load and play the different stimuli. Currently loaded stimulus: MAT-10 : Sentence 1

Audio player:

Transcription:

It had established periodic regular review of the status of four hundred individuals;

System	Condition	Sentence 1	Sentence 2	Sentence 3	Sentence 4	Sentence 5	Sentence 6
Vocoded speech	VOC
Matcha-TTS	MAT-10
	MAT-4
	MAT-2
Grad-TTS	GRAD-10
Grad-TTS	GRAD-4
Grad-TTS+CFM	GCFM-4
FastSpeech 2	FS2
VITS	VITS

## Effect of the number of ODE solver steps

1 500

Steps:

System	Sentence 1	Sentence 2	Sentence 3
Matcha-TTS
Grad-TTS
Grad-TTS + CFM

[![MatchaTTS](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https://shivammehta25.github.io/Matcha-TTS&count_bg=%23409CFF&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=Matcha-TTS&edge_flat=false)][this_page]