23 Commits
0.0.4 ... docs

Author SHA1 Message Date
Gustav Eje Henter
5a7f220662 Update demo page
Tweak the text, mention ICASSP acceptance, and update citation information.
2024-01-14 15:43:13 +01:00
Shivam Mehta
188cdb6844 Removing architecture from the docs 2024-01-10 11:06:48 +00:00
Shivam Mehta
30ae66af39 Update README.md 2023-09-28 10:15:09 +02:00
Shivam Mehta
158d95e278 Updating docs 2023-09-17 15:54:16 +00:00
Shivam Mehta
47b2fc9244 Update README.md 2023-09-14 15:07:04 +02:00
Shivam Mehta
00bf704393 centering the logo 2023-09-07 09:48:38 +00:00
Shivam Mehta
be07e9aed7 Trying to add logo 2023-09-07 09:38:41 +00:00
Shivam Mehta
87dc55646e Adding arXiv link 2023-09-07 08:17:54 +00:00
Shivam Mehta
3e297d9b25 Adding arXiv link 2023-09-07 08:15:38 +00:00
Shivam Mehta
5d6519b004 Truning on the hit counter 2023-09-06 18:23:02 +00:00
Shivam Mehta
7bd56fcdae Updating links 2023-09-06 17:45:04 +00:00
Shivam Mehta
859a6fe45c minor change to retrigger deployment 2023-09-06 16:19:57 +00:00
Shivam Mehta
a29515877b updating number of ode solver audios 2023-09-06 16:14:40 +00:00
Shivam Mehta
af5a398021 Changing the number of ode solver steps 2023-09-06 16:03:59 +00:00
Shivam Mehta
54bb4aca58 Adding stimuli for different number of ode solver steps 2023-09-06 15:51:06 +00:00
Shivam Mehta
94c39f85b2 Minor formatting fixes 2023-09-06 12:35:18 +00:00
Shivam Mehta
6b5936b616 minor changes 2023-09-06 12:28:57 +00:00
Shivam Mehta
82e33b4240 minor changes 2023-09-06 12:26:14 +00:00
Shivam Mehta
82a799f56e minor changes 2023-09-06 11:10:47 +00:00
Shivam Mehta
0e35178fbe adding hit counter 2023-09-06 02:19:18 +00:00
Shivam Mehta
683fa463bb adding hit counter 2023-09-06 02:15:42 +00:00
Shivam Mehta
d3208d86e3 Aligning image of play 2023-09-06 02:12:12 +00:00
Shivam Mehta
af03022f71 Adding first table 2023-09-06 02:04:14 +00:00
170 changed files with 525 additions and 21 deletions

21
LICENSE
View File

@@ -1,21 +0,0 @@
MIT License
Copyright (c) 2023 Shivam Mehta
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

521
README.md Normal file
View File

@@ -0,0 +1,521 @@
# Matcha-TTS: A fast TTS architecture with conditional flow matching
<head>
<link rel="icon" type="image/x-icon" href="favicon.ico">
<meta name="msapplication-TileColor" content="#da532c">
<meta charset="UTF-8">
<meta name="theme-color" content="#ffffff">
<meta property="og:title" content="Matcha-TTS: A fast TTS architecture with conditional flow matching" />
<meta name="og:description" content="We propose Matcha-TTS, a new approach to non-autoregressive neural TTS, that uses conditional flow matching to speed up ODE-based speech synthesis. Our method is probabilistic, has compact memory footprint, sounds highly natural, is very fast to synthesise from">
<meta property="og:image" content="images/architecture.png" />
<meta property="twitter:image" content="images/architecture.png" />
<meta property="og:type" content="website" />
<meta property="og:site_name" content="Matcha-TTS" />
<meta name="twitter:card" content="images/architecture.png" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="keywords" content="tts, text to speech, probabilistic machine learning, diffusion models, conditional flow matching, generative modelling, machine learning, deep learning, speech synthesis, research, phd">
<meta name="description" content="We propose Matcha-TTS, a new approach to non-autoregressive neural TTS, that uses conditional flow matching to speed up ODE-based speech synthesis. Our method is probabilistic, has compact memory footprint, sounds highly natural, is very fast to synthesise from." />
</head>
##### [Shivam Mehta][shivam_profile], [Ruibo Tu][ruibo_profile], [Jonas Beskow][jonas_profile], [Éva Székely][eva_profile], and [Gustav Eje Henter][gustav_profile]
<p style="text-align: center;">
<img src="images/logo.png" height="128"/>
</p>
We propose 🍵 Matcha-TTS, a new approach to non-autoregressive neural TTS, that uses [conditional flow matching](https://arxiv.org/abs/2210.02747) (similar to [rectified flows](https://arxiv.org/abs/2209.03003)) to speed up ODE-based speech synthesis. Our method:
- Is probabilistic
- Has compact memory footprint
- Sounds highly natural
- Is very fast to synthesise from
See below for audio examples, or read [our ICASSP 2024 paper][arxiv_link] for more details.
Code is available in our [GitHub repository][github_link], along with pre-trained models.
You can also [try 🍵 Matcha-TTS in your browser on HuggingFace 🤗 spaces][hf_space].
[shivam_profile]: https://www.kth.se/profile/smehta
[ruibo_profile]: https://www.kth.se/profile/ruibo
[jonas_profile]: https://www.kth.se/profile/beskow
[eva_profile]: https://www.kth.se/profile/szekely
[gustav_profile]: https://people.kth.se/~ghe/
[this_page]: https://shivammehta25.github.io/Matcha-TTS
[arxiv_link]: https://arxiv.org/abs/2309.03199
[grad_tts_paper]: https://arxiv.org/abs/2105.06337
[vits_paper]: https://arxiv.org/abs/2106.06103
[fastspeech2_paper]: https://arxiv.org/abs/2006.04558
[github_link]: https://github.com/shivammehta25/Matcha-TTS
[hf_space]: https://huggingface.co/spaces/shivammehta25/Matcha-TTS
<style type="text/css">
.tg {
border-collapse: collapse;
border-color: #9ABAD9;
border-spacing: 0;
}
.tg td {
background-color: #EBF5FF;
border-color: #9ABAD9;
border-style: solid;
border-width: 1px;
color: #444;
font-family: Arial, sans-serif;
font-size: 14px;
overflow: hidden;
padding: 0px 20px;
word-break: normal;
font-weight: bold;
vertical-align: middle;
text-align: center;
white-space: nowrap;
}
.tg th {
background-color: #409cff;
border-color: #9ABAD9;
border-style: solid;
border-width: 1px;
color: #fff;
font-family: Arial, sans-serif;
font-size: 14px;
font-weight: bold;
overflow: hidden;
padding: 0px 20px;
word-break: normal;
font-weight: bold;
vertical-align: middle;
text-align: center;
white-space: nowrap;
margin: auto;
}
.tg th a {
background-color: #409cff;
color: #fff;
text-decoration: none;
font-family: Arial, sans-serif;
font-size: 14px;
font-weight: bold;
overflow: hidden;
padding: 0px 20px;
word-break: normal;
font-weight: bold;
vertical-align: middle;
text-align: center;
white-space: nowrap;
margin: auto;
}
.tg .tg-0pky {
border-color: inherit;
text-align: center;
vertical-align: top,
}
td img {
position: relative;
margin: 0 auto;
max-width: 650px;
padding: 5px;
border: 0px;
}
.tg .tg-fymr {
border-color: inherit;
font-weight: bold;
text-align: center;
vertical-align: top
}
.slider {
-webkit-appearance: none;
width: 75%;
height: 15px;
border-radius: 5px;
background: #d3d3d3;
outline: none;
opacity: 0.7;
-webkit-transition: .2s;
transition: opacity .2s;
}
.slider::-webkit-slider-thumb {
-webkit-appearance: none;
appearance: none;
width: 25px;
height: 25px;
border-radius: 50%;
background: #409cff;
cursor: pointer;
}
.slider::-moz-range-thumb {
width: 25px;
height: 25px;
border-radius: 50%;
background: #409cff;
cursor: pointer;
}
/* audio {
width: 240px;
} */
/* CSS */
.button-12 {
display: flex;
flex-direction: column;
align-items: center;
padding: 10px 54px;
font-family: -apple-system, BlinkMacSystemFont, 'Roboto', sans-serif;
font-weight: bold;
border-radius: 6px;
border: none;
background: #6E6D70;
box-shadow: 0px 0.5px 1px rgba(0, 0, 0, 0.1), inset 0px 0.5px 0.5px rgba(255, 255, 255, 0.5), 0px 0px 0px 0.5px rgba(0, 0, 0, 0.12);
color: #DFDEDF;
user-select: none;
-webkit-user-select: none;
touch-action: manipulation;
}
.button-12:focus {
box-shadow: inset 0px 0.8px 0px -0.25px rgba(255, 255, 255, 0.2), 0px 0.5px 1px rgba(0, 0, 0, 0.1), 0px 0px 0px 3.5px rgba(58, 108, 217, 0.5);
outline: 0;
}
audio {
margin: 0.5em;
}
.slider {
-webkit-appearance: none;
width: 75%;
height: 15px;
border-radius: 5px;
background: #d3d3d3;
outline: none;
opacity: 0.7;
-webkit-transition: .2s;
transition: opacity .2s;
}
.slider::-webkit-slider-thumb {
-webkit-appearance: none;
appearance: none;
width: 25px;
height: 25px;
border-radius: 50%;
background: #409cff;
cursor: pointer;
}
.slider::-moz-range-thumb {
width: 25px;
height: 25px;
border-radius: 50%;
background: #409cff;
cursor: pointer;
}
</style>
<script src="transcripts.js"></script>
<!-- ## Architecture
<img src="images/architecture.png" alt="Architecture of Matcha-TTS" width="750"/> -->
<script>
transcript_listening_test = {
1: "It had established periodic regular review of the status of four hundred individuals;", //4
2: "The narrative of these events is based largely on the recollections of the participants,", // 3
3: "The jury did not believe him, and the verdict was for the defendants.", // 7
4: "One by one the huge uprights of black timber were fitted together,", // 19
5: "The position of this palmprint on the carton was parallel with the long axis of the box, and at right angles with the short axis;", // 23
6: "The boy declared he saw no one, and accordingly passed through without paying the toll of a penny." // 38
}
function play_audio(filename, audio_id, condition_name, transcription){
audio = document.getElementById(audio_id);
audio_source = document.getElementById(audio_id + "-src");
block_quote = document.getElementById(audio_id + "-transcript");
stimulus_span = document.getElementById(audio_id + "-span");
audio.pause();
audio_source.src = filename;
block_quote.innerHTML = transcription;
stimulus_span.innerHTML = condition_name;
audio.load();
audio.play();
}
</script>
## Stimuli from the listening test
> Click the buttons in the table to load and play the different stimuli.
Currently loaded stimulus: <span id="stimuli-from-listening-test-span" style="font-weight: bold;"> MAT-10 : Sentence 1</span>
<p>Audio player: </p>
<audio id="stimuli-from-listening-test" controls>
<source id="stimuli-from-listening-test-src" src="stimuli/sample_from_test/MAT-10_1.wav" type="audio/wav">
</audio>
<p> Transcription: </p>
<blockquote style="height: 60px">
<p id="stimuli-from-listening-test-transcript">
It had established periodic regular review of the status of four hundred individuals;
</p>
</blockquote>
<table class="tg">
<thead>
<tr>
<th class="tg-0pky">System</th>
<th class="tg-0pky">Condition</th>
<th class="tg-0pky">Sentence 1</th>
<th class="tg-0pky">Sentence 2</th>
<th class="tg-0pky">Sentence 3</th>
<th class="tg-0pky">Sentence 4</th>
<th class="tg-0pky">Sentence 5</th>
<th class="tg-0pky">Sentence 6</th>
</tr>
</thead>
<tbody>
<tr>
<th class="tg-0pky">Vocoded <br> speech</th>
<th class="tg-0pky">VOC</th>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/VOC_1.wav', 'stimuli-from-listening-test', 'VOC , Sentence 1', transcript_listening_test[1])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/VOC_2.wav', 'stimuli-from-listening-test', 'VOC , Sentence 2', transcript_listening_test[2])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/VOC_3.wav', 'stimuli-from-listening-test', 'VOC , Sentence 3', transcript_listening_test[3])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/VOC_4.wav', 'stimuli-from-listening-test', 'VOC , Sentence 4', transcript_listening_test[4])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/VOC_5.wav', 'stimuli-from-listening-test', 'VOC , Sentence 5', transcript_listening_test[5])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/VOC_6.wav', 'stimuli-from-listening-test', 'VOC , Sentence 6', transcript_listening_test[6])"/> </td>
</tr>
<tr>
<th class="tg-0pky" rowspan="3"><a href="https://shivammehta25.github.io/Matcha-TTS"> Matcha-TTS</a></th>
<th class="tg-0pky">MAT-10</th>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-10_1.wav', 'stimuli-from-listening-test', 'MAT-10 , Sentence 1', transcript_listening_test[1])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-10_2.wav', 'stimuli-from-listening-test', 'MAT-10 , Sentence 2', transcript_listening_test[2])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-10_3.wav', 'stimuli-from-listening-test', 'MAT-10 , Sentence 3', transcript_listening_test[3])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-10_4.wav', 'stimuli-from-listening-test', 'MAT-10 , Sentence 4', transcript_listening_test[4])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-10_5.wav', 'stimuli-from-listening-test', 'MAT-10 , Sentence 5', transcript_listening_test[5])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-10_6.wav', 'stimuli-from-listening-test', 'MAT-10 , Sentence 6', transcript_listening_test[6])"/> </td>
</tr>
<tr>
<th class="tg-0pky">MAT-4</th>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-4_1.wav', 'stimuli-from-listening-test', 'MAT-4 , Sentence 1', transcript_listening_test[1])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-4_2.wav', 'stimuli-from-listening-test', 'MAT-4 , Sentence 2', transcript_listening_test[2])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-4_3.wav', 'stimuli-from-listening-test', 'MAT-4 : Sentence 3', transcript_listening_test[3])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-4_4.wav', 'stimuli-from-listening-test', 'MAT-4 : Sentence 4', transcript_listening_test[4])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-4_5.wav', 'stimuli-from-listening-test', 'MAT-4 , Sentence 5', transcript_listening_test[5])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-4_6.wav', 'stimuli-from-listening-test', 'MAT-4 , Sentence 6', transcript_listening_test[6])"/> </td>
</tr>
<tr>
<th class="tg-0pky">MAT-2</th>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-2_1.wav', 'stimuli-from-listening-test', 'MAT-2 , Sentence 1', transcript_listening_test[1])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-2_2.wav', 'stimuli-from-listening-test', 'MAT-2 , Sentence 2', transcript_listening_test[2])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-2_3.wav', 'stimuli-from-listening-test', 'MAT-2 , Sentence 3', transcript_listening_test[3])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-2_4.wav', 'stimuli-from-listening-test', 'MAT-2 , Sentence 4', transcript_listening_test[4])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-2_5.wav', 'stimuli-from-listening-test', 'MAT-2 , Sentence 5', transcript_listening_test[5])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/MAT-2_6.wav', 'stimuli-from-listening-test', 'MAT-2 , Sentence 6', transcript_listening_test[6])"/> </td>
</tr>
<tr>
<th class="tg-0pky" rowspan="2"><a href="https://arxiv.org/abs/2105.06337">Grad-TTS</a></th>
<th class="tg-0pky">GRAD-10</th>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GRAD-10_1.wav', 'stimuli-from-listening-test', 'GRAD-10 , Sentence 1', transcript_listening_test[1])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GRAD-10_2.wav', 'stimuli-from-listening-test', 'GRAD-10 , Sentence 2', transcript_listening_test[2])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GRAD-10_3.wav', 'stimuli-from-listening-test', 'GRAD-10 , Sentence 3', transcript_listening_test[3])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GRAD-10_4.wav', 'stimuli-from-listening-test', 'GRAD-10 , Sentence 4', transcript_listening_test[4])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GRAD-10_5.wav', 'stimuli-from-listening-test', 'GRAD-10 , Sentence 5', transcript_listening_test[5])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GRAD-10_6.wav', 'stimuli-from-listening-test', 'GRAD-10 , Sentence 6', transcript_listening_test[6])"/> </td>
</tr>
<tr>
<th class="tg-0pky">GRAD-4</th>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GRAD-4_1.wav', 'stimuli-from-listening-test', 'GRAD-4 , Sentence 1', transcript_listening_test[1])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GRAD-4_2.wav', 'stimuli-from-listening-test', 'GRAD-4 , Sentence 2', transcript_listening_test[2])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GRAD-4_3.wav', 'stimuli-from-listening-test', 'GRAD-4 , Sentence 3', transcript_listening_test[3])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GRAD-4_4.wav', 'stimuli-from-listening-test', 'GRAD-4 , Sentence 4', transcript_listening_test[4])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GRAD-4_5.wav', 'stimuli-from-listening-test', 'GRAD-4 , Sentence 5', transcript_listening_test[5])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GRAD-4_6.wav', 'stimuli-from-listening-test', 'GRAD-4 , Sentence 6', transcript_listening_test[6])"/> </td>
</tr>
<tr>
<th class="tg-0pky">Grad-TTS+CFM</th>
<th class="tg-0pky">GCFM-4</th>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GCFM-4_1.wav', 'stimuli-from-listening-test', 'GCFM-4 , Sentence 1', transcript_listening_test[1])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GCFM-4_2.wav', 'stimuli-from-listening-test', 'GCFM-4 , Sentence 2', transcript_listening_test[2])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GCFM-4_3.wav', 'stimuli-from-listening-test', 'GCFM-4 , Sentence 3', transcript_listening_test[3])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GCFM-4_4.wav', 'stimuli-from-listening-test', 'GCFM-4 , Sentence 4', transcript_listening_test[4])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GCFM-4_5.wav', 'stimuli-from-listening-test', 'GCFM-4 , Sentence 5', transcript_listening_test[5])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/GCFM-4_6.wav', 'stimuli-from-listening-test', 'GCFM-4 , Sentence 6', transcript_listening_test[6])"/> </td>
</tr>
<tr>
<th class="tg-0pky"><a href="https://arxiv.org/abs/2006.04558">FastSpeech 2</a></th>
<th class="tg-0pky">FS2</th>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/FS2_1.wav', 'stimuli-from-listening-test', 'FS2 , Sentence 1', transcript_listening_test[1])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/FS2_2.wav', 'stimuli-from-listening-test', 'FS2 , Sentence 2', transcript_listening_test[2])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/FS2_3.wav', 'stimuli-from-listening-test', 'FS2 , Sentence 3', transcript_listening_test[3])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/FS2_4.wav', 'stimuli-from-listening-test', 'FS2 , Sentence 4', transcript_listening_test[4])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/FS2_5.wav', 'stimuli-from-listening-test', 'FS2 , Sentence 5', transcript_listening_test[5])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/FS2_6.wav', 'stimuli-from-listening-test', 'FS2 , Sentence 6', transcript_listening_test[6])"/> </td>
</tr>
<tr>
<th class="tg-0pky"><a href="https://arxiv.org/abs/2106.06103">VITS</a></th>
<th class="tg-0pky">VITS</th>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/VITS_1.wav', 'stimuli-from-listening-test', 'VITS , Sentence 1', transcript_listening_test[1])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/VITS_2.wav', 'stimuli-from-listening-test', 'VITS , Sentence 2', transcript_listening_test[2])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/VITS_3.wav', 'stimuli-from-listening-test', 'VITS , Sentence 3', transcript_listening_test[3])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/VITS_4.wav', 'stimuli-from-listening-test', 'VITS , Sentence 4', transcript_listening_test[4])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/VITS_5.wav', 'stimuli-from-listening-test', 'VITS , Sentence 5', transcript_listening_test[5])"/> </td>
<td> <img src="images/play_button.png" height=40 style="cursor: pointer;" onclick="play_audio('stimuli/sample_from_test/VITS_6.wav', 'stimuli-from-listening-test', 'VITS , Sentence 6', transcript_listening_test[6])"/> </td>
</tr>
</tbody>
</table>
## Effect of the number of ODE solver steps
<div class="slidecontainer">
<label for="itr_slider"><span style="font-weight:bold"> 1 </span></label>
<input type="range" min="1" max="12" value="6" class="slider" id="itr_slider">
<label for="itr_slider"><span style="font-weight:bold"> 500 </span> </label>
<p><span style="font-weight:bold">Steps:</span> <span class="itr_val"></span>
</p>
</div>
<script>
var itr_slider = document.getElementById("itr_slider");
var itr_vals = document.getElementsByClassName("itr_val");
// Functions to update values
var iterations = {
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 10,
7: 15,
8: 20,
9: 25,
10: 50,
11: 100,
12: 500,
};
function updateVals(classes, value){
for(var i=0; i < classes.length; i++) {
classes[i].innerHTML= iterations[parseInt(value)];
}
}
let systems = [
"MAT",
"GRAD",
"GCFM"
]
updateVals(itr_vals, 6);
itr_slider.oninput = function() {
updateVals(itr_vals, this.value);
let iteration = iterations[parseInt(this.value)];
// Update sources
for (let sent=1; sent <= 3; sent++){
for (let system_idx = 0; system_idx < systems.length; system_idx++){
let audio = document.getElementById(systems[system_idx] + "_sent_" + sent);
let audio_src = document.getElementById( systems[system_idx] + "_sent_src_" + sent);
audio_src.src = "stimuli/number_of_ode_solver/" + systems[system_idx] + "-" + iteration + "_" + sent + ".wav";
audio.load();
}
}
}
</script>
<table class="tg">
<thead>
<tr>
<th class="tg-0pky">System</th>
<th class="tg-0pky">Sentence 1</th>
<th class="tg-0pky">Sentence 2</th>
<th class="tg-0pky">Sentence 3</th>
</tr>
</thead>
<tbody>
<tr>
<th class="tg-0pky"><a href="https://shivammehta25.github.io/Matcha-TTS">Matcha-TTS</a></th>
<td>
<audio id="MAT_sent_1" controls>
<source id="MAT_sent_src_1" src="stimuli/number_of_ode_solver/MAT-10_1.wav" type="audio/wav">
</audio>
</td>
<td>
<audio id="MAT_sent_2" controls>
<source id="MAT_sent_src_2" src="stimuli/number_of_ode_solver/MAT-10_2.wav" type="audio/wav">
</audio>
</td>
<td>
<audio id="MAT_sent_3" controls>
<source id="MAT_sent_src_3" src="stimuli/sample_from_test/MAT-10_3.wav" type="audio/wav">
</audio>
</td>
</tr>
<tr>
<th class="tg-0pky"><a href="https://arxiv.org/abs/2105.06337">Grad-TTS</a></th>
<td>
<audio id="GRAD_sent_1" controls>
<source id="GRAD_sent_src_1" src="stimuli/number_of_ode_solver/GRAD-10_1.wav" type="audio/wav">
</audio>
</td>
<td>
<audio id="GRAD_sent_2" controls>
<source id="GRAD_sent_src_2" src="stimuli/number_of_ode_solver/GRAD-10_2.wav" type="audio/wav">
</audio>
</td>
<td>
<audio id="GRAD_sent_3" controls>
<source id="GRAD_sent_src_3" src="stimuli/number_of_ode_solver/GRAD-10_3.wav" type="audio/wav">
</audio>
</td>
</tr>
<tr>
<th class="tg-0pky">Grad-TTS + CFM</th>
<td>
<audio id="GCFM_sent_1" controls>
<source id="GCFM_sent_src_1" src="stimuli/number_of_ode_solver/GCFM-10_1.wav" type="audio/wav">
</audio>
</td>
<td>
<audio id="GCFM_sent_2" controls>
<source id="GCFM_sent_src_2" src="stimuli/number_of_ode_solver/GCFM-10_2.wav" type="audio/wav">
</audio>
</td>
<td>
<audio id="GCFM_sent_3" controls>
<source id="GCFM_sent_src_3" src="stimuli/number_of_ode_solver/GCFM-10_3.wav" type="audio/wav">
</audio>
</td>
</tr>
</tbody>
</table>
## Citation information
```
@inproceedings{mehta2024matcha,
title={Matcha-{TTS}: A fast {TTS} architecture with conditional flow matching},
author={Mehta, Shivam and Tu, Ruibo and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
booktitle={Proc. ICASSP},
year={2024}
}
```
[![MatchaTTS](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https://shivammehta25.github.io/Matcha-TTS&count_bg=%23409CFF&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=Matcha-TTS&edge_flat=false)][this_page]

4
_config.yaml Normal file
View File

@@ -0,0 +1,4 @@
title: Matcha-TTS
theme: jekyll-theme-dinky
description: A fast TTS architecture with conditional flow matching
show_downloads: False

BIN
favicon.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

BIN
images/architecture.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 928 KiB

BIN
images/logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 352 KiB

BIN
images/play_button.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.8 KiB

BIN
stimuli/number_of_ode_solver/.DS_Store vendored Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Some files were not shown because too many files have changed in this diff Show More