Update README.md

This commit is contained in:
itechmusic
2024-04-02 19:37:48 +08:00
committed by czk32611
parent 4bb0398b53
commit 4c9c634fc4
6 changed files with 52 additions and 19 deletions

View File

@@ -11,7 +11,7 @@ Chao Zhan,
Wenjiang Zhou
(<sup>*</sup>Equal Contribution, <sup></sup>Corresponding Author, benbinwu@tencent.com)
**[github](https://github.com/TMElyralab/MuseTalk)** **[huggingface](https://huggingface.co/TMElyralab/MuseTalk)** **Project(comming soon)** **Technical report (comming soon)**
**[github](https://github.com/TMElyralab/MuseTalk)** **[huggingface](https://huggingface.co/TMElyralab/MuseTalk)** **Project (comming soon)** **Technical report (comming soon)**
We introduce `MuseTalk`, a **real-time high quality** lip-syncing model (30fps+ on an NVIDIA Tesla V100). MuseTalk can be applied with input videos, e.g., generated by [MuseV](https://github.com/TMElyralab/MuseV), as a complete virtual human solution.
@@ -37,18 +37,51 @@ MuseTalk was trained in latent spaces, where the images were encoded by a freeze
<table class="center">
<tr style="font-weight: bolder;text-align:center;">
<td width="33%">Image</td>
<td width="33%">MuseV </td>
<td width="33%"> +MuseTalk</td>
<td width="33%">MuseV</td>
<td width="33%">+MuseTalk</td>
</tr>
<tr>
<td>
<img src=assets/demo/musk/musk.png width="95%">
</td>
<td >
<video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/4a4bb2d1-9d14-4ca9-85c8-7f19c39f712e controls preload></video>
</td>
<td >
<video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/b2a879c2-e23a-4d39-911d-51f0343218e4 controls preload></video>
</td>
</tr>
<tr>
<td>
<img src=assets/demo/yongen/yongen.jpeg width="95%">
</td>
<td >
<video src=assets/demo/yongen/yongen_musev.mp4 controls preload></video>
<video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/57ef9dee-a9fd-4dc8-839b-3fbbbf0ff3f4 controls preload></video>
</td>
<td >
<video src=assets/demo/yongen/yongen_musetalk.mp4 controls preload></video>
<video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/94d8dcba-1bcd-4b54-9d1d-8b6fc53228f0 controls preload></video>
</td>
</tr>
<tr>
<td>
<img src=assets/demo/man/man.png width="95%">
</td>
<td >
<video src= controls preload></video>
</td>
<td >
<video src= controls preload></video>
</td>
</tr>
<tr>
<td>
<img src=assets/demo/sit/sit.jpeg width="95%">
</td>
<td >
<video src= controls preload></video>
</td>
<td >
<video src= controls preload></video>
</td>
</tr>
<tr>
@@ -56,10 +89,10 @@ MuseTalk was trained in latent spaces, where the images were encoded by a freeze
<img src=assets/demo/monalisa/monalisa.png width="95%">
</td>
<td >
<video src=assets/demo/monalisa/monalisa_musev.mp4 controls preload></video>
<video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/1568f604-a34f-4526-a13a-7d282aa2e773 controls preload></video>
</td>
<td >
<video src=assets/demo/monalisa/monalisa_musetalk.mp4 controls preload></video>
<video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/a40784fc-a885-4c1f-9b7e-8f87b7caf4e0 controls preload></video>
</td>
</tr>
<tr>
@@ -67,10 +100,10 @@ MuseTalk was trained in latent spaces, where the images were encoded by a freeze
<img src=assets/demo/sun1/sun.png width="95%">
</td>
<td >
<video src=assets/demo/sun1/sun_musev.mp4 controls preload></video>
<video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/37a3a666-7b90-4244-8d3a-058cb0e44107 controls preload></video>
</td>
<td >
<video src=assets/demo/sun1/sun_musetalk.mp4 controls preload></video>
<video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/172f4ff1-d432-45bd-a5a7-a07dec33a26b controls preload></video>
</td>
</tr>
<tr>
@@ -78,10 +111,10 @@ MuseTalk was trained in latent spaces, where the images were encoded by a freeze
<img src=assets/demo/sun2/sun.png width="95%">
</td>
<td >
<video src=assets/demo/sun2/sun_musev.mp4 controls preload></video>
<video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/37a3a666-7b90-4244-8d3a-058cb0e44107 controls preload></video>
</td>
<td >
<video src=assets/demo/sun2/sun_musetalk.mp4 controls preload></video>
<video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/85a6873d-a028-4cce-af2b-6c59a1f2971d controls preload></video>
</td>
</tr>
</table >
@@ -96,7 +129,7 @@ MuseTalk was trained in latent spaces, where the images were encoded by a freeze
</tr>
<tr>
<td>
<video src=assets/demo/video_dubbing/Let_the_Bullets_Fly.mp4 controls preload></video>
<video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/4d7c5fa1-3550-4d52-8ed2-52f158150f24 controls preload></video>
</td>
<td>
<a href="//www.bilibili.com/video/BV1wT411b7HU">Link</a>
@@ -204,7 +237,7 @@ python -m scripts.inference --inference_config configs/inference/test.yaml --bbo
#### Combining MuseV and MuseTalk
You are suggested to first apply [MuseV](https://github.com/TMElyralab/MuseV) to generate a video by referring [this](https://github.com/TMElyralab/MuseV?tab=readme-ov-file#text2video). Then, you can use `MuseTalk` by referring [this]().
As a complete solution to virtual human generation, you are suggested to first apply [MuseV](https://github.com/TMElyralab/MuseV) to generate a video (text-to-video, image-to-video or pose-to-video) by referring [this](https://github.com/TMElyralab/MuseV?tab=readme-ov-file#text2video). Then, you can use `MuseTalk` to generate a lip-sync video by referring [this](https://github.com/TMElyralab/MuseTalk?tab=readme-ov-file#inference).
# Note