Update README.md

2026-02-04 09:29:20 +08:00 · 2024-04-02 19:37:48 +08:00
parent 4bb0398b53
commit 4c9c634fc4
6 changed files with 52 additions and 19 deletions
--- a/README.md
+++ b/README.md
@@ -11,7 +11,7 @@ Chao Zhan,
 Wenjiang Zhou
 (<sup>*</sup>Equal Contribution, <sup>†</sup>Corresponding Author, benbinwu@tencent.com)

-**[github](https://github.com/TMElyralab/MuseTalk)**    **[huggingface](https://huggingface.co/TMElyralab/MuseTalk)**    **Project(comming soon)**    **Technical report (comming soon)**
+**[github](https://github.com/TMElyralab/MuseTalk)**    **[huggingface](https://huggingface.co/TMElyralab/MuseTalk)**    **Project (comming soon)**    **Technical report (comming soon)**

 We introduce `MuseTalk`, a **real-time high quality** lip-syncing model (30fps+ on an NVIDIA Tesla V100). MuseTalk can be applied with input videos, e.g., generated by [MuseV](https://github.com/TMElyralab/MuseV), as a complete virtual human solution.

@@ -37,18 +37,51 @@ MuseTalk was trained in latent spaces, where the images were encoded by a freeze
 <table class="center">
  <tr style="font-weight: bolder;text-align:center;">
        <td width="33%">Image</td>
-        <td width="33%">MuseV </td>
-        <td width="33%"> +MuseTalk</td>
+        <td width="33%">MuseV</td>
+        <td width="33%">+MuseTalk</td>
+  </tr>
+  <tr>
+    <td>
+      <img src=assets/demo/musk/musk.png width="95%">
+    </td>
+    <td >
+      <video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/4a4bb2d1-9d14-4ca9-85c8-7f19c39f712e controls preload></video>
+    </td>
+    <td >
+      <video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/b2a879c2-e23a-4d39-911d-51f0343218e4 controls preload></video>
+    </td>
  </tr>
  <tr>
    <td>
      <img src=assets/demo/yongen/yongen.jpeg width="95%">
    </td>
    <td >
-      <video src=assets/demo/yongen/yongen_musev.mp4 controls preload></video>
+      <video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/57ef9dee-a9fd-4dc8-839b-3fbbbf0ff3f4 controls preload></video>
    </td>
    <td >
-      <video src=assets/demo/yongen/yongen_musetalk.mp4 controls preload></video>
+      <video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/94d8dcba-1bcd-4b54-9d1d-8b6fc53228f0 controls preload></video>
+    </td>
+  </tr>
+  <tr>
+    <td>
+      <img src=assets/demo/man/man.png width="95%">
+    </td>
+    <td >
+      <video src= controls preload></video>
+    </td>
+    <td >
+      <video src= controls preload></video>
+    </td>
+  </tr>
+  <tr>
+    <td>
+      <img src=assets/demo/sit/sit.jpeg width="95%">
+    </td>
+    <td >
+      <video src= controls preload></video>
+    </td>
+    <td >
+      <video src= controls preload></video>
    </td>
  </tr>
  <tr>
@@ -56,10 +89,10 @@ MuseTalk was trained in latent spaces, where the images were encoded by a freeze
      <img src=assets/demo/monalisa/monalisa.png width="95%">
    </td>
    <td >
-      <video src=assets/demo/monalisa/monalisa_musev.mp4 controls preload></video>
+      <video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/1568f604-a34f-4526-a13a-7d282aa2e773 controls preload></video>
    </td>
    <td >
-      <video src=assets/demo/monalisa/monalisa_musetalk.mp4 controls preload></video>
+      <video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/a40784fc-a885-4c1f-9b7e-8f87b7caf4e0 controls preload></video>
    </td>
  </tr>
  <tr>
@@ -67,10 +100,10 @@ MuseTalk was trained in latent spaces, where the images were encoded by a freeze
      <img src=assets/demo/sun1/sun.png width="95%">
    </td>
    <td >
-      <video src=assets/demo/sun1/sun_musev.mp4 controls preload></video>
+      <video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/37a3a666-7b90-4244-8d3a-058cb0e44107 controls preload></video>
    </td>
    <td >
-      <video src=assets/demo/sun1/sun_musetalk.mp4 controls preload></video>
+      <video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/172f4ff1-d432-45bd-a5a7-a07dec33a26b controls preload></video>
    </td>
  </tr>
  <tr>
@@ -78,10 +111,10 @@ MuseTalk was trained in latent spaces, where the images were encoded by a freeze
      <img src=assets/demo/sun2/sun.png width="95%">
    </td>
    <td >
-      <video src=assets/demo/sun2/sun_musev.mp4 controls preload></video>
+      <video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/37a3a666-7b90-4244-8d3a-058cb0e44107 controls preload></video>
    </td>
    <td >
-      <video src=assets/demo/sun2/sun_musetalk.mp4 controls preload></video>
+      <video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/85a6873d-a028-4cce-af2b-6c59a1f2971d controls preload></video>
    </td>
  </tr>
 </table >
@@ -96,7 +129,7 @@ MuseTalk was trained in latent spaces, where the images were encoded by a freeze
  </tr>
  <tr>
    <td>
-      <video src=assets/demo/video_dubbing/Let_the_Bullets_Fly.mp4 controls preload></video>
+      <video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/4d7c5fa1-3550-4d52-8ed2-52f158150f24 controls preload></video>
    </td>
    <td>
      <a href="//www.bilibili.com/video/BV1wT411b7HU">Link</a>
@@ -204,7 +237,7 @@ python -m scripts.inference --inference_config configs/inference/test.yaml --bbo

 #### Combining MuseV and MuseTalk

-You are suggested to first apply [MuseV](https://github.com/TMElyralab/MuseV) to generate a video by referring [this](https://github.com/TMElyralab/MuseV?tab=readme-ov-file#text2video). Then, you can use `MuseTalk` by referring [this]().
+As a complete solution to virtual human generation, you are suggested to first apply [MuseV](https://github.com/TMElyralab/MuseV) to generate a video (text-to-video, image-to-video or pose-to-video) by referring [this](https://github.com/TMElyralab/MuseV?tab=readme-ov-file#text2video). Then, you can use `MuseTalk` to generate a lip-sync video by referring [this](https://github.com/TMElyralab/MuseTalk?tab=readme-ov-file#inference).

 # Note