Update train_codes/README.md

2026-02-04 17:39:20 +08:00 · 2024-04-30 15:06:50 +08:00
parent d73daf1808
commit 30dcd5237f
2 changed files with 11 additions and 2 deletions
--- a/README.md
+++ b/README.md
@@ -29,6 +29,7 @@ We introduce `MuseTalk`, a **real-time high quality** lip-syncing model (30fps+
 - [04/02/2024] Release MuseTalk project and pretrained models.
 - [04/16/2024] Release Gradio [demo](https://huggingface.co/spaces/TMElyralab/MuseTalk) on HuggingFace Spaces (thanks to HF team for their community grant)
 - [04/17/2024] :mega: We release a pipeline that utilizes MuseTalk for real-time inference.
 - [04/30/2024] We release an initial version of training codes in `train_codes`.
 ## Model
 ![Model Structure](assets/figs/musetalk_arc.jpg)
@@ -165,7 +166,7 @@ Note that although we use a very similar architecture as Stable Diffusion, MuseT
 - [x] Huggingface Gradio [demo](https://huggingface.co/spaces/TMElyralab/MuseTalk).
 - [x] codes for real-time inference.
 - [ ] technical report.
- [ ] training codes.
+- [x] training codes.
 - [ ] a better model (may take longer).
--- a/train_codes/README.md
+++ b/train_codes/README.md
@@ -2,6 +2,10 @@
 We provde the draft training codes here. Unfortunately, data preprocessing code is still being reorganized.
 ## Setup
 We trained our model on an NVIDIA A100 with `batch size=8, gradient_accumulation_steps=4` for 20w+ steps. Using multiple GPUs should accelerate the training.
 ## Data preprocessing
 You could refer the inference codes which [crop the face images](https://github.com/TMElyralab/MuseTalk/blob/main/scripts/inference.py#L79) and [extract audio features](https://github.com/TMElyralab/MuseTalk/blob/main/scripts/inference.py#L69).
@@ -33,3 +37,7 @@ Simply run after preparing the preprocessed data
 ```
 sh train.sh
 ```
 ## TODO
 - [ ] release data preprocessing codes
 - [ ] release some novel designs in training (after technical report)