Update train_codes/README.md

This commit is contained in:
czk32611
2024-04-30 15:06:50 +08:00
parent d73daf1808
commit 30dcd5237f
2 changed files with 11 additions and 2 deletions

View File

@@ -29,6 +29,7 @@ We introduce `MuseTalk`, a **real-time high quality** lip-syncing model (30fps+
- [04/02/2024] Release MuseTalk project and pretrained models. - [04/02/2024] Release MuseTalk project and pretrained models.
- [04/16/2024] Release Gradio [demo](https://huggingface.co/spaces/TMElyralab/MuseTalk) on HuggingFace Spaces (thanks to HF team for their community grant) - [04/16/2024] Release Gradio [demo](https://huggingface.co/spaces/TMElyralab/MuseTalk) on HuggingFace Spaces (thanks to HF team for their community grant)
- [04/17/2024] :mega: We release a pipeline that utilizes MuseTalk for real-time inference. - [04/17/2024] :mega: We release a pipeline that utilizes MuseTalk for real-time inference.
- [04/30/2024] We release an initial version of training codes in `train_codes`.
## Model ## Model
![Model Structure](assets/figs/musetalk_arc.jpg) ![Model Structure](assets/figs/musetalk_arc.jpg)
@@ -165,7 +166,7 @@ Note that although we use a very similar architecture as Stable Diffusion, MuseT
- [x] Huggingface Gradio [demo](https://huggingface.co/spaces/TMElyralab/MuseTalk). - [x] Huggingface Gradio [demo](https://huggingface.co/spaces/TMElyralab/MuseTalk).
- [x] codes for real-time inference. - [x] codes for real-time inference.
- [ ] technical report. - [ ] technical report.
- [ ] training codes. - [x] training codes.
- [ ] a better model (may take longer). - [ ] a better model (may take longer).

View File

@@ -2,6 +2,10 @@
We provde the draft training codes here. Unfortunately, data preprocessing code is still being reorganized. We provde the draft training codes here. Unfortunately, data preprocessing code is still being reorganized.
## Setup
We trained our model on an NVIDIA A100 with `batch size=8, gradient_accumulation_steps=4` for 20w+ steps. Using multiple GPUs should accelerate the training.
## Data preprocessing ## Data preprocessing
You could refer the inference codes which [crop the face images](https://github.com/TMElyralab/MuseTalk/blob/main/scripts/inference.py#L79) and [extract audio features](https://github.com/TMElyralab/MuseTalk/blob/main/scripts/inference.py#L69). You could refer the inference codes which [crop the face images](https://github.com/TMElyralab/MuseTalk/blob/main/scripts/inference.py#L79) and [extract audio features](https://github.com/TMElyralab/MuseTalk/blob/main/scripts/inference.py#L69).
@@ -32,4 +36,8 @@ Finally, the data should be organized as follows:
Simply run after preparing the preprocessed data Simply run after preparing the preprocessed data
``` ```
sh train.sh sh train.sh
``` ```
## TODO
- [ ] release data preprocessing codes
- [ ] release some novel designs in training (after technical report)