diff --git a/README.md b/README.md index e824b50..bb9819c 100644 --- a/README.md +++ b/README.md @@ -29,6 +29,7 @@ We introduce `MuseTalk`, a **real-time high quality** lip-syncing model (30fps+ - [04/02/2024] Release MuseTalk project and pretrained models. - [04/16/2024] Release Gradio [demo](https://huggingface.co/spaces/TMElyralab/MuseTalk) on HuggingFace Spaces (thanks to HF team for their community grant) - [04/17/2024] :mega: We release a pipeline that utilizes MuseTalk for real-time inference. +- [04/30/2024] We release an initial version of training codes in `train_codes`. ## Model ![Model Structure](assets/figs/musetalk_arc.jpg) @@ -165,7 +166,7 @@ Note that although we use a very similar architecture as Stable Diffusion, MuseT - [x] Huggingface Gradio [demo](https://huggingface.co/spaces/TMElyralab/MuseTalk). - [x] codes for real-time inference. - [ ] technical report. -- [ ] training codes. +- [x] training codes. - [ ] a better model (may take longer). diff --git a/train_codes/README.md b/train_codes/README.md index fdd56ee..db9848c 100644 --- a/train_codes/README.md +++ b/train_codes/README.md @@ -2,6 +2,10 @@ We provde the draft training codes here. Unfortunately, data preprocessing code is still being reorganized. +## Setup + +We trained our model on an NVIDIA A100 with `batch size=8, gradient_accumulation_steps=4` for 20w+ steps. Using multiple GPUs should accelerate the training. + ## Data preprocessing You could refer the inference codes which [crop the face images](https://github.com/TMElyralab/MuseTalk/blob/main/scripts/inference.py#L79) and [extract audio features](https://github.com/TMElyralab/MuseTalk/blob/main/scripts/inference.py#L69). @@ -32,4 +36,8 @@ Finally, the data should be organized as follows: Simply run after preparing the preprocessed data ``` sh train.sh -``` \ No newline at end of file +``` + +## TODO +- [ ] release data preprocessing codes +- [ ] release some novel designs in training (after technical report) \ No newline at end of file