mirror of
https://github.com/TMElyralab/MuseTalk.git
synced 2026-02-04 17:39:20 +08:00
Update train_codes/README.md
This commit is contained in:
@@ -29,6 +29,7 @@ We introduce `MuseTalk`, a **real-time high quality** lip-syncing model (30fps+
|
|||||||
- [04/02/2024] Release MuseTalk project and pretrained models.
|
- [04/02/2024] Release MuseTalk project and pretrained models.
|
||||||
- [04/16/2024] Release Gradio [demo](https://huggingface.co/spaces/TMElyralab/MuseTalk) on HuggingFace Spaces (thanks to HF team for their community grant)
|
- [04/16/2024] Release Gradio [demo](https://huggingface.co/spaces/TMElyralab/MuseTalk) on HuggingFace Spaces (thanks to HF team for their community grant)
|
||||||
- [04/17/2024] :mega: We release a pipeline that utilizes MuseTalk for real-time inference.
|
- [04/17/2024] :mega: We release a pipeline that utilizes MuseTalk for real-time inference.
|
||||||
|
- [04/30/2024] We release an initial version of training codes in `train_codes`.
|
||||||
|
|
||||||
## Model
|
## Model
|
||||||

|

|
||||||
@@ -165,7 +166,7 @@ Note that although we use a very similar architecture as Stable Diffusion, MuseT
|
|||||||
- [x] Huggingface Gradio [demo](https://huggingface.co/spaces/TMElyralab/MuseTalk).
|
- [x] Huggingface Gradio [demo](https://huggingface.co/spaces/TMElyralab/MuseTalk).
|
||||||
- [x] codes for real-time inference.
|
- [x] codes for real-time inference.
|
||||||
- [ ] technical report.
|
- [ ] technical report.
|
||||||
- [ ] training codes.
|
- [x] training codes.
|
||||||
- [ ] a better model (may take longer).
|
- [ ] a better model (may take longer).
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -2,6 +2,10 @@
|
|||||||
|
|
||||||
We provde the draft training codes here. Unfortunately, data preprocessing code is still being reorganized.
|
We provde the draft training codes here. Unfortunately, data preprocessing code is still being reorganized.
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
We trained our model on an NVIDIA A100 with `batch size=8, gradient_accumulation_steps=4` for 20w+ steps. Using multiple GPUs should accelerate the training.
|
||||||
|
|
||||||
## Data preprocessing
|
## Data preprocessing
|
||||||
You could refer the inference codes which [crop the face images](https://github.com/TMElyralab/MuseTalk/blob/main/scripts/inference.py#L79) and [extract audio features](https://github.com/TMElyralab/MuseTalk/blob/main/scripts/inference.py#L69).
|
You could refer the inference codes which [crop the face images](https://github.com/TMElyralab/MuseTalk/blob/main/scripts/inference.py#L79) and [extract audio features](https://github.com/TMElyralab/MuseTalk/blob/main/scripts/inference.py#L69).
|
||||||
|
|
||||||
@@ -33,3 +37,7 @@ Simply run after preparing the preprocessed data
|
|||||||
```
|
```
|
||||||
sh train.sh
|
sh train.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## TODO
|
||||||
|
- [ ] release data preprocessing codes
|
||||||
|
- [ ] release some novel designs in training (after technical report)
|
||||||
Reference in New Issue
Block a user