fix: windows infer

This commit is contained in:
zzzweakman
2025-04-10 13:54:16 +08:00
parent 36163fccbd
commit 0702078902
10 changed files with 281 additions and 68 deletions

4
.gitignore vendored
View File

@@ -5,11 +5,13 @@
*.pyc *.pyc
.ipynb_checkpoints .ipynb_checkpoints
results/ results/
./models models/
**/__pycache__/ **/__pycache__/
*.py[cod] *.py[cod]
*$py.class *$py.class
dataset/ dataset/
ffmpeg* ffmpeg*
ffmprobe*
ffplay*
debug debug
exp_out exp_out

148
README.md
View File

@@ -146,15 +146,36 @@ We also hope you note that we have not verified, maintained, or updated third-pa
## Installation ## Installation
To prepare the Python environment and install additional packages such as opencv, diffusers, mmcv, etc., please follow the steps below: To prepare the Python environment and install additional packages such as opencv, diffusers, mmcv, etc., please follow the steps below:
### Build environment
We recommend a python version >=3.10 and cuda version =11.7. Then build environment as follows: ### Build environment
We recommend Python 3.10 and CUDA 11.7. Set up your environment as follows:
```shell
conda create -n MuseTalk python==3.10
conda activate MuseTalk
```
### Install PyTorch 2.0.1
Choose one of the following installation methods:
```shell
# Option 1: Using pip
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
# Option 2: Using conda
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
```
### Install Dependencies
Install the remaining required packages:
```shell ```shell
pip install -r requirements.txt pip install -r requirements.txt
``` ```
### mmlab packages ### Install MMLab Packages
Install the MMLab ecosystem packages:
```bash ```bash
pip install --no-cache-dir -U openmim pip install --no-cache-dir -U openmim
mim install mmengine mim install mmengine
@@ -163,33 +184,52 @@ mim install "mmdet>=3.1.0"
mim install "mmpose>=1.1.0" mim install "mmpose>=1.1.0"
``` ```
### Download ffmpeg-static ### Setup FFmpeg
Download the ffmpeg-static and 1. [Download](https://github.com/BtbN/FFmpeg-Builds/releases) the ffmpeg-static package
```
2. Configure FFmpeg based on your operating system:
For Linux:
```bash
export FFMPEG_PATH=/path/to/ffmpeg export FFMPEG_PATH=/path/to/ffmpeg
``` # Example:
for example:
```
export FFMPEG_PATH=/musetalk/ffmpeg-4.4-amd64-static export FFMPEG_PATH=/musetalk/ffmpeg-4.4-amd64-static
``` ```
### Download weights
You can download weights manually as follows:
1. Download our trained [weights](https://huggingface.co/TMElyralab/MuseTalk). For Windows:
Add the `ffmpeg-xxx\bin` directory to your system's PATH environment variable. Verify the installation by running `ffmpeg -version` in the command prompt - it should display the ffmpeg version information.
### Download weights
You can download weights in two ways:
#### Option 1: Using Download Scripts
We provide two scripts for automatic downloading:
For Linux:
```bash ```bash
# !pip install -U "huggingface_hub[cli]" # Make the script executable
export HF_ENDPOINT=https://hf-mirror.com chmod +x download_weights.sh
huggingface-cli download TMElyralab/MuseTalk --local-dir models/ # Run the script
./download_weights.sh
``` ```
For Windows:
```batch
# Run the script
download_weights.bat
```
#### Option 2: Manual Download
You can also download the weights manually from the following links:
1. Download our trained [weights](https://huggingface.co/TMElyralab/MuseTalk/tree/main)
2. Download the weights of other components: 2. Download the weights of other components:
- [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse) - [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse/tree/main)
- [whisper](https://huggingface.co/openai/whisper-tiny/tree/main) - [whisper](https://huggingface.co/openai/whisper-tiny/tree/main)
- [dwpose](https://huggingface.co/yzd-v/DWPose/tree/main) - [dwpose](https://huggingface.co/yzd-v/DWPose/tree/main)
- [syncnet](https://huggingface.co/ByteDance/LatentSync/tree/main)
- [face-parse-bisent](https://github.com/zllrunning/face-parsing.PyTorch) - [face-parse-bisent](https://github.com/zllrunning/face-parsing.PyTorch)
- [resnet18](https://download.pytorch.org/models/resnet18-5c106cde.pth) - [resnet18](https://download.pytorch.org/models/resnet18-5c106cde.pth)
- [syncnet](https://huggingface.co/ByteDance/LatentSync/tree/main)
Finally, these weights should be organized in `models` as follows: Finally, these weights should be organized in `models` as follows:
``` ```
@@ -207,7 +247,7 @@ Finally, these weights should be organized in `models` as follows:
├── face-parse-bisent ├── face-parse-bisent
│ ├── 79999_iter.pth │ ├── 79999_iter.pth
│ └── resnet18-5c106cde.pth │ └── resnet18-5c106cde.pth
├── sd-vae-ft-mse ├── sd-vae
│ ├── config.json │ ├── config.json
│ └── diffusion_pytorch_model.bin │ └── diffusion_pytorch_model.bin
└── whisper └── whisper
@@ -221,21 +261,60 @@ Finally, these weights should be organized in `models` as follows:
### Inference ### Inference
We provide inference scripts for both versions of MuseTalk: We provide inference scripts for both versions of MuseTalk:
#### MuseTalk 1.5 (Recommended) #### Prerequisites
Before running inference, please ensure ffmpeg is installed and accessible:
```bash ```bash
# Run MuseTalk 1.5 inference # Check ffmpeg installation
sh inference.sh v1.5 normal ffmpeg -version
``` ```
If ffmpeg is not found, please install it first:
- Windows: Download from [ffmpeg-static](https://github.com/BtbN/FFmpeg-Builds/releases) and add to PATH
- Linux: `sudo apt-get install ffmpeg`
#### MuseTalk 1.0 #### Normal Inference
##### Linux Environment
```bash ```bash
# Run MuseTalk 1.0 inference # MuseTalk 1.5 (Recommended)
sh inference.sh v1.5 normal
# MuseTalk 1.0
sh inference.sh v1.0 normal sh inference.sh v1.0 normal
``` ```
The inference script supports both MuseTalk 1.5 and 1.0 models: ##### Windows Environment
- For MuseTalk 1.5: Use the command above with the V1.5 model path
- For MuseTalk 1.0: Use the same script but point to the V1.0 model path Please ensure that you set the `ffmpeg_path` to match the actual location of your FFmpeg installation.
```bash
# MuseTalk 1.5 (Recommended)
python -m scripts.inference --inference_config configs\inference\test.yaml --result_dir results\test --unet_model_path models\musetalkV15\unet.pth --unet_config models\musetalkV15\musetalk.json --version v15 --ffmpeg_path ffmpeg-master-latest-win64-gpl-shared\bin
# For MuseTalk 1.0, change:
# - models\musetalkV15 -> models\musetalk
# - unet.pth -> pytorch_model.bin
# - --version v15 -> --version v1
```
#### Real-time Inference
##### Linux Environment
```bash
# MuseTalk 1.5 (Recommended)
sh inference.sh v1.5 realtime
# MuseTalk 1.0
sh inference.sh v1.0 realtime
```
##### Windows Environment
```bash
# MuseTalk 1.5 (Recommended)
python -m scripts.realtime_inference --inference_config configs\inference\realtime.yaml --result_dir results\realtime --unet_model_path models\musetalkV15\unet.pth --unet_config models\musetalkV15\musetalk.json --version v15 --fps 25
# For MuseTalk 1.0, change:
# - models\musetalkV15 -> models\musetalk
# - unet.pth -> pytorch_model.bin
# - --version v15 -> --version v1
```
The configuration file `configs/inference/test.yaml` contains the inference settings, including: The configuration file `configs/inference/test.yaml` contains the inference settings, including:
- `video_path`: Path to the input video, image file, or directory of images - `video_path`: Path to the input video, image file, or directory of images
@@ -243,21 +322,6 @@ The configuration file `configs/inference/test.yaml` contains the inference sett
Note: For optimal results, we recommend using input videos with 25fps, which is the same fps used during model training. If your video has a lower frame rate, you can use frame interpolation or convert it to 25fps using ffmpeg. Note: For optimal results, we recommend using input videos with 25fps, which is the same fps used during model training. If your video has a lower frame rate, you can use frame interpolation or convert it to 25fps using ffmpeg.
#### Real-time Inference
For real-time inference, use the following command:
```bash
# Run real-time inference
sh inference.sh v1.5 realtime # For MuseTalk 1.5
# or
sh inference.sh v1.0 realtime # For MuseTalk 1.0
```
The real-time inference configuration is in `configs/inference/realtime.yaml`, which includes:
- `preparation`: Set to `True` for new avatar preparation
- `video_path`: Path to the input video
- `bbox_shift`: Adjustable parameter for mouth region control
- `audio_clips`: List of audio clips for generation
Important notes for real-time inference: Important notes for real-time inference:
1. Set `preparation` to `True` when processing a new avatar 1. Set `preparation` to `True` when processing a new avatar
2. After preparation, the avatar will generate videos using audio clips from `audio_clips` 2. After preparation, the avatar will generate videos using audio clips from `audio_clips`

45
download_weights.bat Normal file
View File

@@ -0,0 +1,45 @@
@echo off
setlocal
:: Set the checkpoints directory
set CheckpointsDir=models
:: Create necessary directories
mkdir %CheckpointsDir%\musetalk
mkdir %CheckpointsDir%\musetalkV15
mkdir %CheckpointsDir%\syncnet
mkdir %CheckpointsDir%\dwpose
mkdir %CheckpointsDir%\face-parse-bisent
mkdir %CheckpointsDir%\sd-vae-ft-mse
mkdir %CheckpointsDir%\whisper
:: Install required packages
pip install -U "huggingface_hub[cli]"
pip install gdown
:: Set HuggingFace endpoint
set HF_ENDPOINT=https://hf-mirror.com
:: Download MuseTalk weights
huggingface-cli download TMElyralab/MuseTalk --local-dir %CheckpointsDir%
:: Download SD VAE weights
huggingface-cli download stabilityai/sd-vae-ft-mse --local-dir %CheckpointsDir%\sd-vae --include "config.json" "diffusion_pytorch_model.bin"
:: Download Whisper weights
huggingface-cli download openai/whisper-tiny --local-dir %CheckpointsDir%\whisper --include "config.json" "pytorch_model.bin" "preprocessor_config.json"
:: Download DWPose weights
huggingface-cli download yzd-v/DWPose --local-dir %CheckpointsDir%\dwpose --include "dw-ll_ucoco_384.pth"
:: Download SyncNet weights
huggingface-cli download ByteDance/LatentSync --local-dir %CheckpointsDir%\syncnet --include "latentsync_syncnet.pt"
:: Download Face Parse Bisent weights (using gdown)
gdown --id 154JgKpzCPW82qINcVieuPH3fZ2e0P812 -O %CheckpointsDir%\face-parse-bisent\79999_iter.pth
:: Download ResNet weights
curl -L https://download.pytorch.org/models/resnet18-5c106cde.pth -o %CheckpointsDir%\face-parse-bisent\resnet18-5c106cde.pth
echo All weights have been downloaded successfully!
endlocal

37
download_weights.sh Normal file
View File

@@ -0,0 +1,37 @@
#!/bin/bash
# Set the checkpoints directory
CheckpointsDir="models"
# Create necessary directories
mkdir -p $CheckpointsDir/{musetalk,musetalkV15,syncnet,dwpose,face-parse-bisent,sd-vae-ft-mse,whisper}
# Install required packages
pip install -U "huggingface_hub[cli]"
pip install gdown
# Set HuggingFace endpoint
export HF_ENDPOINT=https://hf-mirror.com
# Download MuseTalk weights
huggingface-cli download TMElyralab/MuseTalk --local-dir $CheckpointsDir
# Download SD VAE weights
huggingface-cli download stabilityai/sd-vae-ft-mse --local-dir $CheckpointsDir/sd-vae --include "config.json" "diffusion_pytorch_model.bin"
# Download Whisper weights
huggingface-cli download openai/whisper-tiny --local-dir $CheckpointsDir/whisper --include "config.json" "pytorch_model.bin" "preprocessor_config.json"
# Download DWPose weights
huggingface-cli download yzd-v/DWPose --local-dir $CheckpointsDir/dwpose --include "dw-ll_ucoco_384.pth"
# Download SyncNet weights
huggingface-cli download ByteDance/LatentSync --local-dir $CheckpointsDir/syncnet --include "latentsync_syncnet.pt"
# Download Face Parse Bisent weights (using gdown)
gdown --id 154JgKpzCPW82qINcVieuPH3fZ2e0P812 -O $CheckpointsDir/face-parse-bisent/79999_iter.pth
# Download ResNet weights
curl -L https://download.pytorch.org/models/resnet18-5c106cde.pth -o $CheckpointsDir/face-parse-bisent/resnet18-5c106cde.pth
echo "All weights have been downloaded successfully!"

View File

@@ -8,26 +8,18 @@ from einops import rearrange
import shutil import shutil
import os.path as osp import os.path as osp
ffmpeg_path = os.getenv('FFMPEG_PATH')
if ffmpeg_path is None:
print("please download ffmpeg-static and export to FFMPEG_PATH. \nFor example: export FFMPEG_PATH=/musetalk/ffmpeg-4.4-amd64-static")
elif ffmpeg_path not in os.getenv('PATH'):
print("add ffmpeg to path")
os.environ["PATH"] = f"{ffmpeg_path}:{os.environ['PATH']}"
from musetalk.models.vae import VAE from musetalk.models.vae import VAE
from musetalk.models.unet import UNet,PositionalEncoding from musetalk.models.unet import UNet,PositionalEncoding
def load_all_model( def load_all_model(
unet_model_path="./models/musetalk/pytorch_model.bin", unet_model_path=os.path.join("models", "musetalk", "pytorch_model.bin"),
vae_type="sd-vae-ft-mse", vae_type="sd-vae",
unet_config="./models/musetalk/musetalk.json", unet_config=os.path.join("models", "musetalk", "musetalk.json"),
device=None, device=None,
): ):
vae = VAE( vae = VAE(
model_path = f"./models/{vae_type}/", model_path = os.path.join("models", vae_type),
) )
print(f"load unet model from {unet_model_path}") print(f"load unet model from {unet_model_path}")
unet = UNet( unet = UNet(

View File

@@ -1,8 +1,4 @@
--extra-index-url https://download.pytorch.org/whl/cu118 diffusers==0.30.2
torch==2.0.1
torchvision==0.15.2
torchaudio==2.0.2
diffusers==0.27.2
accelerate==0.28.0 accelerate==0.28.0
tensorflow==2.12.0 tensorflow==2.12.0
tensorboard==2.12.0 tensorboard==2.12.0
@@ -10,13 +6,15 @@ opencv-python==4.9.0.80
soundfile==0.12.1 soundfile==0.12.1
transformers==4.39.2 transformers==4.39.2
huggingface_hub==0.25.0 huggingface_hub==0.25.0
librosa==0.11.0
numpy==1.24.4
einops==0.8.1
gdown gdown
requests requests
imageio[ffmpeg] imageio[ffmpeg]
gradio
omegaconf omegaconf
ffmpeg-python ffmpeg-python
gradio
spaces
moviepy moviepy

View File

@@ -8,9 +8,11 @@ import shutil
import pickle import pickle
import argparse import argparse
import numpy as np import numpy as np
import subprocess
from tqdm import tqdm from tqdm import tqdm
from omegaconf import OmegaConf from omegaconf import OmegaConf
from transformers import WhisperModel from transformers import WhisperModel
import sys
from musetalk.utils.blending import get_image from musetalk.utils.blending import get_image
from musetalk.utils.face_parsing import FaceParsing from musetalk.utils.face_parsing import FaceParsing
@@ -18,16 +20,26 @@ from musetalk.utils.audio_processor import AudioProcessor
from musetalk.utils.utils import get_file_type, get_video_fps, datagen, load_all_model from musetalk.utils.utils import get_file_type, get_video_fps, datagen, load_all_model
from musetalk.utils.preprocessing import get_landmark_and_bbox, read_imgs, coord_placeholder from musetalk.utils.preprocessing import get_landmark_and_bbox, read_imgs, coord_placeholder
def fast_check_ffmpeg():
try:
subprocess.run(["ffmpeg", "-version"], capture_output=True, check=True)
return True
except:
return False
@torch.no_grad() @torch.no_grad()
def main(args): def main(args):
# Configure ffmpeg path # Configure ffmpeg path
if args.ffmpeg_path not in os.getenv('PATH'): if not fast_check_ffmpeg():
print("Adding ffmpeg to PATH") print("Adding ffmpeg to PATH")
os.environ["PATH"] = f"{args.ffmpeg_path}:{os.environ['PATH']}" # Choose path separator based on operating system
path_separator = ';' if sys.platform == 'win32' else ':'
os.environ["PATH"] = f"{args.ffmpeg_path}{path_separator}{os.environ['PATH']}"
if not fast_check_ffmpeg():
print("Warning: Unable to find ffmpeg, please ensure ffmpeg is properly installed")
# Set computing device # Set computing device
device = torch.device(f"cuda:{args.gpu_id}" if torch.cuda.is_available() else "cpu") device = torch.device(f"cuda:{args.gpu_id}" if torch.cuda.is_available() else "cpu")
# Load model weights # Load model weights
vae, unet, pe = load_all_model( vae, unet, pe = load_all_model(
unet_model_path=args.unet_model_path, unet_model_path=args.unet_model_path,

View File

@@ -12,11 +12,23 @@ from mmpose.structures import merge_data_samples
import torch import torch
import numpy as np import numpy as np
from tqdm import tqdm from tqdm import tqdm
import sys
def fast_check_ffmpeg():
try:
subprocess.run(["ffmpeg", "-version"], capture_output=True, check=True)
return True
except:
return False
ffmpeg_path = "./ffmpeg-4.4-amd64-static/" ffmpeg_path = "./ffmpeg-4.4-amd64-static/"
if ffmpeg_path not in os.getenv('PATH'): if not fast_check_ffmpeg():
print("add ffmpeg to path") print("Adding ffmpeg to PATH")
os.environ["PATH"] = f"{ffmpeg_path}:{os.environ['PATH']}" # Choose path separator based on operating system
path_separator = ';' if sys.platform == 'win32' else ':'
os.environ["PATH"] = f"{args.ffmpeg_path}{path_separator}{os.environ['PATH']}"
if not fast_check_ffmpeg():
print("Warning: Unable to find ffmpeg, please ensure ffmpeg is properly installed")
class AnalyzeFace: class AnalyzeFace:
def __init__(self, device: Union[str, torch.device], config_file: str, checkpoint_file: str): def __init__(self, device: Union[str, torch.device], config_file: str, checkpoint_file: str):

View File

@@ -23,6 +23,15 @@ import shutil
import threading import threading
import queue import queue
import time import time
import subprocess
def fast_check_ffmpeg():
try:
subprocess.run(["ffmpeg", "-version"], capture_output=True, check=True)
return True
except:
return False
def video2imgs(vid_path, save_path, ext='.png', cut_frame=10000000): def video2imgs(vid_path, save_path, ext='.png', cut_frame=10000000):
@@ -332,6 +341,15 @@ if __name__ == "__main__":
args = parser.parse_args() args = parser.parse_args()
# Configure ffmpeg path
if not fast_check_ffmpeg():
print("Adding ffmpeg to PATH")
# Choose path separator based on operating system
path_separator = ';' if sys.platform == 'win32' else ':'
os.environ["PATH"] = f"{args.ffmpeg_path}{path_separator}{os.environ['PATH']}"
if not fast_check_ffmpeg():
print("Warning: Unable to find ffmpeg, please ensure ffmpeg is properly installed")
# Set computing device # Set computing device
device = torch.device(f"cuda:{args.gpu_id}" if torch.cuda.is_available() else "cpu") device = torch.device(f"cuda:{args.gpu_id}" if torch.cuda.is_available() else "cpu")

33
test_ffmpeg.py Normal file
View File

@@ -0,0 +1,33 @@
import os
import subprocess
import sys
def test_ffmpeg(ffmpeg_path):
print(f"Testing ffmpeg path: {ffmpeg_path}")
# Choose path separator based on operating system
path_separator = ';' if sys.platform == 'win32' else ':'
# Add ffmpeg path to environment variable
os.environ["PATH"] = f"{ffmpeg_path}{path_separator}{os.environ['PATH']}"
try:
# Try to run ffmpeg
result = subprocess.run(["ffmpeg", "-version"], capture_output=True, text=True)
print("FFmpeg test successful!")
print("FFmpeg version information:")
print(result.stdout)
return True
except Exception as e:
print("FFmpeg test failed!")
print(f"Error message: {str(e)}")
return False
if __name__ == "__main__":
# Default ffmpeg path, can be modified as needed
default_path = r"ffmpeg-master-latest-win64-gpl-shared\bin"
# Use command line argument if provided, otherwise use default path
ffmpeg_path = sys.argv[1] if len(sys.argv) > 1 else default_path
test_ffmpeg(ffmpeg_path)