diff --git a/README.md b/README.md index 3b0bc71..f4373c5 100644 --- a/README.md +++ b/README.md @@ -53,34 +53,34 @@ ## Install -**Clone and install** +### Clone and install - Clone the repo -``` sh -git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git -# If you failed to clone submodule due to network failures, please run following command until success -cd CosyVoice -git submodule update --init --recursive -``` + ``` sh + git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git + # If you failed to clone submodule due to network failures, please run following command until success + cd CosyVoice + git submodule update --init --recursive + ``` - Install Conda: please see https://docs.conda.io/en/latest/miniconda.html - Create Conda env: -``` sh -conda create -n cosyvoice -y python=3.10 -conda activate cosyvoice -# pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platform. -conda install -y -c conda-forge pynini==2.1.5 -pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com + ``` sh + conda create -n cosyvoice -y python=3.10 + conda activate cosyvoice + # pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platform. + conda install -y -c conda-forge pynini==2.1.5 + pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com + + # If you encounter sox compatibility issues + # ubuntu + sudo apt-get install sox libsox-dev + # centos + sudo yum install sox sox-devel + ``` -# If you encounter sox compatibility issues -# ubuntu -sudo apt-get install sox libsox-dev -# centos -sudo yum install sox sox-devel -``` - -**Model download** +### Model download We strongly recommend that you download our pretrained `CosyVoice2-0.5B` `CosyVoice-300M` `CosyVoice-300M-SFT` `CosyVoice-300M-Instruct` model and `CosyVoice-ttsfrd` resource. @@ -115,7 +115,7 @@ pip install ttsfrd_dependency-0.1-py3-none-any.whl pip install ttsfrd-0.4.2-cp310-cp310-linux_x86_64.whl ``` -**Basic Usage** +### Basic Usage We strongly recommend using `CosyVoice2-0.5B` for better performance. Follow code below for detailed usage of each model. @@ -128,7 +128,7 @@ from cosyvoice.utils.file_utils import load_wav import torchaudio ``` -**CosyVoice2 Usage** +#### CosyVoice2 Usage ```python cosyvoice = CosyVoice2('pretrained_models/CosyVoice2-0.5B', load_jit=False, load_trt=False, load_vllm=False, fp16=False) @@ -214,7 +214,7 @@ for i, j in enumerate(cosyvoice.inference_instruct('在面对挑战时,他展 torchaudio.save('instruct_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate) ``` -**Start web demo** +#### Start web demo You can use our web demo page to get familiar with CosyVoice quickly. @@ -225,11 +225,11 @@ Please see the demo website for details. python3 webui.py --port 50000 --model_dir pretrained_models/CosyVoice-300M ``` -**Advanced Usage** +#### Advanced Usage For advanced user, we have provided train and inference scripts in `examples/libritts/cosyvoice/run.sh`. -**Build for deployment** +#### Build for deployment Optionally, if you want service deployment, you can run following steps. diff --git a/requirements.txt b/requirements.txt index 781a8fa..fe48b20 100644 --- a/requirements.txt +++ b/requirements.txt @@ -3,6 +3,8 @@ conformer==0.3.2 deepspeed==0.15.1; sys_platform == 'linux' diffusers==0.29.0 +fastapi==0.115.6 +fastapi-cli==0.0.4 gdown==5.1.0 gradio==5.4.0 grpcio==1.57.0 @@ -34,7 +36,5 @@ torch==2.3.1 torchaudio==2.3.1 transformers==4.40.1 uvicorn==0.30.0 -wget==3.2 -fastapi==0.115.6 -fastapi-cli==0.0.4 WeTextProcessing==1.0.3 +wget==3.2