From 47283856a363cab672db3c6f330d2e77a12520c5 Mon Sep 17 00:00:00 2001 From: Alphi <52458637+HwwwwwwwH@users.noreply.github.com> Date: Sat, 8 Feb 2025 17:52:37 +0800 Subject: [PATCH] Update vllm example in ReadMe (#819) * Update README.md * Update README_zh.md --- README.md | 104 ++++--------------------------------------------- README_zh.md | 108 ++++++--------------------------------------------- 2 files changed, 19 insertions(+), 193 deletions(-) diff --git a/README.md b/README.md index 337d2f3..a2891d8 100644 --- a/README.md +++ b/README.md @@ -2516,103 +2516,15 @@ See [our fork of ollama](https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/exa
vLLM now officially supports MiniCPM-V 2.6, MiniCPM-Llama3-V 2.5 and MiniCPM-V 2.0. And you can use our fork to run MiniCPM-o 2.6 for now. Click to see. -1. For MiniCPM-o 2.6 - 1. Clone our fork of vLLM: - ```shell - git clone https://github.com/OpenBMB/vllm.git - cd vllm - git checkout minicpmo - ``` - 2. Install vLLM from source: - ```shell - VLLM_USE_PRECOMPILED=1 pip install --editable . - ``` - 3. Run MiniCPM-o 2.6 in the same way as the previous models (shown in the following example). +1. Install vLLM(>=0.7.1): +```shell +pip install vllm +``` -2. For previous MiniCPM-V models - 1. Install vLLM(>=0.5.4): - ```shell - pip install vllm - ``` - 2. Install timm: (optional, MiniCPM-V 2.0 need timm) - ```shell - pip install timm==0.9.10 - ``` - 3. Run the example(for image): - ```python - from transformers import AutoTokenizer - from PIL import Image - from vllm import LLM, SamplingParams - - MODEL_NAME = "openbmb/MiniCPM-V-2_6" - # MODEL_NAME = "openbmb/MiniCPM-o-2_6" - # Also available for previous models - # MODEL_NAME = "openbmb/MiniCPM-Llama3-V-2_5" - # MODEL_NAME = "HwwwH/MiniCPM-V-2" - - image = Image.open("xxx.png").convert("RGB") - tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True) - llm = LLM( - model=MODEL_NAME, - trust_remote_code=True, - gpu_memory_utilization=1, - max_model_len=2048 - ) - - messages = [{ - "role": - "user", - "content": - # Number of images - "(./)" + \ - "\nWhat is the content of this image?" - }] - prompt = tokenizer.apply_chat_template( - messages, - tokenize=False, - add_generation_prompt=True - ) - - # Single Inference - inputs = { - "prompt": prompt, - "multi_modal_data": { - "image": image - # Multi images, the number of images should be equal to that of `(./)` - # "image": [image, image] - }, - } - # Batch Inference - # inputs = [{ - # "prompt": prompt, - # "multi_modal_data": { - # "image": image - # }, - # } for _ in 2] - - - # 2.6 - stop_tokens = ['<|im_end|>', '<|endoftext|>'] - stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens] - # 2.0 - # stop_token_ids = [tokenizer.eos_id] - # 2.5 - # stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id] - - sampling_params = SamplingParams( - stop_token_ids=stop_token_ids, - use_beam_search=True, - temperature=0, - best_of=3, - max_tokens=1024 - ) - - outputs = llm.generate(inputs, sampling_params=sampling_params) - - print(outputs[0].outputs[0].text) - ``` - 4. click [here](https://modelbest.feishu.cn/wiki/C2BWw4ZP0iCDy7kkCPCcX2BHnOf?from=from_copylink) if you want to use it with *video*, or get more details about `vLLM`. -
+2. Run Example: +* [Vision Language](https://docs.vllm.ai/en/latest/getting_started/examples/vision_language.html) +* [Audio Language](https://docs.vllm.ai/en/latest/getting_started/examples/audio_language.html) + ## Fine-tuning diff --git a/README_zh.md b/README_zh.md index f0f4834..058d9c6 100644 --- a/README_zh.md +++ b/README_zh.md @@ -2396,103 +2396,17 @@ llama.cpp 用法请参考[我们的fork llama.cpp](https://github.com/OpenBMB/ll ollama 用法请参考[我们的fork ollama](https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/examples/minicpm-v2.6/README.md), 在iPad上可以支持 16~18 token/s 的流畅推理(测试环境:iPad Pro + M4)。
-点击查看, vLLM 现已官方支持MiniCPM-V 2.6、MiniCPM-Llama3-V 2.5 和 MiniCPM-V 2.0,MiniCPM-o 2.6 模型也可以临时用我们的 fork 仓库运行。 -1. MiniCPM-o 2.6 - 1. 克隆我们的 vLLM fork 仓库: - ```shell - git clone https://github.com/OpenBMB/vllm.git - cd vllm - git checkout minicpmo - ``` - 2. 从源码进行安装: - ```shell - VLLM_USE_PRECOMPILED=1 pip install --editable . - ``` - 3. 用和之前同样的方式运行(下有样例). - -2. 之前版本的 MiniCPM-V - 1. 安装 vLLM(>=0.5.4): - ```shell - pip install vllm - ``` - 3. 安装 timm 库: (可选,MiniCPM-V 2.0需安装) - ```shell - pip install timm=0.9.10 - ``` - 4. 运行示例代码:(注意:如果使用本地路径的模型,请确保模型代码已更新到Hugging Face上的最新版) - ```python - from transformers import AutoTokenizer - from PIL import Image - from vllm import LLM, SamplingParams - - MODEL_NAME = "openbmb/MiniCPM-V-2_6" - # MODEL_NAME = "openbmb/MiniCPM-o-2_6" - # Also available for previous models - # MODEL_NAME = "openbmb/MiniCPM-Llama3-V-2_5" - # MODEL_NAME = "HwwwH/MiniCPM-V-2" - - image = Image.open("xxx.png").convert("RGB") - tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True) - llm = LLM( - model=MODEL_NAME, - trust_remote_code=True, - gpu_memory_utilization=1, - max_model_len=2048 - ) - - messages = [{ - "role": - "user", - "content": - # Number of images - "(./)" + \ - "\nWhat is the content of this image?" - }] - prompt = tokenizer.apply_chat_template( - messages, - tokenize=False, - add_generation_prompt=True - ) - - # Single Inference - inputs = { - "prompt": prompt, - "multi_modal_data": { - "image": image - # Multi images, the number of images should be equal to that of `(./)` - # "image": [image, image] - }, - } - # Batch Inference - # inputs = [{ - # "prompt": prompt, - # "multi_modal_data": { - # "image": image - # }, - # } for _ in 2] - - - # 2.6 - stop_tokens = ['<|im_end|>', '<|endoftext|>'] - stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens] - # 2.0 - # stop_token_ids = [tokenizer.eos_id] - # 2.5 - # stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id] - - sampling_params = SamplingParams( - stop_token_ids=stop_token_ids, - use_beam_search=True, - temperature=0, - best_of=3, - max_tokens=1024 - ) - - outputs = llm.generate(inputs, sampling_params=sampling_params) - - print(outputs[0].outputs[0].text) - ``` - 4. [点击此处](https://modelbest.feishu.cn/wiki/C2BWw4ZP0iCDy7kkCPCcX2BHnOf?from=from_copylink)查看带视频推理和其他有关 `vLLM` 的信息。 +点击查看, vLLM 现已官方支持MiniCPM-o 2.6、MiniCPM-V 2.6、MiniCPM-Llama3-V 2.5 和 MiniCPM-V 2.0。 +1. 安装 vLLM(>=0.7.1): + +```shell +pip install vllm +``` + +2. 运行示例代码:(注意:如果使用本地路径的模型,请确保模型代码已更新到Hugging Face上的最新版) + + * [图文示例](https://docs.vllm.ai/en/latest/getting_started/examples/vision_language.html) + * [音频示例](https://docs.vllm.ai/en/latest/getting_started/examples/audio_language.html)