From 47283856a363cab672db3c6f330d2e77a12520c5 Mon Sep 17 00:00:00 2001
From: Alphi <52458637+HwwwwwwwH@users.noreply.github.com>
Date: Sat, 8 Feb 2025 17:52:37 +0800
Subject: [PATCH] Update vllm example in ReadMe (#819)

* Update README.md

* Update README_zh.md
---
 README.md    | 104 ++++---------------------------------------------
 README_zh.md | 108 ++++++---------------------------------------------
 2 files changed, 19 insertions(+), 193 deletions(-)
diff --git a/README.md b/README.md
index 337d2f3..a2891d8 100644
--- a/README.md
+++ b/README.md
@@ -2516,103 +2516,15 @@ See [our fork of ollama](https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/exa
 <details>
 <summary> vLLM now officially supports MiniCPM-V 2.6, MiniCPM-Llama3-V 2.5 and MiniCPM-V 2.0. And you can use our fork to run MiniCPM-o 2.6 for now. Click to see. </summary>
 
-1. For MiniCPM-o 2.6
-   1. Clone our fork of vLLM:
-   ```shell
-   git clone https://github.com/OpenBMB/vllm.git
-   cd vllm
-   git checkout minicpmo
-   ```
-   2. Install vLLM from source:
-   ```shell
-   VLLM_USE_PRECOMPILED=1 pip install --editable . 
-   ```
-   3. Run MiniCPM-o 2.6 in the same way as the previous models (shown in the following example).
+1. Install vLLM(>=0.7.1):
+```shell
+pip install vllm
+```
 
-2. For previous MiniCPM-V models
-    1. Install vLLM(>=0.5.4):
-    ```shell
-    pip install vllm
-    ```
-    2. Install timm: (optional, MiniCPM-V 2.0 need timm)
-    ```shell
-    pip install timm==0.9.10
-    ```
-    3. Run the example(for image):
-    ```python
-    from transformers import AutoTokenizer
-    from PIL import Image
-    from vllm import LLM, SamplingParams
-
-    MODEL_NAME = "openbmb/MiniCPM-V-2_6"
-    # MODEL_NAME = "openbmb/MiniCPM-o-2_6"
-    # Also available for previous models
-    # MODEL_NAME = "openbmb/MiniCPM-Llama3-V-2_5"
-    # MODEL_NAME = "HwwwH/MiniCPM-V-2"
-
-    image = Image.open("xxx.png").convert("RGB")
-    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
-    llm = LLM(
-        model=MODEL_NAME,
-        trust_remote_code=True,
-        gpu_memory_utilization=1,
-        max_model_len=2048
-    )
-
-    messages = [{
-        "role":
-        "user",
-        "content":
-        # Number of images
-        "(<image>./</image>)" + \
-        "\nWhat is the content of this image?" 
-    }]
-    prompt = tokenizer.apply_chat_template(
-        messages,
-        tokenize=False,
-        add_generation_prompt=True
-    )
-
-    # Single Inference
-    inputs = {
-        "prompt": prompt,
-        "multi_modal_data": {
-            "image": image
-            # Multi images, the number of images should be equal to that of `(<image>./</image>)`
-            # "image": [image, image] 
-        },
-    }
-    # Batch Inference
-    # inputs = [{
-    #     "prompt": prompt,
-    #     "multi_modal_data": {
-    #         "image": image
-    #     },
-    # } for _ in 2]
-
-
-    # 2.6
-    stop_tokens = ['<|im_end|>', '<|endoftext|>']
-    stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]
-    # 2.0
-    # stop_token_ids = [tokenizer.eos_id]
-    # 2.5
-    # stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]
-
-    sampling_params = SamplingParams(
-        stop_token_ids=stop_token_ids, 
-        use_beam_search=True,
-        temperature=0, 
-        best_of=3,
-        max_tokens=1024
-    )
-
-    outputs = llm.generate(inputs, sampling_params=sampling_params)
-
-    print(outputs[0].outputs[0].text)
-    ```
-    4. click [here](https://modelbest.feishu.cn/wiki/C2BWw4ZP0iCDy7kkCPCcX2BHnOf?from=from_copylink) if you want to use it with *video*, or get more details about `vLLM`.
-    </details>
+2. Run Example:
+* [Vision Language](https://docs.vllm.ai/en/latest/getting_started/examples/vision_language.html) 
+* [Audio Language](https://docs.vllm.ai/en/latest/getting_started/examples/audio_language.html) 
+  </details>
 
 ## Fine-tuning
 
diff --git a/README_zh.md b/README_zh.md
index f0f4834..058d9c6 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -2396,103 +2396,17 @@ llama.cpp 用法请参考[我们的fork llama.cpp](https://github.com/OpenBMB/ll
 ollama 用法请参考[我们的fork ollama](https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/examples/minicpm-v2.6/README.md)， 在iPad上可以支持 16~18 token/s 的流畅推理（测试环境：iPad Pro + M4）。
 
 <details>
-<summary>点击查看, vLLM 现已官方支持MiniCPM-V 2.6、MiniCPM-Llama3-V 2.5 和 MiniCPM-V 2.0，MiniCPM-o 2.6 模型也可以临时用我们的 fork 仓库运行。  </summary>
-1. MiniCPM-o 2.6
-   1. 克隆我们的 vLLM fork 仓库:
-   ```shell
-   git clone https://github.com/OpenBMB/vllm.git
-   cd vllm
-   git checkout minicpmo
-   ```
-   2. 从源码进行安装:
-   ```shell
-   VLLM_USE_PRECOMPILED=1 pip install --editable . 
-   ```
-   3. 用和之前同样的方式运行（下有样例）.
-
-2. 之前版本的 MiniCPM-V
-    1. 安装 vLLM(>=0.5.4):
-    ```shell
-    pip install vllm
-    ```
-    3. 安装 timm 库: （可选，MiniCPM-V 2.0需安装）
-    ```shell
-    pip install timm=0.9.10
-    ```
-    4. 运行示例代码:（注意：如果使用本地路径的模型，请确保模型代码已更新到Hugging Face上的最新版)
-    ```python
-    from transformers import AutoTokenizer
-    from PIL import Image
-    from vllm import LLM, SamplingParams
-
-    MODEL_NAME = "openbmb/MiniCPM-V-2_6"
-    # MODEL_NAME = "openbmb/MiniCPM-o-2_6"
-    # Also available for previous models
-    # MODEL_NAME = "openbmb/MiniCPM-Llama3-V-2_5"
-    # MODEL_NAME = "HwwwH/MiniCPM-V-2"
-
-    image = Image.open("xxx.png").convert("RGB")
-    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
-    llm = LLM(
-        model=MODEL_NAME,
-        trust_remote_code=True,
-        gpu_memory_utilization=1,
-        max_model_len=2048
-    )
-
-    messages = [{
-        "role":
-        "user",
-        "content":
-        # Number of images
-        "(<image>./</image>)" + \
-        "\nWhat is the content of this image?" 
-    }]
-    prompt = tokenizer.apply_chat_template(
-        messages,
-        tokenize=False,
-        add_generation_prompt=True
-    )
-
-    # Single Inference
-    inputs = {
-        "prompt": prompt,
-        "multi_modal_data": {
-            "image": image
-            # Multi images, the number of images should be equal to that of `(<image>./</image>)`
-            # "image": [image, image] 
-        },
-    }
-    # Batch Inference
-    # inputs = [{
-    #     "prompt": prompt,
-    #     "multi_modal_data": {
-    #         "image": image
-    #     },
-    # } for _ in 2]
-
-
-    # 2.6
-    stop_tokens = ['<|im_end|>', '<|endoftext|>']
-    stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]
-    # 2.0
-    # stop_token_ids = [tokenizer.eos_id]
-    # 2.5
-    # stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]
-
-    sampling_params = SamplingParams(
-        stop_token_ids=stop_token_ids, 
-        use_beam_search=True,
-        temperature=0, 
-        best_of=3,
-        max_tokens=1024
-    )
-
-    outputs = llm.generate(inputs, sampling_params=sampling_params)
-
-    print(outputs[0].outputs[0].text)
-    ```
-    4. [点击此处](https://modelbest.feishu.cn/wiki/C2BWw4ZP0iCDy7kkCPCcX2BHnOf?from=from_copylink)查看带视频推理和其他有关 `vLLM` 的信息。
+<summary>点击查看, vLLM 现已官方支持MiniCPM-o 2.6、MiniCPM-V 2.6、MiniCPM-Llama3-V 2.5 和 MiniCPM-V 2.0。  </summary>
+1. 安装 vLLM(>=0.7.1):
+  
+```shell
+pip install vllm
+```
+  
+2. 运行示例代码:（注意：如果使用本地路径的模型，请确保模型代码已更新到Hugging Face上的最新版)
+  
+  * [图文示例](https://docs.vllm.ai/en/latest/getting_started/examples/vision_language.html) 
+  * [音频示例](https://docs.vllm.ai/en/latest/getting_started/examples/audio_language.html) 
 
 </details>