update readme; demo model server add args for model path

2026-02-04 09:49:20 +08:00 · 2025-01-15 17:07:49 +08:00
parent b178622f73
commit 40a54bb0e3
3 changed files with 20 additions and 11 deletions
--- a/README.md
+++ b/README.md
@@ -131,7 +131,7 @@ Advancing popular visual capabilites from MiniCPM-V series, MiniCPM-o 2.6 can pr
  In addition to its friendly size, MiniCPM-o 2.6 also shows **state-of-the-art token density** (i.e., number of pixels encoded into each visual token). **It produces only 640 tokens when processing a 1.8M pixel image, which is 75% fewer than most models**. This directly improves the inference speed, first-token latency, memory usage, and power consumption. As a result, MiniCPM-o 2.6 can efficiently support **multimodal live streaming** on end-side devices such as iPad.

 -  💫  **Easy Usage.**
-MiniCPM-o 2.6 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-omni/examples/llava/README-minicpmo2.6.md) support for efficient CPU inference on local devices, (2) [int4](https://huggingface.co/openbmb/MiniCPM-o-2_6-int4) and [GGUF](https://huggingface.co/openbmb/MiniCPM-o-2_6-gguf) format quantized models in 16 sizes, (3) [vLLM](#efficient-inference-with-llamacpp-ollama-vllm) support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks with [LLaMA-Factory](./docs/llamafactory_train.md), (5) quick local WebUI demo setup with [Gradio](#chat-with-our-demo-on-gradio), and (6) online web demo on [server](https://minicpm-omni-webdemo-us.modelbest.cn/).
+MiniCPM-o 2.6 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-omni/examples/llava/README-minicpmo2.6.md) support for efficient CPU inference on local devices, (2) [int4](https://huggingface.co/openbmb/MiniCPM-o-2_6-int4) and [GGUF](https://huggingface.co/openbmb/MiniCPM-o-2_6-gguf) format quantized models in 16 sizes, (3) [vLLM](#efficient-inference-with-llamacpp-ollama-vllm) support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks with [LLaMA-Factory](./docs/llamafactory_train.md), (5) quick [local WebUI demo](#chat-with-our-demo-on-gradio), and (6) online web demo on [server](https://minicpm-omni-webdemo-us.modelbest.cn/).


 **Model Architecture.**
@@ -1811,7 +1811,7 @@ Click here to try out the online demo of [MiniCPM-o 2.6](https://minicpm-omni-we

 ### Local WebUI Demo <!-- omit in toc --> 
  
-You can easily build your own local WebUI demo using the following commands.
+You can easily build your own local WebUI demo using the following commands, experience real-time streaming voice/video call.

 1. launch model server:
 ```shell
--- a/README_zh.md
+++ b/README_zh.md
@@ -1797,7 +1797,7 @@ MiniCPM-o 2.6 可以通过多种方式轻松使用：(1) [llama.cpp](https://git

 ### 本地 WebUI Demo <!-- omit in toc --> 

-您可以使用以下命令轻松构建自己的本地 WebUI Demo。
+您可以使用以下命令轻松构建自己的本地 WebUI Demo, 体验实时流式视频/语音通话。

 1. 启动model server:
 ```shell
@@ -2358,16 +2358,24 @@ MiniCPM-V 2.0 可运行在Android手机上，点击[MiniCPM-V 2.0](https://githu

 ### 本地WebUI Demo部署
 <details>
-<summary>点击查看本地WebUI demo 在 NVIDIA GPU、Mac等不同设备部署方法 </summary>
-  
+<summary>点击查看本地WebUI demo部署方法, 体验实时流式视频/语音通话 </summary>
+
+1. 启动model server:
 ```shell
-pip install -r requirements.txt
+pip install -r requirements_o2.6.txt
+
+python web_demos/minicpm-o_2.6/model_server.py
+```
+
+2. 启动web server:
+```shell
+# Make sure Node and PNPM is installed.
+cd web_demos/minicpm-o_2.6/web_server
+pnpm install  # install requirements
+
+pnpm run dev  # start server
 ```
  
-```shell
-# For NVIDIA GPUs, run:
-python web_demo_2.6.py --device cuda
-```
 </details>

 ### 基于 llama.cpp、ollama、vLLM 的高效推理
--- a/web_demos/minicpm-o_2.6/model_server.py
+++ b/web_demos/minicpm-o_2.6/model_server.py
@@ -55,6 +55,7 @@ logger = setup_logger()

 ap = argparse.ArgumentParser()
 ap.add_argument('--port', type=int , default=32550)
+ap.add_argument('--model', type=str , default="openbmb/MiniCPM-o-2_6", help="huggingface model name or local path")
 args = ap.parse_args()


@@ -89,7 +90,7 @@ class StreamManager:
        self.target_dtype = torch.bfloat16
        self.device='cuda:0'
        
-        self.minicpmo_model_path = "openbmb/MiniCPM-o-2_6"
+        self.minicpmo_model_path = args.model #"openbmb/MiniCPM-o-2_6"
        self.model_version = "2.6"
        with torch.no_grad():
            self.minicpmo_model = AutoModel.from_pretrained(self.minicpmo_model_path, trust_remote_code=True, torch_dtype=self.target_dtype, attn_implementation='sdpa')