update readme; demo model server add args for model path

This commit is contained in:
Hongji Zhu
2025-01-15 17:07:49 +08:00
parent b178622f73
commit 40a54bb0e3
3 changed files with 20 additions and 11 deletions

View File

@@ -131,7 +131,7 @@ Advancing popular visual capabilites from MiniCPM-V series, MiniCPM-o 2.6 can pr
In addition to its friendly size, MiniCPM-o 2.6 also shows **state-of-the-art token density** (i.e., number of pixels encoded into each visual token). **It produces only 640 tokens when processing a 1.8M pixel image, which is 75% fewer than most models**. This directly improves the inference speed, first-token latency, memory usage, and power consumption. As a result, MiniCPM-o 2.6 can efficiently support **multimodal live streaming** on end-side devices such as iPad.
- 💫 **Easy Usage.**
MiniCPM-o 2.6 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-omni/examples/llava/README-minicpmo2.6.md) support for efficient CPU inference on local devices, (2) [int4](https://huggingface.co/openbmb/MiniCPM-o-2_6-int4) and [GGUF](https://huggingface.co/openbmb/MiniCPM-o-2_6-gguf) format quantized models in 16 sizes, (3) [vLLM](#efficient-inference-with-llamacpp-ollama-vllm) support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks with [LLaMA-Factory](./docs/llamafactory_train.md), (5) quick local WebUI demo setup with [Gradio](#chat-with-our-demo-on-gradio), and (6) online web demo on [server](https://minicpm-omni-webdemo-us.modelbest.cn/).
MiniCPM-o 2.6 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-omni/examples/llava/README-minicpmo2.6.md) support for efficient CPU inference on local devices, (2) [int4](https://huggingface.co/openbmb/MiniCPM-o-2_6-int4) and [GGUF](https://huggingface.co/openbmb/MiniCPM-o-2_6-gguf) format quantized models in 16 sizes, (3) [vLLM](#efficient-inference-with-llamacpp-ollama-vllm) support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks with [LLaMA-Factory](./docs/llamafactory_train.md), (5) quick [local WebUI demo](#chat-with-our-demo-on-gradio), and (6) online web demo on [server](https://minicpm-omni-webdemo-us.modelbest.cn/).
**Model Architecture.**
@@ -1811,7 +1811,7 @@ Click here to try out the online demo of [MiniCPM-o 2.6](https://minicpm-omni-we
### Local WebUI Demo <!-- omit in toc -->
You can easily build your own local WebUI demo using the following commands.
You can easily build your own local WebUI demo using the following commands, experience real-time streaming voice/video call.
1. launch model server:
```shell

View File

@@ -1797,7 +1797,7 @@ MiniCPM-o 2.6 可以通过多种方式轻松使用:(1) [llama.cpp](https://git
### 本地 WebUI Demo <!-- omit in toc -->
您可以使用以下命令轻松构建自己的本地 WebUI Demo。
您可以使用以下命令轻松构建自己的本地 WebUI Demo, 体验实时流式视频/语音通话
1. 启动model server:
```shell
@@ -2358,16 +2358,24 @@ MiniCPM-V 2.0 可运行在Android手机上点击[MiniCPM-V 2.0](https://githu
### 本地WebUI Demo部署
<details>
<summary>点击查看本地WebUI demo 在 NVIDIA GPU、Mac等不同设备部署方法 </summary>
<summary>点击查看本地WebUI demo部署方法, 体验实时流式视频/语音通话 </summary>
1. 启动model server:
```shell
pip install -r requirements.txt
pip install -r requirements_o2.6.txt
python web_demos/minicpm-o_2.6/model_server.py
```
2. 启动web server:
```shell
# Make sure Node and PNPM is installed.
cd web_demos/minicpm-o_2.6/web_server
pnpm install # install requirements
pnpm run dev # start server
```
```shell
# For NVIDIA GPUs, run:
python web_demo_2.6.py --device cuda
```
</details>
### 基于 llama.cpp、ollama、vLLM 的高效推理

View File

@@ -55,6 +55,7 @@ logger = setup_logger()
ap = argparse.ArgumentParser()
ap.add_argument('--port', type=int , default=32550)
ap.add_argument('--model', type=str , default="openbmb/MiniCPM-o-2_6", help="huggingface model name or local path")
args = ap.parse_args()
@@ -89,7 +90,7 @@ class StreamManager:
self.target_dtype = torch.bfloat16
self.device='cuda:0'
self.minicpmo_model_path = "openbmb/MiniCPM-o-2_6"
self.minicpmo_model_path = args.model #"openbmb/MiniCPM-o-2_6"
self.model_version = "2.6"
with torch.no_grad():
self.minicpmo_model = AutoModel.from_pretrained(self.minicpmo_model_path, trust_remote_code=True, torch_dtype=self.target_dtype, attn_implementation='sdpa')