@@ -851,35 +850,35 @@ We deploy MiniCPM-V 2.6 on end devices. The demo video is the raw screen recordi
## MiniCPM-Llama3-V 2.5
-Click to view more details of MiniCPM-Llama3-V 2.5
+查看 MiniCPM-Llama3-V 2.5 的详细信息
-**MiniCPM-Llama3-V 2.5** is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.0. Notable features of MiniCPM-Llama3-V 2.5 include:
+**MiniCPM-Llama3-V 2.5** 是 MiniCPM-V 系列的最新版本模型,基于 SigLip-400M 和 Llama3-8B-Instruct 构建,共 8B 参数量,相较于 MiniCPM-V 2.0 性能取得较大幅度提升。MiniCPM-Llama3-V 2.5 值得关注的特点包括:
-- 🔥 **Leading Performance.**
- MiniCPM-Llama3-V 2.5 has achieved an average score of 65.1 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. **With only 8B parameters, it surpasses widely used proprietary models like GPT-4V-1106, Gemini Pro, Claude 3 and Qwen-VL-Max** and greatly outperforms other Llama 3-based MLLMs.
+- 🔥 **领先的性能。**
+ MiniCPM-Llama3-V 2.5 在综合了 11 个主流多模态大模型评测基准的 OpenCompass 榜单上平均得分 65.1,**以 8B 量级的大小超过了 GPT-4V-1106、Gemini Pro、Claude 3、Qwen-VL-Max 等主流商用闭源多模态大模型**,大幅超越基于Llama 3构建的其他多模态大模型。
-- 💪 **Strong OCR Capabilities.**
- MiniCPM-Llama3-V 2.5 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving a **700+ score on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V-0409, Qwen-VL-Max and Gemini Pro**. Based on recent user feedback, MiniCPM-Llama3-V 2.5 has now enhanced full-text OCR extraction, table-to-markdown conversion, and other high-utility capabilities, and has further strengthened its instruction-following and complex reasoning abilities, enhancing multimodal interaction experiences.
+- 💪 **优秀的 OCR 能力。**
+ MiniCPM-Llama3-V 2.5 可接受 180 万像素的任意宽高比图像输入,**OCRBench 得分达到 725,超越 GPT-4o、GPT-4V、Gemini Pro、Qwen-VL-Max 等商用闭源模型**,达到最佳水平。基于近期用户反馈建议,MiniCPM-Llama3-V 2.5 增强了全文 OCR 信息提取、表格图像转 markdown 等高频实用能力,并且进一步加强了指令跟随、复杂推理能力,带来更好的多模态交互体感。
-- 🏆 **Trustworthy Behavior.**
- Leveraging the latest [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) method (the newest technique in the [RLHF-V](https://github.com/RLHF-V) [CVPR'24] series), MiniCPM-Llama3-V 2.5 exhibits more trustworthy behavior. It achieves a **10.3%** hallucination rate on Object HalBench, lower than GPT-4V-1106 (13.6%), achieving the best-level performance within the open-source community. [Data released](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset).
+- 🏆 **可信行为。**
+ 借助最新的 [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) 对齐技术([RLHF-V](https://github.com/RLHF-V/) [CVPR'24]系列的最新技术),MiniCPM-Llama3-V 2.5 具有更加可信的多模态行为,在 Object HalBench 的幻觉率降低到了 **10.3%**,显著低于 GPT-4V-1106 (13.6%),达到开源社区最佳水平。[数据集已发布](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset)。
-- 🌏 **Multilingual Support.**
- Thanks to the strong multilingual capabilities of Llama 3 and the cross-lingual generalization technique from [VisCPM](https://github.com/OpenBMB/VisCPM), MiniCPM-Llama3-V 2.5 extends its bilingual (Chinese-English) multimodal capabilities to **over 30 languages including German, French, Spanish, Italian, Korean etc.** [All Supported Languages](./assets/minicpm-llama-v-2-5_languages.md).
+- 🌏 **多语言支持。**
+ 得益于 Llama 3 强大的多语言能力和 VisCPM 的跨语言泛化技术,MiniCPM-Llama3-V 2.5 在中英双语多模态能力的基础上,仅通过少量翻译的多模态数据的指令微调,高效泛化支持了**德语、法语、西班牙语、意大利语、韩语等 30+ 种语言**的多模态能力,并表现出了良好的多语言多模态对话性能。[查看所有支持语言](./assets/minicpm-llama-v-2-5_languages.md)
-- 🚀 **Efficient Deployment.**
- MiniCPM-Llama3-V 2.5 systematically employs **model quantization, CPU optimizations, NPU optimizations and compilation optimizations**, achieving high-efficiency deployment on end-side devices. For mobile phones with Qualcomm chips, we have integrated the NPU acceleration framework QNN into llama.cpp for the first time. After systematic optimization, MiniCPM-Llama3-V 2.5 has realized a **150x acceleration in end-side MLLM image encoding** and a **3x speedup in language decoding**.
+- 🚀 **高效部署。**
+ MiniCPM-Llama3-V 2.5 较为系统地通过**模型量化、CPU、NPU、编译优化**等高效加速技术,实现高效的终端设备部署。对于高通芯片的移动手机,我们首次将 NPU 加速框架 QNN 整合进了 llama.cpp。经过系统优化后,MiniCPM-Llama3-V 2.5 实现了多模态大模型端侧**语言解码速度 3 倍加速**、**图像编码 150 倍加速**的巨大提升。
-- 💫 **Easy Usage.**
-MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) and [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) support for efficient CPU inference on local devices, (2) [GGUF](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) format quantized models in 16 sizes, (3) efficient [LoRA](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#lora-finetuning) fine-tuning with only 2 V100 GPUs, (4) [streaming output](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5#usage), (5) quick local WebUI demo setup with [Gradio](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_2.5.py) and [Streamlit](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_streamlit-2_5.py), and (6) interactive demos on [HuggingFace Spaces](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5).
+- 💫 **易于使用。**
+ MiniCPM-Llama3-V 2.5 可以通过多种方式轻松使用:(1)[llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) 和 [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) 支持在本地设备上进行高效的 CPU 推理;(2)提供 16 种尺寸的 [GGUF](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) 格式量化模型;(3)仅需 2 张 V100 GPU 即可进行高效的 [LoRA](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#lora-finetuning) 微调;( 4)支持[流式输出](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5#usage);(5)快速搭建 [Gradio](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_2.5.py) 和 [Streamlit](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_streamlit-2_5.py) 本地 WebUI demo;( 6.)[HuggingFace Spaces](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5) 交互式 demo。
-### Evaluation
+### 性能评估
-

+
-Click to view results on TextVQA, DocVQA, OCRBench, OpenCompass, MME, MMBench, MMMU, MathVista, LLaVA Bench, RealWorld QA, Object HalBench.
+TextVQA, DocVQA, OCRBench, OpenCompass MultiModal Avg Score, MME, MMBench, MMMU, MathVista, LLaVA Bench, RealWorld QA, Object HalBench上的详细评测结果。
@@ -901,7 +900,7 @@ MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) [llama.cpp](https:/
| Object HalBench |
-
+
| Proprietary |
@@ -1073,13 +1072,13 @@ MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) [llama.cpp](https:/
8.4B |
- |
- |
- 78.2 |
+ - |
- |
1971.5 |
- |
- |
41.7 |
- 37.5 |
+ - |
80.1 |
60.0 |
- |
@@ -1151,58 +1150,60 @@ MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) [llama.cpp](https:/
-
-* We evaluate the officially released checkpoint by ourselves.
-
+* 正式开源模型权重的评测结果。
-

+
- Evaluation results of multilingual LLaVA Bench
+ 多语言LLaVA Bench评测结果
-### Examples
-
-
-
-
+### 典型示例
+
+
+
+
-
+
## MiniCPM-V 2.0
-Click to view more details of MiniCPM-V 2.0
+查看 MiniCPM-V 2.0 的详细信息
+
+**MiniCPM-V 2.0**可以高效部署到终端设备。该模型基于 SigLip-400M 和 [MiniCPM-2.4B](https://github.com/OpenBMB/MiniCPM/)构建,通过perceiver resampler连接。其特点包括:
+
+- 🔥 **优秀的性能。**
+
+ MiniCPM-V 2.0 在多个测试基准(如 OCRBench, TextVQA, MME, MMB, MathVista 等)中实现了 7B 以下模型的**最佳性能**。**在综合了 11 个主流多模态大模型评测基准的 OpenCompass 榜单上超过了 Qwen-VL-Chat 9.6B、CogVLM-Chat 17.4B 和 Yi-VL 34B 等更大参数规模的模型**。MiniCPM-V 2.0 还展现出**领先的 OCR 能力**,在场景文字识别能力上**接近 Gemini Pro**,OCRBench 得分达到**开源模型第一**。
+
+
+- 🏆 **可信行为。**
+
+ 多模态大模型深受幻觉问题困扰,模型经常生成和图像中的事实不符的文本。MiniCPM-V 2.0 是 **第一个通过多模态 RLHF 对齐的端侧多模态大模型**(借助 [RLHF-V](https://rlhf-v.github.io/) [CVPR'24] 系列技术)。该模型在 [Object HalBench](https://arxiv.org/abs/2312.00849) 达到**和 GPT-4V 相仿**的性能。
-**MiniCPM-V 2.0** is an efficient version with promising performance for deployment. The model is built based on SigLip-400M and [MiniCPM-2.4B](https://github.com/OpenBMB/MiniCPM/), connected by a perceiver resampler. Our latest version, MiniCPM-V 2.0 has several notable features.
+- 🌟 **高清图像高效编码。**
-- 🔥 **State-of-the-art Performance.**
+ MiniCPM-V 2.0 可以接受 **180 万像素的任意长宽比图像输入**(基于最新的[LLaVA-UHD](https://arxiv.org/pdf/2403.11703.pdf) 技术),这使得模型可以感知到小物体、密集文字等更加细粒度的视觉信息。
- MiniCPM-V 2.0 achieves **state-of-the-art performance** on multiple benchmarks (including OCRBench, TextVQA, MME, MMB, MathVista, etc) among models under 7B parameters. It even **outperforms strong Qwen-VL-Chat 9.6B, CogVLM-Chat 17.4B, and Yi-VL 34B on OpenCompass, a comprehensive evaluation over 11 popular benchmarks**. Notably, MiniCPM-V 2.0 shows **strong OCR capability**, achieving **comparable performance to Gemini Pro in scene-text understanding**, and **state-of-the-art performance on OCRBench** among open-source models.
-- 🏆 **Trustworthy Behavior.**
+- ⚡️ **高效部署。**
- LMMs are known for suffering from hallucination, often generating text not factually grounded in images. MiniCPM-V 2.0 is **the first end-side LMM aligned via multimodal RLHF for trustworthy behavior** (using the recent [RLHF-V](https://rlhf-v.github.io/) [CVPR'24] series technique). This allows the model to **match GPT-4V in preventing hallucinations** on Object HalBench.
+ MiniCPM-V 2.0 可以**高效部署在大多数消费级显卡和个人电脑上**,包括**移动手机等终端设备**。在视觉编码方面,我们通过perceiver resampler将图像表示压缩为更少的 token。这使得 MiniCPM-V 2.0 即便是**面对高分辨率图像,也能占用较低的存储并展现优秀的推理速度**。
-- 🌟 **High-Resolution Images at Any Aspect Raito.**
+- 🙌 **双语支持。**
- MiniCPM-V 2.0 can accept **1.8 million pixels (e.g., 1344x1344) images at any aspect ratio**. This enables better perception of fine-grained visual information such as small objects and optical characters, which is achieved via a recent technique from [LLaVA-UHD](https://arxiv.org/pdf/2403.11703.pdf).
+ MiniCPM-V 2.0 **提供领先的中英双语多模态能力支持**。
+ 该能力通过 [VisCPM](https://arxiv.org/abs/2308.12038) [ICLR'24] 论文中提出的多模态能力的跨语言泛化技术实现。
-- ⚡️ **High Efficiency.**
+### 典型示例
- MiniCPM-V 2.0 can be **efficiently deployed on most GPU cards and personal computers**, and **even on end devices such as mobile phones**. For visual encoding, we compress the image representations into much fewer tokens via a perceiver resampler. This allows MiniCPM-V 2.0 to operate with **favorable memory cost and speed during inference even when dealing with high-resolution images**.
-
-- 🙌 **Bilingual Support.**
-
- MiniCPM-V 2.0 **supports strong bilingual multimodal capabilities in both English and Chinese**. This is enabled by generalizing multimodal capabilities across languages, a technique from [VisCPM](https://arxiv.org/abs/2308.12038) [ICLR'24].
-
-### Examples
@@ -1210,7 +1211,7 @@ MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) [llama.cpp](https:/
-We deploy MiniCPM-V 2.0 on end devices. The demo video is the raw screen recording on a Xiaomi 14 Pro without edition.
+我们将 MiniCPM-V 2.0 部署在小米 14 Pro 上,并录制了以下演示视频,未经任何视频剪辑。
@@ -1221,79 +1222,84 @@ We deploy MiniCPM-V 2.0 on end devices. The demo video is the raw screen recordi
-## Legacy Models
-| Model | Introduction and Guidance |
+
+
+## 历史版本模型
+
+
+| 模型 | 介绍信息和使用教程 |
|:----------------------|:-------------------:|
-| MiniCPM-V 1.0 | [Document](./minicpm_v1.md) |
-| OmniLMM-12B | [Document](./omnilmm_en.md) |
+| MiniCPM-V 1.0 | [文档](./minicpm_v1.md) |
+| OmniLMM-12B | [文档](./omnilmm.md) |
-## Chat with Our Demo on Gradio 🤗
-
-We provide online and local demos powered by Hugging Face Gradio
, the most popular model deployment framework nowadays. It supports streaming outputs, progress bars, queuing, alerts, and other useful features.
+## Gradio Demo 🤗
+我们提供由 Hugging Face Gradio
支持的在线和本地 Demo。Gradio 是目前最流行的模型部署框架,支持流式输出、进度条、process bars 和其他常用功能。
### Online Demo
-Click here to try out the online demo of [MiniCPM-V 2.6](https://huggingface.co/spaces/openbmb/MiniCPM-V-2_6) | [MiniCPM-Llama3-V 2.5](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5) | [MiniCPM-V 2.0](https://huggingface.co/spaces/openbmb/MiniCPM-V-2).
+欢迎试用 Online Demo: [MiniCPM-V 2.6](https://huggingface.co/spaces/openbmb/MiniCPM-V-2_6) | [MiniCPM-Llama3-V 2.5](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5) | [MiniCPM-V 2.0](https://huggingface.co/spaces/openbmb/MiniCPM-V-2) 。
+
+### 本地 WebUI Demo
+
+您可以使用以下命令轻松构建自己的本地 WebUI Demo。
-### Local WebUI Demo
-
-You can easily build your own local WebUI demo with Gradio using the following commands.
-
```shell
pip install -r requirements.txt
```
-
+
```shell
-# For NVIDIA GPUs, run:
+# 对于 NVIDIA GPU,请运行:
python web_demo_2.6.py --device cuda
```
-## Install
-1. Clone this repository and navigate to the source folder
+## 安装
+
+1. 克隆我们的仓库并跳转到相应目录
```bash
git clone https://github.com/OpenBMB/MiniCPM-V.git
cd MiniCPM-V
```
-2. Create conda environment
+1. 创建 conda 环境
```Shell
-conda create -n MiniCPM-V python=3.10 -y
-conda activate MiniCPM-V
+conda create -n MiniCPMV python=3.10 -y
+conda activate MiniCPMV
```
-3. Install dependencies
+3. 安装依赖
```shell
pip install -r requirements.txt
```
-## Inference
+## 推理
+### 模型库
-### Model Zoo
+| 模型 | 设备 | 资源 | 简介 | 下载链接 |
+|:--------------|:-:|:----------:|:-------------------|:---------------:|
+| MiniCPM-V 2.6| GPU | 17 GB | 最新版本,提供最佳的端侧单图、多图、视频理解能力。 | [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6) [
](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6) |
+| MiniCPM-V 2.6 gguf | CPU | 6 GB | gguf 版本,更低的内存占用和更高的推理效率。 | [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) [
](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-gguf) |
+| MiniCPM-V 2.6 int4 | GPU | 7 GB | int4量化版,更低显存占用。 | [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6-int4) [
](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-int4) |
+| MiniCPM-Llama3-V 2.5| GPU | 19 GB | 提供出色的端侧多模态理解能力。 | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/) [
](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5) |
+| MiniCPM-Llama3-V 2.5 gguf | CPU | 6 GB | gguf 版本,更低的内存占用和更高的推理效率。 | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) [
](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-gguf) |
+| MiniCPM-Llama3-V 2.5 int4 | GPU | 8 GB | int4量化版,更低显存占用。 | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-int4/) [
](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-int4) |
+| MiniCPM-V 2.0 | GPU | 8 GB | 轻量级版本,平衡计算开销和多模态理解能力。 | [🤗](https://huggingface.co/openbmb/MiniCPM-V-2) [
](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2) |
+| MiniCPM-V 1.0 | GPU | 7 GB | 最轻量版本, 提供最快的推理速度。 | [🤗](https://huggingface.co/openbmb/MiniCPM-V) [
](https://modelscope.cn/models/OpenBMB/MiniCPM-V) |
-| Model | Device | Memory | Description | Download |
-|:-----------|:--:|:-----------:|:-------------------|:---------------:|
-| MiniCPM-V 2.6| GPU | 17 GB | The latest version, achieving state-of-the-art end-side performance for single image, multi-image and video understanding. | [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6) [
](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6) |
-| MiniCPM-V 2.6 gguf | CPU | 6 GB | The gguf version, lower memory usage and faster inference. | [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) [
](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-gguf) |
-| MiniCPM-V 2.6 int4 | GPU | 7 GB | The int4 quantized version, lower GPU memory usage. | [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6-int4) [
](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-int4) |
-| MiniCPM-Llama3-V 2.5 | GPU | 19 GB | Strong end-side multimodal performance. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/) [
](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5) |
-| MiniCPM-Llama3-V 2.5 gguf | CPU | 6 GB | The gguf version, lower memory usage and faster inference. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) [
](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-gguf) |
-| MiniCPM-Llama3-V 2.5 int4 | GPU | 8 GB | The int4 quantized version, lower GPU memory usage. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-int4/) [
](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-int4) |
-| MiniCPM-V 2.0 | GPU | 8 GB | Light version, balance the performance the computation cost. | [🤗](https://huggingface.co/openbmb/MiniCPM-V-2) [
](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2) |
-| MiniCPM-V 1.0 | GPU | 7 GB | Lightest version, achieving the fastest inference. | [🤗](https://huggingface.co/openbmb/MiniCPM-V) [
](https://modelscope.cn/models/OpenBMB/MiniCPM-V) |
+更多[历史版本模型](#legacy-models)
-### Multi-turn Conversation
+### 多轮对话
-Please refer to the following codes to run.
+请参考以下代码进行推理。

@@ -1338,7 +1344,7 @@ answer = model.chat(
print(answer)
```
-You will get the following output:
+可以得到以下输出:
```
"The aircraft in the image is an Airbus A380, which can be identified by its large size, double-deck structure, and the distinctive shape of its wings and engines. The A380 is a wide-body aircraft known for being the world's largest passenger airliner, designed for long-haul flights. It has four engines, which are characteristic of large commercial aircraft. The registration number on the aircraft can also provide specific information about the model if looked up in an aviation database."
@@ -1346,9 +1352,9 @@ You will get the following output:
"The Airbus A380 is a double-deck, wide-body, four-engine jet airliner made by Airbus. It is the world's largest passenger airliner and is known for its long-haul capabilities. The aircraft was developed to improve efficiency and comfort for passengers traveling over long distances. It has two full-length passenger decks, which can accommodate more passengers than a typical single-aisle airplane. The A380 has been operated by airlines such as Lufthansa, Singapore Airlines, and Emirates, among others. It is widely recognized for its unique design and significant impact on the aviation industry."
```
-#### Chat with multiple images
+#### 多图理解
- Click to view Python code running MiniCPM-V 2.6 with multiple images input.
+ 点击查看使用 MiniCPM-V 2.6 进行多图理解的Python示例
```python
import torch
@@ -1375,9 +1381,10 @@ print(answer)
```
-#### In-context few-shot learning
+#### 少样本上下文学习
+
- Click to view Python code running MiniCPM-V 2.6 with few-shot input.
+ 点击查看使用 MiniCPM-V 2.6 进行few-shot推理的Python示例
```python
import torch
@@ -1411,9 +1418,9 @@ print(answer)
```
-#### Chat with video
+#### 视频理解
- Click to view Python code running MiniCPM-V 2.6 with video input.
+ 点击查看使用 MiniCPM-V 2.6 进行视频理解的Python示例
```python
import torch
@@ -1454,7 +1461,7 @@ msgs = [
# Set decode params for video
params = {}
params["use_image_id"] = False
-params["max_slice_nums"] = 2 # use 1 if cuda OOM and video resolution > 448*448
+params["max_slice_nums"] = 2 # 如果cuda OOM且视频分辨率大于448*448可设为1
answer = model.chat(
image=None,
@@ -1467,16 +1474,16 @@ print(answer)
-### Inference on Multiple GPUs
-You can run MiniCPM-Llama3-V 2.5 on multiple low VRAM GPUs (12 GB or 16 GB) by distributing the model's layers across multiple GPUs. Please refer to this [tutorial](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md) for detailed instructions on how to load the model and inference using multiple low VRAM GPUs.
+### 多卡推理
+您可以通过将模型的层分布在多个低显存显卡(12 GB 或 16 GB)上,运行 MiniCPM-Llama3-V 2.5。请查看该[教程](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md),详细了解如何使用多张低显存显卡载入模型并进行推理。
-### Inference on Mac
+### Mac 推理
-Click to view an example, to run MiniCPM-Llama3-V 2.5 on 💻 Mac with MPS (Apple silicon or AMD GPUs).
+点击查看 MiniCPM-Llama3-V 2.5 / MiniCPM-V 2.0 基于Mac MPS运行 (Apple silicon 或 AMD GPUs)的示例。
```python
-# test.py Need more than 16GB memory.
+# test.py Need more than 16GB memory to run.
import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer
@@ -1500,35 +1507,49 @@ answer, context, _ = model.chat(
)
print(answer)
```
-Run with command:
+运行:
```shell
PYTORCH_ENABLE_MPS_FALLBACK=1 python test.py
```
-### Deployment on Mobile Phone
-MiniCPM-V 2.0 can be deployed on mobile phones with Android operating systems. 🚀 Click [MiniCPM-V 2.0](https://github.com/OpenBMB/mlc-MiniCPM) to install apk.
-### Inference with llama.cpp
-MiniCPM-V 2.6 can run with llama.cpp now! See [our fork of llama.cpp](https://github.com/OpenBMB/llama.cpp/tree/minicpmv-main/examples/llava/README-minicpmv2.6.md) for more detail. This implementation supports smooth inference of 16~18 token/s on iPad (test environment:iPad Pro + M4).
-
-### Inference with ollama
-MiniCPM-V 2.6 can run with ollama now! See [our fork of ollama](https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/examples/minicpm-v2.6/README.md) for more detail. This implementation supports smooth inference of 16~18 token/s on iPad (test environment:iPad Pro + M4).
-
-### Inference with vLLM
+### 手机端部署
+MiniCPM-V 2.0 可运行在Android手机上,点击[MiniCPM-V 2.0](https://github.com/OpenBMB/mlc-MiniCPM)安装apk使用;
+### 本地WebUI Demo部署
- vLLM now officially supports MiniCPM-V 2.6, MiniCPM-Llama3-V 2.5 and MiniCPM-V 2.0, Click to see.
+点击查看本地WebUI demo 在 NVIDIA GPU、Mac等不同设备部署方法
+
+```shell
+pip install -r requirements.txt
+```
+
+```shell
+# For NVIDIA GPUs, run:
+python web_demo_2.6.py --device cuda
+```
+
-1. Install vLLM(>=0.5.4):
+### llama.cpp 部署
+MiniCPM-V 2.6 现在支持llama.cpp啦! 用法请参考[我们的fork llama.cpp](https://github.com/OpenBMB/llama.cpp/tree/minicpmv-main/examples/llava/README-minicpmv2.6.md), 在iPad上可以支持 16~18 token/s 的流畅推理(测试环境:iPad Pro + M4)。
+
+### ollama 部署
+MiniCPM-V 2.6 现在支持ollama啦! 用法请参考[我们的fork ollama](https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/examples/minicpm-v2.6/README.md), 在iPad上可以支持 16~18 token/s 的流畅推理(测试环境:iPad Pro + M4)。
+
+### vLLM 部署
+
+点击查看, vLLM 现已官方支持MiniCPM-V 2.6、MiniCPM-Llama3-V 2.5 和 MiniCPM-V 2.0
+
+1. 安装 vLLM(>=0.5.4):
```shell
pip install vllm
```
-2. Install timm: (optional, MiniCPM-V 2.0 need timm)
+3. 安装 timm 库: (可选,MiniCPM-V 2.0需安装)
```shell
-pip install timm==0.9.10
+pip install timm=0.9.10
```
-3. Run the example(for image):
+4. 运行示例代码:(注意:如果使用本地路径的模型,请确保模型代码已更新到Hugging Face上的最新版)
```python
from transformers import AutoTokenizer
from PIL import Image
@@ -1600,49 +1621,49 @@ outputs = llm.generate(inputs, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)
```
-4. click [here](https://modelbest.feishu.cn/wiki/C2BWw4ZP0iCDy7kkCPCcX2BHnOf?from=from_copylink) if you want to use it with *video*, or get more details about `vLLM`.
+4. [点击此处](https://modelbest.feishu.cn/wiki/C2BWw4ZP0iCDy7kkCPCcX2BHnOf?from=from_copylink)查看带视频推理和其他有关 `vLLM` 的信息。
+
-## Fine-tuning
-### Simple Fine-tuning
+## 微调
-We support simple fine-tuning with Hugging Face for MiniCPM-V 2.0 and MiniCPM-Llama3-V 2.5.
+### 简易微调
-[Reference Document](./finetune/readme.md)
+我们支持使用 Huggingface Transformers 库简易地微调 MiniCPM-V 2.0 和 MiniCPM-Llama3-V 2.5 模型。
-### With the SWIFT Framework
+[参考文档](./finetune/readme.md)
-We now support MiniCPM-V series fine-tuning with the SWIFT framework. SWIFT supports training, inference, evaluation and deployment of nearly 200 LLMs and MLLMs . It supports the lightweight training solutions provided by PEFT and a complete Adapters Library including techniques such as NEFTune, LoRA+ and LLaMA-PRO.
+### 使用 SWIFT 框架
-Best Practices:[MiniCPM-V 1.0](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v最佳实践.md), [MiniCPM-V 2.0](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v-2最佳实践.md), [MiniCPM-V 2.6](https://github.com/modelscope/ms-swift/issues/1613).
+我们支持使用 SWIFT 框架微调 MiniCPM-V 系列模型。SWIFT 支持近 200 种大语言模型和多模态大模型的训练、推理、评测和部署。支持 PEFT 提供的轻量训练方案和完整的 Adapters 库支持的最新训练技术如 NEFTune、LoRA+、LLaMA-PRO 等。
+
+参考文档:[MiniCPM-V 1.0](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v最佳实践.md),[MiniCPM-V 2.0](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v-2最佳实践.md) [MiniCPM-V 2.6](https://github.com/modelscope/ms-swift/issues/1613).
## FAQs
-Click here to view the [FAQs](./docs/faqs.md)
-
-## Model License
-
-* This repository is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
-
-* The usage of MiniCPM-V model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
-
-* The models and weights of MiniCPM are completely free for academic research. after filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, are also available for free commercial use.
-
-
-## Statement
-
-As LMMs, MiniCPM-V models (including OmniLMM) generate contents by learning a large amount of multimodal corpora, but they cannot comprehend, express personal opinions or make value judgement. Anything generated by MiniCPM-V models does not represent the views and positions of the model developers
-
-We will not be liable for any problems arising from the use of MiniCPM-V models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model.
+点击查看 [FAQs](./docs/faqs.md)
-## Institutions
+## 模型协议
-This project is developed by the following institutions:
+* 本仓库中代码依照 [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) 协议开源
+* MiniCPM-V 模型权重的使用则需要遵循 [“MiniCPM模型商用许可协议.md”](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%E6%A8%A1%E5%9E%8B%E5%95%86%E7%94%A8%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.md)。
+* MiniCPM 模型权重对学术研究完全开放,在填写[“问卷”](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g)进行登记后亦允许免费商业使用。
--

[THUNLP](https://nlp.csai.tsinghua.edu.cn/)
--

[ModelBest](https://modelbest.cn/)
--

[Zhihu](https://www.zhihu.com/ )
+## 声明
+
+作为多模态大模型,MiniCPM-V 系列模型(包括 OmniLMM)通过学习大量的多模态数据来生成内容,但它无法理解、表达个人观点或价值判断,它所输出的任何内容都不代表模型开发者的观点和立场。
+
+因此用户在使用本项目的系列模型生成的内容时,应自行负责对其进行评估和验证。如果由于使用本项目的系列开源模型而导致的任何问题,包括但不限于数据安全问题、公共舆论风险,或模型被误导、滥用、传播或不当利用所带来的任何风险和问题,我们将不承担任何责任。
+
+
+## 机构
+
+本项目由以下机构共同开发:
+
+-

[清华大学自然语言处理实验室](https://nlp.csai.tsinghua.edu.cn/)
+-

[面壁智能](https://modelbest.cn/)
+-

[知乎](https://www.zhihu.com/ )
## 🌟 Star History
@@ -1672,16 +1693,17 @@ This project is developed by the following institutions:
/>
-->
-## Key Techniques and Other Multimodal Projects
+## 支持技术和其他多模态项目
-👏 Welcome to explore key techniques of MiniCPM-V and other multimodal projects of our team:
+👏 欢迎了解 MiniCPM-V 背后的支持技术和更多我们的多模态项目!
[VisCPM](https://github.com/OpenBMB/VisCPM/tree/main) | [RLHF-V](https://github.com/RLHF-V/RLHF-V) | [LLaVA-UHD](https://github.com/thunlp/LLaVA-UHD) | [RLAIF-V](https://github.com/RLHF-V/RLAIF-V)
-## Citation
-If you find our model/code/paper helpful, please consider cite our papers 📝 and star us ⭐️!
+## 引用
+
+如果您觉得我们模型/代码/论文有帮助,请给我们 ⭐ 和 引用 📝,感谢!
```bib
@article{yao2024minicpm,