Update trainer.py

rm ide file
Signed-off-by: tc-mb <caitianchi@modelbest.cn>
2026-02-05 18:29:18 +08:00 · 2025-09-12 15:53:48 +08:00 · 2025-09-02 16:14:14 +08:00 · 2025-09-02 12:16:14 +08:00 · 2025-09-02 12:15:25 +08:00 · 2025-09-01 17:51:49 +08:00
56 changed files with 3615 additions and 1772 deletions
--- a/.vscode/settings.json
+++ b/.vscode/settings.json
@@ -1,5 +0,0 @@
-{
-    "githubPullRequests.ignoredPullRequestBranches": [
-        "main"
-    ]
-}
--- a/License.md
+++ b/License.md
@@ -0,0 +1,41 @@
+Version 1.0, June 5, 2024
+© 2024 OpenBMB. All rights reserved.
+
+## Part One: Preamble
+
+We are opening the entire series of the globally leading MiniCPM edge-side large language models, including the flagship edge-side models MiniCPM-2.4B and MiniCPM-1.2B, as well as the world's most powerful edge multimodal models MiniCPM-V series. The aforementioned weights are completely open for all academic research. Commercial use is also allowed after filling out a registration questionnaire. Community use of the MiniCPM series models must comply with Apache 2.0 and the "MiniCPM Model Community License Agreement."
+Therefore, you and the MiniCPM development team agree to the following "MiniCPM Model Community License Agreement":
+
+## Part Two: Licensing and Redistributio
+
+####  1. Grant of Rights
+You are granted a non-exclusive, worldwide, non-transferable, royalty-free, limited license to use, copy, distribute, reproduce, create derivative works from, and modify MiniCPM materials in accordance with OpenBMB's intellectual property rights or other rights in the MiniCPM materials.
+####  2. Distribution and Redistribution
+- If you distribute or provide MiniCPM series model materials (or any derivative works thereof), or use any product or service of them, you must (A) provide a copy of this agreement; and (B) prominently display "Built with 面壁MiniCPM" on the relevant website, user interface, blog post, about page, or product documentation. If you create, train, fine-tune, or improve an AI model using the MiniCPM series models, the model must include "MiniCPM" in its name.
+- You must retain the following attribution statement in all distributed MiniCPM-related materials: "MiniCPM is licensed under the MiniCPM Model Community License, © OpenBMB Platforms, Inc. All rights reserved."
+- Your use of MiniCPM materials must comply with applicable laws and regulations and the "MiniCPM Model Community License Agreement," which is incorporated into this agreement by reference.
+- You may not use MiniCPM series models or their outputs and results to improve any other large language models (other than MiniCPM or its derivatives).
+####  3. Additional Commercial Terms
+If you or your affiliates' services or products deploy the model on edge-side devices not exceeding 5,000 units, or provide applications with a daily active user count (DAU) of less than 1 million, you can apply to OpenBMB for permission and, after filling out the registration questionnaire, may be allowed to use it commercially for free. Otherwise, please email (cpm@modelbest.cn) to apply for authorization from OpenBMB, which may, at its discretion, grant permission, and you will not have the right to exercise any rights under this agreement.
+####  4. Usage-based Restrictions
+The restrictions set forth in Appendix A are considered usage-based restrictions. Therefore, you may not use the model or its derivatives for designated restricted uses. You may use the model under this license, including only for lawful purposes and in compliance with the terms of the license. Usage includes creating any content, fine-tuning, updating, running, training, evaluating, and/or re-parameterizing the model. You should require all users of the model or its derivatives to comply with the terms of this section.
+
+## Part Three: Other Terms
+####  5. Trademarks and Related
+This license does not grant you the right to use OpenBMB, OpenBMB Intelligence, MiniCPM trademarks, trade names, logos, or otherwise imply a relationship between the parties; any rights not expressly granted herein are reserved by OpenBMB.
+####  6. Disclaimer
+Unless required by applicable law or agreed to in writing, OpenBMB provides the model and supplemental materials "as is," without any warranty or condition, express or implied, including but not limited to all express and implied warranties or conditions of title, non-infringement, merchantability, or fitness for a particular purpose. You are solely responsible for determining the appropriateness of using or redistributing the model, its derivatives, and supplemental materials, and assume any risks associated with exercising the permissions under this license.
+
+## Appendix A: Usage Restrictions
+You agree not to use the model or its derivatives for:
+- Any use that violates applicable national or international laws or regulations or infringes upon the legal rights and interests of any third party;
+- Any military purposes;
+- Exploiting, harming, or attempting to exploit or harm minors in any way;
+- Generating or disseminating verifiable false information and/or content with the intent to harm others;
+- Generating or disseminating inappropriate content that must comply with applicable regulatory requirements;
+- Unauthorized generation or dissemination of personally identifiable information, or unreasonable use thereof;
+- Defamation, demeaning, or otherwise harassing others;
+- Fully automated decision-making that adversely affects individuals' legal rights or creates or modifies binding, enforceable obligations;
+- Any use intended to or having the effect of discriminating or harming individuals or groups based on online or offline social behaviors or known or predicted personal characteristics;
+- Exploiting the vulnerabilities of specific groups due to their age, social, physical, or psychological characteristics, in a manner that materially distorts the behavior of group members, leading to or likely leading to physical or psychological harm to the person or others;
+- Any use intended to or having the effect of discriminating against individuals or groups based on legally protected characteristics or categories.
--- a/MiniCPM模型商用许可协议.md
+++ b/MiniCPM模型商用许可协议.md
@@ -0,0 +1,43 @@
+版本 1.0，2024年6月5日
+版权所有 © 2024 OpenBMB
+
+## 第一部分：序言
+
+我们将全球领先的MiniCPM端侧模型全系开源，包括旗舰端侧模型MiniCPM-2.4B和MiniCPM-1.2B，以及全球领先的端侧多模态模型MiniCPM-V系列。以上权重对所有学术研究完全开放。在填写问卷登记后亦允许商业使用，社区使用 MiniCPM系列模型需要遵循 Apache 2.0 和《MiniCPM 模型社区许可协议》。
+因此，您与MiniCPM 开发团队达成如下《MiniCPM模型商用许可协议》：
+
+## 第二部分：许可权和再分发
+
+#### 1. 权利授予
+您被授予非排他性的、全球性的、不可转让的和免版税的有限许可，依据OpenBMB对MiniCPM材料所拥有的知识产权或其他权利来使用、复制、分发、复制、创建衍生作品和修改MiniCPM材料。
+#### 2. 分发和再分发
+- 如果您分发或提供MiniCPM系列模型材料（或其任何衍生作品），或使用其中任何一个的产品或服务，您必须（A）提供本协议的副本；并（B）在相关网站、用户界面、博客文章、关于页面或产品文档中显著显示“Built with 面壁MiniCPM”。如果您使用MiniCPM系列模型创建、训练、微调或改进AI模型，该模型必须包含“MiniCPM”命名。
+- 您必须在分发的所有MiniCPM相关材料中保留以下归属声明：“面壁MiniCPM 根据MiniCPM模型社区许可证许可，版权所有©面壁智能 Platforms, Inc. 保留所有权利。”
+- 您对MiniCPM材料的使用必须遵守适用的法律法规，并遵守《MiniCPM 模型社区许可协议》，该政策通过引用并入本协议。
+- 您不得使用MiniCPM系列模型或其输出和结果来改进任何其他大型语言模型（除 MiniCPM 或其衍生品外）。
+#### 3. 附加商业条款
+若您或您的关联方的服务或产品是将模型部署在端侧设备，且部署设备不超5000台，或提供应用的日均用户活跃量（DAU）低于100万，可直接向面壁智能申请许可，在填写问卷登记后可允许免费商业使用。否则请发邮件（cpm@modelbest.cn）向面壁智能申请授权，我们可自行决定是否授权，并自行决定授权的期限和范围。在我们给予书面授权前，您无权行使任何商业性权利，亦不得将模型用于任何商业用途。
+
+#### 4. 基于使用的限制
+附录A中规定的限制被视为基于使用的限制。因此，您不得将模型及其衍生作品用于指定的受限用途。您可以根据本许可证使用模型，包括仅用于合法目的并符合许可证的规定。使用包括创建任何内容、微调、更新、运行、训练、评估和/或重新参数化模型。您应要求所有使用模型或其衍生作品的用户遵守本段的条款。
+
+## 第三部分：其他条款
+#### 5. 商标和相关
+本许可证不授予您使用OpenBMB、面壁智能、MiniCPM商标、商号、标志或以其他方式暗示双方之间关系的权利；未在此明确授予的任何权利均由OpenBMB保留。
+
+#### 6. 免责声明
+除非适用法律要求或书面同意，OpenBMB 按“现状”提供模型和补充材料，不提供任何形式的保证或条件，包括但不限于所有明示和暗示的保证或条件，包括所有权、非侵权、适销性或适用于特定目的的保证或条件。您自行负责确定使用或再分发模型、模型的衍生作品和补充材料的适当性，并承担在本许可证下行使权利所引发的任何风险。
+
+## 附录A：使用限制
+您同意不将模型或其衍生作品用于：
+- 任何违反适用国家或国际法律法规或侵犯任何第三方合法权利和利益的方式；
+- 任何军事用途；
+- 以任何方式利用、伤害或试图利用或伤害未成年人；
+- 生成或传播可验证的虚假信息和/或内容，以损害他人为目的；
+- 生成或传播不适当内容，需符合适用的监管要求；
+- 未经授权生成或传播个人可识别信息，或进行不合理使用；
+- 诽谤、贬低或以其他方式骚扰他人；
+- 完全自动化的决策，导致个人的法律权利受到不利影响或创建或修改具有约束力、可执行的义务；
+- 任何意图或具有歧视或伤害个人或群体的效果，基于在线或离线的社会行为或已知或预测的个人特征；
+- 利用特定群体的年龄、社会、身体或心理特征的弱点，以实质性扭曲该群体成员的行为，导致或可能导致该人或其他人身体或心理伤害的方式；
+- 任何意图或具有歧视个人或群体效果的用途，基于法律保护的特征或类别。
--- a/README.md
+++ b/README.md
--- a/README_zh.md
+++ b/README_zh.md
--- a/assets/MiniCPM-V.jpg
+++ b/assets/MiniCPM-V.jpg
--- a/assets/MiniCPM-V27.jpg
+++ b/assets/MiniCPM-V27.jpg
--- a/assets/Minicpm-v
+++ b/assets/Minicpm-v
--- a/assets/join.png
+++ b/assets/join.png
--- a/assets/minicpm-o-group.jpeg
+++ b/assets/minicpm-o-group.jpeg
--- a/assets/minicpm-v-4dot5-framework.png
+++ b/assets/minicpm-v-4dot5-framework.png
--- a/assets/minicpm-v17.png
+++ b/assets/minicpm-v17.png
--- a/assets/minicpm-v18.png
+++ b/assets/minicpm-v18.png
--- a/assets/minicpm-v21-2.png
+++ b/assets/minicpm-v21-2.png
--- a/assets/minicpm-v21.png
+++ b/assets/minicpm-v21.png
--- a/assets/minicpm-v22.png
+++ b/assets/minicpm-v22.png
--- a/assets/minicpm-v23.png
+++ b/assets/minicpm-v23.png
--- a/assets/minicpm-v24.png
+++ b/assets/minicpm-v24.png
--- a/assets/minicpm-v25.png
+++ b/assets/minicpm-v25.png
--- a/assets/minicpm-v26.png
+++ b/assets/minicpm-v26.png
--- a/assets/minicpm-v27.png
+++ b/assets/minicpm-v27.png
--- a/assets/minicpm-v_wechat.png
+++ b/assets/minicpm-v_wechat.png
--- a/assets/minicpm_v_and_minicpm_o_title.png
+++ b/assets/minicpm_v_and_minicpm_o_title.png
--- a/assets/minicpmv22.jpeg
+++ b/assets/minicpmv22.jpeg
--- a/assets/minicpmv35-2.jpg
+++ b/assets/minicpmv35-2.jpg
--- a/assets/minicpmv4/iphone_cn.gif
+++ b/assets/minicpmv4/iphone_cn.gif
--- a/assets/minicpmv4/iphone_cn_funny_points.gif
+++ b/assets/minicpmv4/iphone_cn_funny_points.gif
--- a/assets/minicpmv4/iphone_en.gif
+++ b/assets/minicpmv4/iphone_en.gif
--- a/assets/minicpmv4/iphone_en_information_extraction.gif
+++ b/assets/minicpmv4/iphone_en_information_extraction.gif
--- a/assets/minicpmv4/minicpm-v-4-case.png
+++ b/assets/minicpmv4/minicpm-v-4-case.png
--- a/assets/minicpmv4_5/MiniCPM-V
+++ b/assets/minicpmv4_5/MiniCPM-V
--- a/assets/minicpmv4_5/MiniCPM-V
+++ b/assets/minicpmv4_5/MiniCPM-V
--- a/assets/minicpmv4_5/en_case1.png
+++ b/assets/minicpmv4_5/en_case1.png
--- a/assets/minicpmv4_5/en_case2.png
+++ b/assets/minicpmv4_5/en_case2.png
--- a/assets/minicpmv4_5/en_case3.jpeg
+++ b/assets/minicpmv4_5/en_case3.jpeg
--- a/assets/minicpmv4_5/en_case4.jpeg
+++ b/assets/minicpmv4_5/en_case4.jpeg
--- a/assets/minicpmv4_5/en_extra.jpg
+++ b/assets/minicpmv4_5/en_extra.jpg
--- a/assets/minicpmv4_5/v45_cn_handwriting.gif
+++ b/assets/minicpmv4_5/v45_cn_handwriting.gif
--- a/assets/minicpmv4_5/v45_cn_travel.gif
+++ b/assets/minicpmv4_5/v45_cn_travel.gif
--- a/assets/minicpmv4_5/v45_en_cot.gif
+++ b/assets/minicpmv4_5/v45_en_cot.gif
--- a/assets/minicpmv4_5/v45_en_handwriting.gif
+++ b/assets/minicpmv4_5/v45_en_handwriting.gif
--- a/assets/minicpmv4_5/zh_case1.jpeg
+++ b/assets/minicpmv4_5/zh_case1.jpeg
--- a/assets/minicpmv4_5/zh_case2.jpeg
+++ b/assets/minicpmv4_5/zh_case2.jpeg
--- a/assets/minicpmv4_5/zh_extra.jpeg
+++ b/assets/minicpmv4_5/zh_extra.jpeg
--- a/assets/minicpmv_4_5_evaluation_result.png
+++ b/assets/minicpmv_4_5_evaluation_result.png
--- a/assets/radar_minicpm_v45.png
+++ b/assets/radar_minicpm_v45.png
--- a/assets/star-history-25-09-02.png
+++ b/assets/star-history-25-09-02.png
--- a/assets/wechat-QR.jpeg
+++ b/assets/wechat-QR.jpeg
--- a/docs/llamafactory_train_and_infer.md
+++ b/docs/llamafactory_train_and_infer.md
@@ -13,6 +13,7 @@
 - [Inference](#Inference)

 ## Support Models
+* [openbmb/MiniCPM-V-4](https://huggingface.co/openbmb/MiniCPM-V-4)
 * [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6)
 * [openbmb/MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6)

--- a/docs/minicpm_v2dot6_en.md
+++ b/docs/minicpm_v2dot6_en.md
@@ -0,0 +1,953 @@
+## MiniCPM-V 2.6
+
+> Archieve at: 2025-01-13
+
+**MiniCPM-V 2.6** is the latest and most capable model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-Llama3-V 2.5, and introduces new features for multi-image and video understanding. Notable features of MiniCPM-V 2.6 include:
+
+- 🔥 **Leading Performance.**
+  MiniCPM-V 2.6 achieves an average score of 65.2 on the latest version of OpenCompass, a comprehensive evaluation over 8 popular benchmarks. **With only 8B parameters, it surpasses widely used proprietary models like GPT-4o mini, GPT-4V, Gemini 1.5 Pro, and Claude 3.5 Sonnet** for single image understanding.
+
+- 🖼️ **Multi Image Understanding and In-context Learning.** MiniCPM-V 2.6 can also perform **conversation and reasoning over multiple images**. It achieves **state-of-the-art performance** on popular multi-image benchmarks such as Mantis-Eval, BLINK, Mathverse mv and Sciverse mv, and also shows promising in-context learning capability.
+
+- 🎬 **Video Understanding.** MiniCPM-V 2.6 can also **accept video inputs**, performing conversation and providing dense captions for spatial-temporal information. It outperforms **GPT-4V, Claude 3.5 Sonnet and LLaVA-NeXT-Video-34B** on Video-MME with/without subtitles.
+
+- 💪 **Strong OCR Capability and Others.**
+  MiniCPM-V 2.6 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344). It achieves **state-of-the-art performance on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V, and Gemini 1.5 Pro**.
+  Based on the the latest [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) and [VisCPM](https://github.com/OpenBMB/VisCPM) techniques, it features **trustworthy behaviors**, with significantly lower hallucination rates than GPT-4o and GPT-4V on Object HalBench, and supports **multilingual capabilities** on English, Chinese, German, French, Italian, Korean, etc.
+
+
+- 🚀 **Superior Efficiency.**
+  In addition to its friendly size, MiniCPM-V 2.6 also shows **state-of-the-art token density** (i.e., number of pixels encoded into each visual token). **It produces only 640 tokens when processing a 1.8M pixel image, which is 75% fewer than most models**. This directly improves the inference speed, first-token latency, memory usage, and power consumption. As a result, MiniCPM-V 2.6 can efficiently support **real-time video understanding** on end-side devices such as iPad.
+
+-  💫  **Easy Usage.**
+MiniCPM-V 2.6 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpmv-main/examples/llava/README-minicpmv2.6.md) and [ollama](https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/examples/minicpm-v2.6/README.md) support for efficient CPU inference on local devices, (2) [int4](https://huggingface.co/openbmb/MiniCPM-V-2_6-int4) and [GGUF](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) format quantized models in 16 sizes, (3) [vLLM](#inference-with-vllm) support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks, (5) quick local WebUI demo setup with [Gradio](#chat-with-our-demo-on-gradio), and (6) online web [demo](http://120.92.209.146:8887/).
+
+### Evaluation  <!-- omit in toc -->
+<div align="center">
+    <img src=../assets/radar_final.png width=66% />
+</div>
+
+<details>
+<summary>Click to view single image results on OpenCompass, MME, MMVet, OCRBench, MMMU, MathVista, MMB, AI2D, TextVQA, DocVQA, HallusionBench, Object HalBench. </summary>
+<div align="center">
+
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th>Token Density<sup>+</sup></th>
+            <th>OpenCompass</th>
+            <th>MME</th>
+            <th>MMVet</th>
+            <th>OCRBench</th>
+            <th>MMMU val</th>
+            <th>MathVista mini</th>
+            <th>MMB1.1 test</th>
+            <th>AI2D</th>
+            <th>TextVQA val</th>
+            <th>DocVQA test</th>
+            <th>HallusionBench</th>
+            <th>Object HalBench</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="15" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o</td>
+            <td>-</td>
+            <td>1088</td>
+            <td>69.9</td>
+            <td>2328.7</td>
+            <td>69.1</td>
+            <td>736</td>
+            <td>69.2</td>
+            <td>61.3</td>
+            <td>82.2</td>
+            <td>84.6</td>
+            <td>-</td>
+            <td>92.8</td>
+            <td>55.0</td>
+            <td>17.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Claude 3.5 Sonnet</td>
+            <td>-</td>
+            <td>750</td>
+            <td>67.9</td>
+            <td>1920.0</td>
+            <td>66.0</td>
+            <td>788</td>
+            <td>65.9</td>
+            <td>61.6</td>
+            <td>78.5</td>
+            <td>80.2</td>
+            <td>-</td>
+            <td>95.2</td>
+            <td>49.9</td>
+            <td>13.8</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Gemini 1.5 Pro</td>
+            <td>-</td>
+            <td>-</td>
+            <td>64.4</td>
+            <td>2110.6</td>
+            <td>64.0</td>
+            <td>754</td>
+            <td>60.6</td>
+            <td>57.7</td>
+            <td>73.9</td>
+            <td>79.1</td>
+            <td>73.5</td>
+            <td>86.5</td>
+            <td>45.6</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o mini</td>
+            <td>-</td>
+            <td>1088</td>
+            <td>64.1</td>
+            <td>2003.4</td>
+            <td>66.9</td>
+            <td>785</td>
+            <td>60.0</td>
+            <td>52.4</td>
+            <td>76.0</td>
+            <td>77.8</td>
+            <td>-</td>
+            <td>-</td>
+            <td>46.1</td>
+            <td>12.4</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4V</td>
+            <td>-</td>
+            <td>1088</td>
+            <td>63.5</td>
+            <td>2070.2</td>
+            <td>67.5</td>
+            <td>656</td>
+            <td>61.7</td>
+            <td>54.7</td>
+            <td>79.8</td>
+            <td>78.6</td>
+            <td>78.0</td>
+            <td>87.2</td>
+            <td>43.9</td>
+            <td>14.2</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Step-1V</td>
+            <td>-</td>
+            <td>-</td>
+            <td>59.5</td>
+            <td>2206.4</td>
+            <td>63.3</td>
+            <td>625</td>
+            <td>49.9</td>
+            <td>44.8</td>
+            <td>78.0</td>
+            <td>79.2</td>
+            <td>71.6</td>
+            <td>-</td>
+            <td>48.4</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen-VL-Max</td>
+            <td>-</td>
+            <td>784</td>
+            <td>58.3</td>
+            <td>2281.7</td>
+            <td>61.8</td>
+            <td>684</td>
+            <td>52.0</td>
+            <td>43.4</td>
+            <td>74.6</td>
+            <td>75.7</td>
+            <td>79.5</td>
+            <td>93.1</td>
+            <td>41.2</td>
+            <td>13.4</td>
+        </tr>
+        <tr>
+            <td colspan="15" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT-Yi-34B</td>
+            <td>34B</td>
+            <td>157</td>
+            <td>55.0</td>
+            <td>2006.5</td>
+            <td>50.7</td>
+            <td>574</td>
+            <td>48.8</td>
+            <td>40.4</td>
+            <td>77.8</td>
+            <td>78.9</td>
+            <td>69.3</td>
+            <td>-</td>
+            <td>34.8</td>
+            <td>12.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Mini-Gemini-HD-34B</td>
+            <td>34B</td>
+            <td>157</td>
+            <td>-</td>
+            <td>2141.0</td>
+            <td>59.3</td>
+            <td>518</td>
+            <td>48.0</td>
+            <td>43.3</td>
+            <td>-</td>
+            <td>80.5</td>
+            <td>74.1</td>
+            <td>78.9</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Cambrian-34B</td>
+            <td>34B</td>
+            <td>1820</td>
+            <td>58.3</td>
+            <td>2049.9</td>
+            <td>53.2</td>
+            <td>591</td>
+            <td>50.4</td>
+            <td>50.3</td>
+            <td>77.8</td>
+            <td>79.5</td>
+            <td>76.7</td>
+            <td>75.5</td>
+            <td>41.6</td>
+            <td>14.7</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GLM-4V-9B</td>
+            <td>13B</td>
+            <td>784</td>
+            <td>59.1</td>
+            <td>2018.8</td>
+            <td>58.0</td>
+            <td>776</td>
+            <td>46.9</td>
+            <td>51.1</td>
+            <td>67.9</td>
+            <td>71.2</td>
+            <td>-</td>
+            <td>-</td>
+            <td>45.0</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2-8B</td>
+            <td>8B</td>
+            <td>706</td>
+            <td>64.1</td>
+            <td>2215.1</td>
+            <td>54.3</td>
+            <td>794</td>
+            <td><strong>51.2</strong></td>
+            <td>58.3</td>
+            <td><strong>79.4</strong></td>
+            <td><strong>83.6</strong></td>
+            <td>77.4</td>
+            <td><strong>91.6</strong></td>
+            <td>45.0</td>
+            <td>21.3</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-Llama-V 2.5</td>
+            <td>8B</td>
+            <td>1882</td>
+            <td>58.8</td>
+            <td>2024.6</td>
+            <td>52.8</td>
+            <td>725</td>
+            <td>45.8</td>
+            <td>54.3</td>
+            <td>72.0</td>
+            <td>78.4</td>
+            <td>76.6</td>
+            <td>84.8</td>
+            <td>42.4</td>
+            <td>10.3</td>
+        </tr>
+        <tr style="background-color: #e6f2ff;">
+            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
+            <td>8B</td>
+            <td><strong>2822</strong></td>
+            <td><strong>65.2</strong></td>
+            <td><strong>2348.4</strong>*</td>
+            <td><strong>60.0</strong></td>
+            <td><strong>852</strong>*</td>
+            <td>49.8*</td>
+            <td><strong>60.6</strong></td>
+            <td>78.0</td>
+            <td>82.1</td>
+            <td><strong>80.1<strong></td>
+            <td>90.8</td>
+            <td><strong>48.1</strong>*</td>
+            <td><strong>8.2</strong></td>
+        </tr>
+    </tbody>
+</table>
+
+</div>
+* We evaluate this benchmark using chain-of-thought prompting. Specifically, for MME, we used this technique only for the Cognition set.
+
+<sup>+</sup> Token Density: number of pixels encoded into each visual token at maximum resolution, i.e., # pixels at maximum resolution / # visual tokens.
+
+Note: For proprietary models, we calculate token density based on the image encoding charging strategy defined in the official API documentation, which provides an upper-bound estimation.
+
+</details>
+
+
+<details>
+<summary>Click to view multi-image results on Mantis Eval, BLINK, Mathverse mv, Sciverse mv, MIRB.</summary>
+<div align="center">
+ 
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th>Mantis Eval</th>
+            <th>BLINK val</th>
+            <th>Mathverse mv</th>
+            <th>Sciverse mv</th>
+            <th>MIRB</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="7" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4V</td>
+            <td>-</td>
+            <td>62.7</td>
+            <td>54.6</td>
+            <td>60.3</td>
+            <td>66.9</td>
+            <td>53.1</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT-Interleave-14B</td>
+            <td>14B</td>
+            <td>66.4</td>
+            <td>52.6</td>
+            <td>32.7</td>
+            <td>30.2</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td colspan="7" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Emu2-Chat</td>
+            <td>37B</td>
+            <td>37.8</td>
+            <td>36.2</td>
+            <td>-</td>
+            <td>27.2</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">CogVLM</td>
+            <td>17B</td>
+            <td>45.2</td>
+            <td>41.1</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">VPG-C</td>
+            <td>7B</td>
+            <td>52.4</td>
+            <td>43.1</td>
+            <td>24.3</td>
+            <td>23.1</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">VILA 8B</td>
+            <td>8B</td>
+            <td>51.2</td>
+            <td>39.3</td>
+            <td>-</td>
+            <td>36.5</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternLM-XComposer-2.5</td>
+            <td>8B</td>
+            <td>53.1*</td>
+            <td>48.9</td>
+            <td>32.1*</td>
+            <td>-</td>
+            <td>42.5</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2-8B</td>
+            <td>8B</td>
+            <td>59.0*</td>
+            <td>50.9</td>
+            <td>30.5*</td>
+            <td>34.4*</td>
+            <td><strong>56.9*</strong></td>
+        </tr>
+        <tr style="background-color: #e6f2ff;">
+            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
+            <td>8B</td>
+            <td><strong>69.1</strong></td>
+            <td><strong>53.0</strong></td>
+            <td><strong>84.9</strong></td>
+            <td><strong>74.9</strong></td>
+            <td>53.8</td>
+        </tr>
+    </tbody>
+</table>
+
+</div>
+* We evaluate the officially released checkpoint by ourselves.
+</details>
+
+<details>
+<summary>Click to view video results on Video-MME and Video-ChatGPT.</summary>
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th colspan="2">Video-MME</th>
+            <th colspan="5">Video-ChatGPT</th>
+        </tr>
+        <tr>
+            <th align="left"></th>
+            <th></th>
+            <th>w/o subs</th>
+            <th>w subs</th>
+            <th>Correctness</th>
+            <th>Detail</th>
+            <th>Context</th>
+            <th>Temporal</th>
+            <th>Consistency</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="9" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Claude 3.5 Sonnet</td>
+            <td>-</td>
+            <td>60.0</td>
+            <td>62.9</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4V</td>
+            <td>-</td>
+            <td>59.9</td>
+            <td>63.3</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td colspan="9" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT-7B</td>
+            <td>7B</td>
+            <td>-</td>
+            <td>-</td>
+            <td>3.39</td>
+            <td>3.29</td>
+            <td>3.92</td>
+            <td>2.60</td>
+            <td>3.12</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT-34B</td>
+            <td>34B</td>
+            <td>-</td>
+            <td>-</td>
+            <td>3.29</td>
+            <td>3.23</td>
+            <td>3.83</td>
+            <td>2.51</td>
+            <td>3.47</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">CogVLM2-Video</td>
+            <td>12B</td>
+            <td>-</td>
+            <td>-</td>
+            <td>3.49</td>
+            <td><strong>3.46</strong></td>
+            <td>3.23</td>
+            <td><strong>2.98</strong></td>
+            <td><strong>3.64</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LongVA</td>
+            <td>7B</td>
+            <td>52.4</td>
+            <td>54.3</td>
+            <td>3.05</td>
+            <td>3.09</td>
+            <td>3.77</td>
+            <td>2.44</td>
+            <td><strong>3.64</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2-8B</td>
+            <td>8B</td>
+            <td>54.0</td>
+            <td>56.9</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternLM-XComposer-2.5</td>
+            <td>8B</td>
+            <td>55.8</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT-Video</td>
+            <td>32B</td>
+            <td>60.2</td>
+            <td>63.0</td>
+            <td>3.48</td>
+            <td>3.37</td>
+            <td><strong>3.95</strong></td>
+            <td>2.64</td>
+            <td>3.28</td>
+        </tr>
+        <tr style="background-color: #e6f2ff;">
+            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
+            <td>8B</td>
+            <td><strong>60.9</strong></td>
+            <td><strong>63.6</strong></td>
+            <td><strong>3.59</strong></td>
+            <td>3.28</td>
+            <td>3.93</td>
+            <td>2.73</td>
+            <td>3.62</td>
+        </tr>
+    </tbody>
+</table>
+</div>
+</details>
+
+
+<details>
+<summary>Click to view few-shot results on TextVQA, VizWiz, VQAv2, OK-VQA.</summary>
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th>Shot</th>
+            <th>TextVQA val</th>
+            <th>VizWiz test-dev</th>
+            <th>VQAv2 test-dev</th>
+            <th>OK-VQA val</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td align="left" nowrap="nowrap" rowspan="3">Flamingo</td>
+            <td rowspan="3">80B</td>
+            <td>0*</td>
+            <td>35.0</td>
+            <td>31.6</td>
+            <td>56.3</td>
+            <td>40.6</td>
+        </tr>
+        <tr>
+            <td>4</td>
+            <td>36.5</td>
+            <td>39.6</td>
+            <td>63.1</td>
+            <td><strong>57.4</strong></td>
+        </tr>
+        <tr>
+            <td>8</td>
+            <td>37.3</td>
+            <td>44.8</td>
+            <td>65.6</td>
+            <td>57.5</td>
+        </tr>
+        <tr>
+            <td align="left" nowrap="nowrap" rowspan="3">IDEFICS</td>
+            <td rowspan="3">80B</td>
+            <td>0*</td>
+            <td>30.9</td>
+            <td>36.0</td>
+            <td>60.0</td>
+            <td>45.2</td>
+        </tr>
+        <tr>
+            <td>4</td>
+            <td>34.3</td>
+            <td>40.4</td>
+            <td>63.6</td>
+            <td>52.4</td>
+        </tr>
+        <tr>
+            <td>8</td>
+            <td>35.7</td>
+            <td>46.1</td>
+            <td>64.8</td>
+            <td>55.1</td>
+        </tr>
+        <tr>
+            <td align="left" nowrap="nowrap" rowspan="3">OmniCorpus</td>
+            <td rowspan="3">7B</td>
+            <td>0*</td>
+            <td>43.0</td>
+            <td>49.8</td>
+            <td>63.2</td>
+            <td>45.5</td>
+        </tr>
+        <tr>
+            <td>4</td>
+            <td>45.4</td>
+            <td>51.3</td>
+            <td>64.5</td>
+            <td>46.5</td>
+        </tr>
+        <tr>
+            <td>8</td>
+            <td>45.6</td>
+            <td>52.2</td>
+            <td>64.7</td>
+            <td>46.6</td>
+        </tr>
+        <tr>
+            <td align="left" nowrap="nowrap" rowspan="3">Emu2</td>
+            <td rowspan="3">37B</td>
+            <td>0</td>
+            <td>26.4</td>
+            <td>40.4</td>
+            <td>33.5</td>
+            <td>26.7</td>
+        </tr>
+        <tr>
+            <td>4</td>
+            <td>48.2</td>
+            <td>54.6</td>
+            <td>67.0</td>
+            <td>53.2</td>
+        </tr>
+        <tr>
+            <td>8</td>
+            <td>49.3</td>
+            <td>54.7</td>
+            <td>67.8</td>
+            <td>54.1</td>
+        </tr>
+        <tr>
+            <td align="left" nowrap="nowrap" rowspan="2">MM1</td>
+            <td rowspan="2">30B</td>
+            <td>0</td>
+            <td>26.2</td>
+            <td>40.4</td>
+            <td>48.9</td>
+            <td>26.7</td>
+        </tr>
+        <tr>
+            <td>8</td>
+            <td>49.3</td>
+            <td>54.7</td>
+            <td><strong>70.9</strong></td>
+            <td>54.1</td>
+        </tr>
+        <tr style="background-color: #e6f2ff;">
+            <td align="left" nowrap="nowrap" rowspan="3">MiniCPM-V 2.6<sup>+</sup></td>
+            <td rowspan="3">8B</td>
+            <td>0</td>
+            <td>43.9</td>
+            <td>33.8</td>
+            <td>45.4</td>
+            <td>23.9</td>
+        </tr>
+        <tr style="background-color: #e6f2ff;">
+            <td>4</td>
+            <td>63.6</td>
+            <td>60.5</td>
+            <td>65.5</td>
+            <td>50.1</td>
+        </tr>
+        <tr style="background-color: #e6f2ff;">
+            <td>8</td>
+            <td><strong>64.6</strong></td>
+            <td><strong>63.4</strong></td>
+            <td>68.2</td>
+            <td>51.4</td>
+        </tr>
+    </tbody>
+</table>
+
+
+</div>
+* denotes zero image shot and two additional text shots following Flamingo.
+
+<sup>+</sup> We evaluate the pretraining ckpt without SFT.
+</details>
+
+### Examples <!-- omit in toc -->
+
+<div style="display: flex; flex-direction: column; align-items: center;">
+  <img src="../assets/minicpmv2_6/multi_img-bike.png" alt="Bike" style="margin-bottom: 5px;">
+  <img src="../assets/minicpmv2_6/multi_img-menu.png" alt="Menu" style="margin-bottom: 5px;">
+  <img src="../assets/minicpmv2_6/multi_img-code.png" alt="Code" style="margin-bottom: 5px;">
+  <img src="../assets/minicpmv2_6/ICL-Mem.png" alt="Mem" style="margin-bottom: 5px;">
+  <img src="../assets/minicpmv2_6/multiling-medal.png" alt="medal" style="margin-bottom: 10px;">
+</div>
+<details>
+  <summary>Click to view more cases.</summary>
+  <div style="display: flex; flex-direction: column; align-items: center;">
+    <img src="../assets/minicpmv2_6/ICL-elec.png" alt="elec" style="margin-bottom: 5px;">
+    <img src="../assets/minicpmv2_6/multiling-olympic.png" alt="Menu" style="margin-bottom: 10px;">
+  </div>
+</details>
+
+We deploy MiniCPM-V 2.6 on end devices. The demo video is the raw screen recording on a iPad Pro without edition.
+
+<table align="center"> 
+    <p align="center">
+      <img src="../assets/gif_cases/ai.gif" width=32%/>
+      &nbsp;&nbsp;&nbsp;&nbsp;
+      <img src="../assets/gif_cases/beer.gif" width=32%/>
+    </p>
+</table> 
+
+<table align="center"> 
+    <p align="center">
+      <img src="../assets/gif_cases/ticket.gif" width=32%/>
+      &nbsp;&nbsp;&nbsp;&nbsp;
+      <img src="../assets/gif_cases/wfh.gif" width=32%/>
+    </p>
+</table> 
+
+<table align="center">
+    <p align="center">
+      <video src="https://github.com/user-attachments/assets/21f4b818-ede1-4822-920e-91281725c830" width="360" /> </video>
+      <!-- <video src="https://github.com/user-attachments/assets/c835f757-206b-4d9c-8e36-70d67b453628" width="360" /> </video> -->
+    </p>
+</table>
+
+</details>
+
+
+
+### Multi-turn Conversation
+
+
+<div align="center">
+<img src="../assets/airplane.jpeg" width="500px">
+</div>
+
+
+```python
+import torch
+from PIL import Image
+from transformers import AutoModel, AutoTokenizer
+
+torch.manual_seed(0)
+
+model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
+    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
+model = model.eval().cuda()
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
+
+image = Image.open('./assets/airplane.jpeg').convert('RGB')
+
+# First round chat 
+question = "Tell me the model of this aircraft."
+msgs = [{'role': 'user', 'content': [image, question]}]
+
+answer = model.chat(
+    image=None,
+    msgs=msgs,
+    tokenizer=tokenizer
+)
+print(answer)
+
+# Second round chat 
+# pass history context of multi-turn conversation
+msgs.append({"role": "assistant", "content": [answer]})
+msgs.append({"role": "user", "content": ["Introduce something about Airbus A380."]})
+
+answer = model.chat(
+    image=None,
+    msgs=msgs,
+    tokenizer=tokenizer
+)
+print(answer)
+```
+
+You could get the following output:
+
+```
+"The aircraft in the image is an Airbus A380, which can be identified by its large size, double-deck structure, and the distinctive shape of its wings and engines. The A380 is a wide-body aircraft known for being the world's largest passenger airliner, designed for long-haul flights. It has four engines, which are characteristic of large commercial aircraft. The registration number on the aircraft can also provide specific information about the model if looked up in an aviation database."
+
+"The Airbus A380 is a double-deck, wide-body, four-engine jet airliner made by Airbus. It is the world's largest passenger airliner and is known for its long-haul capabilities. The aircraft was developed to improve efficiency and comfort for passengers traveling over long distances. It has two full-length passenger decks, which can accommodate more passengers than a typical single-aisle airplane. The A380 has been operated by airlines such as Lufthansa, Singapore Airlines, and Emirates, among others. It is widely recognized for its unique design and significant impact on the aviation industry."
+```
+
+#### Multi-image Understanding
+<details>
+<summary> Click to view Python example of MiniCPM-V 2.6 multi-image understanding </summary>
+  
+```python
+import torch
+from PIL import Image
+from transformers import AutoModel, AutoTokenizer
+
+model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
+    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
+model = model.eval().cuda()
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
+
+image1 = Image.open('image1.jpg').convert('RGB')
+image2 = Image.open('image2.jpg').convert('RGB')
+question = 'Compare image 1 and image 2, tell me about the differences between image 1 and image 2.'
+
+msgs = [{'role': 'user', 'content': [image1, image2, question]}]
+
+answer = model.chat(
+    image=None,
+    msgs=msgs,
+    tokenizer=tokenizer
+)
+print(answer)
+```
+</details>
+
+#### Few-shot In-Context-Learning 
+
+<details>
+<summary> Click to view Python example of MiniCPM-V 2.6 few-shot in-context-learning example </summary>
+
+```python
+import torch
+from PIL import Image
+from transformers import AutoModel, AutoTokenizer
+
+model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
+    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
+model = model.eval().cuda()
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
+
+question = "production date" 
+image1 = Image.open('example1.jpg').convert('RGB')
+answer1 = "2023.08.04"
+image2 = Image.open('example2.jpg').convert('RGB')
+answer2 = "2007.04.24"
+image_test = Image.open('test.jpg').convert('RGB')
+
+msgs = [
+    {'role': 'user', 'content': [image1, question]}, {'role': 'assistant', 'content': [answer1]},
+    {'role': 'user', 'content': [image2, question]}, {'role': 'assistant', 'content': [answer2]},
+    {'role': 'user', 'content': [image_test, question]}
+]
+
+answer = model.chat(
+    image=None,
+    msgs=msgs,
+    tokenizer=tokenizer
+)
+print(answer)
+```
+</details>
+
+#### Video understanding
+<details>
+<summary> Click to view Python example of MiniCPM-V 2.6 video understanding </summary>
+
+```python
+import torch
+from PIL import Image
+from transformers import AutoModel, AutoTokenizer
+from decord import VideoReader, cpu    # pip install decord
+
+model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
+    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
+model = model.eval().cuda()
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
+
+MAX_NUM_FRAMES=64 # if cuda OOM set a smaller number
+
+def encode_video(video_path):
+    def uniform_sample(l, n):
+        gap = len(l) / n
+        idxs = [int(i * gap + gap / 2) for i in range(n)]
+        return [l[i] for i in idxs]
+
+    vr = VideoReader(video_path, ctx=cpu(0))
+    sample_fps = round(vr.get_avg_fps() / 1)  # FPS
+    frame_idx = [i for i in range(0, len(vr), sample_fps)]
+    if len(frame_idx) > MAX_NUM_FRAMES:
+        frame_idx = uniform_sample(frame_idx, MAX_NUM_FRAMES)
+    frames = vr.get_batch(frame_idx).asnumpy()
+    frames = [Image.fromarray(v.astype('uint8')) for v in frames]
+    print('num frames:', len(frames))
+    return frames
+
+video_path="video_test.mp4"
+frames = encode_video(video_path)
+question = "Describe the video"
+msgs = [
+    {'role': 'user', 'content': frames + [question]}, 
+]
+
+# Set decode params for video
+params = {}
+params["use_image_id"] = False
+params["max_slice_nums"] = 2 # 如果cuda OOM且视频分辨率大于448*448可设为1
+
+answer = model.chat(
+    image=None,
+    msgs=msgs,
+    tokenizer=tokenizer,
+    **params
+)
+print(answer)
+```
+</details>
+
+### Model Zoo
+
+| Model           | Device | Memory    | &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; Description       | Download |
+|:-----------|:--:|:-----------:|:-------------------|:---------------:|
+| MiniCPM-V 2.6| GPU | 17 GB  | Strong end-side multimodal performance for single image, multi-image and video understanding.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6) |
+| MiniCPM-V 2.6 gguf | CPU | 6 GB  | The gguf version, lower memory usage and faster inference.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-gguf) |
+| MiniCPM-V 2.6 int4 | GPU | 7 GB  | The int4 quantized version, lower GPU memory usage.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6-int4) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-int4) |
--- a/docs/minicpm_v2dot6_zh.md
+++ b/docs/minicpm_v2dot6_zh.md
@@ -0,0 +1,773 @@
+## MiniCPM-V 2.6
+
+> Archieve at: 2025-08-25
+
+**MiniCPM-V 2.6** 是 MiniCPM-V 系列中最新、性能最佳的模型。该模型基于 SigLip-400M 和 Qwen2-7B 构建，共 8B 参数。与 MiniCPM-Llama3-V 2.5 相比，MiniCPM-V 2.6 性能提升显著，并引入了多图和视频理解的新功能。MiniCPM-V 2.6 的主要特点包括：
+
+
+- 🔥 **领先的性能。**
+  MiniCPM-V 2.6 在最新版本 OpenCompass 榜单上（综合 8 个主流多模态评测基准）平均得分 65.2，**以8B量级的大小在单图理解方面超越了 GPT-4o mini、GPT-4V、Gemini 1.5 Pro 和 Claude 3.5 Sonnet 等主流商用闭源多模态大模型**。
+
+- 🖼️ **多图理解和上下文学习。**
+  MiniCPM-V 2.6 还支持**多图对话和推理**。它在 Mantis-Eval、BLINK、Mathverse mv 和 Sciverse mv 等主流多图评测基准中取得了**最佳水平**，并展现出了优秀的上下文学习能力。
+
+- 🎬 **视频理解。**
+  MiniCPM-V 2.6 还可以**接受视频输入**，进行对话和提供涵盖时序和空间信息的详细视频描述。模型在 有/无字幕 评测场景下的 Video-MME 表现均超过了 **GPT-4V、Claude 3.5 Sonnet 和 LLaVA-NeXT-Video-34B**等商用闭源模型。
+
+- 💪 **强大的 OCR 能力及其他功能。**
+  MiniCPM-V 2.6 可以处理任意长宽比的图像，像素数可达 180 万（如 1344x1344）。在 OCRBench 上取得**最佳水平，超过 GPT-4o、GPT-4V 和 Gemini 1.5 Pro 等商用闭源模型**。基于最新的 [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) 和 [VisCPM](https://github.com/OpenBMB/VisCPM) 技术，其具备了**可信的多模态行为**，在 Object HalBench 上的幻觉率显著低于 GPT-4o 和 GPT-4V，并支持英语、中文、德语、法语、意大利语、韩语等**多种语言**。
+
+- 🚀 **卓越的效率。**
+  除了对个人用户友好的模型大小，MiniCPM-V 2.6 还表现出**最先进的视觉 token 密度**（即每个视觉 token 编码的像素数量）。它**仅需 640 个 token 即可处理 180 万像素图像，比大多数模型少 75%**。这一特性优化了模型的推理速度、首 token 延迟、内存占用和功耗。因此，MiniCPM-V 2.6 可以支持 iPad 等终端设备上的高效**实时视频理解**。
+
+- 💫 **易于使用。**
+  MiniCPM-V 2.6 可以通过多种方式轻松使用：(1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpmv-main/examples/llava/README-minicpmv2.6.md) 和 [ollama](https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/examples/minicpm-v2.6/README.md) 支持在本地设备上进行高效的 CPU 推理，(2) [int4](https://huggingface.co/openbmb/MiniCPM-V-2_6-int4) 和 [GGUF](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) 格式的量化模型，有 16 种尺寸，(3) [vLLM](#vllm-部署-) 支持高吞吐量和内存高效的推理，(4) 针对新领域和任务进行微调，(5) 使用 [Gradio](#本地-webui-demo-) 快速设置本地 WebUI 演示，(6) 在线[demo](http://120.92.209.146:8887/)即可体验。
+
+### 性能评估  <!-- omit in toc -->
+<div align="center">
+    <img src=assets/radar_final.png width=90% />
+</div>
+
+<details>
+<summary>点击查看 OpenCompass, MME, MMVet, OCRBench, MMMU, MathVista, MMB, AI2D, TextVQA, DocVQA, HallusionBench, Object HalBench 上的单图评测结果详情。 </summary>
+<div align="center">
+
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th>Token Density<sup>+</sup></th>
+            <th>OpenCompass</th>
+            <th>MME</th>
+            <th>MMVet</th>
+            <th>OCRBench</th>
+            <th>MMMU val</th>
+            <th>MathVista mini</th>
+            <th>MMB1.1 test</th>
+            <th>AI2D</th>
+            <th>TextVQA val</th>
+            <th>DocVQA test</th>
+            <th>HallusionBench</th>
+            <th>Object HalBench</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="15" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o</td>
+            <td>-</td>
+            <td>1088</td>
+            <td>69.9</td>
+            <td>2328.7</td>
+            <td>69.1</td>
+            <td>736</td>
+            <td>69.2</td>
+            <td>61.3</td>
+            <td>82.2</td>
+            <td>84.6</td>
+            <td>-</td>
+            <td>92.8</td>
+            <td>55.0</td>
+            <td>17.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Claude 3.5 Sonnet</td>
+            <td>-</td>
+            <td>750</td>
+            <td>67.9</td>
+            <td>1920.0</td>
+            <td>66.0</td>
+            <td>788</td>
+            <td>65.9</td>
+            <td>61.6</td>
+            <td>78.5</td>
+            <td>80.2</td>
+            <td>-</td>
+            <td>95.2</td>
+            <td>49.9</td>
+            <td>13.8</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Gemini 1.5 Pro</td>
+            <td>-</td>
+            <td>-</td>
+            <td>64.4</td>
+            <td>2110.6</td>
+            <td>64.0</td>
+            <td>754</td>
+            <td>60.6</td>
+            <td>57.7</td>
+            <td>73.9</td>
+            <td>79.1</td>
+            <td>73.5</td>
+            <td>86.5</td>
+            <td>45.6</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o mini</td>
+            <td>-</td>
+            <td>1088</td>
+            <td>64.1</td>
+            <td>2003.4</td>
+            <td>66.9</td>
+            <td>785</td>
+            <td>60.0</td>
+            <td>52.4</td>
+            <td>76.0</td>
+            <td>77.8</td>
+            <td>-</td>
+            <td>-</td>
+            <td>46.1</td>
+            <td>12.4</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4V</td>
+            <td>-</td>
+            <td>1088</td>
+            <td>63.5</td>
+            <td>2070.2</td>
+            <td>67.5</td>
+            <td>656</td>
+            <td>61.7</td>
+            <td>54.7</td>
+            <td>79.8</td>
+            <td>78.6</td>
+            <td>78.0</td>
+            <td>87.2</td>
+            <td>43.9</td>
+            <td>14.2</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Step-1V</td>
+            <td>-</td>
+            <td>-</td>
+            <td>59.5</td>
+            <td>2206.4</td>
+            <td>63.3</td>
+            <td>625</td>
+            <td>49.9</td>
+            <td>44.8</td>
+            <td>78.0</td>
+            <td>79.2</td>
+            <td>71.6</td>
+            <td>-</td>
+            <td>48.4</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen-VL-Max</td>
+            <td>-</td>
+            <td>784</td>
+            <td>58.3</td>
+            <td>2281.7</td>
+            <td>61.8</td>
+            <td>684</td>
+            <td>52.0</td>
+            <td>43.4</td>
+            <td>74.6</td>
+            <td>75.7</td>
+            <td>79.5</td>
+            <td>93.1</td>
+            <td>41.2</td>
+            <td>13.4</td>
+        </tr>
+        <tr>
+            <td colspan="15" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT-Yi-34B</td>
+            <td>34B</td>
+            <td>157</td>
+            <td>55.0</td>
+            <td>2006.5</td>
+            <td>50.7</td>
+            <td>574</td>
+            <td>48.8</td>
+            <td>40.4</td>
+            <td>77.8</td>
+            <td>78.9</td>
+            <td>69.3</td>
+            <td>-</td>
+            <td>34.8</td>
+            <td>12.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Mini-Gemini-HD-34B</td>
+            <td>34B</td>
+            <td>157</td>
+            <td>-</td>
+            <td>2141</td>
+            <td>59.3</td>
+            <td>518</td>
+            <td>48.0</td>
+            <td>43.3</td>
+            <td>-</td>
+            <td>80.5</td>
+            <td>74.1</td>
+            <td>78.9</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Cambrian-34B</td>
+            <td>34B</td>
+            <td>1820</td>
+            <td>58.3</td>
+            <td>2049.9</td>
+            <td>53.2</td>
+            <td>591</td>
+            <td>50.4</td>
+            <td>50.3</td>
+            <td>77.8</td>
+            <td>79.5</td>
+            <td>76.7</td>
+            <td>75.5</td>
+            <td>41.6</td>
+            <td>14.7</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GLM-4V-9B</td>
+            <td>13B</td>
+            <td>784</td>
+            <td>59.1</td>
+            <td>2018.8</td>
+            <td>58.0</td>
+            <td>776</td>
+            <td>46.9</td>
+            <td>51.1</td>
+            <td>67.9</td>
+            <td>71.2</td>
+            <td>-</td>
+            <td>-</td>
+            <td>45.0</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2-8B</td>
+            <td>8B</td>
+            <td>706</td>
+            <td>64.1</td>
+            <td>2215.1</td>
+            <td>54.3</td>
+            <td>794</td>
+            <td><strong>51.2</strong></td>
+            <td>58.3</td>
+            <td><strong>79.4</strong></td>
+            <td><strong>83.6</strong></td>
+            <td>77.4</td>
+            <td><strong>91.6</strong></td>
+            <td>45.0</td>
+            <td>21.3</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-Llama-V 2.5</td>
+            <td>8B</td>
+            <td>1882</td>
+            <td>58.8</td>
+            <td>2024.6</td>
+            <td>52.8</td>
+            <td>725</td>
+            <td>45.8</td>
+            <td>54.3</td>
+            <td>72.0</td>
+            <td>78.4</td>
+            <td>76.6</td>
+            <td>84.8</td>
+            <td>42.4</td>
+            <td>10.3</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
+            <td>8B</td>
+            <td><strong>2822</strong></td>
+            <td><strong>65.2</strong></td>
+            <td><strong>2348.4</strong>*</td>
+            <td><strong>60.0</strong></td>
+            <td><strong>852</strong>*</td>
+            <td>49.8*</td>
+            <td><strong>60.6</strong></td>
+            <td>78.0</td>
+            <td>82.1</td>
+            <td><strong>80.1<strong></td>
+            <td>90.8</td>
+            <td><strong>48.1</strong>*</td>
+            <td><strong>8.2</strong></td>
+        </tr>
+    </tbody>
+</table>
+
+</div>
+* 我们使用思维链提示词来评估这些基准。
+
+<sup>+</sup> Token Density：每个视觉 token 在最大分辨率下编码的像素数，即最大分辨率下的像素数 / 视觉 token 数。
+
+注意：闭源模型的 Token Density 由 API 收费方式估算得到。
+</details>
+
+
+<details>
+<summary>点击查看 Mantis Eval, BLINK, Mathverse mv, Sciverse mv, MIRB 上的多图评测结果详情。</summary>
+<div align="center">
+ 
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th>Mantis Eval</th>
+            <th>BLINK val</th>
+            <th>Mathverse mv</th>
+            <th>Sciverse mv</th>
+            <th>MIRB</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="7" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4V</td>
+            <td>-</td>
+            <td>62.7</td>
+            <td>54.6</td>
+            <td>60.3</td>
+            <td>66.9</td>
+            <td>53.1</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT-Interleave-14B</td>
+            <td>14B</td>
+            <td>66.4</td>
+            <td>52.6</td>
+            <td>32.7</td>
+            <td>30.2</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td colspan="7" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Emu2-Chat</td>
+            <td>37B</td>
+            <td>37.8</td>
+            <td>36.2</td>
+            <td>-</td>
+            <td>27.2</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">CogVLM</td>
+            <td>17B</td>
+            <td>45.2</td>
+            <td>41.1</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">VPG-C</td>
+            <td>7B</td>
+            <td>52.4</td>
+            <td>43.1</td>
+            <td>24.3</td>
+            <td>23.1</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">VILA 8B</td>
+            <td>8B</td>
+            <td>51.2</td>
+            <td>39.3</td>
+            <td>-</td>
+            <td>36.5</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternLM-XComposer-2.5</td>
+            <td>8B</td>
+            <td>53.1*</td>
+            <td>48.9</td>
+            <td>32.1*</td>
+            <td>-</td>
+            <td>42.5</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2-8B</td>
+            <td>8B</td>
+            <td>59.0*</td>
+            <td>50.9</td>
+            <td>30.5*</td>
+            <td>34.4*</td>
+            <td><strong>56.9*</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
+            <td>8B</td>
+            <td><strong>69.1</strong></td>
+            <td><strong>53.0</strong></td>
+            <td><strong>84.9</strong></td>
+            <td><strong>74.9</strong></td>
+            <td>53.8</td>
+        </tr>
+    </tbody>
+</table>
+
+
+</div>
+* 正式开源模型权重的评测结果。
+</details>
+
+<details>
+<summary>点击查看 Video-MME 和 Video-ChatGPT 上的视频评测结果详情。</summary>
+<div align="center">
+
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th colspan="2">Video-MME</th>
+            <th colspan="5">Video-ChatGPT</th>
+        </tr>
+        <tr>
+            <th align="left"></th>
+            <th></th>
+            <th>w/o subs</th>
+            <th>w subs</th>
+            <th>Correctness</th>
+            <th>Detail</th>
+            <th>Context</th>
+            <th>Temporal</th>
+            <th>Consistency</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="9" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Claude 3.5 Sonnet</td>
+            <td>-</td>
+            <td>60.0</td>
+            <td>62.9</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4V</td>
+            <td>-</td>
+            <td>59.9</td>
+            <td>63.3</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td colspan="9" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT-7B</td>
+            <td>7B</td>
+            <td>-</td>
+            <td>-</td>
+            <td>3.39</td>
+            <td>3.29</td>
+            <td>3.92</td>
+            <td>2.60</td>
+            <td>3.12</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT-34B</td>
+            <td>34B</td>
+            <td>-</td>
+            <td>-</td>
+            <td>3.29</td>
+            <td>3.23</td>
+            <td>3.83</td>
+            <td>2.51</td>
+            <td>3.47</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">CogVLM2-Video</td>
+            <td>12B</td>
+            <td>-</td>
+            <td>-</td>
+            <td>3.49</td>
+            <td><strong>3.46</strong></td>
+            <td>3.23</td>
+            <td><strong>2.98</strong></td>
+            <td><strong>3.64</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LongVA</td>
+            <td>7B</td>
+            <td>52.4</td>
+            <td>54.3</td>
+            <td>3.05</td>
+            <td>3.09</td>
+            <td>3.77</td>
+            <td>2.44</td>
+            <td><strong>3.64</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2-8B</td>
+            <td>8B</td>
+            <td>54.0</td>
+            <td>56.9</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternLM-XComposer-2.5</td>
+            <td>8B</td>
+            <td>55.8</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT-Video</td>
+            <td>32B</td>
+            <td>60.2</td>
+            <td>63.0</td>
+            <td>3.48</td>
+            <td>3.37</td>
+            <td><strong>3.95</strong></td>
+            <td>2.64</td>
+            <td>3.28</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
+            <td>8B</td>
+            <td><strong>60.9</strong></td>
+            <td><strong>63.6</strong></td>
+            <td><strong>3.59</strong></td>
+            <td>3.28</td>
+            <td>3.93</td>
+            <td>2.73</td>
+            <td>3.62</td>
+        </tr>
+    </tbody>
+</table>
+</div>
+</details>
+
+
+<details>
+<summary>点击查看 TextVQA, VizWiz, VQAv2, OK-VQA上的少样本评测结果详情。</summary>
+<div align="center">
+
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th>Shot</th>
+            <th>TextVQA val</th>
+            <th>VizWiz test-dev</th>
+            <th>VQAv2 test-dev</th>
+            <th>OK-VQA val</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td align="left" nowrap="nowrap" rowspan="3">Flamingo</td>
+            <td rowspan="3">80B</td>
+            <td>0*</td>
+            <td>35.0</td>
+            <td>31.6</td>
+            <td>56.3</td>
+            <td>40.6</td>
+        </tr>
+        <tr>
+            <td>4</td>
+            <td>36.5</td>
+            <td>39.6</td>
+            <td>63.1</td>
+            <td><strong>57.4</strong></td>
+        </tr>
+        <tr>
+            <td>8</td>
+            <td>37.3</td>
+            <td>44.8</td>
+            <td>65.6</td>
+            <td>57.5</td>
+        </tr>
+        <tr>
+            <td align="left" nowrap="nowrap" rowspan="3">IDEFICS</td>
+            <td rowspan="3">80B</td>
+            <td>0*</td>
+            <td>30.9</td>
+            <td>36.0</td>
+            <td>60.0</td>
+            <td>45.2</td>
+        </tr>
+        <tr>
+            <td>4</td>
+            <td>34.3</td>
+            <td>40.4</td>
+            <td>63.6</td>
+            <td>52.4</td>
+        </tr>
+        <tr>
+            <td>8</td>
+            <td>35.7</td>
+            <td>46.1</td>
+            <td>64.8</td>
+            <td>55.1</td>
+        </tr>
+        <tr>
+            <td align="left" nowrap="nowrap" rowspan="3">OmniCorpus</td>
+            <td rowspan="3">7B</td>
+            <td>0*</td>
+            <td>43.0</td>
+            <td>49.8</td>
+            <td>63.2</td>
+            <td>45.5</td>
+        </tr>
+        <tr>
+            <td>4</td>
+            <td>45.4</td>
+            <td>51.3</td>
+            <td>64.5</td>
+            <td>46.5</td>
+        </tr>
+        <tr>
+            <td>8</td>
+            <td>45.6</td>
+            <td>52.2</td>
+            <td>64.7</td>
+            <td>46.6</td>
+        </tr>
+        <tr>
+            <td align="left" nowrap="nowrap" rowspan="3">Emu2</td>
+            <td rowspan="3">37B</td>
+            <td>0</td>
+            <td>26.4</td>
+            <td>40.4</td>
+            <td>33.5</td>
+            <td>26.7</td>
+        </tr>
+        <tr>
+            <td>4</td>
+            <td>48.2</td>
+            <td>54.6</td>
+            <td>67.0</td>
+            <td>53.2</td>
+        </tr>
+        <tr>
+            <td>8</td>
+            <td>49.3</td>
+            <td>54.7</td>
+            <td>67.8</td>
+            <td>54.1</td>
+        </tr>
+        <tr>
+            <td align="left" nowrap="nowrap" rowspan="2">MM1</td>
+            <td rowspan="2">30B</td>
+            <td>0</td>
+            <td>26.2</td>
+            <td>40.4</td>
+            <td>48.9</td>
+            <td>26.7</td>
+        </tr>
+        <tr>
+            <td>8</td>
+            <td>49.3</td>
+            <td>54.7</td>
+            <td><strong>70.9</strong></td>
+            <td>54.1</td>
+        </tr>
+        <tr>
+            <td align="left" nowrap="nowrap" rowspan="3">MiniCPM-V 2.6<sup>+</sup></td>
+            <td rowspan="3">8B</td>
+            <td>0</td>
+            <td>43.9</td>
+            <td>33.8</td>
+            <td>45.4</td>
+            <td>23.9</td>
+        </tr>
+        <tr>
+            <td>4</td>
+            <td>63.6</td>
+            <td>60.5</td>
+            <td>65.5</td>
+            <td>50.1</td>
+        </tr>
+        <tr>
+            <td>8</td>
+            <td><strong>64.6</strong></td>
+            <td><strong>63.4</strong></td>
+            <td>68.2</td>
+            <td>51.4</td>
+        </tr>
+    </tbody>
+</table>
+
+
+</div>
+* 使用 Flamingo 方式 zero image shot 和 two additional text shots 评估零样本性能。
+
+<sup>+</sup> 我们在没有进行监督微调 (SFT) 的情况下评估预训练的模型权重 (ckpt)。
+</details>
+
+### 典型示例 <!-- omit in toc -->
+
+<div style="display: flex; flex-direction: column; align-items: center;">
+  <img src="../assets/minicpmv2_6/multi_img-bike.png" alt="Bike" style="margin-bottom: 5px;">
+  <img src="../assets/minicpmv2_6/multi_img-menu.png" alt="Menu" style="margin-bottom: 5px;">
+  <img src="../assets/minicpmv2_6/multi_img-code.png" alt="Code" style="margin-bottom: 5px;">
+  <img src="../assets/minicpmv2_6/ICL-Mem.png" alt="Mem" style="margin-bottom: 5px;">
+  <img src="../assets/minicpmv2_6/multiling-medal.png" alt="medal" style="margin-bottom: 10px;">
+</div>
+<details>
+  <summary>点击查看更多示例。</summary>
+  <div style="display: flex; flex-direction: column; align-items: center;">
+    <img src="../assets/minicpmv2_6/ICL-elec.png" alt="elec" style="margin-bottom: 5px;">
+    <img src="../assets/minicpmv2_6/multiling-olympic.png" alt="Menu" style="margin-bottom: 10px;">
+  </div>
+</details>
+
+我们将 MiniCPM-V 2.6 部署在iPad Pro上，并录制了以下演示视频。
+
+<table align="center"> 
+    <p align="center">
+      <img src="../assets/gif_cases/ai.gif" width=32%/>
+      &nbsp;&nbsp;&nbsp;&nbsp;
+      <img src="../assets/gif_cases/beer.gif" width=32%/>
+    </p>
+</table>
+
+<table align="center">
+    <p align="center">
+      <video src="https://github.com/user-attachments/assets/21f4b818-ede1-4822-920e-91281725c830" width="360" /> </video>
+      <!-- <video src="https://github.com/user-attachments/assets/c835f757-206b-4d9c-8e36-70d67b453628" width="360" /> </video> -->
+    </p>
+</table>
+
+</details>
+
+
+
+### 模型库
+
+| 模型           | 设备 | 资源     | &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; 简介       | 下载链接 |
+|:--------------|:-:|:----------:|:-------------------|:---------------:|
+| MiniCPM-V 2.6| GPU | 17 GB  | 提供出色的端侧单图、多图、视频理解能力。   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6) |
+| MiniCPM-V 2.6 gguf | CPU | 6 GB  | gguf 版本，更低的内存占用和更高的推理效率。   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-gguf) |
+| MiniCPM-V 2.6 int4 | GPU | 7 GB  | int4量化版，更低显存占用。   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6-int4) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-int4) |
--- a/docs/minicpm_v4_en.md
+++ b/docs/minicpm_v4_en.md
@@ -0,0 +1,556 @@
+## MiniCPM-V 4.0
+
+> Archieve at: 2025-08-25
+
+**MiniCPM-V 4.0** is the latest efficient model in the MiniCPM-V series. The model is built based on SigLIP2-400M and MiniCPM4-3B with a total of 4.1B parameters. It inherits the strong single-image, multi-image and video understanding performance of MiniCPM-V 2.6 with largely improved efficiency. Notable features of MiniCPM-V 4.0 include:
+
+- 🔥 **Leading Visual Capability.**
+   With only 4.1B parameters, MiniCPM-V 4.0 achieves an average score of 69.0 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks, **outperforming GPT-4.1-mini-20250414, MiniCPM-V 2.6 (8.1B params, OpenCompass 65.2) and Qwen2.5-VL-3B-Instruct (3.8B params, OpenCompass 64.5)**. It also shows good performance in multi-image understanding and video understanding.
+
+- 🚀 **Superior Efficiency.**
+  Designed for on-device deployment, MiniCPM-V 4.0 runs smoothly on end devices. For example, it devlivers **less than 2s first token delay and more than 17 token/s decoding on iPhone 16 Pro Max**, without heating problems. It also shows superior throughput under concurrent requests.
+
+-  💫  **Easy Usage.**
+  MiniCPM-V 4.0 can be easily used in various ways including **llama.cpp, Ollama, vLLM, SGLang, LLaMA-Factory and local web demo** etc. We also open-source iOS App that can run on iPhone and iPad. Get started easily with our well-structured [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook), featuring detailed instructions and practical examples.
+
+### Evaluation  <!-- omit in toc -->
+
+<details>
+<summary>Click to view single image results on OpenCompass. </summary>
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th nowrap="nowrap" align="left">model</th>
+            <th>Size</th>
+            <th>Opencompass</th>
+            <th>OCRBench</th>
+            <th>MathVista</th>
+            <th>HallusionBench</th>
+            <th>MMMU</th>
+            <th>MMVet</th>
+            <th>MMBench V1.1</th>
+            <th>MMStar</th>
+            <th>AI2D</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="11" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4v-20240409</td>
+            <td>-</td>
+            <td>63.5</td>
+            <td>656</td>
+            <td>55.2</td>
+            <td>43.9</td>
+            <td>61.7</td>
+            <td>67.5</td>
+            <td>79.8</td>
+            <td>56.0</td>
+            <td>78.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Gemini-1.5-Pro</td>
+            <td>-</td>
+            <td>64.5</td>
+            <td>754</td>
+            <td>58.3</td>
+            <td>45.6</td>
+            <td>60.6</td>
+            <td>64.0</td>
+            <td>73.9</td>
+            <td>59.1</td>
+            <td>79.1</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4.1-mini-20250414</td>
+            <td>-</td>
+            <td>68.9</td>
+            <td>840</td>
+            <td>70.9</td>
+            <td>49.3</td>
+            <td>55.0</td>
+            <td>74.3</td>
+            <td>80.9</td>
+            <td>60.9</td>
+            <td>76.0</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Claude 3.5 Sonnet-20241022</td>
+            <td>-</td>
+            <td>70.6</td>
+            <td>798</td>
+            <td>65.3</td>
+            <td>55.5</td>
+            <td>66.4</td>
+            <td>70.1</td>
+            <td>81.7</td>
+            <td>65.1</td>
+            <td>81.2</td>
+        </tr>
+        <tr>
+            <td colspan="11" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2.5-VL-3B-Instruct</td>
+            <td>3.8B</td>
+            <td>64.5</td>
+            <td>828</td>
+            <td>61.2</td>
+            <td>46.6</td>
+            <td>51.2</td>
+            <td>60.0</td>
+            <td>76.8</td>
+            <td>56.3</td>
+            <td>81.4</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2.5-4B</td>
+            <td>3.7B</td>
+            <td>65.1</td>
+            <td>820</td>
+            <td>60.8</td>
+            <td>46.6</td>
+            <td>51.8</td>
+            <td>61.5</td>
+            <td>78.2</td>
+            <td>58.7</td>
+            <td>81.4</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2.5-VL-7B-Instruct</td>
+            <td>8.3B</td>
+            <td>70.9</td>
+            <td>888</td>
+            <td>68.1</td>
+            <td>51.9</td>
+            <td>58.0</td>
+            <td>69.7</td>
+            <td>82.2</td>
+            <td>64.1</td>
+            <td>84.3</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2.5-8B</td>
+            <td>8.1B</td>
+            <td>68.1</td>
+            <td>821</td>
+            <td>64.5</td>
+            <td>49.0</td>
+            <td>56.2</td>
+            <td>62.8</td>
+            <td>82.5</td>
+            <td>63.2</td>
+            <td>84.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V-2.6</td>
+            <td>8.1B</td>
+            <td>65.2</td>
+            <td>852</td>
+            <td>60.8</td>
+            <td>48.1</td>
+            <td>49.8</td>
+            <td>60.0</td>
+            <td>78.0</td>
+            <td>57.5</td>
+            <td>82.1</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o-2.6</td>
+            <td>8.7B</td>
+            <td>70.2</td>
+            <td>889</td>
+            <td>73.3</td>
+            <td>51.1</td>
+            <td>50.9</td>
+            <td>67.2</td>
+            <td>80.6</td>
+            <td>63.3</td>
+            <td>86.1</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V-4.0</td>
+            <td>4.1B</td>
+            <td>69.0</td>
+            <td>894</td>
+            <td>66.9</td>
+            <td>50.8</td>
+            <td>51.2</td>
+            <td>68.0</td>
+            <td>79.7</td>
+            <td>62.8</td>
+            <td>82.9</td>
+        </tr>
+    </tbody>
+</table>
+</div>
+
+</details>
+
+<details>
+<summary>Click to view single image results on ChartQA, MME, RealWorldQA, TextVQA, DocVQA, MathVision, DynaMath, WeMath, Object HalBench and MM Halbench. </summary>
+
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th nowrap="nowrap" align="left">model</th>
+            <th>Size</th>
+            <th>ChartQA</th>
+            <th>MME</th>
+            <th>RealWorldQA</th>
+            <th>TextVQA</th>
+            <th>DocVQA</th>
+            <th>MathVision</th>
+            <th>DynaMath</th>
+            <th>WeMath</th>
+            <th colspan="2">Obj Hal</th>
+            <th colspan="2">MM Hal</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td>CHAIRs↓</td>
+            <td>CHAIRi↓</td>
+            <td nowrap="nowrap">score avg@3↑</td>
+            <td nowrap="nowrap">hall rate avg@3↓</td>
+        </tr>
+        <tbody align="center">
+        <tr>
+            <td colspan="14" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4v-20240409</td>
+            <td>-</td>
+            <td>78.5</td>
+            <td>1927</td>
+            <td>61.4</td>
+            <td>78.0</td>
+            <td>88.4</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Gemini-1.5-Pro</td>
+            <td>-</td>
+            <td>87.2</td>
+            <td>-</td>
+            <td>67.5</td>
+            <td>78.8</td>
+            <td>93.1</td>
+            <td>41.0</td>
+            <td>31.5</td>
+            <td>50.5</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4.1-mini-20250414</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>45.3</td>
+            <td>47.7</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Claude 3.5 Sonnet-20241022</td>
+            <td>-</td>
+            <td>90.8</td>
+            <td>-</td>
+            <td>60.1</td>
+            <td>74.1</td>
+            <td>95.2</td>
+            <td>35.6</td>
+            <td>35.7</td>
+            <td>44.0</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td colspan="14" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2.5-VL-3B-Instruct</td>
+            <td>3.8B</td>
+            <td>84.0</td>
+            <td>2157</td>
+            <td>65.4</td>
+            <td>79.3</td>
+            <td>93.9</td>
+            <td>21.9</td>
+            <td>13.2</td>
+            <td>22.9</td>
+            <td>18.3</td>
+            <td>10.8</td>
+            <td>3.9 </td>
+            <td>33.3 </td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2.5-4B</td>
+            <td>3.7B</td>
+            <td>84.0</td>
+            <td>2338</td>
+            <td>64.3</td>
+            <td>76.8</td>
+            <td>91.6</td>
+            <td>18.4</td>
+            <td>15.2</td>
+            <td>21.2</td>
+            <td>13.7</td>
+            <td>8.7</td>
+            <td>3.2 </td>
+            <td>46.5 </td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2.5-VL-7B-Instruct</td>
+            <td>8.3B</td>
+            <td>87.3</td>
+            <td>2347</td>
+            <td>68.5</td>
+            <td>84.9</td>
+            <td>95.7</td>
+            <td>25.4</td>
+            <td>21.8</td>
+            <td>36.2</td>
+            <td>13.3</td>
+            <td>7.9</td>
+            <td>4.1 </td>
+            <td>31.6 </td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2.5-8B</td>
+            <td>8.1B</td>
+            <td>84.8</td>
+            <td>2344</td>
+            <td>70.1</td>
+            <td>79.1</td>
+            <td>93.0</td>
+            <td>17.0</td>
+            <td>9.4</td>
+            <td>23.5</td>
+            <td>18.3</td>
+            <td>11.6</td>
+            <td>3.6 </td>
+            <td>37.2</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V-2.6</td>
+            <td>8.1B</td>
+            <td>79.4</td>
+            <td>2348</td>
+            <td>65.0</td>
+            <td>80.1</td>
+            <td>90.8</td>
+            <td>17.5</td>
+            <td>9.0</td>
+            <td>20.4</td>
+            <td>7.3</td>
+            <td>4.7</td>
+            <td>4.0 </td>
+            <td>29.9 </td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o-2.6</td>
+            <td>8.7B</td>
+            <td>86.9</td>
+            <td>2372</td>
+            <td>68.1</td>
+            <td>82.0</td>
+            <td>93.5</td>
+            <td>21.7</td>
+            <td>10.4</td>
+            <td>25.2</td>
+            <td>6.3</td>
+            <td>3.4</td>
+            <td>4.1 </td>
+            <td>31.3 </td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V-4.0</td>
+            <td>4.1B</td>
+            <td>84.4</td>
+            <td>2298</td>
+            <td>68.5</td>
+            <td>80.8</td>
+            <td>92.9</td>
+            <td>20.7</td>
+            <td>14.2</td>
+            <td>32.7</td>
+            <td>6.3</td>
+            <td>3.5</td>
+            <td>4.1 </td>
+            <td>29.2 </td>
+        </tr>
+    </tbody>
+</table>
+</div>
+
+</details>
+
+<details>
+<summary>Click to view multi-image and video understanding results on Mantis, Blink and Video-MME. </summary>
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th nowrap="nowrap" align="left">model</th>
+            <th>Size</th>
+            <th>Mantis</th>
+            <th>Blink</th>
+            <th nowrap="nowrap" colspan="2" >Video-MME</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td>wo subs</td>
+            <td>w subs</td>
+        </tr>
+        <tbody align="center">
+        <tr>
+            <td colspan="6" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4v-20240409</td>
+            <td>-</td>
+            <td>62.7</td>
+            <td>54.6</td>
+            <td>59.9</td>
+            <td>63.3</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Gemini-1.5-Pro</td>
+            <td>-</td>
+            <td>-</td>
+            <td>59.1</td>
+            <td>75.0</td>
+            <td>81.3</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o-20240513</td>
+            <td>-</td>
+            <td>-</td>
+            <td>68.0</td>
+            <td>71.9</td>
+            <td>77.2</td>
+        </tr>
+        <tr>
+            <td colspan="6" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2.5-VL-3B-Instruct</td>
+            <td>3.8B</td>
+            <td>-</td>
+            <td>47.6</td>
+            <td>61.5</td>
+            <td>67.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2.5-4B</td>
+            <td>3.7B</td>
+            <td>62.7</td>
+            <td>50.8</td>
+            <td>62.3</td>
+            <td>63.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2.5-VL-7B-Instruct</td>
+            <td>8.3B</td>
+            <td>-</td>
+            <td>56.4</td>
+            <td>65.1</td>
+            <td>71.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2.5-8B</td>
+            <td>8.1B</td>
+            <td>67.7</td>
+            <td>54.8</td>
+            <td>64.2</td>
+            <td>66.9</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V-2.6</td>
+            <td>8.1B</td>
+            <td>69.1</td>
+            <td>53.0</td>
+            <td>60.9</td>
+            <td>63.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o-2.6</td>
+            <td>8.7B</td>
+            <td>71.9</td>
+            <td>56.7</td>
+            <td>63.9</td>
+            <td>69.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V-4.0</td>
+            <td>4.1B</td>
+            <td>71.4</td>
+            <td>54.0</td>
+            <td>61.2</td>
+            <td>65.8</td>
+        </tr>
+    </tbody>
+</table>
+</div>
+
+</details>
+
+### Examples
+
+<div style="display: flex; flex-direction: column; align-items: center;">
+  <img src="../assets/minicpmv4/minicpm-v-4-case.png" alt="math" style="margin-bottom: 5px;">
+</div>
+
+We deploy MiniCPM-V 4.0 on iPhone 16 Pro Max with [iOS demo](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/demo/ios_demo/ios.md). The demo video is the raw screen recording without edition.
+
+<table align="center"> 
+    <p align="center">
+      <img src="../assets/minicpmv4/iphone_en.gif" width=45%/>
+      &nbsp;&nbsp;&nbsp;&nbsp;
+      <img src="../assets/minicpmv4/iphone_en_information_extraction.gif" width=45%/>
+    </p>
+    <p align="center">
+      <img src="../assets/minicpmv4/iphone_cn.gif" width=45%/>
+      &nbsp;&nbsp;&nbsp;&nbsp;
+      <img src="../assets/minicpmv4/iphone_cn_funny_points.gif" width=45%/>
+    </p>
+</table> 
+
+
--- a/docs/minicpm_v4_zh.md
+++ b/docs/minicpm_v4_zh.md
@@ -0,0 +1,557 @@
+## MiniCPM-V 4.0
+
+> Archieve at: 2025-08-25
+
+MiniCPM-V 4.0 是 MiniCPM-V 系列中的最新模型。该模型基于 SigLIP2-400M 和 MiniCPM4-3B 构建，参数总量为 4.1B。它延续了 MiniCPM-V 2.6 在单图、多图和视频理解方面的强大能力，同时大幅提升了推理效率。MiniCPM-V 4.0 的主要特点包括：
+
+- 🔥 **领先的视觉能力。**
+MiniCPM-V 4.0 在 OpenCompass 上获得了平均 69.0 的高分，超越了 MiniCPM-V 2.6（8.1B，得分 65.2）、 Qwen2.5-VL-3B-Instruct（3.8B，得分 64.5）和**广泛使用的闭源模型 GPT-4.1-mini-20250414**。在多图理解与视频理解任务上，MiniCPM-V 4.0 也表现出色。
+
+- 🚀 **卓越的效率。**
+MiniCPM-V 4.0 专为端侧设备优化，**可在 iPhone 16 Pro Max 上流畅运行，首 token 延迟低至 2 秒，解码速度达 17.9 tokens/s**，且无发热问题。MiniCPM-V 4.0 在并发请求场景下表现出领先的吞吐率指标。
+
+- 💫 **易于使用。**
+MiniCPM-V 4.0 支持多种推理方式，包括 **llama.cpp、Ollama、vLLM、SGLang、LLaMA-Factory 及本地 Web Demo 等**。我们还开源了可以在 iPhone 和 iPad 运行的 iOS App。欢迎参考我们开源的 **结构清晰的[使用手册](https://github.com/OpenSQZ/MiniCPM-V-CookBook)** 玩转 MiniCPM-V 4.0，其中涵盖了详细的部署指南和真实示例。
+
+
+### 性能评估 <!-- omit in toc -->
+
+
+<details>
+<summary>点击查看在OpenCompass上的单图理解能力的评测结果。</summary>
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th nowrap="nowrap" align="left">model</th>
+            <th>Size</th>
+            <th>Opencompass</th>
+            <th>OCRBench</th>
+            <th>MathVista</th>
+            <th>HallusionBench</th>
+            <th>MMMU</th>
+            <th>MMVet</th>
+            <th>MMBench V1.1</th>
+            <th>MMStar</th>
+            <th>AI2D</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="11" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4v-20240409</td>
+            <td>-</td>
+            <td>63.5</td>
+            <td>656</td>
+            <td>55.2</td>
+            <td>43.9</td>
+            <td>61.7</td>
+            <td>67.5</td>
+            <td>79.8</td>
+            <td>56.0</td>
+            <td>78.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Gemini-1.5-Pro</td>
+            <td>-</td>
+            <td>64.5</td>
+            <td>754</td>
+            <td>58.3</td>
+            <td>45.6</td>
+            <td>60.6</td>
+            <td>64.0</td>
+            <td>73.9</td>
+            <td>59.1</td>
+            <td>79.1</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4.1-mini-20250414</td>
+            <td>-</td>
+            <td>68.9</td>
+            <td>840</td>
+            <td>70.9</td>
+            <td>49.3</td>
+            <td>55.0</td>
+            <td>74.3</td>
+            <td>80.9</td>
+            <td>60.9</td>
+            <td>76.0</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Claude 3.5 Sonnet-20241022</td>
+            <td>-</td>
+            <td>70.6</td>
+            <td>798</td>
+            <td>65.3</td>
+            <td>55.5</td>
+            <td>66.4</td>
+            <td>70.1</td>
+            <td>81.7</td>
+            <td>65.1</td>
+            <td>81.2</td>
+        </tr>
+        <tr>
+            <td colspan="11" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2.5-VL-3B-Instruct</td>
+            <td>3.8B</td>
+            <td>64.5</td>
+            <td>828</td>
+            <td>61.2</td>
+            <td>46.6</td>
+            <td>51.2</td>
+            <td>60.0</td>
+            <td>76.8</td>
+            <td>56.3</td>
+            <td>81.4</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2.5-4B</td>
+            <td>3.7B</td>
+            <td>65.1</td>
+            <td>820</td>
+            <td>60.8</td>
+            <td>46.6</td>
+            <td>51.8</td>
+            <td>61.5</td>
+            <td>78.2</td>
+            <td>58.7</td>
+            <td>81.4</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2.5-VL-7B-Instruct</td>
+            <td>8.3B</td>
+            <td>70.9</td>
+            <td>888</td>
+            <td>68.1</td>
+            <td>51.9</td>
+            <td>58.0</td>
+            <td>69.7</td>
+            <td>82.2</td>
+            <td>64.1</td>
+            <td>84.3</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2.5-8B</td>
+            <td>8.1B</td>
+            <td>68.1</td>
+            <td>821</td>
+            <td>64.5</td>
+            <td>49.0</td>
+            <td>56.2</td>
+            <td>62.8</td>
+            <td>82.5</td>
+            <td>63.2</td>
+            <td>84.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V-2.6</td>
+            <td>8.1B</td>
+            <td>65.2</td>
+            <td>852</td>
+            <td>60.8</td>
+            <td>48.1</td>
+            <td>49.8</td>
+            <td>60.0</td>
+            <td>78.0</td>
+            <td>57.5</td>
+            <td>82.1</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o-2.6</td>
+            <td>8.7B</td>
+            <td>70.2</td>
+            <td>889</td>
+            <td>73.3</td>
+            <td>51.1</td>
+            <td>50.9</td>
+            <td>67.2</td>
+            <td>80.6</td>
+            <td>63.3</td>
+            <td>86.1</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V-4.0</td>
+            <td>4.1B</td>
+            <td>69.0</td>
+            <td>894</td>
+            <td>66.9</td>
+            <td>50.8</td>
+            <td>51.2</td>
+            <td>68.0</td>
+            <td>79.7</td>
+            <td>62.8</td>
+            <td>82.9</td>
+        </tr>
+    </tbody>
+</table>
+</div>
+
+</details>
+
+<details>
+<summary>点击查看在图表理解、文档理解、数学推理、幻觉等领域的评测结果。 </summary>
+
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th nowrap="nowrap" align="left">model</th>
+            <th>Size</th>
+            <th>ChartQA</th>
+            <th>MME</th>
+            <th>RealWorldQA</th>
+            <th>TextVQA</th>
+            <th>DocVQA</th>
+            <th>MathVision</th>
+            <th>DynaMath</th>
+            <th>WeMath</th>
+            <th colspan="2">Obj Hal</th>
+            <th colspan="2">MM Hal</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td>CHAIRs↓</td>
+            <td>CHAIRi↓</td>
+            <td nowrap="nowrap">score avg@3↑</td>
+            <td nowrap="nowrap">hall rate avg@3↓</td>
+        </tr>
+        <tbody align="center">
+        <tr>
+            <td colspan="14" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4v-20240409</td>
+            <td>-</td>
+            <td>78.5</td>
+            <td>1927</td>
+            <td>61.4</td>
+            <td>78.0</td>
+            <td>88.4</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Gemini-1.5-Pro</td>
+            <td>-</td>
+            <td>87.2</td>
+            <td>-</td>
+            <td>67.5</td>
+            <td>78.8</td>
+            <td>93.1</td>
+            <td>41.0</td>
+            <td>31.5</td>
+            <td>50.5</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4.1-mini-20250414</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>45.3</td>
+            <td>47.7</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Claude 3.5 Sonnet-20241022</td>
+            <td>-</td>
+            <td>90.8</td>
+            <td>-</td>
+            <td>60.1</td>
+            <td>74.1</td>
+            <td>95.2</td>
+            <td>35.6</td>
+            <td>35.7</td>
+            <td>44.0</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td colspan="14" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2.5-VL-3B-Instruct</td>
+            <td>3.8B</td>
+            <td>84.0</td>
+            <td>2157</td>
+            <td>65.4</td>
+            <td>79.3</td>
+            <td>93.9</td>
+            <td>21.9</td>
+            <td>13.2</td>
+            <td>22.9</td>
+            <td>18.3</td>
+            <td>10.8</td>
+            <td>3.9 </td>
+            <td>33.3 </td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2.5-4B</td>
+            <td>3.7B</td>
+            <td>84.0</td>
+            <td>2338</td>
+            <td>64.3</td>
+            <td>76.8</td>
+            <td>91.6</td>
+            <td>18.4</td>
+            <td>15.2</td>
+            <td>21.2</td>
+            <td>13.7</td>
+            <td>8.7</td>
+            <td>3.2 </td>
+            <td>46.5 </td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2.5-VL-7B-Instruct</td>
+            <td>8.3B</td>
+            <td>87.3</td>
+            <td>2347</td>
+            <td>68.5</td>
+            <td>84.9</td>
+            <td>95.7</td>
+            <td>25.4</td>
+            <td>21.8</td>
+            <td>36.2</td>
+            <td>13.3</td>
+            <td>7.9</td>
+            <td>4.1 </td>
+            <td>31.6 </td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2.5-8B</td>
+            <td>8.1B</td>
+            <td>84.8</td>
+            <td>2344</td>
+            <td>70.1</td>
+            <td>79.1</td>
+            <td>93.0</td>
+            <td>17.0</td>
+            <td>9.4</td>
+            <td>23.5</td>
+            <td>18.3</td>
+            <td>11.6</td>
+            <td>3.6 </td>
+            <td>37.2</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V-2.6</td>
+            <td>8.1B</td>
+            <td>79.4</td>
+            <td>2348</td>
+            <td>65.0</td>
+            <td>80.1</td>
+            <td>90.8</td>
+            <td>17.5</td>
+            <td>9.0</td>
+            <td>20.4</td>
+            <td>7.3</td>
+            <td>4.7</td>
+            <td>4.0 </td>
+            <td>29.9 </td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o-2.6</td>
+            <td>8.7B</td>
+            <td>86.9</td>
+            <td>2372</td>
+            <td>68.1</td>
+            <td>82.0</td>
+            <td>93.5</td>
+            <td>21.7</td>
+            <td>10.4</td>
+            <td>25.2</td>
+            <td>6.3</td>
+            <td>3.4</td>
+            <td>4.1 </td>
+            <td>31.3 </td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V-4.0</td>
+            <td>4.1B</td>
+            <td>84.4</td>
+            <td>2298</td>
+            <td>68.5</td>
+            <td>80.8</td>
+            <td>92.9</td>
+            <td>20.7</td>
+            <td>14.2</td>
+            <td>32.7</td>
+            <td>6.3</td>
+            <td>3.5</td>
+            <td>4.1 </td>
+            <td>29.2 </td>
+        </tr>
+    </tbody>
+</table>
+</div>
+
+</details>
+
+<details>
+<summary>点击查看多图和视频理解能力的评测结果。 </summary>
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th nowrap="nowrap" align="left">model</th>
+            <th>Size</th>
+            <th>Mantis</th>
+            <th>Blink</th>
+            <th nowrap="nowrap" colspan="2" >Video-MME</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td></td>
+            <td>wo subs</td>
+            <td>w subs</td>
+        </tr>
+        <tbody align="center">
+        <tr>
+            <td colspan="6" align="left"><strong>Proprietary</strong></td>
+        </tr>
+                <tr>
+            <td nowrap="nowrap" align="left">GPT-4v-20240409</td>
+            <td>-</td>
+            <td>62.7</td>
+            <td>54.6</td>
+            <td>59.9</td>
+            <td>63.3</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Gemini-1.5-Pro</td>
+            <td>-</td>
+            <td>-</td>
+            <td>59.1</td>
+            <td>75.0</td>
+            <td>81.3</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o-20240513</td>
+            <td>-</td>
+            <td>-</td>
+            <td>68.0</td>
+            <td>71.9</td>
+            <td>77.2</td>
+        </tr>
+        <tr>
+            <td colspan="6" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2.5-VL-3B-Instruct</td>
+            <td>3.8B</td>
+            <td>-</td>
+            <td>47.6</td>
+            <td>61.5</td>
+            <td>67.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2.5-4B</td>
+            <td>3.7B</td>
+            <td>62.7</td>
+            <td>50.8</td>
+            <td>62.3</td>
+            <td>63.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2.5-VL-7B-Instruct</td>
+            <td>8.3B</td>
+            <td>-</td>
+            <td>56.4</td>
+            <td>65.1</td>
+            <td>71.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2.5-8B</td>
+            <td>8.1B</td>
+            <td>67.7</td>
+            <td>54.8</td>
+            <td>64.2</td>
+            <td>66.9</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V-2.6</td>
+            <td>8.1B</td>
+            <td>69.1</td>
+            <td>53.0</td>
+            <td>60.9</td>
+            <td>63.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o-2.6</td>
+            <td>8.7B</td>
+            <td>71.9</td>
+            <td>56.7</td>
+            <td>63.9</td>
+            <td>69.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V-4.0</td>
+            <td>4.1B</td>
+            <td>71.4</td>
+            <td>54.0</td>
+            <td>61.2</td>
+            <td>65.8</td>
+        </tr>
+    </tbody>
+</table>
+</div>
+
+</details>
+
+### 典型示例
+
+<div style="display: flex; flex-direction: column; align-items: center;">
+  <img src="../assets/minicpmv4/minicpm-v-4-case.png" alt="math" style="margin-bottom: 5px;">
+</div>
+
+
+我们在 iPhone 16 Pro Max 上部署了 MiniCPM-V 4.0 [iOS demo](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/demo/ios_demo/ios.md)，并录制了以下演示录屏，视频未经加速等任何编辑：
+
+<table align="center"> 
+    <p align="center">
+      <img src="../assets/minicpmv4/iphone_en.gif" width=45%/>
+      &nbsp;&nbsp;&nbsp;&nbsp;
+      <img src="../assets/minicpmv4/iphone_en_information_extraction.gif" width=45%/>
+    </p>
+    <p align="center">
+      <img src="../assets/minicpmv4/iphone_cn.gif" width=45%/>
+      &nbsp;&nbsp;&nbsp;&nbsp;
+      <img src="../assets/minicpmv4/iphone_cn_funny_points.gif" width=45%/>
+    </p>
+</table> 
--- a/finetune/readme.md
+++ b/finetune/readme.md
@@ -1,7 +1,7 @@
-# MiniCPM-V Finetuning
+# MiniCPM-V & o Finetuning


-We offer the official scripts for easy finetuning of the pretrained **MiniCPM-o-2_6**, **MiniCPM-V-2_6**, **MiniCPM-Llama3-V 2.5** and **MiniCPM-V 2.0** on downstream tasks. Our finetune scripts use transformers Trainer and DeepSpeed by default.
+We offer the official scripts for easy finetuning of the pretrained **MiniCPM-V 4.0**, **MiniCPM-o 2.6**, **MiniCPM-V 2.6**, **MiniCPM-Llama3-V 2.5** and **MiniCPM-V 2.0** on downstream tasks. Our finetune scripts use transformers Trainer and DeepSpeed by default.

 ### Data preparation

@@ -96,11 +96,10 @@ If the total token count exceeds `max_length`, truncation will be applied. For m
 Full-parameter parameter finetuning requires updating all parameters of LLM in the whole training process. Please specify the correct MODEL path, DATA path and LLM_TYPE in the shell scripts.

 ```shell
-MODEL="MiniCPM-o-2_6" # or "openbmb/MiniCPM-V-2_6", openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2
-DATA="path/to/trainging_data" # json file
-EVAL_DATA="path/to/test_data" # json file
-LLM_TYPE="qwen" # if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm, if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE="llama3",
-# if use openbmb/MiniCPM-o-2_6 or openbmb/MiniCPM-V-2_6, please set LLM_TYPE=qwen
+MODEL="MiniCPM-o-2_6" # or "openbmb/MiniCPM-V-2_6", "openbmb/MiniCPM-Llama3-V-2_5", "openbmb/MiniCPM-V-2"
+DATA="path/to/training_data.json"
+EVAL_DATA="path/to/test_data.json"
+LLM_TYPE="qwen" # llama for MiniCPM-V-4, minicpm for MiniCPM-V-2, llama3 for MiniCPM-Llama3-V-2_5, qwen for MiniCPM-o-2_6/MiniCPM-V-2_6
 ```

 To launch your training, run the following script:
--- a/finetune/trainer.py
+++ b/finetune/trainer.py
@@ -7,7 +7,7 @@ from transformers.trainer_pt_utils import nested_detach
 from transformers.utils import is_sagemaker_mp_enabled
 from transformers.trainer import *
 from transformers.integrations import is_deepspeed_zero3_enabled
-
+from typing import Dict, List, Optional, Tuple

 class CPMTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
@@ -170,7 +170,7 @@ class CPMTrainer(Trainer):

        return (loss, logits, labels)
        
-    def training_step(self, model: nn.Module, inputs: Dict[str, Union[torch.Tensor, Any]]) -> torch.Tensor:
+    def training_step(self, model: nn.Module, inputs: Dict[str, Union[torch.Tensor, Any]], num_items_in_batch=None) -> torch.Tensor:
        """
        Perform a training step on a batch of inputs.

@@ -189,8 +189,7 @@ class CPMTrainer(Trainer):
            `torch.Tensor`: The tensor with training loss on this batch.
        """
        model.train()
-        inputs = self._prepare_inputs(inputs)
-
+        inputs = self._prepare_inputs(inputs) 
        if is_sagemaker_mp_enabled():
            loss_mb = smp_forward_backward(model, inputs, self.args.gradient_accumulation_steps)
            return loss_mb.reduce_mean().detach().to(self.args.device)
--- a/quantize/bnb_quantize.py
+++ b/quantize/bnb_quantize.py
@@ -1,81 +0,0 @@
-"""
-the script will use bitandbytes to quantize the MiniCPM-Llama3-V-2_5 model.
-the be quantized model can be finetuned by MiniCPM-Llama3-V-2_5 or not.
-you only need to set the model_path 、save_path and run bash code 
-
-cd MiniCPM-V
-python quantize/bnb_quantize.py
-
-you will get the quantized model in save_path、quantized_model test time and gpu usage
-"""
-
-
-import torch
-from transformers import AutoModel, AutoTokenizer, BitsAndBytesConfig
-from PIL import Image
-import time
-import torch
-import GPUtil
-import os
-
-assert torch.cuda.is_available(),"CUDA is not available, but this code requires a GPU."
-
-device = 'cuda'  # Select GPU to use
-model_path = '/root/ld/ld_model_pretrained/MiniCPM-Llama3-V-2_5' # Model download path
-save_path = '/root/ld/ld_model_pretrain/MiniCPM-Llama3-V-2_5_int4' # Quantized model save path
-image_path = './assets/airplane.jpeg'
-
-
-# Create a configuration object to specify quantization parameters
-quantization_config = BitsAndBytesConfig(
-    load_in_4bit=True,  # Whether to perform 4-bit quantization
-    load_in_8bit=False,  # Whether to perform 8-bit quantization
-    bnb_4bit_compute_dtype=torch.float16,  # Computation precision setting
-    bnb_4bit_quant_storage=torch.uint8,  # Storage format for quantized weights
-    bnb_4bit_quant_type="nf4",  # Quantization format, here using normally distributed int4
-    bnb_4bit_use_double_quant=True,  # Whether to use double quantization, i.e., quantizing zeropoint and scaling parameters
-    llm_int8_enable_fp32_cpu_offload=False,  # Whether LLM uses int8, with fp32 parameters stored on the CPU
-    llm_int8_has_fp16_weight=False,  # Whether mixed precision is enabled
-    llm_int8_skip_modules=["out_proj", "kv_proj", "lm_head"],  # Modules not to be quantized
-    llm_int8_threshold=6.0  # Outlier value in the llm.int8() algorithm, distinguishing whether to perform quantization based on this value
-)
-
-tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
-model = AutoModel.from_pretrained(
-    model_path,
-    device_map=device,  # Allocate model to device
-    quantization_config=quantization_config,
-    trust_remote_code=True
-)
-
-gpu_usage = GPUtil.getGPUs()[0].memoryUsed  
-start=time.time()
-response = model.chat(
-    image=Image.open(image_path).convert("RGB"),
-    msgs=[
-        {
-            "role": "user",
-            "content": "What is in this picture?"
-        }
-    ],
-    tokenizer=tokenizer
-) # 模型推理
-print('Output after quantization:',response)
-print('Inference time after quantization:',time.time()-start)
-print(f"GPU memory usage after quantization: {round(gpu_usage/1024,2)}GB")
-
-"""
-Expected output:
-
-    Output after quantization: This picture contains specific parts of an airplane, including wings, engines, and tail sections. These components are key parts of large commercial aircraft.
-    The wings support lift during flight, while the engines provide thrust to move the plane forward. The tail section is typically used for stabilizing flight and plays a role in airline branding.
-    The design and color of the airplane indicate that it belongs to Air China, likely a passenger aircraft due to its large size and twin-engine configuration.
-    There are no markings or insignia on the airplane indicating the specific model or registration number; such information may require additional context or a clearer perspective to discern.
-    Inference time after quantization: 8.583992719650269 seconds
-    GPU memory usage after quantization: 6.41 GB
-"""
-
-# Save the model and tokenizer
-os.makedirs(save_path, exist_ok=True)
-model.save_pretrained(save_path, safe_serialization=True)
-tokenizer.save_pretrained(save_path)
Author	SHA1	Message	Date
qianyu chen	e41152f89c	Update trainer.py	2025-09-12 15:53:48 +08:00
tc-mb	c821cbd7c8	rm ide file Signed-off-by: tc-mb <caitianchi@modelbest.cn>	2025-09-02 16:14:14 +08:00
yiranyyu	a846468195	update readme	2025-09-02 12:16:14 +08:00
yiranyyu	f8da52c35c	update readme	2025-09-02 12:15:25 +08:00
yiranyyu	67afdeb934	update readme	2025-09-01 17:51:49 +08:00
yiranyyu	3cde81287d	Merge branch 'main' of https://github.com/OpenBMB/MiniCPM-o	2025-08-31 22:33:56 +08:00
yiranyyu	e45524cbf7	update readme	2025-08-31 22:33:14 +08:00
Yuan Yao	0d8b90df97	Update README.md	2025-08-31 11:24:58 +08:00
Yuan Yao	d16875b120	Update README.md	2025-08-30 10:52:45 +08:00
Yuan Yao	1c89161d65	Update README.md	2025-08-30 10:26:52 +08:00
YuzaChongyi	da79d55ad4	update readme (#986 ) Co-authored-by: wangchongyi <>	2025-08-30 00:02:35 +08:00
YuzaChongyi	b9a95ee0ea	update readme (#985 ) Co-authored-by: wangchongyi <>	2025-08-29 23:58:10 +08:00
YuzaChongyi	02c68764d4	update readme (#984 ) Co-authored-by: wangchongyi <>	2025-08-29 23:52:31 +08:00
tc-mb	509e934a59	update video link Signed-off-by: tc-mb <caitianchi@modelbest.cn>	2025-08-29 01:00:45 +08:00
Yuan Yao	3d050a5dd4	Update README.md	2025-08-27 11:47:38 +08:00
Yuan Yao	d01532f89c	Update README.md	2025-08-27 11:47:00 +08:00
tc-mb	bffc715128	update awq	2025-08-26 22:38:54 +08:00
YuzaChongyi	af96e66e01	update readme (#966 ) Co-authored-by: wangchongyi <>	2025-08-26 18:58:47 +08:00
yiranyyu	eb072b30a0	update readme	2025-08-26 18:34:29 +08:00
yiranyyu	16a79219cb	update readme	2025-08-26 18:33:00 +08:00
yiranyyu	663d96c887	update readme	2025-08-26 18:31:16 +08:00
yiranyyu	1dcb4e2fee	update readme	2025-08-26 17:39:41 +08:00
yiranyyu	fe7b3d27de	update readme	2025-08-26 17:35:17 +08:00
yiranyyu	9d0531b236	update readme	2025-08-26 17:29:42 +08:00
yiranyyu	5443a7c4d7	update readme	2025-08-26 17:27:04 +08:00
yiranyyu	fcecab8045	update readme	2025-08-26 17:21:27 +08:00
YuzaChongyi	06e220c8f4	add minicpm-v-4.5 (#963 ) Co-authored-by: wangchongyi <>	2025-08-26 05:20:58 +08:00
tc-mb	2ef22c138e	update qr png	2025-08-20 17:41:33 +08:00
tc-mb	51f3f36614	add modelbest license to minicpm-o repo	2025-08-12 14:14:46 +08:00
tc-mb	03111d5c5b	change quan doc to cookbook	2025-08-06 16:25:53 +08:00
tc-mb	4f7eba0c29	Merge pull request #947 from ZMXJJ/minicpmv-4 Update README	2025-08-06 14:55:20 +08:00
Dennis Huang	3acd3f9891	Update README	2025-08-06 14:31:54 +08:00
tc-mb	d828902a98	update news	2025-08-06 00:21:09 +08:00
tc-mb	8438ec2147	fix png in readme	2025-08-06 00:18:31 +08:00
tc-mb	b91fff3ea8	update readme	2025-08-05 22:26:23 +08:00
tc-mb	e2559a5ca2	public MiniCPM-V 4.0	2025-08-05 22:19:54 +08:00
tc-mb	8185ac321d	add gif	2025-08-05 21:40:15 +08:00
yiranyyu	539e70177c	Add Cookbook	2025-08-01 01:18:47 +08:00
yiranyyu	6e8f1d7a66	Add Cookbook	2025-08-01 01:18:14 +08:00
yiranyyu	50214bfa52	Add Cookbook	2025-08-01 01:16:19 +08:00
tc-mb	2d9919ac69	Update README_zh.md	2025-06-30 11:08:15 +08:00
tc-mb	48c0611a3f	Update README.md	2025-06-30 11:08:00 +08:00
tc-mb	afc3b105bd	Update README.md	2025-06-25 21:17:24 +08:00
tc-mb	732f5e62e4	Update README_zh.md	2025-06-25 20:23:57 +08:00
tc-mb	949fc4e843	Update README.md	2025-06-25 20:22:42 +08:00
tc-mb	ebb1a5e0a7	Update README_zh.md	2025-06-25 17:22:07 +08:00
tc-mb	7084bbfa9f	Update README.md	2025-06-25 17:21:22 +08:00
yiranyyu	523fb11263	Update README	2025-06-25 11:40:07 +08:00
yiranyyu	b2b2b7bd70	Update README	2025-06-20 14:32:01 +08:00
YuzaChongyi	0234793a3b	add join us link (#929 ) Co-authored-by: wangchongyi <>	2025-06-17 23:13:16 +08:00
tc-mb	4b5828acb1	Update README.md	2025-06-12 16:24:45 +08:00
tc-mb	11ca385133	Add files via upload	2025-06-12 10:52:27 +08:00