mirror of
https://github.com/OpenBMB/MiniCPM-V.git
synced 2026-02-04 17:59:18 +08:00
Merge branch 'OpenBMB:main' into main
This commit is contained in:
@@ -7,7 +7,7 @@
|
||||
<strong>[中文](./README_zh.md) |
|
||||
English</strong>
|
||||
|
||||
Join our <a href="docs/wechat.md" target="_blank"> 💬 WeChat</a>
|
||||
Join our <a href="docs/wechat.md" target="_blank"> 💬 WeChat</a> | View MiniCPM-V <a href="docs/best_practice_summary.md" target="_blank"> 📖 best practices</a>
|
||||
|
||||
|
||||
<p align="center">
|
||||
@@ -29,9 +29,10 @@ Join our <a href="docs/wechat.md" target="_blank"> 💬 WeChat</a>
|
||||
|
||||
#### 📌 Pinned
|
||||
|
||||
* [2024.08.17] 🚀🚀🚀 MiniCPM-V 2.6 is now fully supported by [official](https://github.com/ggerganov/llama.cpp) llama.cpp! GGUF models of various sizes are available [here](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf).
|
||||
* [2024.08.15] We now also support multi-image SFT. For more details, please refer to the [document](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune).
|
||||
* [2024.08.14] MiniCPM-V 2.6 now also supports [fine-tuning](https://github.com/modelscope/ms-swift/issues/1613) with the SWIFT framework!
|
||||
* [2024.08.10] 🚀🚀🚀 MiniCPM-Llama3-V 2.5 is now fully supported by [official](https://github.com/ggerganov/llama.cpp) llama.cpp! GGUF models of various sizes are available [here](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf). Please note that MiniCPM-V 2.6 still needs [our fork](https://github.com/OpenBMB/llama.cpp/blob/minicpmv-main/examples/llava/README-minicpmv2.6.md).
|
||||
* [2024.08.10] 🚀🚀🚀 MiniCPM-Llama3-V 2.5 is now fully supported by [official](https://github.com/ggerganov/llama.cpp) llama.cpp! GGUF models of various sizes are available [here](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf).
|
||||
* [2024.08.06] 🔥🔥🔥 We open-source MiniCPM-V 2.6, which outperforms GPT-4V on single image, multi-image and video understanding. It advances popular features of MiniCPM-Llama3-V 2.5, and can support real-time video understanding on iPad. Try it now!
|
||||
* [2024.08.03] MiniCPM-Llama3-V 2.5 technical report is released! See [here](https://arxiv.org/abs/2408.01800).
|
||||
* [2024.07.19] MiniCPM-Llama3-V 2.5 supports vLLM now! See [here](#inference-with-vllm).
|
||||
@@ -535,7 +536,7 @@ Note: For proprietary models, we calculate token density based on the image enco
|
||||
<td nowrap="nowrap" align="left">Claude 3.5 Sonnet</td>
|
||||
<td>-</td>
|
||||
<td>60.0</td>
|
||||
<td>-</td>
|
||||
<td>62.9</td>
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
@@ -546,7 +547,7 @@ Note: For proprietary models, we calculate token density based on the image enco
|
||||
<td nowrap="nowrap" align="left">GPT-4V</td>
|
||||
<td>-</td>
|
||||
<td>59.9</td>
|
||||
<td>-</td>
|
||||
<td>63.3</td>
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
|
||||
@@ -7,7 +7,7 @@
|
||||
<strong>[中文](./README_zh.md) |
|
||||
English</strong>
|
||||
|
||||
Join our <a href="docs/wechat.md" target="_blank"> 💬 WeChat</a>
|
||||
Join our <a href="docs/wechat.md" target="_blank"> 💬 WeChat</a> | View MiniCPM-V <a href="docs/best_practice_summary.md" target="_blank"> 📖 best practices</a>
|
||||
|
||||
|
||||
<p align="center">
|
||||
@@ -29,9 +29,10 @@ Join our <a href="docs/wechat.md" target="_blank"> 💬 WeChat</a>
|
||||
|
||||
#### 📌 Pinned
|
||||
|
||||
* [2024.08.17] 🚀🚀🚀 MiniCPM-V 2.6 is now fully supported by [official](https://github.com/ggerganov/llama.cpp) llama.cpp! GGUF models of various sizes are available [here](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf).
|
||||
* [2024.08.15] We now also support multi-image SFT. For more details, please refer to the [document](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune).
|
||||
* [2024.08.14] MiniCPM-V 2.6 now also supports [fine-tuning](https://github.com/modelscope/ms-swift/issues/1613) with the SWIFT framework!
|
||||
* [2024.08.10] 🚀🚀🚀 MiniCPM-Llama3-V 2.5 is now fully supported by [official](https://github.com/ggerganov/llama.cpp) llama.cpp! GGUF models of various sizes are available [here](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf). Please note that MiniCPM-V 2.6 still needs [our fork](https://github.com/OpenBMB/llama.cpp/blob/minicpmv-main/examples/llava/README-minicpmv2.6.md).
|
||||
* [2024.08.10] 🚀🚀🚀 MiniCPM-Llama3-V 2.5 is now fully supported by [official](https://github.com/ggerganov/llama.cpp) llama.cpp! GGUF models of various sizes are available [here](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf).
|
||||
* [2024.08.06] 🔥🔥🔥 We open-source MiniCPM-V 2.6, which outperforms GPT-4V on single image, multi-image and video understanding. It advances popular features of MiniCPM-Llama3-V 2.5, and can support real-time video understanding on iPad. Try it now!
|
||||
* [2024.08.03] MiniCPM-Llama3-V 2.5 technical report is released! See [here](https://arxiv.org/abs/2408.01800).
|
||||
* [2024.07.19] MiniCPM-Llama3-V 2.5 supports vLLM now! See [here](#inference-with-vllm).
|
||||
@@ -535,7 +536,7 @@ Note: For proprietary models, we calculate token density based on the image enco
|
||||
<td nowrap="nowrap" align="left">Claude 3.5 Sonnet</td>
|
||||
<td>-</td>
|
||||
<td>60.0</td>
|
||||
<td>-</td>
|
||||
<td>62.9</td>
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
@@ -546,7 +547,7 @@ Note: For proprietary models, we calculate token density based on the image enco
|
||||
<td nowrap="nowrap" align="left">GPT-4V</td>
|
||||
<td>-</td>
|
||||
<td>59.9</td>
|
||||
<td>-</td>
|
||||
<td>63.3</td>
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
|
||||
12
README_zh.md
12
README_zh.md
@@ -9,7 +9,9 @@
|
||||
<strong>中文 |
|
||||
[English](./README_en.md)</strong>
|
||||
|
||||
加入我们的 <a href="docs/wechat.md" target="_blank"> 💬 微信社区</a>
|
||||
加入我们的 <a href="docs/wechat.md" target="_blank"> 💬 微信社区</a>
|
||||
| 了解 MiniCPM-V <a href="docs/best_practice_summary_zh.md" target="_blank"> 📖 最佳实践</a>
|
||||
|
||||
|
||||
<p align="center">
|
||||
MiniCPM-V 2.6 <a href="https://huggingface.co/openbmb/MiniCPM-V-2_6">🤗</a> <a href="https://huggingface.co/spaces/openbmb/MiniCPM-V-2_6">🤖</a> | MiniCPM-Llama3-V 2.5 <a href="https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/">🤗</a> <a href="https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5">🤖</a> |
|
||||
@@ -33,9 +35,11 @@
|
||||
|
||||
#### 📌 置顶
|
||||
|
||||
|
||||
* [2024.08.17] 🚀🚀🚀 llama.cpp [官方仓库](https://github.com/ggerganov/llama.cpp)正式支持 MiniCPM-V 2.6 啦!点击[这里](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf)查看各种大小的 GGUF 版本。
|
||||
* [2024.08.15] MiniCPM-V 2.6 现在支持多图像 SFT。有关更多详细信息,请参阅[微调文档](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune)
|
||||
* [2024.08.14] MiniCPM-V 2.6 现在可以通过 SWIFT 框架 [微调](https://github.com/modelscope/ms-swift/issues/1613) 了!
|
||||
* [2024.08.10] 🚀🚀🚀 llama.cpp [官方仓库](https://github.com/ggerganov/llama.cpp)正式支持 MiniCPM-Llama3-V 2.5 啦!点击[这里](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/tree/main)查看各种大小的 GGUF 版本。但还请使用者注意 MiniCPM-V 2.6 仍然需要**拉取我们最新的 fork 来使用**:[llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpmv-main/examples/llava/README-minicpmv2.6.md) 。我们将继续积极推进将这些功能合并到 llama.cpp 官方仓库
|
||||
* [2024.08.10] 🚀🚀🚀 llama.cpp [官方仓库](https://github.com/ggerganov/llama.cpp)正式支持 MiniCPM-Llama3-V 2.5 啦!点击[这里](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/tree/main)查看各种大小的 GGUF 版本。
|
||||
* [2024.08.06] 🔥🔥🔥 我们开源了 MiniCPM-V 2.6,该模型在单图、多图和视频理解方面取得了优于 GPT-4V 的表现。我们还进一步提升了 MiniCPM-Llama3-V 2.5 的多项亮点能力,并首次支持了 iPad 上的实时视频理解。欢迎试用!
|
||||
* [2024.08.03] MiniCPM-Llama3-V 2.5 技术报告已发布!欢迎点击[这里](https://arxiv.org/abs/2408.01800)查看。
|
||||
* [2024.07.19] MiniCPM-Llama3-V 2.5 现已支持[vLLM](#vllm-部署-) !
|
||||
@@ -541,7 +545,7 @@
|
||||
<td nowrap="nowrap" align="left">Claude 3.5 Sonnet</td>
|
||||
<td>-</td>
|
||||
<td>60.0</td>
|
||||
<td>-</td>
|
||||
<td>62.9</td>
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
@@ -552,7 +556,7 @@
|
||||
<td nowrap="nowrap" align="left">GPT-4V</td>
|
||||
<td>-</td>
|
||||
<td>59.9</td>
|
||||
<td>-</td>
|
||||
<td>63.3</td>
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
|
||||
BIN
assets/minicpm-v27.png
Normal file
BIN
assets/minicpm-v27.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 148 KiB |
23
docs/best_practice_summary.md
Normal file
23
docs/best_practice_summary.md
Normal file
@@ -0,0 +1,23 @@
|
||||
# MiniCPM-V Best Practices
|
||||
|
||||
**MiniCPM-V** is a series of end-side multimodal LLMs (MLLMs) designed for vision-language understanding. The models take image, video and text as inputs and provide high-quality text output, aiming to achieve **strong performance and efficient deployment**. The most notable models in this series currently include MiniCPM-Llama3-V 2.5 and MiniCPM-V 2.6. The following sections provide detailed tutorials and guidance for each version of the MiniCPM-V models.
|
||||
|
||||
|
||||
## MiniCPM-V 2.6
|
||||
|
||||
MiniCPM-V 2.6 is the latest and most capable model in the MiniCPM-V series. With a total of 8B parameters, the model **surpasses GPT-4V in single image, multi-image and video understanding**. It outperforms **GPT-4o mini, Gemini 1.5 Pro and Claude 3.5 Sonnet** in single image understanding, and advances MiniCPM-Llama3-V 2.5's features such as strong OCR capability, trustworthy behavior, multilingual support, and end-side deployment. Due to its superior token density, MiniCPM-V 2.6 can for the first time support real-time video understanding on end-side devices such as iPad.
|
||||
|
||||
* [Deployment Tutorial](https://modelbest.feishu.cn/wiki/C2BWw4ZP0iCDy7kkCPCcX2BHnOf)
|
||||
* [Training Tutorial](https://modelbest.feishu.cn/wiki/GeHMwLMa0i2FhUkV0f6cz3HWnV1)
|
||||
* [Quantization Tutorial](https://modelbest.feishu.cn/wiki/YvsPwnPwWiqUjlkmW0scQ76TnBb)
|
||||
|
||||
## MiniCPM-Llama3-V 2.5
|
||||
|
||||
MiniCPM-Llama3-V 2.5 is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.0.
|
||||
|
||||
* [Quantization Tutorial](https://modelbest.feishu.cn/wiki/Kc7ywV4X1ipSaAkuPFOc9SFun8b)
|
||||
* [Training Tutorial](https://modelbest.feishu.cn/wiki/UpSiw63o9iGDhIklmwScX4a6nhW)
|
||||
* [End-side Deployment](https://modelbest.feishu.cn/wiki/Lwr9wpOQdinr6AkLzHrc9LlgnJD)
|
||||
* [Deployment Tutorial](https://modelbest.feishu.cn/wiki/LTOKw3Hz7il9kGkCLX9czsennKe)
|
||||
* [HD Decoding Tutorial](https://modelbest.feishu.cn/wiki/Ug8iwdXfhiHVsDk2gGEco6xnnVg)
|
||||
* [Model Structure](https://modelbest.feishu.cn/wiki/ACtAw9bOgiBQ9lkWyafcvtVEnQf)
|
||||
22
docs/best_practice_summary_zh.md
Normal file
22
docs/best_practice_summary_zh.md
Normal file
@@ -0,0 +1,22 @@
|
||||
# MiniCPM-V 最佳实践
|
||||
|
||||
**MiniCPM-V**是面向图文理解的端侧多模态大模型系列。该系列模型接受图像和文本输入,并提供高质量的文本输出。自2024年2月以来,我们共发布了5个版本模型,旨在实现**领先的性能和高效的部署**,目前该系列最值得关注的模型包括:
|
||||
|
||||
## MiniCPM-V 2.6
|
||||
|
||||
MiniCPM-V系列的最新、性能最佳模型。总参数量 8B,单图、多图和视频理解性能**超越了 GPT-4V**。在单图理解上,它取得了优于 **GPT-4o mini、Gemini 1.5 Pro 和 Claude 3.5 Sonnet** 等商用闭源模型的表现,并进一步优化了 MiniCPM-Llama3-V 2.5 的 OCR、可信行为、多语言支持以及端侧部署等诸多特性。基于其领先的视觉 token 密度,MiniCPM-V 2.6 成为了首个支持在 iPad 等端侧设备上进行实时视频理解的多模态大模型。
|
||||
|
||||
* [部署教程](https://modelbest.feishu.cn/wiki/LZxLwp4Lzi29vXklYLFchwN5nCf)
|
||||
* [训练教程](https://modelbest.feishu.cn/wiki/HvfLwYzlIihqzXkmeCdczs6onmd)
|
||||
* [量化教程](https://modelbest.feishu.cn/wiki/PAsHw6N6xiEy0DkJWpJcIocRnz9)
|
||||
|
||||
## MiniCPM-Llama3-V 2.5
|
||||
|
||||
MiniCPM-Llama3-V 2.5 基于 SigLip-400M 和 Llama3-8B-Instruct 构建,总共有 80 亿参数。其性能相比 MiniCPM-V 2.0 有了显著提升。
|
||||
|
||||
* [量化教程](https://modelbest.feishu.cn/wiki/O0KTwQV5piUPzTkRXl9cSFyHnQb)
|
||||
* [训练教程](https://modelbest.feishu.cn/wiki/MPkPwvONEiZm3BkWMnyc83Tin4d)
|
||||
* [端侧部署](https://modelbest.feishu.cn/wiki/CZZJw1EDGitSSZka664cZwbWnrb)
|
||||
* [部署教程](https://modelbest.feishu.cn/wiki/BcHIwjOLGihJXCkkSdMc2WhbnZf)
|
||||
* [高清解码教程](https://modelbest.feishu.cn/wiki/L0ajwm8VAiiPY6kDZfJce3B7nRg)
|
||||
* [模型结构](https://modelbest.feishu.cn/wiki/X15nwGzqpioxlikbi2RcXDpJnjd)
|
||||
@@ -1,5 +1,5 @@
|
||||
<div align="center">
|
||||
<img src="../assets/minicpm-v26.png" width="60%"/>
|
||||
<img src="../assets/minicpm-v27.png" width="60%"/>
|
||||
|
||||
<p> 扫码加入「MiniCPM-V 交流群」 </p>
|
||||
<p> Scan the QR code to join the "MiniCPM-V Discussion Group" </p>
|
||||
|
||||
Reference in New Issue
Block a user