mirror of
https://github.com/OpenBMB/MiniCPM-V.git
synced 2026-02-04 17:59:18 +08:00
Merge remote-tracking branch 'origin/main'
This commit is contained in:
26
README.md
26
README.md
@@ -25,12 +25,13 @@
|
||||
|
||||
## News <!-- omit in toc -->
|
||||
|
||||
* [2024.05.23] We've released a comprehensive comparison between Phi-3-vision-128k-instruct and MiniCPM-Llama3-V 2.5, including benchmarks evaluations, and multilingual capabilities 🌟📊🌍. Click [here](./docs/compare_with_phi-3_vision.md) to view more details.
|
||||
* [2024.05.20] We open-soure MiniCPM-Llama3-V 2.5, it has improved OCR capability and supports 30+ languages, representing the first edge-side MLLM achieving GPT-4V level performance! We provide [efficient inference](#deployment-on-mobile-phone) and [simple fine-tuning](./finetune/readme.md). Try it now!
|
||||
* [2024.04.23] MiniCPM-V-2.0 supports vLLM now! Click [here](#vllm) to view more details.
|
||||
* [2024.04.18] We create a HuggingFace Space to host the demo of MiniCPM-V 2.0 at [here](https://huggingface.co/spaces/openbmb/MiniCPM-V-2)!
|
||||
* [2024.04.17] MiniCPM-V-2.0 supports deploying [WebUI Demo](#webui-demo) now!
|
||||
* [2024.04.15] MiniCPM-V-2.0 now also supports [fine-tuning](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v-2最佳实践.md) with the SWIFT framework!
|
||||
* [2024.04.12] We open-source MiniCPM-V-2.0, which achieves comparable performance with Gemini Pro in understanding scene text and outperforms strong Qwen-VL-Chat 9.6B and Yi-VL 34B on <a href="https://rank.opencompass.org.cn/leaderboard-multimodal">OpenCompass</a>, a comprehensive evaluation over 11 popular benchmarks. Click <a href="https://openbmb.vercel.app/minicpm-v-2">here</a> to view the MiniCPM-V 2.0 technical blog.
|
||||
* [2024.04.12] We open-source MiniCPM-V 2.0, which achieves comparable performance with Gemini Pro in understanding scene text and outperforms strong Qwen-VL-Chat 9.6B and Yi-VL 34B on <a href="https://rank.opencompass.org.cn/leaderboard-multimodal">OpenCompass</a>, a comprehensive evaluation over 11 popular benchmarks. Click <a href="https://openbmb.vercel.app/minicpm-v-2">here</a> to view the MiniCPM-V 2.0 technical blog.
|
||||
* [2024.03.14] MiniCPM-V now supports [fine-tuning](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v最佳实践.md) with the SWIFT framework. Thanks to [Jintao](https://github.com/Jintao-Huang) for the contribution!
|
||||
* [2024.03.01] MiniCPM-V now can be deployed on Mac!
|
||||
* [2024.02.01] We open-source MiniCPM-V and OmniLMM-12B, which support efficient end-side deployment and powerful multimodal capabilities correspondingly.
|
||||
@@ -40,6 +41,7 @@
|
||||
|
||||
|
||||
- [MiniCPM-Llama3-V 2.5](#minicpm-llama3-v-25)
|
||||
- [Evaluation](#evaluation)
|
||||
- [MiniCPM-V 2.0](#minicpm-v-20)
|
||||
- [Online Demo](#online-demo)
|
||||
- [Install](#install)
|
||||
@@ -74,7 +76,7 @@
|
||||
- 🚀 **Efficient Deployment.**
|
||||
MiniCPM-Llama3-V 2.5 systematically employs **model quantization, CPU optimizations, NPU optimizations and compilation optimizations**, achieving high-efficiency deployment on edge devices. For mobile phones with Qualcomm chips, we have integrated the NPU acceleration framework QNN into llama.cpp for the first time. After systematic optimization, MiniCPM-Llama3-V 2.5 has realized a **150-fold acceleration in multimodal large model edge-side image encoding** and a **3-fold increase in language decoding speed**.
|
||||
|
||||
### Evaluation <!-- omit in toc -->
|
||||
### Evaluation
|
||||
|
||||
<div align="center">
|
||||
<img src=assets/MiniCPM-Llama3-V-2.5-peformance.png width=66% />
|
||||
@@ -285,6 +287,22 @@
|
||||
<td>60.0</td>
|
||||
<td>-</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td nowrap="nowrap" align="left">Phi-3-vision-128k-instruct</td>
|
||||
<td>4.2B</td>
|
||||
<td>639*</td>
|
||||
<td>70.9</td>
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
<td>1537.5*</td>
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
<td>40.4</td>
|
||||
<td>44.5</td>
|
||||
<td>64.2*</td>
|
||||
<td>58.8*</td>
|
||||
<td>-</td>
|
||||
</tr>
|
||||
<tr style="background-color: #e6f2ff;">
|
||||
<td nowrap="nowrap" align="left">MiniCPM-V 1.0</td>
|
||||
<td>2.8B</td>
|
||||
@@ -343,7 +361,7 @@
|
||||
</details>
|
||||
|
||||
<div align="center">
|
||||
<img src="assets/llavabench_compare.png" width="66%" />
|
||||
<img src="assets/llavabench_compare_3.png" width="85%" />
|
||||
<br>
|
||||
Evaluation results of LLaVABench in multiple languages
|
||||
</div>
|
||||
@@ -504,7 +522,7 @@ answer = chat_model.chat(inputs)
|
||||
print(answer)
|
||||
```
|
||||
|
||||
可以得到以下输出:
|
||||
You will get the following output:
|
||||
|
||||
```
|
||||
"The aircraft in the image is an Airbus A380, which can be identified by its large size, double-deck structure, and the distinctive shape of its wings and engines. The A380 is a wide-body aircraft known for being the world's largest passenger airliner, designed for long-haul flights. It has four engines, which are characteristic of large commercial aircraft. The registration number on the aircraft can also provide specific information about the model if looked up in an aviation database."
|
||||
|
||||
19
README_zh.md
19
README_zh.md
@@ -28,6 +28,7 @@
|
||||
|
||||
## 更新日志 <!-- omit in toc -->
|
||||
|
||||
* [2024.05.23] 我们添加了Phi-3-vision-128k-instruct与MiniCPM-Llama3-V 2.5的全面对比,包括基准测试评估和多语言能力 🌟📊🌍。点击[这里](./docs/compare_with_phi-3_vision.md)查看详细信息。
|
||||
<!-- * [2024.05.22] 我们进一步提升了端侧推理速度!实现了 6-8 tokens/s 的流畅体验,欢迎试用! -->
|
||||
* [2024.05.20] 我们开源了 MiniCPM-Llama3-V 2.5,增强了 OCR 能力,支持 30 多种语言,并首次在端侧实现了 GPT-4V 级的多模态能力!我们提供了[高效推理](#手机端部署)和[简易微调](./finetune/readme.md)的支持,欢迎试用!
|
||||
* [2024.04.23] 我们增加了对 [vLLM](#vllm) 的支持,欢迎体验!
|
||||
@@ -292,6 +293,22 @@
|
||||
<td>60.0</td>
|
||||
<td>-</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td nowrap="nowrap" align="left">Phi-3-vision-128k-instruct</td>
|
||||
<td>4.2B</td>
|
||||
<td>639*</td>
|
||||
<td>70.9</td>
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
<td>1537.5*</td>
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
<td>40.4</td>
|
||||
<td>44.5</td>
|
||||
<td>64.2*</td>
|
||||
<td>58.8*</td>
|
||||
<td>-</td>
|
||||
</tr>
|
||||
<tr style="background-color: #e6f2ff;">
|
||||
<td nowrap="nowrap" align="left">MiniCPM-V 1.0</td>
|
||||
<td>2.8B</td>
|
||||
@@ -348,7 +365,7 @@
|
||||
</details>
|
||||
|
||||
<div align="center">
|
||||
<img src="assets/llavabench_compare.png" width="66%" />
|
||||
<img src="assets/llavabench_compare_3.png" width="80%" />
|
||||
<br>
|
||||
多语言LLaVA Bench评测结果
|
||||
</div>
|
||||
|
||||
Binary file not shown.
|
Before Width: | Height: | Size: 348 KiB |
BIN
assets/llavabench_compare_3.png
Normal file
BIN
assets/llavabench_compare_3.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 492 KiB |
BIN
assets/llavabench_compare_phi3.png
Normal file
BIN
assets/llavabench_compare_phi3.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 427 KiB |
@@ -1,14 +1,14 @@
|
||||
## Phi-3-vision-128K-Instruct vs MiniCPM-Llama3-V 2.5
|
||||
|
||||
Comparison results of Phi-3-vision-128K-Instruct and MiniCPM-Llama3-V 2.5, regarding the model size, hardware requirements, and performances on multiple popular benchmarks.
|
||||
Comparison results of Phi-3-vision-128K-Instruct and MiniCPM-Llama3-V 2.5, regarding the model size, hardware requirements, and performances.
|
||||
|
||||
我们提供了从模型参数、硬件需求、全面性能指标等方面对比 Phi-3-vision-128K-Instruct 和 MiniCPM-Llama3-V 2.5 的结果。
|
||||
我们提供了从模型参数、硬件需求、性能指标等方面对比 Phi-3-vision-128K-Instruct 和 MiniCPM-Llama3-V 2.5 的结果。
|
||||
|
||||
## Hardeware Requirements (硬件需求)
|
||||
|
||||
With in4 quantization, MiniCPM-Llama3-V 2.5 delivers smooth inference of 6-8 tokens/s with only 8GB of GPU memory.
|
||||
With int4 quantization, MiniCPM-Llama3-V 2.5 delivers smooth inference with only 8GB of GPU memory.
|
||||
|
||||
通过 in4 量化,MiniCPM-Llama3-V 2.5 仅需 8GB 显存即可提供 6-8 tokens/s 的流畅推理。
|
||||
通过 int4 量化,MiniCPM-Llama3-V 2.5 仅需 8GB 显存即可推理。
|
||||
|
||||
| Model(模型) | GPU Memory(显存) |
|
||||
|:----------------------|:-------------------:|
|
||||
@@ -18,14 +18,32 @@ With in4 quantization, MiniCPM-Llama3-V 2.5 delivers smooth inference of 6-8 tok
|
||||
|
||||
## Model Size and Peformance (模型参数和性能)
|
||||
|
||||
In most benchmarks, MiniCPM-Llama3-V 2.5 achieves **better performance** compared with Phi-3-vision-128K-Instruct.
|
||||
|
||||
在大多数评测集上, MiniCPM-Llama3-V 2.5 相比于 Phi-3-vision-128K-Instruct 都展现出了**更优的性能表现**.
|
||||
|
||||
| | Phi-3-vision-128K-Instruct | MiniCPM-Llama3-V 2.5|
|
||||
|:-|:----------:|:-------------------:|
|
||||
| Size(参数) | **4B** | 8B|
|
||||
| OpenCompass | 53.7 | **58.8** |
|
||||
| OpenCompass 2024/05 | 53.7 | **58.8** |
|
||||
| OCRBench | 639.0 | **725.0**|
|
||||
| RealworldQA | 58.8 | **63.5**|
|
||||
| TextVQA | 72.2 | **76.6** |
|
||||
| ScienceQA| **90.8** | 89.0 |
|
||||
| POPE | 83.4 | **87.2** |
|
||||
| POPE | 83.4 | **87.2** |
|
||||
|
||||
|
||||
## Multilingual Capabilities
|
||||
|
||||
|
||||
MiniCPM-Llama3-V 2.5 exhibits **stronger multilingual** capabilities compared with Phi-3-vision-128K-Instruct on LLaVA Bench.
|
||||
|
||||
MiniCPM-Llama3-V 2.5 在对话和推理评测榜单 LLaVA Bench 上展现出了比 Phi-3-vision-128K-Instruct **更强的多语言的性能**。
|
||||
|
||||
<div align="center">
|
||||
<img src="../assets/llavabench_compare_phi-3.png" width="85%" />
|
||||
<br>
|
||||
Evaluation results of LLaVABench in multiple languages
|
||||
<br>
|
||||
多语言LLaVA Bench评测结果
|
||||
</div>
|
||||
|
||||
Reference in New Issue
Block a user