update readme

This commit is contained in:
yiranyyu
2024-05-23 19:20:40 +08:00
parent 980a1e37e2
commit 6dde8b97ea
2 changed files with 2 additions and 2 deletions

View File

@@ -18,7 +18,7 @@
**MiniCPM-V** is a series of end-side multimodal LLMs (MLLMs) designed for vision-language understanding. Models take image and text as inputs and provide high-quality text outputs. Since February 2024, we have released 4 versions of the model, aiming to achieve **strong performance and efficient deployment**! The most noteworthy models in this series currently include:
- **MiniCPM-Llama3-V 2.5**: 🔥🔥🔥 The latest and most capable model in the MiniCPM-V series. With a total of 8B parameters, the model surpasses proprietary models such as GPT-4V-1106, Gemini Pro, Qwen-VL-Max and Claude 3 in overall performance. Equipped with the enhanced OCR and instruction-following capability, the model can also support multimodal conversation for over 30 languages including English, Chinese, French, Spanish, German etc. With help of quantization, compilation optimizations, and several efficient inference technologies on CPUs and NPUs, MiniCPM-Llama3-V 2.5 can be efficiently deployed on end-side devices.
- **MiniCPM-Llama3-V 2.5**: 🔥🔥🔥 The latest and most capable model in the MiniCPM-V series. With a total of 8B parameters, the model **surpasses proprietary models such as GPT-4V-1106, Gemini Pro, Qwen-VL-Max and Claude 3** in overall performance. Equipped with the enhanced OCR and instruction-following capability, the model can also support multimodal conversation for **over 30 languages** including English, Chinese, French, Spanish, German etc. With help of quantization, compilation optimizations, and several efficient inference technologies on CPUs and NPUs, MiniCPM-Llama3-V 2.5 can be **efficiently deployed on end-side devices**.
- **MiniCPM-V 2.0**: The lightest model in the MiniCPM-V series. With 2B parameters, it surpasses larger models such as Yi-VL 34B, CogVLM-Chat 17B, and Qwen-VL-Chat 10B in overall performance. It can accept image inputs of any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving comparable performance with Gemini Pro in understanding scene-text and matches GPT-4V in low hallucination rates.

View File

@@ -20,7 +20,7 @@
**MiniCPM-V**是面向图文理解的端侧多模态大模型系列。该系列模型接受图像和文本输入并提供高质量的文本输出。自2024年2月以来我们共发布了4个版本模型旨在实现**领先的性能和高效的部署**,目前该系列最值得关注的模型包括:
- **MiniCPM-Llama3-V 2.5**:🔥🔥🔥 MiniCPM-V系列的最新、性能最佳模型。总参数量8B多模态综合性能超越 GPT-4V-1106、Gemini Pro、Claude 3、Qwen-VL-Max 等商用闭源模型OCR 能力及指令跟随能力进一步提升并支持超过30种语言的多模态交互。通过系统使用模型量化、CPU、NPU、编译优化等高效推理技术MiniCPM-Llama3-V 2.5 可以实现高效的终端设备部署。
- **MiniCPM-Llama3-V 2.5**:🔥🔥🔥 MiniCPM-V系列的最新、性能最佳模型。总参数量8B多模态综合性能**超越 GPT-4V-1106、Gemini Pro、Claude 3、Qwen-VL-Max 等商用闭源模型**OCR 能力及指令跟随能力进一步提升,并**支持超过30种语言**的多模态交互。通过系统使用模型量化、CPU、NPU、编译优化等高效推理技术MiniCPM-Llama3-V 2.5 可以实现**高效的终端设备部署**
- **MiniCPM-V 2.0**MiniCPM-V系列的最轻量级模型。总参数量2B多模态综合性能超越 Yi-VL 34B、CogVLM-Chat 17B、Qwen-VL-Chat 10B 等更大参数规模的模型,可接受 180 万像素的任意长宽比图像输入,实现了和 Gemini Pro 相近的场景文字识别能力以及和 GPT-4V 相匹的低幻觉率。