## MiniCPM-Llama3-V 2.5 > Archieve at: 2025-01-13 **MiniCPM-Llama3-V 2.5** is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.0. Notable features of MiniCPM-Llama3-V 2.5 include: - 🔥 **Leading Performance.** MiniCPM-Llama3-V 2.5 has achieved an average score of 65.1 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. **With only 8B parameters, it surpasses widely used proprietary models like GPT-4V-1106, Gemini Pro, Claude 3 and Qwen-VL-Max** and greatly outperforms other Llama 3-based MLLMs. - 💪 **Strong OCR Capabilities.** MiniCPM-Llama3-V 2.5 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving a **700+ score on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V-0409, Qwen-VL-Max and Gemini Pro**. Based on recent user feedback, MiniCPM-Llama3-V 2.5 has now enhanced full-text OCR extraction, table-to-markdown conversion, and other high-utility capabilities, and has further strengthened its instruction-following and complex reasoning abilities, enhancing multimodal interaction experiences. - 🏆 **Trustworthy Behavior.** Leveraging the latest [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) method (the newest technique in the [RLHF-V](https://github.com/RLHF-V) [CVPR'24] series), MiniCPM-Llama3-V 2.5 exhibits more trustworthy behavior. It achieves a **10.3%** hallucination rate on Object HalBench, lower than GPT-4V-1106 (13.6%), achieving the best-level performance within the open-source community. [Data released](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset). - 🌏 **Multilingual Support.** Thanks to the strong multilingual capabilities of Llama 3 and the cross-lingual generalization technique from [VisCPM](https://github.com/OpenBMB/VisCPM), MiniCPM-Llama3-V 2.5 extends its bilingual (Chinese-English) multimodal capabilities to **over 30 languages including German, French, Spanish, Italian, Korean etc.** [All Supported Languages](./assets/minicpm-llama-v-2-5_languages.md). - 🚀 **Efficient Deployment.** MiniCPM-Llama3-V 2.5 systematically employs **model quantization, CPU optimizations, NPU optimizations and compilation optimizations**, achieving high-efficiency deployment on end-side devices. For mobile phones with Qualcomm chips, we have integrated the NPU acceleration framework QNN into llama.cpp for the first time. After systematic optimization, MiniCPM-Llama3-V 2.5 has realized a **150x acceleration in end-side MLLM image encoding** and a **3x speedup in language decoding**. - 💫 **Easy Usage.** MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) and [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) support for efficient CPU inference on local devices, (2) [GGUF](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) format quantized models in 16 sizes, (3) efficient [LoRA](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#lora-finetuning) fine-tuning with only 2 V100 GPUs, (4) [streaming output](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5#usage), (5) quick local WebUI demo setup with [Gradio](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_2.5.py) and [Streamlit](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_streamlit-2_5.py), and (6) interactive demos on [HuggingFace Spaces](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5). ### Evaluation
Click to view results on TextVQA, DocVQA, OCRBench, OpenCompass, MME, MMBench, MMMU, MathVista, LLaVA Bench, RealWorld QA, Object HalBench.
Model Size OCRBench TextVQA val DocVQA test Open-Compass MME MMB test (en) MMB test (cn) MMMU val Math-Vista LLaVA Bench RealWorld QA Object HalBench
Proprietary
Gemini Pro - 680 74.6 88.1 62.9 2148.9 73.6 74.3 48.9 45.8 79.9 60.4 -
GPT-4V (2023.11.06) - 645 78.0 88.4 63.5 1771.5 77.0 74.4 53.8 47.8 93.1 63.0 86.4
Open-source
Mini-Gemini 2.2B - 56.2 34.2* - 1653.0 - - 31.7 - - - -
Qwen-VL-Chat 9.6B 488 61.5 62.6 51.6 1860.0 61.8 56.3 37.0 33.8 67.7 49.3 56.2
DeepSeek-VL-7B 7.3B 435 64.7* 47.0* 54.6 1765.4 73.8 71.4 38.3 36.8 77.8 54.2 -
Yi-VL-34B 34B 290 43.4* 16.9* 52.2 2050.2 72.4 70.7 45.1 30.7 62.3 54.8 79.3
CogVLM-Chat 17.4B 590 70.4 33.3* 54.2 1736.6 65.8 55.9 37.3 34.7 73.9 60.3 73.6
TextMonkey 9.7B 558 64.3 66.7 - - - - - - - - -
Idefics2 8.0B - 73.0 74.0 57.2 1847.6 75.7 68.6 45.2 52.2 49.1 60.7 -
Bunny-LLama-3-8B 8.4B - - - 54.3 1920.3 77.0 73.9 41.3 31.5 61.2 58.8 -
LLaVA-NeXT Llama-3-8B 8.4B - - 78.2 - 1971.5 - - 41.7 37.5 80.1 60.0 -
Phi-3-vision-128k-instruct 4.2B 639* 70.9 - - 1537.5* - - 40.4 44.5 64.2* 58.8* -
MiniCPM-V 1.0 2.8B 366 60.6 38.2 47.5 1650.2 64.1 62.6 38.3 28.9 51.3 51.2 78.4
MiniCPM-V 2.0 2.8B 605 74.1 71.9 54.5 1808.6 69.1 66.5 38.2 38.7 69.2 55.8 85.5
MiniCPM-Llama3-V 2.5 8.5B 725 76.6 84.8 65.1 2024.6 77.2 74.2 45.8 54.3 86.7 63.5 89.7
* We evaluate the officially released checkpoint by ourselves.

Evaluation results of multilingual LLaVA Bench
### Examples

### Model Zoo | Model | Device | Memory |          Description | Download | |:-----------|:--:|:-----------:|:-------------------|:---------------:| | MiniCPM-Llama3-V 2.5 | GPU | 19 GB | Strong end-side multimodal performance. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/)    [](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5) | | MiniCPM-Llama3-V 2.5 gguf | CPU | 6 GB | The gguf version, lower memory usage and faster inference. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf)   [](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-gguf) | | MiniCPM-Llama3-V 2.5 int4 | GPU | 8 GB | The int4 quantized version, lower GPU memory usage. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-int4/)    [](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-int4) |