diff --git a/README.md b/README.md index 8c9ab92..7f723db 100644 --- a/README.md +++ b/README.md @@ -480,7 +480,7 @@ pip install -r requirements.txt | Model | Memory |          Description | Download | |:-----------|:-----------:|:-------------------|:---------------:| | MiniCPM-Llama3-V 2.5 | 19 GB | The lastest version, achieving state-of-the end-side multimodal performance. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/)    [](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5) | -| MiniCPM-Llama3-V 2.5 gguf | 5 GB | The gguf version, lower GPU memory and faster inference. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf)          | +| MiniCPM-Llama3-V 2.5 gguf | 5 GB | The gguf version, lower GPU memory and faster inference. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf)   [](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-gguf) | | MiniCPM-Llama3-V 2.5 int4 | 8 GB | The int4 quantized version,lower GPU memory usage. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-int4/)    [](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-int4) | | MiniCPM-V 2.0 | 8 GB | Light version, balance the performance the computation cost. | [🤗](https://huggingface.co/openbmb/MiniCPM-V-2)    [](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2) | | MiniCPM-V 1.0 | 7 GB | Lightest version, achieving the fastest inference. | [🤗](https://huggingface.co/openbmb/MiniCPM-V)    [](https://modelscope.cn/models/OpenBMB/MiniCPM-V) | diff --git a/README_en.md b/README_en.md index 3c6d7a6..7f723db 100644 --- a/README_en.md +++ b/README_en.md @@ -25,6 +25,7 @@ ## News +* [2024.05.24] We release the [MiniCPM-Llama3-V 2.5 gguf](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf), which supports [llama.cpp](#inference-with-llamacpp) inference and provides a 6~8 token/s smooth decoding on mobile phones. Try it now! * [2024.05.23] 🔍 We've released a comprehensive comparison between Phi-3-vision-128k-instruct and MiniCPM-Llama3-V 2.5, including benchmarks evaluations, and multilingual capabilities 🌟📊🌍. Click [here](./docs/compare_with_phi-3_vision.md) to view more details. * [2024.05.20] We open-soure MiniCPM-Llama3-V 2.5, it has improved OCR capability and supports 30+ languages, representing the first end-side MLLM achieving GPT-4V level performance! We provide [efficient inference](#deployment-on-mobile-phone) and [simple fine-tuning](./finetune/readme.md). Try it now! * [2024.04.23] MiniCPM-V-2.0 supports vLLM now! Click [here](#vllm) to view more details. @@ -51,7 +52,7 @@ - [Inference on Mac](#inference-on-mac) - [Deployment on Mobile Phone](#deployment-on-mobile-phone) - [WebUI Demo](#webui-demo) - - [Inference with llama.cpp](#llamacpp) + - [Inference with llama.cpp](#inference-with-llamacpp) - [Inference with vLLM](#inference-with-vllm) - [Fine-tuning](#fine-tuning) - [TODO](#todo) @@ -476,10 +477,11 @@ pip install -r requirements.txt ### Model Zoo -| Model | GPU Memory |          Description | Download Link | +| Model | Memory |          Description | Download | |:-----------|:-----------:|:-------------------|:---------------:| | MiniCPM-Llama3-V 2.5 | 19 GB | The lastest version, achieving state-of-the end-side multimodal performance. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/)    [](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5) | -| MiniCPM-Llama3-V 2.5 int4 | 8 GB | int4 quantized version,lower GPU memory usage. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-int4/)    [](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-int4) | +| MiniCPM-Llama3-V 2.5 gguf | 5 GB | The gguf version, lower GPU memory and faster inference. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf)   [](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-gguf) | +| MiniCPM-Llama3-V 2.5 int4 | 8 GB | The int4 quantized version,lower GPU memory usage. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-int4/)    [](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-int4) | | MiniCPM-V 2.0 | 8 GB | Light version, balance the performance the computation cost. | [🤗](https://huggingface.co/openbmb/MiniCPM-V-2)    [](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2) | | MiniCPM-V 1.0 | 7 GB | Lightest version, achieving the fastest inference. | [🤗](https://huggingface.co/openbmb/MiniCPM-V)    [](https://modelscope.cn/models/OpenBMB/MiniCPM-V) | @@ -586,8 +588,12 @@ PYTORCH_ENABLE_MPS_FALLBACK=1 python web_demo_2.5.py --device mps ``` -### Inference with llama.cpp -MiniCPM-Llama3-V 2.5 can run with llama.cpp now! See our fork of [llama.cpp](https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5/examples/minicpmv) for more detail. +### Inference with llama.cpp +MiniCPM-Llama3-V 2.5 can run with llama.cpp now! See our fork of [llama.cpp](https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5/examples/minicpmv) for more detail. This implementation supports smooth inference of 6~8 token/s on mobile phones1. + + +1. Test environment:Xiaomi 14 pro + Snapdragon 8 Gen 3 + ### Inference with vLLM diff --git a/README_zh.md b/README_zh.md index 2853caf..1c138ef 100644 --- a/README_zh.md +++ b/README_zh.md @@ -491,7 +491,7 @@ pip install -r requirements.txt | 模型 | 显存占用 |          简介 | 下载链接 | |:--------------|:--------:|:-------------------|:---------------:| | MiniCPM-Llama3-V 2.5| 19 GB | 最新版本,提供最佳的端侧多模态理解能力。 | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/)    [](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5) | -| MiniCPM-Llama3-V 2.5 gguf | 5 GB | gguf 版本,更低的内存占用和更高的推理效率。 | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf)         | +| MiniCPM-Llama3-V 2.5 gguf | 5 GB | gguf 版本,更低的内存占用和更高的推理效率。 | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf)    [](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-gguf) | | MiniCPM-Llama3-V 2.5 int4 | 8 GB | int4量化版,更低显存占用。 | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-int4/)    [](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-int4) | | MiniCPM-V 2.0 | 8 GB | 轻量级版本,平衡计算开销和多模态理解能力。 | [🤗](https://huggingface.co/openbmb/MiniCPM-V-2)    [](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2) | | MiniCPM-V 1.0 | 7 GB | 最轻量版本, 提供最快的推理速度。 | [🤗](https://huggingface.co/openbmb/MiniCPM-V)    [](https://modelscope.cn/models/OpenBMB/MiniCPM-V) |