mirror of
https://github.com/OpenBMB/MiniCPM-V.git
synced 2026-02-05 02:09:20 +08:00
update readme
This commit is contained in:
16
README_en.md
16
README_en.md
@@ -25,6 +25,7 @@
|
||||
|
||||
## News <!-- omit in toc -->
|
||||
|
||||
* [2024.05.24] We release the [MiniCPM-Llama3-V 2.5 gguf](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf), which supports [llama.cpp](#inference-with-llamacpp) inference and provides a 6~8 token/s smooth decoding on mobile phones. Try it now!
|
||||
* [2024.05.23] 🔍 We've released a comprehensive comparison between Phi-3-vision-128k-instruct and MiniCPM-Llama3-V 2.5, including benchmarks evaluations, and multilingual capabilities 🌟📊🌍. Click [here](./docs/compare_with_phi-3_vision.md) to view more details.
|
||||
* [2024.05.20] We open-soure MiniCPM-Llama3-V 2.5, it has improved OCR capability and supports 30+ languages, representing the first end-side MLLM achieving GPT-4V level performance! We provide [efficient inference](#deployment-on-mobile-phone) and [simple fine-tuning](./finetune/readme.md). Try it now!
|
||||
* [2024.04.23] MiniCPM-V-2.0 supports vLLM now! Click [here](#vllm) to view more details.
|
||||
@@ -51,7 +52,7 @@
|
||||
- [Inference on Mac](#inference-on-mac)
|
||||
- [Deployment on Mobile Phone](#deployment-on-mobile-phone)
|
||||
- [WebUI Demo](#webui-demo)
|
||||
- [Inference with llama.cpp](#llamacpp)
|
||||
- [Inference with llama.cpp](#inference-with-llamacpp)
|
||||
- [Inference with vLLM](#inference-with-vllm)
|
||||
- [Fine-tuning](#fine-tuning)
|
||||
- [TODO](#todo)
|
||||
@@ -476,10 +477,11 @@ pip install -r requirements.txt
|
||||
|
||||
### Model Zoo
|
||||
|
||||
| Model | GPU Memory |          Description | Download Link |
|
||||
| Model | Memory |          Description | Download |
|
||||
|:-----------|:-----------:|:-------------------|:---------------:|
|
||||
| MiniCPM-Llama3-V 2.5 | 19 GB | The lastest version, achieving state-of-the end-side multimodal performance. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/) [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5) |
|
||||
| MiniCPM-Llama3-V 2.5 int4 | 8 GB | int4 quantized version,lower GPU memory usage. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-int4/) [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-int4) |
|
||||
| MiniCPM-Llama3-V 2.5 gguf | 5 GB | The gguf version, lower GPU memory and faster inference. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-gguf) |
|
||||
| MiniCPM-Llama3-V 2.5 int4 | 8 GB | The int4 quantized version,lower GPU memory usage. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-int4/) [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-int4) |
|
||||
| MiniCPM-V 2.0 | 8 GB | Light version, balance the performance the computation cost. | [🤗](https://huggingface.co/openbmb/MiniCPM-V-2) [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2) |
|
||||
| MiniCPM-V 1.0 | 7 GB | Lightest version, achieving the fastest inference. | [🤗](https://huggingface.co/openbmb/MiniCPM-V) [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V) |
|
||||
|
||||
@@ -586,8 +588,12 @@ PYTORCH_ENABLE_MPS_FALLBACK=1 python web_demo_2.5.py --device mps
|
||||
```
|
||||
</details>
|
||||
|
||||
### Inference with llama.cpp<a id="llamacpp"></a>
|
||||
MiniCPM-Llama3-V 2.5 can run with llama.cpp now! See our fork of [llama.cpp](https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5/examples/minicpmv) for more detail.
|
||||
### Inference with llama.cpp<a id="inference-with-llamacpp"></a>
|
||||
MiniCPM-Llama3-V 2.5 can run with llama.cpp now! See our fork of [llama.cpp](https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5/examples/minicpmv) for more detail. This implementation supports smooth inference of 6~8 token/s on mobile phones<sup>1</sup>.
|
||||
|
||||
<small>
|
||||
1. Test environment:Xiaomi 14 pro + Snapdragon 8 Gen 3
|
||||
</small>
|
||||
|
||||
### Inference with vLLM<a id="vllm"></a>
|
||||
|
||||
|
||||
Reference in New Issue
Block a user