From c0c5581f227a7e01f675c37348d05f5d6ad19be7 Mon Sep 17 00:00:00 2001
From: yiranyyu <2606375857@qq.com>
Date: Fri, 24 May 2024 11:57:33 +0800
Subject: [PATCH] update readme

---
 README.md    | 11 ++++++++---
 README_zh.md | 12 ++++++++----
 2 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/README.md b/README.md
index 3c6d7a6..c39e121 100644
--- a/README.md
+++ b/README.md
@@ -25,6 +25,7 @@
 
 ## News <!-- omit in toc -->
 
+* [2024.05.24] MiniCPM-Llama3-V 2.5 supports [llama.cpp](#inference-with-llamacpp) now, providing a smooth inference of 6-8 tokens/s on mobile phones. Try it now!
 * [2024.05.23] 🔍 We've released a comprehensive comparison between Phi-3-vision-128k-instruct and MiniCPM-Llama3-V 2.5, including benchmarks evaluations, and multilingual capabilities 🌟📊🌍. Click [here](./docs/compare_with_phi-3_vision.md) to view more details.
 * [2024.05.20] We open-soure MiniCPM-Llama3-V 2.5, it has improved OCR capability and supports 30+ languages, representing the first end-side MLLM achieving GPT-4V level performance! We provide [efficient inference](#deployment-on-mobile-phone) and [simple fine-tuning](./finetune/readme.md). Try it now!
 * [2024.04.23] MiniCPM-V-2.0 supports vLLM now! Click [here](#vllm) to view more details.
@@ -51,7 +52,7 @@
   - [Inference on Mac](#inference-on-mac)
   - [Deployment on Mobile Phone](#deployment-on-mobile-phone)
   - [WebUI Demo](#webui-demo)
-  - [Inference with llama.cpp](#llamacpp)
+  - [Inference with llama.cpp](#inference-with-llamacpp)
   - [Inference with vLLM](#inference-with-vllm)
 - [Fine-tuning](#fine-tuning)
 - [TODO](#todo)
@@ -586,8 +587,12 @@ PYTORCH_ENABLE_MPS_FALLBACK=1 python web_demo_2.5.py --device mps
 ```
 </details>
 
-### Inference with llama.cpp<a id="llamacpp"></a>
-MiniCPM-Llama3-V 2.5 can run with llama.cpp now! See our fork of [llama.cpp](https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5/examples/minicpmv) for more detail.
+### Inference with llama.cpp<a id="inference-with-llamacpp"></a>
+MiniCPM-Llama3-V 2.5 can run with llama.cpp now! See our fork of [llama.cpp](https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5/examples/minicpmv) for more detail. This implementation supports smooth inference of 6~8 token/s on mobile phone<sup>1</sup>.
+
+<small>
+1. Test environment：Xiaomi 14 pro + Snapdragon 8 Gen 3
+</small>
 
 ### Inference with vLLM<a id="vllm"></a>
 
diff --git a/README_zh.md b/README_zh.md
index c35d8c7..fefdb21 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -28,8 +28,8 @@
 
 ## 更新日志 <!-- omit in toc -->
 
+* [2024.05.24] MiniCPM-Llama3-V 2.5现在支持 [llama.cpp](#llamacpp-部署) 推理了！实现端侧 6-8 tokens/s 的流畅推理，欢迎试用！
 * [2024.05.23] 🔍 我们添加了Phi-3-vision-128k-instruct 与 MiniCPM-Llama3-V 2.5的全面对比，包括基准测试评估和多语言能力 🌟📊🌍。点击[这里](./docs/compare_with_phi-3_vision.md)查看详细信息。
-<!-- * [2024.05.22] 我们进一步提升了端侧推理速度！实现了 6-8 tokens/s 的流畅体验，欢迎试用！ -->
 * [2024.05.20] 我们开源了 MiniCPM-Llama3-V 2.5，增强了 OCR 能力，支持 30 多种语言，并首次在端侧实现了 GPT-4V 级的多模态能力！我们提供了[高效推理](#手机端部署)和[简易微调](./finetune/readme.md)的支持，欢迎试用！
 * [2024.04.23] 我们增加了对 [vLLM](#vllm) 的支持，欢迎体验！
 * [2024.04.18] 我们在 HuggingFace Space 新增了 MiniCPM-V 2.0 的 [demo](https://huggingface.co/spaces/openbmb/MiniCPM-V-2)，欢迎体验！
@@ -55,7 +55,7 @@
   - [Mac 推理](#mac-推理)
   - [手机端部署](#手机端部署)
   - [本地WebUI Demo部署](#本地webui-demo部署)
-  - [llama.cpp部署](#llamacpp)
+  - [llama.cpp 部署](#llamacpp-部署)
   - [vLLM 部署 ](#vllm-部署-)
 - [微调](#微调)
 - [未来计划](#未来计划)
@@ -601,8 +601,12 @@ PYTORCH_ENABLE_MPS_FALLBACK=1 python web_demo_2.5.py --device mps
 ```
 </details>
 
-### llama.cpp 部署<a id="llamacpp"></a>
-MiniCPM-Llama3-V 2.5 现在支持llama.cpp啦! 用法请参考我们的fork [llama.cpp](https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5/examples/minicpmv) .
+### llama.cpp 部署<a id="llamacpp-部署"></a>
+MiniCPM-Llama3-V 2.5 现在支持llama.cpp啦! 用法请参考我们的fork [llama.cpp](https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5/examples/minicpmv)， 在手机上可以支持 6~8 token/s 的流畅推理<sup>1</sup>。
+
+<small>
+1. 测试环境：Xiaomi 14 pro + Snapdragon 8 Gen 3
+</small>
 
 ### vLLM 部署 <a id='vllm'></a>
 <details>