diff --git a/README.md b/README.md index f0250e8..5fc5b0f 100644 --- a/README.md +++ b/README.md @@ -135,7 +135,7 @@ **MiniCPM-V 4.5** is the latest and most capable model in the MiniCPM-V series. The model is built on Qwen3-8B and SigLIP2-400M with a total of 8B parameters. It exhibits a significant performance improvement over previous MiniCPM-V and MiniCPM-o models, and introduces new useful features. Notable features of MiniCPM-V 4.5 include: - 🔥 **State-of-the-art Vision-Language Capability.** - MiniCPM-V 4.5 achieves an average score of 77.2 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks. **With only 8B parameters, it surpasses widely used proprietary models like GPT-4o-latest, Gemini-2.0 Pro, and strong open-source models like Qwen2.5-VL 72B** for vision-language capabilities, making it the most performant MLLM under 30B parameters. + MiniCPM-V 4.5 achieves an average score of 77.0 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks. **With only 8B parameters, it surpasses widely used proprietary models like GPT-4o-latest, Gemini-2.0 Pro, and strong open-source models like Qwen2.5-VL 72B** for vision-language capabilities, making it the most performant MLLM under 30B parameters. - 🎬 **Efficient High Refresh Rate and Long Video Understanding.** Powered by a new unified 3D-Resampler over images and videos, MiniCPM-V 4.5 can now achieve 96x compression rate for video tokens, where 6 448x448 video frames can be jointly compressed into 64 video tokens (normally 1,536 tokens for most MLLMs). This means that the model can percieve significantly more video frames without increasing the LLM inference cost. This brings state-of-the-art high refresh rate (up to 10FPS) video understanding and long video understanding capabilities on Video-MME, LVBench, MLVU, MotionBench, FavorBench, etc., efficiently. diff --git a/README_zh.md b/README_zh.md index 1efd0e1..4e1195d 100644 --- a/README_zh.md +++ b/README_zh.md @@ -127,7 +127,7 @@ - 🔥 **领先的视觉理解能力** - MiniCPM-V 4.5 在 OpenCompass 综合评测(涵盖 8 个主流评测基准)中取得了 77.2 的高分。**在仅 8B 参数的情况下超越了广泛使用的闭源模型(如 GPT-4o-latest、Gemini-2.0 Pro)以及强大的开源模型(如 Qwen2.5-VL 72B)**,成为 30B 参数规模以下最强的多模态大模型。 + MiniCPM-V 4.5 在 OpenCompass 综合评测(涵盖 8 个主流评测基准)中取得了 77.0 的高分。**在仅 8B 参数的情况下超越了广泛使用的闭源模型(如 GPT-4o-latest、Gemini-2.0 Pro)以及强大的开源模型(如 Qwen2.5-VL 72B)**,成为 30B 参数规模以下最强的多模态大模型。 - 🎬 **高效的高帧率与长视频理解** 借助全新的图像-视频统一 3D-Resampler,MiniCPM-V 4.5 能够实现 96 倍视频 token 压缩率,即将 6 帧 448x448 视频帧联合压缩为 64 个 token(大多数多模态大模型需约 1536 个 token)。这意味着模型在语言模型推理成本不增加的情况下,可以感知显著更多的视频帧,从而实现业界领先的 高帧率(最高 10FPS)视频理解与长视频理解,并在 Video-MME、LVBench、MLVU、MotionBench、FavorBench 等基准上高效率地展现出色性能。