diff --git a/README.md b/README.md index 05176b9..b73b2f9 100644 --- a/README.md +++ b/README.md @@ -44,6 +44,7 @@ - [Online Demo](#online-demo) - [Install](#install) - [Inference](#inference) + - [Hardware Requirements](#hardware-requirements) - [Model Zoo](#model-zoo) - [Multi-turn Conversation](#multi-turn-conversation) - [Inference on Mac](#inference-on-mac) @@ -453,6 +454,15 @@ pip install -r requirements.txt ## Inference +### Hardware Requirements + +| Model | GPU Memory | +|:----------------------|:-------------------:| +| MiniCPM-Llama3-V 2.5 | 19 GB | +| MiniCPM-Llama3-V 2.5 (int4) | 8 GB | +| MiniCPM-Llama3-V 2.0 | 8 GB | + + ### Model Zoo | Model | Description | Download Link | |:----------------------|:-------------------|:---------------:| @@ -589,13 +599,13 @@ python examples/minicpmv_example.py ### Simple Fine-tuning -We supports simple fine-tuning with Hugging Face for MiniCPM-V 2.0 and MiniCPM-Llama3-V 2.5. +We support simple fine-tuning with Hugging Face for MiniCPM-V 2.0 and MiniCPM-Llama3-V 2.5. [Reference Document](./finetune/readme.md) ### With the SWIFT Framework -We now support finetune MiniCPM-V series with the SWIFT framework. SWIFT supports training, inference, evaluation and deployment of nearly 200 LLMs and MLLMs . It supports the lightweight training solutions provided by PEFT and a complete Adapters Library including techniques such as NEFTune, LoRA+ and LLaMA-PRO. +We now support MiniCPM-V series fine-tuning with the SWIFT framework. SWIFT supports training, inference, evaluation and deployment of nearly 200 LLMs and MLLMs . It supports the lightweight training solutions provided by PEFT and a complete Adapters Library including techniques such as NEFTune, LoRA+ and LLaMA-PRO. Best Practices:[MiniCPM-V 1.0](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v最佳实践.md), [MiniCPM-V 2.0](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v-2最佳实践.md) @@ -618,9 +628,9 @@ Please contact cpm@modelbest.cn to obtain written authorization for commercial u ## Statement -As LMMs, OmniLMMs generate contents by learning a large amount of multimodal corpora, but they cannot comprehend, express personal opinions or make value judgement. Anything generated by OmniLMMs does not represent the views and positions of the model developers +As LMMs, MiniCPM-V models (including OmniLMM) generate contents by learning a large amount of multimodal corpora, but they cannot comprehend, express personal opinions or make value judgement. Anything generated by MiniCPM-V models does not represent the views and positions of the model developers -We will not be liable for any problems arising from the use of OmniLMM open source models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model. +We will not be liable for any problems arising from the use of MiniCPMV-V models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model. ## Institutions diff --git a/README_zh.md b/README_zh.md index 29f737c..9fbb527 100644 --- a/README_zh.md +++ b/README_zh.md @@ -28,6 +28,7 @@ ## 更新日志 + * [2024.05.20] 我们开源了 MiniCPM-Llama3-V 2.5,增强了 OCR 能力,支持 30 多种语言,并首次在端侧实现了 GPT-4V 级的多模态能力!我们提供了[高效推理](#手机端部署)和[简易微调](./finetune/readme.md)的支持,欢迎试用! * [2024.04.23] 我们增加了对 [vLLM](#vllm) 的支持,欢迎体验! * [2024.04.18] 我们在 HuggingFace Space 新增了 MiniCPM-V 2.0 的 [demo](https://huggingface.co/spaces/openbmb/MiniCPM-V-2),欢迎体验! diff --git a/docs/compare_with_phi-3_vision.md b/docs/compare_with_phi-3_vision.md new file mode 100644 index 0000000..af7ad50 --- /dev/null +++ b/docs/compare_with_phi-3_vision.md @@ -0,0 +1,31 @@ +## Phi-3-vision-128K-Instruct vs MiniCPM-Llama3-V 2.5 + +Comparison results of Phi-3-vision-128K-Instruct and MiniCPM-Llama3-V 2.5, regarding the model size, hardware requirements, and performances on multiple popular benchmarks. + +我们提供了从模型参数、硬件需求、全面性能指标等方面对比 Phi-3-vision-128K-Instruct 和 MiniCPM-Llama3-V 2.5 的结果。 + + ## Hardeware Requirements (硬件需求) + +With in4 quantization, MiniCPM-Llama3-V 2.5 delivers smooth inference of 6-8 tokens/s with only 8GB of GPU memory. + +通过 in4 量化,MiniCPM-Llama3-V 2.5 仅需 8GB 显存即可提供 6-8 tokens/s 的流畅推理。 + +| Model(模型) | GPU Memory(显存) | +|:----------------------|:-------------------:| +| [MiniCPM-Llama3-V 2.5](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/) | 19 GB | +| Phi-3-vision-128K-Instruct | 12 GB | +| [MiniCPM-Llama3-V 2.5 (int4)](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-int4/) | 8 GB | + +## Model Size and Peformance (模型参数和性能) + + + +| | Phi-3-vision-128K-Instruct | MiniCPM-Llama3-V 2.5| +|:-|:----------:|:-------------------:| +| Size(参数) | **4B** | 8B| +| OpenCompass | 53.7 | **58.8** | +| OCRBench | 639.0 | **725.0**| +| RealworldQA | 58.8 | **63.5**| +| TextVQA | 72.2 | **76.6** | +| ScienceQA| **90.8** | 89.0 | +| POPE | 83.4 | **87.2** | \ No newline at end of file diff --git a/requirements.txt b/requirements.txt index a7de2d2..dce85f9 100644 --- a/requirements.txt +++ b/requirements.txt @@ -29,5 +29,5 @@ uvicorn==0.24.0.post1 sentencepiece==0.1.99 accelerate==0.30.1 socksio==1.0.0 -gradio==4.31.4 -gradio_client==0.16.4 \ No newline at end of file +gradio +gradio_client