From b2943a6f57bc4aa72dedad195bce556637079d37 Mon Sep 17 00:00:00 2001 From: Alphi <52458637+HwwwwwwwH@users.noreply.github.com> Date: Wed, 7 Aug 2024 18:38:45 +0800 Subject: [PATCH] Update README_en.md --- README_en.md | 90 ++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 77 insertions(+), 13 deletions(-) diff --git a/README_en.md b/README_en.md index e352647..3801cd1 100644 --- a/README_en.md +++ b/README_en.md @@ -30,7 +30,7 @@ Join our ๐Ÿ’ฌ WeChat #### ๐Ÿ“Œ Pinned * [2024.08.06] ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ We open-source MiniCPM-V 2.6, which outperforms GPT-4V on single image, multi-image and video understanding. It advances popular features of MiniCPM-Llama3-V 2.5, and can support real-time video understanding on iPad. Try it now! * [2024.08.03] MiniCPM-Llama3-V 2.5 technical report is released! See [here](https://arxiv.org/abs/2408.01800). -* [2024.07.19] MiniCPM-Llama3-V 2.5 supports vLLM now! See [here](#vllm). +* [2024.07.19] MiniCPM-Llama3-V 2.5 supports vLLM now! See [here](#inference-with-vllm). * [2024.05.28] ๐Ÿš€๐Ÿš€๐Ÿš€ MiniCPM-Llama3-V 2.5 now fully supports its feature in llama.cpp and ollama! Please pull the latest code **of our provided forks** ([llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md), [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5)). GGUF models in various sizes are available [here](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/tree/main). MiniCPM-Llama3-V 2.5 series is **not supported by the official repositories yet**, and we are working hard to merge PRs. Please stay tuned! * [2024.05.28] ๐Ÿ’ซ We now support LoRA fine-tuning for MiniCPM-Llama3-V 2.5, using only 2 V100 GPUs! See more statistics [here](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#model-fine-tuning-memory-usage-statistics). * [2024.05.23] ๐Ÿ” We've released a comprehensive comparison between Phi-3-vision-128k-instruct and MiniCPM-Llama3-V 2.5, including benchmarks evaluations, multilingual capabilities, and inference efficiency ๐ŸŒŸ๐Ÿ“Š๐ŸŒ๐Ÿš€. Click [here](./docs/compare_with_phi-3_vision.md) to view more details. @@ -45,7 +45,7 @@ Join our ๐Ÿ’ฌ WeChat * [2024.05.25] MiniCPM-Llama3-V 2.5 now supports streaming outputs and customized system prompts. Try it [here](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5#usage)! * [2024.05.24] We release the MiniCPM-Llama3-V 2.5 [gguf](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf), which supports [llama.cpp](#inference-with-llamacpp) inference and provides a 6~8 token/s smooth decoding on mobile phones. Try it now! * [2024.05.20] We open-soure MiniCPM-Llama3-V 2.5, it has improved OCR capability and supports 30+ languages, representing the first end-side MLLM achieving GPT-4V level performance! We provide [efficient inference](#deployment-on-mobile-phone) and [simple fine-tuning](./finetune/readme.md). Try it now! -* [2024.04.23] MiniCPM-V-2.0 supports vLLM now! Click [here](#vllm) to view more details. +* [2024.04.23] MiniCPM-V-2.0 supports vLLM now! Click [here](#inference-with-vllm) to view more details. * [2024.04.18] We create a HuggingFace Space to host the demo of MiniCPM-V 2.0 at [here](https://huggingface.co/spaces/openbmb/MiniCPM-V-2)! * [2024.04.17] MiniCPM-V-2.0 supports deploying [WebUI Demo](#webui-demo) now! * [2024.04.15] MiniCPM-V-2.0 now also supports [fine-tuning](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v-2ๆœ€ไฝณๅฎž่ทต.md) with the SWIFT framework! @@ -1517,23 +1517,87 @@ MiniCPM-V 2.6 can run with ollama now! See [our fork of ollama](https://github.c
vLLM now officially supports MiniCPM-V 2.0, MiniCPM-Llama3-V 2.5 and MiniCPM-V 2.6, Click to see. -1. Clone the official vLLM: +1. Install vLLM(==0.5.4): ```shell -git clone https://github.com/vllm-project/vllm.git +pip install vllm ``` -2. Install vLLM: -```shell -cd vllm -pip install -e . -``` -3. Install timm: (optional, MiniCPM-V 2.0 need timm) +2. Install timm: (optional, MiniCPM-V 2.0 need timm) ```shell pip install timm==0.9.10 ``` -4. Run the example:๏ผˆAttention: If you use model in local path, please update the model code to the latest version on Hugging Face.) -```shell -python examples/minicpmv_example.py +3. Run the example(for image): +```python +from transformers import AutoTokenizer +from PIL import Image +from vllm import LLM, SamplingParams + +MODEL_NAME = "openbmb/MiniCPM-V-2_6" +# Also available for previous models +# MODEL_NAME = "openbmb/MiniCPM-Llama3-V-2_5" +# MODEL_NAME = "HwwwH/MiniCPM-V-2" + +image = Image.open("xxx.png").convert("RGB") +tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True) +llm = LLM( + model=MODEL_NAME, + trust_remote_code=True, + gpu_memory_utilization=1, + max_model_len=2048 +) + +messages = [{ + "role": + "user", + "content": + # Number of images + "(./)" + \ + "\nWhat is the content of this image?" +}] +prompt = tokenizer.apply_chat_template( + messages, + tokenize=False, + add_generation_prompt=True +) + +# Single Inference +inputs = { + "prompt": prompt, + "multi_modal_data": { + "image": image + # Multi images, the number of images should be equal to that of `(./)` + # "image": [image, image] + }, +} +# Batch Inference +# inputs = [{ +# "prompt": prompt, +# "multi_modal_data": { +# "image": image +# }, +# } for _ in 2] + + +# 2.6 +stop_tokens = ['<|im_end|>', '<|endoftext|>'] +stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens] +# 2.0 +# stop_token_ids = [tokenizer.eos_id] +# 2.5 +# stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id] + +sampling_params = SamplingParams( + stop_token_ids=stop_token_ids, + use_beam_search=True, + temperature=0, + best_of=3, + max_tokens=1024 +) + +outputs = llm.generate(inputs, sampling_params=sampling_params) + +print(outputs[0].outputs[0].text) ``` +4. click [here](https://modelbest.feishu.cn/wiki/C2BWw4ZP0iCDy7kkCPCcX2BHnOf?from=from_copylink) if you want to use it with *video*, or get more details about `vLLM`.
## Fine-tuning