diff --git a/README.md b/README.md
index e8e6c29..a498122 100644
--- a/README.md
+++ b/README.md
@@ -1,43 +1,54 @@
 <div align="center">
 
-<img src="./assets/minicpmv.png" width="300em" ></img> 
+<img src="./assets/MiniCPM-o.png" width="300em" ></img> 
 
-**A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone**
+**A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone**
 
   <strong>[中文](./README_zh.md) |
   English</strong>
 
-Join our <a href="docs/wechat.md" target="_blank"> 💬 WeChat</a> | View  MiniCPM-V <a href="docs/best_practice_summary.md" target="_blank"> 📖 best practices</a>
+
+
+<span style="display: inline-flex; align-items: center; margin-right: 2px;">
+  <img src="./assets/wechat.png" alt="WeChat" style="margin-right: 4px;">
+  <a href="docs/wechat.md" target="_blank"> WeChat</a> &nbsp;|
+</span>
+&nbsp;
+<span style="display: inline-flex; align-items: center; margin-left: -8px;">
+<img src="./assets/discord.png" alt="Discord" style="margin-right: 4px;">
+  <a href="https://discord.gg/uQpn8kKx" target="_blank"> Discord</a>  
+</span>
+
 
 
 <p align="center">
-  MiniCPM-V 2.6 <a href="https://huggingface.co/openbmb/MiniCPM-V-2_6">🤗</a> <a href="http://120.92.209.146:8887/">🤖</a> | MiniCPM-Llama3-V 2.5  <a href="https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/">🤗</a> <a href="https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5">🤖</a> |
-  <a href=https://arxiv.org/abs/2408.01800>MiniCPM-Llama3-V 2.5 Technical Report</a> 
+  MiniCPM-o 2.6 <a href="https://huggingface.co/openbmb/MiniCPM-o-2_6">🤗</a> <a href="https://minicpm-omni-webdemo.modelbest.cn/"> CN🤖</a> <a href="https://minicpm-omni-webdemo-us.modelbest.cn/"> US🤖</a> | MiniCPM-V 2.6 <a href="https://huggingface.co/openbmb/MiniCPM-V-2_6">🤗</a> <a href="http://120.92.209.146:8887/">🤖</a> | 
+  Technical Blog Coming Soon
 </p>
 
 </div>
 
+**MiniCPM-o** is the latest series of end-side multimodal LLMs (MLLMs) ungraded from MiniCPM-V. The models can now take image, video, text, and audio as inputs and provide high-quality text and speech outputs in an end-to-end fashion. Since February 2024, we have released 6 versions of the model, aiming to achieve **strong performance and efficient deployment**. The most notable models in the series currently include:
 
-**MiniCPM-V** is a series of end-side multimodal LLMs (MLLMs) designed for vision-language understanding. The models take image, video and text as inputs and provide high-quality text outputs. Since February 2024, we have released 5 versions of the model, aiming to achieve **strong performance and efficient deployment**. The most notable models in this series currently include:
+- **MiniCPM-o 2.6**: 🔥🔥🔥 The latest and most capable model in the MiniCPM-o series. With a total of 8B parameters, this end-to-end model **achieves comparable performance to GPT-4o-202405 in vision, speech, and multimodal live streaming**, making it one of the most versatile and performant models in the open-source community. For the new voice mode, MiniCPM-o 2.6 **supports bilingual real-time speech conversation with configurable voices**, and also allows for fun capabilities such as emotion/speed/style control, end-to-end voice cloning, role play, etc. It also advances MiniCPM-V 2.6's visual capabilities such **strong OCR capability, trustworthy behavior, multilingual support, and video understanding**. Due to its superior token density, MiniCPM-o 2.6 can for the first time **support multimodal live streaming on end-side devices** such as iPad.
 
-- **MiniCPM-V 2.6**: 🔥🔥🔥 The latest and most capable model in the MiniCPM-V series. With a total of 8B parameters, the model **surpasses GPT-4V in single image, multi-image and video understanding**. It outperforms **GPT-4o mini, Gemini 1.5 Pro and Claude 3.5 Sonnet** in single image understanding, and advances MiniCPM-Llama3-V 2.5's features such as strong OCR capability, trustworthy behavior, multilingual support, and end-side deployment. Due to its superior token density, MiniCPM-V 2.6 can for the first time support real-time video understanding on end-side devices such as iPad.
+- **MiniCPM-V 2.6**: The most capable model in the MiniCPM-V series. With a total of 8B parameters, the model **surpasses GPT-4V in single image, multi-image and video understanding**. It outperforms **GPT-4o mini, Gemini 1.5 Pro and Claude 3.5 Sonnet** in single image understanding, and can for the first time support real-time video understanding on iPad.
 
-- **MiniCPM-V 2.0**: The lightest model in the MiniCPM-V series. With 2B parameters, it surpasses larger models such as Yi-VL 34B, CogVLM-Chat 17B, and Qwen-VL-Chat 10B in overall performance. It can accept image inputs of any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving comparable performance with Gemini Pro in understanding scene-text and matches GPT-4V in low hallucination rates.
 
 
 ## News <!-- omit in toc -->
 
 #### 📌 Pinned
 
+* [2025.01.13] 🔥🔥🔥 We open-source MiniCPM-o 2.6, which matches GPT-4o-202405 on vision, speech and multimodal live streaming. It advances popular capabitlies of MiniCPM-V 2.6, and supports various new fun features. Try it now!
+
+
 * [2024.08.17] 🚀🚀🚀 MiniCPM-V 2.6 is now fully supported by [official](https://github.com/ggerganov/llama.cpp) llama.cpp! GGUF models of various sizes are available [here](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf).
-* [2024.08.15] We now also support multi-image SFT. For more details, please refer to the [document](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune).
-* [2024.08.14] MiniCPM-V 2.6 now also supports [fine-tuning](https://github.com/modelscope/ms-swift/issues/1613) with the SWIFT framework!
-* [2024.08.10] 🚀🚀🚀 MiniCPM-Llama3-V 2.5 is now fully supported by [official](https://github.com/ggerganov/llama.cpp) llama.cpp! GGUF models of various sizes are available [here](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf).
+
 * [2024.08.06] 🔥🔥🔥 We open-source MiniCPM-V 2.6, which outperforms GPT-4V on single image, multi-image and video understanding. It advances popular features of MiniCPM-Llama3-V 2.5, and can support real-time video understanding on iPad. Try it now!
+
 * [2024.08.03] MiniCPM-Llama3-V 2.5 technical report is released! See [here](https://arxiv.org/abs/2408.01800).
-* [2024.07.19] MiniCPM-Llama3-V 2.5 supports vLLM now! See [here](#inference-with-vllm).
-* [2024.05.28] 💫 We now support LoRA fine-tuning for MiniCPM-Llama3-V 2.5, using only 2 V100 GPUs! See more statistics [here](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#model-fine-tuning-memory-usage-statistics).
-* [2024.05.23] 🔍 We've released a comprehensive comparison between Phi-3-vision-128k-instruct and MiniCPM-Llama3-V 2.5, including benchmarks evaluations, multilingual capabilities, and inference efficiency 🌟📊🌍🚀. Click [here](./docs/compare_with_phi-3_vision.md) to view more details.
+
 * [2024.05.23] 🔥🔥🔥 MiniCPM-V tops GitHub Trending and Hugging Face Trending! Our demo, recommended by Hugging Face Gradio’s official account, is available [here](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5). Come and try it out!
 
 <br>
@@ -45,10 +56,22 @@ Join our <a href="docs/wechat.md" target="_blank"> 💬 WeChat</a> | View  MiniC
 <details> 
 <summary>Click to view more news.</summary>
 
+* [2024.08.15] We now also support multi-image SFT. For more details, please refer to the [document](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune).
+* [2024.08.14] MiniCPM-V 2.6 now also supports [fine-tuning](https://github.com/modelscope/ms-swift/issues/1613) with the SWIFT framework!
+* [2024.08.10] 🚀🚀🚀 MiniCPM-Llama3-V 2.5 is now fully supported by [official](https://github.com/ggerganov/llama.cpp) llama.cpp! GGUF models of various sizes are available [here](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf).
+
+* [2024.07.19] MiniCPM-Llama3-V 2.5 supports vLLM now! See [here](#inference-with-vllm).
+
 * [2024.06.03] Now, you can run MiniCPM-Llama3-V 2.5 on multiple low VRAM GPUs(12 GB or 16 GB) by distributing the model's layers across multiple GPUs. For more details, Check this [link](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md).
 * [2024.05.28] 🚀🚀🚀 MiniCPM-Llama3-V 2.5 now fully supports its feature in llama.cpp and ollama! Please pull the latest code **of our provided forks** ([llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md), [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5)). GGUF models in various sizes are available [here](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/tree/main). MiniCPM-Llama3-V 2.5 series is **not supported by the official repositories yet**, and we are working hard to merge PRs. Please stay tuned!
+
+* [2024.05.28] 💫 We now support LoRA fine-tuning for MiniCPM-Llama3-V 2.5, using only 2 V100 GPUs! See more statistics [here](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#model-fine-tuning-memory-usage-statistics).
+
 * [2024.05.25] MiniCPM-Llama3-V 2.5 now supports streaming outputs and customized system prompts. Try it [here](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5#usage)!
 * [2024.05.24] We release the MiniCPM-Llama3-V 2.5 [gguf](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf), which supports [llama.cpp](#inference-with-llamacpp) inference and provides a 6~8 token/s smooth decoding on mobile phones. Try it now!
+
+* [2024.05.23] 🔍 We've released a comprehensive comparison between Phi-3-vision-128k-instruct and MiniCPM-Llama3-V 2.5, including benchmarks evaluations, multilingual capabilities, and inference efficiency 🌟📊🌍🚀. Click [here](./docs/compare_with_phi-3_vision.md) to view more details.
+
 * [2024.05.20] We open-soure MiniCPM-Llama3-V 2.5, it has improved OCR capability and supports 30+ languages, representing the first end-side MLLM achieving GPT-4V level performance! We provide [efficient inference](#deployment-on-mobile-phone) and [simple fine-tuning](./finetune/readme.md). Try it now!
 * [2024.04.23] MiniCPM-V-2.0 supports vLLM now! Click [here](#inference-with-vllm) to view more details.
 * [2024.04.18] We create a HuggingFace Space to host the demo of MiniCPM-V 2.0 at [here](https://huggingface.co/spaces/openbmb/MiniCPM-V-2)!
@@ -64,29 +87,943 @@ Join our <a href="docs/wechat.md" target="_blank"> 💬 WeChat</a> | View  MiniC
 ## Contents <!-- omit in toc -->
 
 
+- [MiniCPM-o 2.6](#minicpm-o-26)
 - [MiniCPM-V 2.6](#minicpm-v-26)
-- [MiniCPM-Llama3-V 2.5](#minicpm-llama3-v-25)
-- [MiniCPM-V 2.0](#minicpm-v-20)
 - [Chat with Our Demo on Gradio 🤗](#chat-with-our-demo-on-gradio-)
 - [Install](#install)
 - [Inference](#inference)
   - [Model Zoo](#model-zoo)
   - [Multi-turn Conversation](#multi-turn-conversation)
-    - [Chat with multiple images](#chat-with-multiple-images)
-    - [In-context few-shot learning](#in-context-few-shot-learning)
-    - [Chat with video](#chat-with-video)
+    - [Chat with Multiple Images](#chat-with-multiple-images)
+    - [In-context Few-shot Learning](#in-context-few-shot-learning)
+    - [Chat with Video](#chat-with-video)
+    - [Speech Conversation](#speech-conversation)
+      - [Mimick](#mimick)
+      - [General Speech Conversation with Configurable Voices](#general-speech-conversation-with-configurable-voices)
+      - [Addressing Various Audio Tasks](#addressing-various-audio-tasks)
+    - [Multimodal Live Streaming](#multimodal-live-streaming)
   - [Inference on Multiple GPUs](#inference-on-multiple-gpus)
   - [Inference on Mac](#inference-on-mac)
   - [Deployment on Mobile Phone](#deployment-on-mobile-phone)
-  - [Inference with llama.cpp](#inference-with-llamacpp)
-  - [Inference with ollama](#inference-with-ollama)
-  - [Inference with vLLM](#inference-with-vllm)
+  - [Efficient Inference with llama.cpp, ollama, vLLM](#efficient-inference-with-llamacpp-ollama-vllm)
 - [Fine-tuning](#fine-tuning)
 - [FAQs](#faqs)
+- [Limitations](#limitations)
+
+
+## MiniCPM-o 2.6
+
+**MiniCPM-o 2.6** is the latest and most capable model in the MiniCPM-o series. The model is built in an end-to-end fashion based on SigLip-400M, Whisper-medium-300M, ChatTTS-200M, and Qwen2.5-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.6, and introduces new features for real-time speech conversation and multimodal live streaming. Notable features of MiniCPM-o 2.6 include:
+
+- 🔥 **Leading Visual Capability.**
+  MiniCPM-o 2.6 achieves an average score of 70.2 on OpenCompass, a comprehensive evaluation over 8 popular benchmarks. **With only 8B parameters, it surpasses widely used proprietary models like GPT-4o-202405, Gemini 1.5 Pro, and Claude 3.5 Sonnet** for single image understanding. It also **outperforms GPT-4V and Claude 3.5 Sonnet** in mutli-image and video understanding, and shows promising in-context learning capability.
+
+- 🎙 **State-of-the-art Speech Capability.** MiniCPM-o 2.6 supports **bilingual real-time speech conversation with configurable voices** in English and Chinese. It **outperforms GPT-4o-realtime on audio understanding tasks** such as ASR and STT translation, and shows **state-of-the-art performance on speech conversation in both semantic and acoustic evaluations in the open-source community**. It also allows for fun features such as emotion/speed/style control, end-to-end voice cloning, role play, etc.
+
+- 🎬 **Strong Multimodal Live Streaming Capability.** As a new feature, MiniCPM-o 2.6 can **accept continous video and audio streams independent of user queries, and support real-time speech interaction**. It **outperforms GPT-4o-202408 and Claude 3.5 Sonnet and shows state-of-art performance in open-source community on StreamingBench**, a comprehensive benchmark for real-time video understanding, omni-source (video & audio) understanding, and multimodal contextual understanding.										
+
+- 💪 **Strong OCR Capability and Others.**
+Advancing popular visual capabilites from MiniCPM-V series, MiniCPM-o 2.6 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344). It achieves **state-of-the-art performance on OCRBench for models under 25B, surpassing proprietary models such as GPT-4o-202405**.
+  Based on the the latest [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) and [VisCPM](https://github.com/OpenBMB/VisCPM) techniques, it features **trustworthy behaviors**, outperforming GPT-4o and Claude 3.5 Sonnet on MMHal-Bench, and supports **multilingual capabilities** on more than 30 languages.
+
+
+- 🚀 **Superior Efficiency.**
+  In addition to its friendly size, MiniCPM-o 2.6 also shows **state-of-the-art token density** (i.e., number of pixels encoded into each visual token). **It produces only 640 tokens when processing a 1.8M pixel image, which is 75% fewer than most models**. This directly improves the inference speed, first-token latency, memory usage, and power consumption. As a result, MiniCPM-o 2.6 can efficiently support **multimodal live streaming** on end-side devices such as iPad.
+
+-  💫  **Easy Usage.**
+MiniCPM-o 2.6 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-omni/examples/llava/README-minicpmo2.6.md) support for efficient CPU inference on local devices, (2) [int4](https://huggingface.co/openbmb/MiniCPM-o-2_6-int4) and [GGUF](https://huggingface.co/openbmb/MiniCPM-o-2_6-gguf) format quantized models in 16 sizes, (3) [vLLM](#efficient-inference-with-llamacpp-ollama-vllm) support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks with [LLaMA-Factory](./docs/llamafactory_train.md), (5) quick local WebUI demo setup with [Gradio](#chat-with-our-demo-on-gradio), and (6) online web demo on [CN](https://minicpm-omni-webdemo.modelbest.cn/ 
+) server and [US](https://minicpm-omni-webdemo-us.modelbest.cn/) server.
+
+
+**Model Architecture.**
+
+- **End-to-end Omni-modal Architecture.** Different modality encoder/decoders are connected and trained in an **end-to-end** fashion to fully exploit rich multimodal knowledge.
+- **Omni-modal Live Streaming Mechanism.** (1) We change the offline modality encoder/decoders into online ones for **streaminig inputs/outputs.** (2) We devise a **time-division multiplexing (TDM) mechanism** for omni-modality streaminig processing in the LLM backbone. It divides parallel omni-modality streams into sequential info within small periodic time slices. 
+- **Configurable Speech Modeling Design.** We devise a multimodal system prompt, including traditional text system prompt, and **a new audio system prompt to determine the assistant voice**. This enables flexible voice configurations in inference time, and also facilitates end-to-end voice cloning and description-based voice creation.
+
+<div align="center">
+<img src="./assets/minicpm-o-26-framework.png" , width=80%>
+</div>
+
+
+### Evaluation  <!-- omit in toc -->
+
+<div align="center">
+  <img src="./assets/radar.jpg", width=70%>
+</div>
+
+<details>
+<summary>Click to view visual understanding results.</summary>
+
+**Image Understanding**
+
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th>Token Density<sup>+</sup></th>
+            <th>OpenCompass</th>
+            <th>OCRBench</th>
+            <th>MathVista mini</th>
+            <th>ChartQA</th>
+            <th>MMVet</th>
+            <th>MMStar</th>
+            <th>MME</th>
+            <th>MMB1.1 test</th>
+            <th>AI2D</th>
+            <th>MMMU val</th>
+            <th>HallusionBench</th>
+            <th>TextVQA val</th>
+            <th>DocVQA test</th>
+            <th>MathVerse mini</th>
+            <th>MathVision</th>
+            <th>MMHal Score</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="19" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o-20240513</td>
+            <td>-</td>
+            <td>1088</td>
+            <td><u>69.9</u></td>
+            <td>736</td>
+            <td>61.3</td>
+            <td>85.7</td>
+            <td><strong>69.1</strong></td>
+            <td>63.9</td>
+            <td>2328.7</td>
+            <td>82.2</td>
+            <td>84.6</td>
+            <td><strong>69.2</strong></td>
+            <td><strong>55.0</strong></td>
+            <td>-</td>
+            <td>92.8</td>
+            <td><strong>50.2</strong></td>
+            <td><strong>30.4</strong></td>
+            <td><u>3.6</u></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Claude3.5-Sonnet</td>
+            <td>-</td>
+            <td>750</td>
+            <td>67.9</td>
+            <td>788</td>
+            <td>61.6</td>
+            <td><strong>90.8</strong></td>
+            <td>66.0</td>
+            <td>62.2</td>
+            <td>1920.0</td>
+            <td>78.5</td>
+            <td>80.2</td>
+            <td><u>65.9</u></td>
+            <td>49.9</td>
+            <td>-</td>
+            <td><strong>95.2</strong></td>
+            <td>-</td>
+            <td>-</td>
+            <td>3.4</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Gemini-1.5-Pro</td>
+            <td>-</td>
+            <td>-</td>
+            <td>64.4</td>
+            <td>754</td>
+            <td>57.7</td>
+            <td>81.3</td>
+            <td>64.0</td>
+            <td>59.1</td>
+            <td>2110.6</td>
+            <td>73.9</td>
+            <td>79.1</td>
+            <td>60.6</td>
+            <td>45.6</td>
+            <td>73.5</td>
+            <td>86.5</td>
+            <td>-</td>
+            <td>19.2</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o-mini-20240718</td>
+            <td>-</td>
+            <td>1088</td>
+            <td>64.1</td>
+            <td>785</td>
+            <td>52.4</td>
+            <td>-</td>
+            <td>66.9</td>
+            <td>54.8</td>
+            <td>2003.4</td>
+            <td>76.0</td>
+            <td>77.8</td>
+            <td>60.0</td>
+            <td>46.1</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>3.3</td>
+        </tr>
+        <tr>
+            <td colspan="19" align="left"><strong>Open Source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Cambrian-34B</td>
+            <td>34B</td>
+            <td><u>1820</u></td>
+            <td>58.3</td>
+            <td>591</td>
+            <td>50.3</td>
+            <td>75.6</td>
+            <td>53.2</td>
+            <td>54.2</td>
+            <td>2049.9</td>
+            <td>77.8</td>
+            <td>79.5</td>
+            <td>50.4</td>
+            <td>41.6</td>
+            <td>76.7</td>
+            <td>75.5</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GLM-4V-9B</td>
+            <td>13B</td>
+            <td>784</td>
+            <td>59.1</td>
+            <td>776</td>
+            <td>51.1</td>
+            <td>-</td>
+            <td>58.0</td>
+            <td>54.8</td>
+            <td>2018.8</td>
+            <td>67.9</td>
+            <td>71.2</td>
+            <td>46.9</td>
+            <td>45.0</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Pixtral-12B</td>
+            <td>12B</td>
+            <td>256</td>
+            <td>61.0</td>
+            <td>685</td>
+            <td>56.9</td>
+            <td>81.8</td>
+            <td>58.5</td>
+            <td>54.5</td>
+            <td>-</td>
+            <td>72.7</td>
+            <td>79.0</td>
+            <td>51.1</td>
+            <td>47.0</td>
+            <td>75.7</td>
+            <td>90.7</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">DeepSeek-VL2-27B (4B)</td>
+            <td>27B</td>
+            <td>672</td>
+            <td>66.4</td>
+            <td>809</td>
+            <td>63.9</td>
+            <td>86.0</td>
+            <td>60.0</td>
+            <td>61.9</td>
+            <td>2253.0</td>
+            <td>81.2</td>
+            <td>83.8</td>
+            <td>54.0</td>
+            <td>45.3</td>
+            <td><u>84.2</u></td>
+            <td>93.3</td>
+            <td>-</td>
+            <td>-</td>
+            <td>3.0</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2-VL-7B</td>
+            <td>8B</td>
+            <td>784</td>
+            <td>67.1</td>
+            <td><u>866</u></td>
+            <td>58.2</td>
+            <td>83.0</td>
+            <td>62.0</td>
+            <td>60.7</td>
+            <td>2326.0</td>
+            <td>81.8</td>
+            <td>83.0</td>
+            <td>54.1</td>
+            <td>50.6</td>
+            <td><strong>84.3</strong></td>
+            <td><u>94.5</u></td>
+            <td>31.9</td>
+            <td>16.3</td>
+            <td>3.2</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-OneVision-72B</td>
+            <td>72B</td>
+            <td>182</td>
+            <td>68.1</td>
+            <td>741</td>
+            <td>67.5</td>
+            <td>83.7</td>
+            <td>60.6</td>
+            <td><strong>65.8</strong></td>
+            <td>2261.0</td>
+            <td><strong>85.0</strong></td>
+            <td><u>85.6</u></td>
+            <td>56.8</td>
+            <td>49.0</td>
+            <td>80.5</td>
+            <td>91.3</td>
+            <td>39.1</td>
+            <td>-</td>
+            <td>3.5</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL-2.5-8B</td>
+            <td>8B</td>
+            <td>706</td>
+            <td>68.3</td>
+            <td>822</td>
+            <td><u>64.4</u></td>
+            <td>84.8</td>
+            <td>62.8</td>
+            <td>62.8</td>
+            <td>2344.0</td>
+            <td><u>83.6</u></td>
+            <td>84.5</td>
+            <td>56.0</td>
+            <td>50.1</td>
+            <td>79.1</td>
+            <td>93.0</td>
+            <td>39.5</td>
+            <td>19.7</td>
+            <td>3.4</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
+            <td>8B</td>
+            <td><strong>2822</strong></td>
+            <td>65.2</td>
+            <td>852*</td>
+            <td>60.6</td>
+            <td>79.4</td>
+            <td>60.0</td>
+            <td>57.5</td>
+            <td><u>2348.4*</u></td>
+            <td>78.0</td>
+            <td>82.1</td>
+            <td>49.8*</td>
+            <td>48.1*</td>
+            <td>80.1</td>
+            <td>90.8</td>
+            <td>25.7</td>
+            <td>18.3</td>
+            <td>3.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o 2.6</td>
+            <td>8B</td>
+            <td><strong>2822</strong></td>
+            <td><strong>70.2</strong></td>
+            <td><strong>897*</strong></td>
+            <td><strong>71.9*</strong></td>
+            <td><u>86.9*</u></td>
+            <td><u>67.5</u></td>
+            <td><u>64.0</u></td>
+            <td><strong>2372.0*</strong></td>
+            <td>80.5</td>
+            <td><strong>85.8</strong></td>
+            <td>50.4*</td>
+            <td><u>51.9</u></td>
+            <td>82.0</td>
+            <td>93.5</td>
+            <td><u>41.4*</u></td>
+            <td><u>23.1*</u></td>
+            <td><strong>3.8</strong></td>
+        </tr>
+    </tbody>
+</table>
+</div>
+* We evaluate this benchmark using chain-of-thought prompting. Specifically, for MME, we used this technique only for the Cognition set.
+
+
+<sup>+</sup> Token Density: number of pixels encoded into each visual token at maximum resolution, i.e., # pixels at maximum resolution / # visual tokens.
+
+Note: For proprietary models, we calculate token density based on the image encoding charging strategy defined in the official API documentation, which provides an upper-bound estimation.
+
+
+**Multi-image and Video Understanding**
+
+<div align="center">
+ 
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th>BLINK-val</th>
+            <th>Mantis-Eval</th>
+            <th>MIRB</th>
+            <th>Video-MME (wo / w subs)</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="6" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o-20240513</td>
+            <td>-</td>
+            <td><strong>68.0</strong></td>
+            <td>-</td>
+            <td>-</td>
+            <td><strong>71.9/77.2<strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT4V</td>
+            <td>-</td>
+            <td>54.6</td>
+            <td>62.7</td>
+            <td>53.1</td>
+            <td>59.9/63.3</td>
+        </tr>
+        <tr>
+            <td colspan="6" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT-Interleave 14B</td>
+            <td>14B</td>
+            <td>52.6</td>
+            <td>66.4</td>
+            <td>30.2</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-One-Vision-72B</td>
+            <td>72B</td>
+            <td>55.4</td>
+            <td><strong>77.6</strong></td>
+            <td>-</td>
+            <td><u>66.2/69.5</u></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MANTIS 8B</td>
+            <td>8B</td>
+            <td>49.1</td>
+            <td>59.5</td>
+            <td>34.8</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2-VL-7B</td>
+            <td>8B</td>
+            <td>53.2</td>
+            <td>69.6*</td>
+            <td><strong>67.6*</strong></td>
+            <td>63.3/69.0</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL-2.5-8B</td>
+            <td>8B</td>
+            <td>54.8</td>
+            <td>67.7</td>
+            <td>52.5</td>
+            <td>64.2/66.9</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
+            <td>8B</td>
+            <td>53.0</td>
+            <td>69.1</td>
+            <td>53.8</td>
+            <td>60.9/63.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o 2.6</td>
+            <td>8B</td>
+            <td><u>56.7</u></td>
+            <td><u>71.9</u></td>
+            <td><u>58.6</u></td>
+            <td>63.9/67.9</td>
+        </tr>
+    </tbody>
+</table>
+
+</div>
+* We evaluate officially released checkpoints by ourselves.
+
+</details>
+
+
+<details>
+<summary>Click to view audio understanding and speech conversation results.</summary>
+
+**Audio Understanding**
+
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Task</th>
+            <th>Size</th>
+            <th colspan="3">ASR (zh)</th>
+            <th colspan="3">ASR (en)</th>
+            <th colspan="2">AST</th>
+            <th>Emotion</th>
+        </tr>
+        <tr>
+            <th align="left">Metric</th>
+            <td></td>
+            <th colspan="3">CER↓</th>
+            <th colspan="3">WER↓</th>
+            <th colspan="2">BLEU↑</th>
+            <th>ACC↑</th>
+        </tr>
+        <tr>
+            <th align="left">Dataset</th>
+            <td></td>
+            <th>AISHELL-1</th>
+            <th>Fleurs zh</th>
+            <th>WenetSpeech test-net</th>
+            <th>LibriSpeech test-clean</th>
+            <th>GigaSpeech</th>
+            <th>TED-LIUM</th>
+            <th>CoVoST en2zh</th>
+            <th>CoVoST zh2en</th>
+            <th>MELD emotion</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="11" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o-Realtime</td>
+            <td>-</td>
+            <td>7.3*</td>
+            <td><u>5.4*</u></td>
+            <td>28.9*</td>
+            <td>2.6*</td>
+            <td>12.9*</td>
+            <td>4.8*</td>
+            <td>37.1*</td>
+            <td>15.7*</td>
+            <td>33.2*</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Gemini-1.5-Pro</td>
+            <td>-</td>
+            <td>4.5*</td>
+            <td>5.9*</td>
+            <td>14.3*</td>
+            <td>2.9*</td>
+            <td>10.6*</td>
+            <td><strong>3.0*</strong></td>
+            <td><u>47.3*</u></td>
+            <td>22.6*</td>
+            <td>48.4*</td>
+        </tr>
+        <tr>
+            <td colspan="11" align="left"><strong>Open-Source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2-Audio-Base</td>
+            <td>8B</td>
+            <td>-</td>
+            <td>7.5</td>
+            <td>-</td>
+            <td><strong>1.6</strong></td>
+            <td>-</td>
+            <td>-</td>
+            <td>45.2</td>
+            <td><u>24.4</u></td>
+            <td><strong>55.3</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2-Audio-Instruction</td>
+            <td>8B</td>
+            <td>2.6*</td>
+            <td>6.9*</td>
+            <td><u>10.3*</u></td>
+            <td>3.1*</td>
+            <td><u>9.7</u>*</td>
+            <td>5.9*</td>
+            <td>39.5*</td>
+            <td>22.9*</td>
+            <td>17.4*</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GLM-4-Voice-Base</td>
+            <td>9B</td>
+            <td><u>2.5</u></td>
+            <td>-</td>
+            <td>-</td>
+            <td>2.8</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o 2.6</td>
+            <td>8B</td>
+            <td><strong>1.6</strong></td>
+            <td><strong>4.4</strong></td>
+            <td><strong>6.9</strong></td>
+            <td><u>1.7</u></td>
+            <td><strong>8.7</strong></td>
+            <td><strong>3.0</strong></td>
+            <td><strong>48.2</strong></td>
+            <td><strong>27.2</strong></td>
+            <td><u>52.4</u></td>
+        </tr>
+    </tbody>
+</table>
+</div>
+* We evaluate officially released checkpoints by ourselves.<br><br>
+
+**Speech Generation**
+
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Task</th>
+            <th>Size</th>
+            <th colspan="9">SpeechQA</th>
+        </tr>
+        <tr>
+            <th align="left">Metric</th>
+            <th></th>
+            <th colspan="3">ACC↑</th>
+            <th>G-Eval (10 point)↑</th>
+            <th>Semantic ELO score↑</th>
+            <th>Acoustic ELO score↑</th>
+            <th>Overall ELO score↑</th>
+            <th>UTMOS↑</th>
+            <th>ASR-WER↓</th>
+        </tr>
+        <tr>
+            <th align="left">Dataset</th>
+            <th></th>
+            <th>Speech Llama Q.</th>
+            <th>Speech Web Q.</th>
+            <th>Speech Trivia QA</th>
+            <th>Speech AlpacaEval</th>
+            <th colspan="5">AudioArena</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="11" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o-Realtime</td>
+            <td></td>
+            <td><strong>71.7</strong></td>
+            <td><strong>51.6</strong></td>
+            <td><strong>69.7</strong></td>
+            <td><strong>7.4</strong></td>
+            <td><strong>1157</strong></td>
+            <td><strong>1203</strong></td>
+            <td><strong>1200</strong></td>
+            <td><strong>4.2</strong></td>
+            <td><strong>2.3</strong></td>
+        </tr>
+        <tr>
+            <td colspan="11" align="left"><strong>Open-Source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GLM-4-Voice</td>
+            <td>9B</td>
+            <td>50.0</td>
+            <td>32.0</td>
+            <td>36.4</td>
+            <td><u>5.1</u></td>
+            <td>999</td>
+            <td>1147</td>
+            <td>1035</td>
+            <td><u>4.1</u></td>
+            <td><u>11.7</u></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Llama-Omni</td>
+            <td>8B</td>
+            <td>45.3</td>
+            <td>22.9</td>
+            <td>10.7</td>
+            <td>3.9</td>
+            <td>960</td>
+            <td>878</td>
+            <td>897</td>
+            <td>3.2</td>
+            <td>24.3</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Moshi</td>
+            <td>7B</td>
+            <td>43.7</td>
+            <td>23.8</td>
+            <td>16.7</td>
+            <td>2.4</td>
+            <td>871</td>
+            <td>808</td>
+            <td>875</td>
+            <td>2.8</td>
+            <td>8.2</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Mini-Omni</td>
+            <td>1B</td>
+            <td>22.0</td>
+            <td>12.8</td>
+            <td>6.9</td>
+            <td>2.5</td>
+            <td>926</td>
+            <td>803</td>
+            <td>865</td>
+            <td>3.4</td>
+            <td>10.0</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o 2.6</td>
+            <td>8B</td>
+            <td><u>61.0</u></td>
+            <td><u>40.0</u></td>
+            <td><u>40.2</u></td>
+            <td><u>5.1</u></td>
+            <td><u>1088</u></td>
+            <td><u>1163</u></td>
+            <td><u>1131</u></td>
+            <td><strong>4.2</strong></td>
+            <td>9.8</td>
+        </tr>
+    </tbody>
+</table>
+</div>
+All results are from AudioEvals, and the evaluation methods along with further details can be found in <a href="https://github.com/OpenBMB/UltraEval-Audio" target="_blank">AudioEvals</a>.<br><br>
+
+**End-to-end Voice Cloning**
+
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Task</th>
+            <th colspan="2">Voice cloning</th>
+        </tr>
+        <tr>
+            <th align="left">Metric</th>
+            <th>SIMO↑</th>
+            <th>SIMO↑</th>
+        </tr>
+        <tr>
+            <th align="left">Dataset</th>
+            <th>Seed-TTS test-zh</th>
+            <th>Seed-TTS test-en</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td nowrap="nowrap" align="left">F5-TTS</td>
+            <td><strong>76</strong></td>
+            <td><strong>67</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">CosyVoice</td>
+            <td><u>75</u></td>
+            <td><u>64</u></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">FireRedTTS</td>
+            <td>63</td>
+            <td>46</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o 2.6</td>
+            <td>57</td>
+            <td>47</td>
+        </tr>
+    </tbody>
+</table>
+</div>
+
+</details>
+
+<details>
+<summary>Click to view multimodal live streaming results.</summary>
+  
+**Multimodal Live Streaming**: results on StreamingBench
+
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th>Real-Time Video Understanding</th>
+            <th>Omni-Source Understanding</th>
+            <th>Contextual Understanding</th>
+            <th>Overall</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="7" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Gemini 1.5 Pro</td>
+            <td>-</td>
+            <td><u>77.4</u></td>
+            <td><strong>67.8</strong></td>
+            <td><strong>51.1</strong></td>
+            <td><strong>70.3</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o-202408</td>
+            <td>-</td>
+            <td>74.5</td>
+            <td>51.0</td>
+            <td><u>48.0</u></td>
+            <td>64.1</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Claude-3.5-Sonnet</td>
+            <td>-</td>
+            <td>74.0</td>
+            <td>41.4</td>
+            <td>37.8</td>
+            <td>59.7</td>
+        </tr>
+        <tr>
+            <td colspan="9" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">VILA-1.5</td>
+            <td>8B</td>
+            <td>61.5</td>
+            <td>37.5</td>
+            <td>26.7</td>
+            <td>49.5</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LongVA</td>
+            <td>7B</td>
+            <td>63.1</td>
+            <td>35.9</td>
+            <td>30.2</td>
+            <td>50.7</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-Next-Video-34B</td>
+            <td>34B</td>
+            <td>69.8</td>
+            <td>41.7</td>
+            <td>34.3</td>
+            <td>56.7</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2-VL-7B</td>
+            <td>8B</td>
+            <td>71.2</td>
+            <td>40.7</td>
+            <td>33.1</td>
+            <td>57.0</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2-8B</td>
+            <td>8B</td>
+            <td>70.1</td>
+            <td>42.7</td>
+            <td>34.1</td>
+            <td>57.0</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">VITA-1.5</td>
+            <td>8B</td>
+            <td>70.9</td>
+            <td>40.8</td>
+            <td>35.8</td>
+            <td>57.4</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-OneVision-7B</td>
+            <td>8B</td>
+            <td>74.3</td>
+            <td>40.8</td>
+            <td>31.0</td>
+            <td>58.4</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternLM-XC2.5-OL-7B</td>
+            <td>8B</td>
+            <td>75.4</td>
+            <td>46.2</td>
+            <td>33.6</td>
+            <td>60.8</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
+            <td>8B</td>
+            <td>72.4</td>
+            <td>40.2</td>
+            <td>33.4</td>
+            <td>57.7</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o 2.6</td>
+            <td>8B</td>
+            <td><strong>79.9</strong></td>
+            <td><u>53.4</u></td>
+            <td>38.5</td>
+            <td><u>66.0</u></td>
+        </tr>
+    </tbody>
+</table>
+
+</details>
+
+
+### Examples <!-- omit in toc -->
+
+We deploy MiniCPM-o 2.6 on end devices. The demo video is the raw-speed recording on an iPad Pro and a Web demo.
+
+<div align="center">
+  <a href="https://youtu.be/JFJg9KZ_iZk"><img src="./assets/o-2dot6-demo-video-preview.png", width=70%></a>
+</div>
+
+<div style="display: flex; flex-direction: column; align-items: center;">
+  <img src="assets/minicpmo2_6/minicpmo2_6_math_intersect.png" alt="math" style="margin-bottom: 5px;">
+  <img src="assets/minicpmo2_6/minicpmo2_6_diagram_train_NN.png" alt="diagram" style="margin-bottom: 5px;">
+  <img src="assets/minicpmo2_6/minicpmo2_6_multi-image_bike.png" alt="bike" style="margin-bottom: 5px;">
+</div>
 
 
 ## MiniCPM-V 2.6
 
+<details>
+<summary>Click to view more details of MiniCPM-V 2.6</summary>
+
 **MiniCPM-V 2.6** is the latest and most capable model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-Llama3-V 2.5, and introduces new features for multi-image and video understanding. Notable features of MiniCPM-V 2.6 include:
 
 - 🔥 **Leading Performance.**
@@ -364,7 +1301,7 @@ MiniCPM-V 2.6 can be easily used in various ways: (1) [llama.cpp](https://github
             <td>42.4</td>
             <td>10.3</td>
         </tr>
-        <tr style="background-color: #e6f2ff;">
+        <tr>
             <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
             <td>8B</td>
             <td><strong>2822</strong></td>
@@ -489,7 +1426,7 @@ Note: For proprietary models, we calculate token density based on the image enco
             <td>34.4*</td>
             <td><strong>56.9*</strong></td>
         </tr>
-        <tr style="background-color: #e6f2ff;">
+        <tr>
             <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
             <td>8B</td>
             <td><strong>69.1</strong></td>
@@ -634,7 +1571,7 @@ Note: For proprietary models, we calculate token density based on the image enco
             <td>2.64</td>
             <td>3.28</td>
         </tr>
-        <tr style="background-color: #e6f2ff;">
+        <tr>
             <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
             <td>8B</td>
             <td><strong>60.9</strong></td>
@@ -775,7 +1712,7 @@ Note: For proprietary models, we calculate token density based on the image enco
             <td><strong>70.9</strong></td>
             <td>54.1</td>
         </tr>
-        <tr style="background-color: #e6f2ff;">
+        <tr>
             <td align="left" nowrap="nowrap" rowspan="3">MiniCPM-V 2.6<sup>+</sup></td>
             <td rowspan="3">8B</td>
             <td>0</td>
@@ -784,14 +1721,14 @@ Note: For proprietary models, we calculate token density based on the image enco
             <td>45.4</td>
             <td>23.9</td>
         </tr>
-        <tr style="background-color: #e6f2ff;">
+        <tr>
             <td>4</td>
             <td>63.6</td>
             <td>60.5</td>
             <td>65.5</td>
             <td>50.1</td>
         </tr>
-        <tr style="background-color: #e6f2ff;">
+        <tr>
             <td>8</td>
             <td><strong>64.6</strong></td>
             <td><strong>63.4</strong></td>
@@ -850,408 +1787,46 @@ We deploy MiniCPM-V 2.6 on end devices. The demo video is the raw screen recordi
     </p>
 </table>
 
-## MiniCPM-Llama3-V 2.5
-
-<details>
-<summary>Click to view more details of MiniCPM-Llama3-V 2.5</summary>
-
-**MiniCPM-Llama3-V 2.5** is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.0. Notable features of MiniCPM-Llama3-V 2.5 include:
-
-- 🔥 **Leading Performance.**
-  MiniCPM-Llama3-V 2.5 has achieved an average score of 65.1 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. **With only 8B parameters, it surpasses widely used proprietary models like GPT-4V-1106, Gemini Pro, Claude 3 and Qwen-VL-Max** and greatly outperforms other Llama 3-based MLLMs.
-
-- 💪 **Strong OCR Capabilities.**
-  MiniCPM-Llama3-V 2.5 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving a **700+ score on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V-0409, Qwen-VL-Max and Gemini Pro**. Based on recent user feedback, MiniCPM-Llama3-V 2.5 has now enhanced full-text OCR extraction, table-to-markdown conversion, and other high-utility capabilities, and has further strengthened its instruction-following and complex reasoning abilities, enhancing multimodal interaction experiences.
-
-- 🏆 **Trustworthy Behavior.**
-  Leveraging the latest [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) method (the newest technique in the [RLHF-V](https://github.com/RLHF-V) [CVPR'24] series), MiniCPM-Llama3-V 2.5 exhibits more trustworthy behavior. It achieves a **10.3%** hallucination rate on Object HalBench, lower than GPT-4V-1106 (13.6%), achieving the best-level performance within the open-source community. [Data released](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset).
-
-- 🌏 **Multilingual Support.**
-  Thanks to the strong multilingual capabilities of Llama 3 and the cross-lingual generalization technique from [VisCPM](https://github.com/OpenBMB/VisCPM), MiniCPM-Llama3-V 2.5 extends its bilingual (Chinese-English) multimodal capabilities to **over 30 languages including German, French, Spanish, Italian, Korean etc.** [All Supported Languages](./assets/minicpm-llama-v-2-5_languages.md).
-
-- 🚀 **Efficient Deployment.**
-  MiniCPM-Llama3-V 2.5 systematically employs **model quantization, CPU optimizations, NPU optimizations and compilation optimizations**, achieving high-efficiency deployment on end-side devices. For mobile phones with Qualcomm chips, we have integrated the NPU acceleration framework QNN into llama.cpp for the first time. After systematic optimization, MiniCPM-Llama3-V 2.5 has realized a **150x acceleration in end-side MLLM image encoding** and a **3x speedup in language decoding**.
-
--  💫  **Easy Usage.**
-MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) and [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) support for efficient CPU inference on local devices, (2) [GGUF](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) format quantized models in 16 sizes, (3) efficient [LoRA](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#lora-finetuning) fine-tuning with only 2 V100 GPUs, (4) [streaming output](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5#usage), (5) quick local WebUI demo setup with [Gradio](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_2.5.py) and [Streamlit](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_streamlit-2_5.py), and (6) interactive demos on [HuggingFace Spaces](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5).
-
-### Evaluation  <!-- omit in toc -->
-
-<div align="center">
-    <img src=assets/MiniCPM-Llama3-V-2.5-peformance.png width=66% />
-</div>
-<details>
-<summary>Click to view results on TextVQA, DocVQA, OCRBench, OpenCompass, MME, MMBench, MMMU, MathVista, LLaVA Bench, RealWorld QA, Object HalBench. </summary>
-<div align="center">
-
-<table style="margin: 0px auto;">
-    <thead>
-        <tr>
-            <th align="left">Model</th>
-            <th>Size</th>
-            <th>OCRBench</th>
-            <th>TextVQA val</th>
-            <th>DocVQA test</th>
-            <th>Open-Compass</th>
-            <th>MME</th>
-            <th>MMB test (en)</th>
-            <th>MMB test (cn)</th>
-            <th>MMMU val</th>
-            <th>Math-Vista</th>
-            <th>LLaVA Bench</th>
-            <th>RealWorld QA</th>
-            <th>Object HalBench</th>
-        </tr>
-    </thead>
-    <tbody align="center">
-        <tr>
-            <td colspan="14" align="left"><strong>Proprietary</strong></td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Gemini Pro</td>
-            <td>-</td>
-            <td>680</td>
-            <td>74.6</td>
-            <td>88.1</td>
-            <td>62.9</td>
-            <td>2148.9</td>
-            <td>73.6</td>
-            <td>74.3</td>
-            <td>48.9</td>
-            <td>45.8</td>
-            <td>79.9</td>
-            <td>60.4</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">GPT-4V (2023.11.06)</td>
-            <td>-</td>
-            <td>645</td>
-            <td>78.0</td>
-            <td>88.4</td>
-            <td>63.5</td>
-            <td>1771.5</td>
-            <td>77.0</td>
-            <td>74.4</td>
-            <td>53.8</td>
-            <td>47.8</td>
-            <td>93.1</td>
-            <td>63.0</td>
-            <td>86.4</td>
-        </tr>
-        <tr>
-            <td colspan="14" align="left"><strong>Open-source</strong></td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Mini-Gemini</td>
-            <td>2.2B</td>
-            <td>-</td>
-            <td>56.2</td>
-            <td>34.2*</td>
-            <td>-</td>
-            <td>1653.0</td>
-            <td>-</td>
-            <td>-</td>
-            <td>31.7</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Qwen-VL-Chat</td>
-            <td>9.6B</td>
-            <td>488</td>
-            <td>61.5</td>
-            <td>62.6</td>
-            <td>51.6</td>
-            <td>1860.0</td>
-            <td>61.8</td>
-            <td>56.3</td>
-            <td>37.0</td>
-            <td>33.8</td>
-            <td>67.7</td>
-            <td>49.3</td>
-            <td>56.2</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">DeepSeek-VL-7B</td>
-            <td>7.3B</td>
-            <td>435</td>
-            <td>64.7*</td>
-            <td>47.0*</td>
-            <td>54.6</td>
-            <td>1765.4</td>
-            <td>73.8</td>
-            <td>71.4</td>
-            <td>38.3</td>
-            <td>36.8</td>
-            <td>77.8</td>
-            <td>54.2</td>
-            <td>-</td>
-        </tr>        
-        <tr>
-            <td nowrap="nowrap" align="left">Yi-VL-34B</td>
-            <td>34B</td>
-            <td>290</td>
-            <td>43.4*</td>
-            <td>16.9*</td>
-            <td>52.2</td>
-            <td><strong>2050.2</strong></td>
-            <td>72.4</td>
-            <td>70.7</td>
-            <td>45.1</td>
-            <td>30.7</td>
-            <td>62.3</td>
-            <td>54.8</td>
-            <td>79.3</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">CogVLM-Chat</td>
-            <td>17.4B</td>
-            <td>590</td>
-            <td>70.4</td>
-            <td>33.3*</td>
-            <td>54.2</td>
-            <td>1736.6</td>
-            <td>65.8</td>
-            <td>55.9</td>
-            <td>37.3</td>
-            <td>34.7</td>
-            <td>73.9</td>
-            <td>60.3</td>
-            <td>73.6</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">TextMonkey</td>
-            <td>9.7B</td>
-            <td>558</td>
-            <td>64.3</td>
-            <td>66.7</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-        </tr>
-        <tr>
-          <td nowrap="nowrap" align="left">Idefics2</td>
-          <td>8.0B</td>
-          <td>-</td>
-          <td>73.0</td>
-          <td>74.0</td>
-          <td>57.2</td>
-          <td>1847.6</td>
-          <td>75.7</td>
-          <td>68.6</td>
-          <td>45.2</td>
-          <td>52.2</td>
-          <td>49.1</td>
-          <td>60.7</td>
-          <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Bunny-LLama-3-8B</td>
-            <td>8.4B</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>54.3</td>
-            <td>1920.3</td>
-            <td>77.0</td>
-            <td>73.9</td>
-            <td>41.3</td>
-            <td>31.5</td>
-            <td>61.2</td>
-            <td>58.8</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">LLaVA-NeXT Llama-3-8B</td>
-            <td>8.4B</td>
-            <td>-</td>
-            <td>-</td>
-            <td>78.2</td>
-            <td>-</td>
-            <td>1971.5</td>
-            <td>-</td>
-            <td>-</td>
-            <td>41.7</td>
-            <td>37.5</td>
-            <td>80.1</td>
-            <td>60.0</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Phi-3-vision-128k-instruct</td>
-            <td>4.2B</td>
-            <td>639*</td>
-            <td>70.9</td>
-            <td>-</td>
-            <td>-</td>
-            <td>1537.5*</td>
-            <td>-</td>
-            <td>-</td>
-            <td>40.4</td>
-            <td>44.5</td>
-            <td>64.2*</td>
-            <td>58.8*</td>
-            <td>-</td>
-        </tr>
-        <tr style="background-color: #e6f2ff;">
-            <td nowrap="nowrap" align="left">MiniCPM-V 1.0</td>
-            <td>2.8B</td>
-            <td>366</td>
-            <td>60.6</td>
-            <td>38.2</td>
-            <td>47.5</td>
-            <td>1650.2</td>
-            <td>64.1</td>
-            <td>62.6</td>
-            <td>38.3</td>
-            <td>28.9</td>
-            <td>51.3</td>
-            <td>51.2</td>
-            <td>78.4</td>
-        </tr>
-        <tr style="background-color: #e6f2ff;">
-            <td nowrap="nowrap" align="left">MiniCPM-V 2.0</td>
-            <td>2.8B</td>
-            <td>605</td>
-            <td>74.1</td>
-            <td>71.9</td>
-            <td>54.5</td>
-            <td>1808.6</td>
-            <td>69.1</td>
-            <td>66.5</td>
-            <td>38.2</td>
-            <td>38.7</td>
-            <td>69.2</td>
-            <td>55.8</td>
-            <td>85.5</td>
-        </tr>
-        <tr style="background-color: #e6f2ff;">
-            <td nowrap="nowrap" align="left">MiniCPM-Llama3-V 2.5</td>
-            <td>8.5B</td>
-            <td><strong>725</strong></td>
-            <td><strong>76.6</strong></td>
-            <td><strong>84.8</strong></td>
-            <td><strong>65.1</strong></td>
-            <td>2024.6</td>
-            <td><strong>77.2</strong></td>
-            <td><strong>74.2</strong></td>
-            <td><strong>45.8</strong></td>
-            <td><strong>54.3</strong></td>
-            <td><strong>86.7</strong></td>
-            <td><strong>63.5</strong></td>
-            <td><strong>89.7</strong></td>
-        </tr>
-    </tbody>
-</table>
-
-
-</div>
-* We evaluate the officially released checkpoint by ourselves.
-
-</details>
-
-<div align="center">
-    <img src="assets/llavabench_compare_3.png" width="100%" />
-    <br>
-    Evaluation results of multilingual LLaVA Bench
-</div>
-
-### Examples <!-- omit in toc -->
-
-<table align="center" >
-  <p align="center" > 
-  <img src="assets/minicpmv-llama3-v2.5/cases_all.png" />
-  </p>
-</table>
-
-</details>
-
-
-## MiniCPM-V 2.0
-
-<details>
-<summary>Click to view more details of MiniCPM-V 2.0</summary>
-
-
-**MiniCPM-V 2.0** is an efficient version with promising performance for deployment. The model is built based on SigLip-400M and [MiniCPM-2.4B](https://github.com/OpenBMB/MiniCPM/), connected by a perceiver resampler. Our latest version, MiniCPM-V 2.0 has several notable features. 
-
-- 🔥 **State-of-the-art Performance.** 
-
-  MiniCPM-V 2.0 achieves **state-of-the-art performance** on multiple benchmarks (including OCRBench, TextVQA, MME, MMB, MathVista, etc) among models under 7B parameters. It even **outperforms strong Qwen-VL-Chat 9.6B, CogVLM-Chat 17.4B, and Yi-VL 34B on OpenCompass, a comprehensive evaluation over 11 popular benchmarks**. Notably, MiniCPM-V 2.0 shows **strong OCR capability**, achieving **comparable performance to Gemini Pro in scene-text understanding**, and **state-of-the-art performance on OCRBench** among open-source models.
-
-- 🏆 **Trustworthy Behavior.** 
-
-  LMMs are known for suffering from hallucination, often generating text not factually grounded in images. MiniCPM-V 2.0 is **the first end-side LMM aligned via multimodal RLHF for trustworthy behavior** (using the recent [RLHF-V](https://rlhf-v.github.io/) [CVPR'24] series technique). This allows the model to **match GPT-4V in preventing hallucinations** on Object HalBench.
-
-- 🌟 **High-Resolution Images at Any Aspect Raito.**
-
-  MiniCPM-V 2.0 can accept **1.8 million pixels (e.g., 1344x1344) images at any aspect ratio**. This enables better perception of fine-grained visual information such as small objects and optical characters, which is achieved via a recent technique from [LLaVA-UHD](https://arxiv.org/pdf/2403.11703.pdf).
-
-- ⚡️ **High Efficiency.** 
-
-  MiniCPM-V 2.0 can be **efficiently deployed on most GPU cards and personal computers**, and **even on end devices such as mobile phones**. For visual encoding, we compress the image representations into much fewer tokens via a perceiver resampler. This allows MiniCPM-V 2.0 to operate with **favorable memory cost and speed during inference even when dealing with high-resolution images**.
-
-- 🙌 **Bilingual Support.** 
-
-  MiniCPM-V 2.0 **supports strong bilingual multimodal capabilities in both English and Chinese**. This is enabled by generalizing multimodal capabilities across languages, a technique from [VisCPM](https://arxiv.org/abs/2308.12038) [ICLR'24].
-
-### Examples <!-- omit in toc -->
-
-<table align="center">
-    <p align="center">
-      <img src="assets/minicpmv2-cases_2.png" width=95%/>
-    </p>
-</table>
-
-We deploy MiniCPM-V 2.0 on end devices. The demo video is the raw screen recording on a Xiaomi 14 Pro without edition.
-
-<table align="center">
-    <p align="center">
-      <img src="assets/gif_cases/station.gif" width=36%/>
-      <img src="assets/gif_cases/london_car.gif" width=36%/>
-    </p>
-</table>
-
 </details>
 
 ## Legacy Models <!-- omit in toc --> 
 
 | Model                | Introduction and Guidance       |
 |:----------------------|:-------------------:|
-| MiniCPM-V 1.0  | [Document](./minicpm_v1.md)   | 
-| OmniLMM-12B  | [Document](./omnilmm_en.md)   |  
+| MiniCPM-Llama3-V 2.5  | [Document](./docs/minicpm_llama3_v2dot5.md)   | 
+| MiniCPM-V 2.0  | [Document](./docs/minicpm_v2.md)   | 
+| MiniCPM-V 1.0  | [Document](./docs/minicpm_v1.md)   | 
+| OmniLMM-12B  | [Document](././docs/omnilmm_en.md)   |  
 
 
 ## Chat with Our Demo on Gradio 🤗
 
-We provide online and local demos powered by Hugging Face Gradio <a href='https://github.com/gradio-app/gradio'><img src='https://img.shields.io/github/stars/gradio-app/gradio'></a>, the most popular model deployment framework nowadays. It supports streaming outputs, progress bars, queuing, alerts,  and other useful features.
+We provide online and local demos powered by Hugging Face Gradio <a href='https://github.com/gradio-app/gradio'><img src='https://img.shields.io/github/stars/gradio-app/gradio'></a>, the most popular model deployment framework nowadays. It supports streaming outputs, progress bars, queuing, alerts, and other useful features.
 
 
 ### Online Demo <!-- omit in toc --> 
 
-Click here to try out the online demo of [MiniCPM-V 2.6](http://120.92.209.146:8887/) | [MiniCPM-Llama3-V 2.5](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5) | [MiniCPM-V 2.0](https://huggingface.co/spaces/openbmb/MiniCPM-V-2).
+Click here to try out the online demo of MiniCPM-o 2.6 ([CN](https://minicpm-omni-webdemo.modelbest.cn) | [US](https://minicpm-omni-webdemo-us.modelbest.cn/)) | [MiniCPM-V 2.6](http://120.92.209.146:8887/) | [MiniCPM-Llama3-V 2.5](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5) | [MiniCPM-V 2.0](https://huggingface.co/spaces/openbmb/MiniCPM-V-2).
 
 ### Local WebUI Demo <!-- omit in toc --> 
   
-You can easily build your own local WebUI demo with Gradio using the following commands.
-  
-```shell
-pip install -r requirements.txt
-```
-  
-```shell
-# For NVIDIA GPUs, run:
-python web_demo_2.6.py --device cuda
+You can easily build your own local WebUI demo using the following commands.
 
+1. launch model server:
+```shell
+pip install -r requirements_o2.6.txt
+
+python web_demos/minicpm-o_2.6/model_server.py
+```
+
+2. launch web server:
+
+```shell
+# Make sure Node and PNPM is installed.
+cd web_demos/minicpm-o_2.6/web_server
+pnpm install  # install requirements
+
+pnpm run dev  # start server
 ```
 
 
@@ -1284,21 +1859,19 @@ pip install -r requirements.txt
 
 | Model           | Device | Memory    | &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; Description       | Download |
 |:-----------|:--:|:-----------:|:-------------------|:---------------:|
-| MiniCPM-V 2.6| GPU | 17 GB  | The latest version, achieving state-of-the-art end-side performance for single image, multi-image and video understanding.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6) |
+| MiniCPM-o 2.6| GPU | 18 GB  | The latest version, achieving GPT-4o level performance for vision, speech and multimodal live streaming on end-side devices.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-o-2_6) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6) |
+| MiniCPM-o 2.6 gguf | CPU | 8 GB  | The gguf version, lower memory usage and faster inference.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-o-2_6-gguf) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6-gguf) |
+| MiniCPM-o 2.6 int4 | GPU | 9 GB  | The int4 quantized version, lower GPU memory usage.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-o-2_6-int4) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6-int4) |
+| MiniCPM-V 2.6| GPU | 17 GB  | Strong end-side multimodal performance for single image, multi-image and video understanding.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6) |
 | MiniCPM-V 2.6 gguf | CPU | 6 GB  | The gguf version, lower memory usage and faster inference.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-gguf) |
 | MiniCPM-V 2.6 int4 | GPU | 7 GB  | The int4 quantized version, lower GPU memory usage.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6-int4) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-int4) |
-| MiniCPM-Llama3-V 2.5 | GPU | 19 GB | Strong end-side multimodal performance.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5) |
-| MiniCPM-Llama3-V 2.5 gguf | CPU  | 6 GB | The gguf version, lower memory usage and faster inference.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) &nbsp;&nbsp;[<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-gguf) |
-| MiniCPM-Llama3-V 2.5 int4 | GPU | 8 GB | The int4 quantized version, lower GPU memory usage. |  [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-int4/) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-int4) |
-| MiniCPM-V 2.0 | GPU | 8 GB | Light version, balance the performance the computation cost.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2) |
-| MiniCPM-V 1.0 | GPU | 7 GB | Lightest version, achieving the fastest inference. |   [🤗](https://huggingface.co/openbmb/MiniCPM-V) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V) |
 
 ### Multi-turn Conversation
 
 Please refer to the following codes to run.
 
 <div align="center">
-<img src="assets/airplane.jpeg" width="500px">
+<img src="assets/minicpmo2_6/show_demo.jpg" width="500px">
 </div>
 
 
@@ -1307,33 +1880,30 @@ import torch
 from PIL import Image
 from transformers import AutoModel, AutoTokenizer
 
-torch.manual_seed(0)
+torch.manual_seed(100)
 
-model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
+model = AutoModel.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True,
     attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
 model = model.eval().cuda()
-tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True)
 
-image = Image.open('./assets/airplane.jpeg').convert('RGB')
+image = Image.open('./assets/minicpmo2_6/show_demo.jpg').convert('RGB')
 
 # First round chat 
-question = "Tell me the model of this aircraft."
+question = "What is the landform in the picture?"
 msgs = [{'role': 'user', 'content': [image, question]}]
 
 answer = model.chat(
-    image=None,
     msgs=msgs,
     tokenizer=tokenizer
 )
 print(answer)
 
-# Second round chat 
-# pass history context of multi-turn conversation
+# Second round chat, pass history context of multi-turn conversation
 msgs.append({"role": "assistant", "content": [answer]})
-msgs.append({"role": "user", "content": ["Introduce something about Airbus A380."]})
+msgs.append({"role": "user", "content": ["What should I pay attention to when traveling here?"]})
 
 answer = model.chat(
-    image=None,
     msgs=msgs,
     tokenizer=tokenizer
 )
@@ -1343,24 +1913,24 @@ print(answer)
 You will get the following output:
 
 ```
-"The aircraft in the image is an Airbus A380, which can be identified by its large size, double-deck structure, and the distinctive shape of its wings and engines. The A380 is a wide-body aircraft known for being the world's largest passenger airliner, designed for long-haul flights. It has four engines, which are characteristic of large commercial aircraft. The registration number on the aircraft can also provide specific information about the model if looked up in an aviation database."
+"The landform in the picture is a mountain range. The mountains appear to be karst formations, characterized by their steep, rugged peaks and smooth, rounded shapes. These types of mountains are often found in regions with limestone bedrock and are shaped by processes such as erosion and weathering. The reflection of the mountains in the water adds to the scenic beauty of the landscape."
 
-"The Airbus A380 is a double-deck, wide-body, four-engine jet airliner made by Airbus. It is the world's largest passenger airliner and is known for its long-haul capabilities. The aircraft was developed to improve efficiency and comfort for passengers traveling over long distances. It has two full-length passenger decks, which can accommodate more passengers than a typical single-aisle airplane. The A380 has been operated by airlines such as Lufthansa, Singapore Airlines, and Emirates, among others. It is widely recognized for its unique design and significant impact on the aviation industry."
+"When traveling to this scenic location, it's important to pay attention to the weather conditions, as the area appears to be prone to fog and mist, especially during sunrise or sunset. Additionally, ensure you have proper footwear for navigating the potentially slippery terrain around the water. Lastly, respect the natural environment by not disturbing the local flora and fauna."
 ```
 
-#### Chat with multiple images
+#### Chat with Multiple Images
 <details>
-<summary> Click to view Python code running MiniCPM-V 2.6 with multiple images input. </summary>
+<summary> Click to view Python code running MiniCPM-o 2.6 with multiple images input. </summary>
   
 ```python
 import torch
 from PIL import Image
 from transformers import AutoModel, AutoTokenizer
 
-model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
+model = AutoModel.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True,
     attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
 model = model.eval().cuda()
-tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True)
 
 image1 = Image.open('image1.jpg').convert('RGB')
 image2 = Image.open('image2.jpg').convert('RGB')
@@ -1369,7 +1939,6 @@ question = 'Compare image 1 and image 2, tell me about the differences between i
 msgs = [{'role': 'user', 'content': [image1, image2, question]}]
 
 answer = model.chat(
-    image=None,
     msgs=msgs,
     tokenizer=tokenizer
 )
@@ -1377,19 +1946,19 @@ print(answer)
 ```
 </details>
 
-#### In-context few-shot learning
+#### In-context Few-shot Learning
 <details>
-<summary> Click to view Python code running MiniCPM-V 2.6 with few-shot input. </summary>
+<summary> Click to view Python code running MiniCPM-o 2.6 with few-shot input. </summary>
 
 ```python
 import torch
 from PIL import Image
 from transformers import AutoModel, AutoTokenizer
 
-model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
+model = AutoModel.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True,
     attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
 model = model.eval().cuda()
-tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True)
 
 question = "production date" 
 image1 = Image.open('example1.jpg').convert('RGB')
@@ -1405,7 +1974,6 @@ msgs = [
 ]
 
 answer = model.chat(
-    image=None,
     msgs=msgs,
     tokenizer=tokenizer
 )
@@ -1413,9 +1981,9 @@ print(answer)
 ```
 </details>
 
-#### Chat with video
+#### Chat with Video
 <details>
-<summary> Click to view Python code running MiniCPM-V 2.6 with video input. </summary>
+<summary> Click to view Python code running MiniCPM-o 2.6 with video input. </summary>
 
 ```python
 import torch
@@ -1423,10 +1991,10 @@ from PIL import Image
 from transformers import AutoModel, AutoTokenizer
 from decord import VideoReader, cpu    # pip install decord
 
-model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
+model = AutoModel.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True,
     attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
 model = model.eval().cuda()
-tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True)
 
 MAX_NUM_FRAMES=64 # if cuda OOM set a smaller number
 
@@ -1459,7 +2027,6 @@ params["use_image_id"] = False
 params["max_slice_nums"] = 2 # use 1 if cuda OOM and video resolution > 448*448
 
 answer = model.chat(
-    image=None,
     msgs=msgs,
     tokenizer=tokenizer,
     **params
@@ -1469,6 +2036,295 @@ print(answer)
 </details>
 
 
+#### Speech Conversation
+<details> <summary> Model initialization </summary>
+
+```python
+import torch
+import librosa
+from transformers import AutoModel, AutoTokenizer
+
+model = AutoModel.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True,
+    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
+model = model.eval().cuda()
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True)
+
+model.init_tts()
+model.tts.float()
+```
+
+</details>
+
+##### Mimick
+
+<details> <summary>Click here to experience the capability of end-to-end audio understanding and generation. </summary>
+
+`Mimick` task reflects a model's end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model's foundational capability in end-to-end speech modeling.
+
+```python
+mimick_prompt = "Please repeat each user's speech, including voice style and speech content."
+audio_input, _ = librosa.load('xxx.wav', sr=16000, mono=True)
+msgs = [{'role': 'user', 'content': [mimick_prompt,audio_input]}]
+res = model.chat(
+    msgs=msgs,
+    tokenizer=tokenizer,
+    sampling=True,
+    max_new_tokens=128,
+    use_tts_template=True,
+    temperature=0.3,
+    generate_audio=True,
+    output_audio_path='output.wav', # save the tts result to output_audio_path
+)
+```
+
+</details>
+
+##### General Speech Conversation with Configurable Voices
+<details> <summary>Click to view the Python code for enabling MiniCPM-o 2.6 to interact with you in a specified voice.</summary>
+
+```python
+ref_audio, _ = librosa.load('./assets/voice_01.wav', sr=16000, mono=True) # load the reference audio
+
+# Audio RolePlay:  # With this mode, model will role-play the character based on the audio prompt.
+sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='audio_roleplay', language='en')
+user_question = {'role': 'user', 'content': [librosa.load('xxx.wav', sr=16000, mono=True)[0]]}
+
+# Audio Assistant: # With this mode, model will speak with the voice in ref_audio as a AI assistant.
+# sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='audio_assistant', language='en') 
+# user_question = {'role': 'user', 'content': [librosa.load('xxx.wav', sr=16000, mono=True)[0]]} # Try to ask something!
+```
+```python
+msgs = [sys_prompt, user_question]
+res = model.chat(
+    msgs=msgs,
+    tokenizer=tokenizer,
+    sampling=True,
+    max_new_tokens=128,
+    use_tts_template=True,
+    generate_audio=True,
+    temperature=0.3,
+    output_audio_path='result.wav',
+)
+
+# round two
+history = msgs.append({'role': 'assistant', 'content': res})
+user_question = {'role': 'user', 'content': [librosa.load('xxx.wav', sr=16000, mono=True)[0]]}
+msgs = history.append(user_question)
+res = model.chat(
+    msgs=msgs,
+    tokenizer=tokenizer,
+    sampling=True,
+    max_new_tokens=128,
+    use_tts_template=True,
+    generate_audio=True,
+    temperature=0.3,
+    output_audio_path='result_round_2.wav',
+)
+print(res)
+```
+
+</details>
+
+##### Addressing Various Audio Tasks
+<details>
+<summary> Click to show Python code running MiniCPM-o 2.6 with specific audioQA task. </summary>
+
+```python
+'''
+Audio Understanding Task Prompt:
+Speech:
+    ASR with ZH(same as AST en2zh): 请仔细听这段音频片段，并将其内容逐字记录。
+    ASR with EN(same as AST zh2en): Please listen to the audio snippet carefully and transcribe the content.
+    Speaker Analysis: Based on the speaker's content, speculate on their gender, condition, age range, and health status.
+General Audio:
+    Audio Caption: Summarize the main content of the audio.
+    Sound Scene Tagging: Utilize one keyword to convey the audio's content or the associated scene.
+'''
+task_prompt = "\n"
+audio_input, _ = librosa.load('xxx.wav', sr=16000, mono=True)
+
+msgs = [{'role': 'user', 'content': [task_prompt,audio_input]}]
+
+res = model.chat(
+    msgs=msgs,
+    tokenizer=tokenizer,
+    sampling=True,
+    max_new_tokens=128,
+    use_tts_template=True,
+    generate_audio=True,
+    temperature=0.3,
+    output_audio_path='result.wav',
+)
+print(res)
+```
+```python
+'''
+Speech Generation Task Prompt:
+    Human Instruction-to-Speech: see https://voxinstruct.github.io/VoxInstruct/
+    Example:
+        # 在新闻中，一个年轻男性兴致勃勃地说：“祝福亲爱的祖国母亲美丽富强！”他用低音调和低音量，慢慢地说出了这句话。
+        # Delighting in a surprised tone, an adult male with low pitch and low volume comments:"One even gave my little dog a biscuit" This dialogue takes place at a leisurely pace, delivering a sense of excitement and surprise in the context. 
+
+    Voice Cloning or Voice Creation: With this mode, model will act like a TTS model. 
+'''
+# Human Instruction-to-Speech:
+task_prompt = '' #Try to make some Human Instruction-to-Speech prompt
+msgs = [{'role': 'user', 'content': [task_prompt]}] # you can try to use the same audio question. (Voice Creation)
+
+# Voice Cloning mode: With this mode, model will act like a TTS model. 
+# sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='voice_cloning', language='en')
+# text_prompt = f"Please read the text below."
+# user_question = {'role': 'user', 'content': [text_prompt, "content that you want to read"]} # using same voice in sys_prompt to read the text. (Voice Cloning)
+# user_question = {'role': 'user', 'content': [text_prompt, librosa.load('xxx.wav', sr=16000, mono=True)[0]]} # using same voice in sys_prompt to read 'xxx.wav'. (Voice Conversion)
+
+msgs = [sys_prompt, user_question]
+res = model.chat(
+    msgs=msgs,
+    tokenizer=tokenizer,
+    sampling=True,
+    max_new_tokens=128,
+    use_tts_template=True,
+    generate_audio=True,
+    temperature=0.3,
+    output_audio_path='result.wav',
+)
+
+
+```
+
+</details>
+
+#### Multimodal Live Streaming
+<details>
+<summary> Click to view Python code running MiniCPM-o 2.6 with chat inference. </summary>
+
+```python
+import math
+import numpy as np
+from PIL import Image
+from moviepy.editor import VideoFileClip
+import tempfile
+import librosa
+import soundfile as sf
+
+## make sure The model has been initialized and `model.init_tts()` has been executed
+
+def get_video_chunk_content(video_path, flatten=True):
+    video = VideoFileClip(video_path)
+    print('video_duration:', video.duration)
+    
+    with tempfile.NamedTemporaryFile(suffix=".wav", delete=True) as temp_audio_file:
+        temp_audio_file_path = temp_audio_file.name
+        video.audio.write_audiofile(temp_audio_file_path, codec="pcm_s16le", fps=16000)
+        audio_np, sr = librosa.load(temp_audio_file_path, sr=16000, mono=True)
+    num_units = math.ceil(video.duration)
+    
+    # 1 frame + 1s audio chunk
+    contents= []
+    for i in range(num_units):
+        frame = video.get_frame(i+1)
+        image = Image.fromarray((frame).astype(np.uint8))
+        audio = audio_np[sr*i:sr*(i+1)]
+        if flatten:
+            contents.extend(["<unit>", image, audio])
+        else:
+            contents.append(["<unit>", image, audio])
+    
+    return contents
+
+video_path="/path/to/video"
+sys_msg = model.get_sys_prompt(mode='omni', language='en')
+# if use voice clone prompt, please set ref_audio
+# ref_audio_path = '/path/to/ref_audio'
+# ref_audio, _ = librosa.load(ref_audio_path, sr=16000, mono=True)
+# sys_msg = model.get_sys_prompt(ref_audio=ref_audio, mode='omni', language='en')
+
+contents = get_video_chunk_content(video_path)
+msg = {"role":"user", "content": contents}
+msgs = [sys_msg, msg]
+
+# please set generate_audio=True and output_audio_path to save the tts result
+generate_audio = True
+output_audio_path = 'output.wav'
+
+res = model.chat(
+    msgs=msgs,
+    tokenizer=tokenizer,
+    sampling=True,
+    temperature=0.5,
+    max_new_tokens=4096,
+    omni_input=True, # please set omni_input=True when omni inference
+    use_tts_template=True,
+    generate_audio=generate_audio,
+    output_audio_path=output_audio_path,
+    max_slice_nums=1,
+    use_image_id=False,
+    return_dict=True
+)
+print(res)
+```
+</details>
+
+<details>
+<summary> Click to view Python code running MiniCPM-o 2.6 with streaming inference. </summary>
+
+Note: The streaming inference has a slight performance degradation because the audio encoding is not global.
+```python
+# a new conversation need reset session first, it will reset the kv-cache
+model.reset_session()
+
+contents = get_video_chunk_content(video_path, flatten=False)
+session_id = '123'
+generate_audio = True
+
+# 1. prefill system prompt
+res = model.streaming_prefill(
+    session_id=session_id,
+    msgs=[sys_msg], 
+    tokenizer=tokenizer
+)
+
+# 2. prefill video/audio chunks
+for content in contents:
+    msgs = [{"role":"user", "content": content}]
+    res = model.streaming_prefill(
+        session_id=session_id,
+        msgs=msgs, 
+        tokenizer=tokenizer
+    )
+
+# 3. generate
+res = model.streaming_generate(
+    session_id=session_id,
+    tokenizer=tokenizer,
+    temperature=0.5,
+    generate_audio=generate_audio
+)
+
+audios = []
+text = ""
+
+if generate_audio:
+    for r in res:
+        audio_wav = r.audio_wav
+        sampling_rate = r.sampling_rate
+        txt = r.text
+
+        audios.append(audio_wav)
+        text += txt
+        
+    res = np.concatenate(audios)
+    sf.write("output.wav", res, samplerate=sampling_rate)
+    print("text:", text)
+    print("audio saved to output.wav")
+else:
+    for r in res:
+        text += r['text']
+    print("text:", text)
+```
+
+</details>
+
 ### Inference on Multiple GPUs
 You can run MiniCPM-Llama3-V 2.5 on multiple low VRAM GPUs (12 GB or 16 GB) by distributing the model's layers across multiple GPUs. Please refer to this [tutorial](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md) for detailed instructions on how to load the model and inference using multiple low VRAM GPUs.
 
@@ -1511,108 +2367,129 @@ PYTORCH_ENABLE_MPS_FALLBACK=1 python test.py
 ### Deployment on Mobile Phone
 MiniCPM-V 2.0 can be deployed on mobile phones with Android operating systems. 🚀 Click [MiniCPM-V 2.0](https://github.com/OpenBMB/mlc-MiniCPM) to install apk.
 
-### Inference with llama.cpp
-MiniCPM-V 2.6 can run with llama.cpp now! See [our fork of llama.cpp](https://github.com/OpenBMB/llama.cpp/tree/minicpmv-main/examples/llava/README-minicpmv2.6.md) for more detail. This implementation supports smooth inference of 16~18 token/s on iPad (test environment：iPad Pro + M4).
+### Efficient Inference with llama.cpp, ollama, vLLM
 
-### Inference with ollama
-MiniCPM-V 2.6 can run with ollama now! See [our fork of ollama](https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/examples/minicpm-v2.6/README.md) for more detail. This implementation supports smooth inference of 16~18 token/s on iPad (test environment：iPad Pro + M4).
+See [our fork of llama.cpp](https://github.com/OpenBMB/llama.cpp/tree/minicpmv-main/examples/llava/README-minicpmv2.6.md) for more detail. This implementation supports smooth inference of 16~18 token/s on iPad (test environment：iPad Pro + M4).
+
+See [our fork of ollama](https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/examples/minicpm-v2.6/README.md) for more detail. This implementation supports smooth inference of 16~18 token/s on iPad (test environment：iPad Pro + M4).
 
-### Inference with vLLM
 
 <details>
-<summary> vLLM now officially supports MiniCPM-V 2.6, MiniCPM-Llama3-V 2.5 and MiniCPM-V 2.0, Click to see. </summary>
+<summary> vLLM now officially supports MiniCPM-V 2.6, MiniCPM-Llama3-V 2.5 and MiniCPM-V 2.0. And you can use our fork to run MiniCPM-o 2.6 for now. Click to see. </summary>
 
-1. Install vLLM(>=0.5.4):
-```shell
-pip install vllm
-```
-2. Install timm: (optional, MiniCPM-V 2.0 need timm)
-```shell
-pip install timm==0.9.10
-```
-3. Run the example(for image):
-```python
-from transformers import AutoTokenizer
-from PIL import Image
-from vllm import LLM, SamplingParams
+1. For MiniCPM-o 2.6
+   1. Clone our fork of vLLM:
+   ```shell
+   git clone https://github.com/OpenBMB/vllm.git
+   cd vllm
+   git checkout minicpmo
+   ```
+   2. Install vLLM from source:
+   ```shell
+   VLLM_USE_PRECOMPILED=1 pip install --editable . 
+   ```
+   3. Run MiniCPM-o 2.6 in the same way as the previous models (shown in the following example).
 
-MODEL_NAME = "openbmb/MiniCPM-V-2_6"
-# Also available for previous models
-# MODEL_NAME = "openbmb/MiniCPM-Llama3-V-2_5"
-# MODEL_NAME = "HwwwH/MiniCPM-V-2"
+2. For previous MiniCPM-V models
+    1. Install vLLM(>=0.5.4):
+    ```shell
+    pip install vllm
+    ```
+    2. Install timm: (optional, MiniCPM-V 2.0 need timm)
+    ```shell
+    pip install timm==0.9.10
+    ```
+    3. Run the example(for image):
+    ```python
+    from transformers import AutoTokenizer
+    from PIL import Image
+    from vllm import LLM, SamplingParams
 
-image = Image.open("xxx.png").convert("RGB")
-tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
-llm = LLM(
-    model=MODEL_NAME,
-    trust_remote_code=True,
-    gpu_memory_utilization=1,
-    max_model_len=2048
-)
+    MODEL_NAME = "openbmb/MiniCPM-V-2_6"
+    # MODEL_NAME = "openbmb/MiniCPM-O-2_6"
+    # Also available for previous models
+    # MODEL_NAME = "openbmb/MiniCPM-Llama3-V-2_5"
+    # MODEL_NAME = "HwwwH/MiniCPM-V-2"
 
-messages = [{
-    "role":
-    "user",
-    "content":
-    # Number of images
-    "(<image>./</image>)" + \
-    "\nWhat is the content of this image?" 
-}]
-prompt = tokenizer.apply_chat_template(
-    messages,
-    tokenize=False,
-    add_generation_prompt=True
-)
+    image = Image.open("xxx.png").convert("RGB")
+    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
+    llm = LLM(
+        model=MODEL_NAME,
+        trust_remote_code=True,
+        gpu_memory_utilization=1,
+        max_model_len=2048
+    )
 
-# Single Inference
-inputs = {
-    "prompt": prompt,
-    "multi_modal_data": {
-        "image": image
-        # Multi images, the number of images should be equal to that of `(<image>./</image>)`
-        # "image": [image, image] 
-    },
-}
-# Batch Inference
-# inputs = [{
-#     "prompt": prompt,
-#     "multi_modal_data": {
-#         "image": image
-#     },
-# } for _ in 2]
+    messages = [{
+        "role":
+        "user",
+        "content":
+        # Number of images
+        "(<image>./</image>)" + \
+        "\nWhat is the content of this image?" 
+    }]
+    prompt = tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True
+    )
+
+    # Single Inference
+    inputs = {
+        "prompt": prompt,
+        "multi_modal_data": {
+            "image": image
+            # Multi images, the number of images should be equal to that of `(<image>./</image>)`
+            # "image": [image, image] 
+        },
+    }
+    # Batch Inference
+    # inputs = [{
+    #     "prompt": prompt,
+    #     "multi_modal_data": {
+    #         "image": image
+    #     },
+    # } for _ in 2]
 
 
-# 2.6
-stop_tokens = ['<|im_end|>', '<|endoftext|>']
-stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]
-# 2.0
-# stop_token_ids = [tokenizer.eos_id]
-# 2.5
-# stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]
+    # 2.6
+    stop_tokens = ['<|im_end|>', '<|endoftext|>']
+    stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]
+    # 2.0
+    # stop_token_ids = [tokenizer.eos_id]
+    # 2.5
+    # stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]
 
-sampling_params = SamplingParams(
-    stop_token_ids=stop_token_ids, 
-    use_beam_search=True,
-    temperature=0, 
-    best_of=3,
-    max_tokens=1024
-)
+    sampling_params = SamplingParams(
+        stop_token_ids=stop_token_ids, 
+        use_beam_search=True,
+        temperature=0, 
+        best_of=3,
+        max_tokens=1024
+    )
 
-outputs = llm.generate(inputs, sampling_params=sampling_params)
+    outputs = llm.generate(inputs, sampling_params=sampling_params)
 
-print(outputs[0].outputs[0].text)
-```
-4. click [here](https://modelbest.feishu.cn/wiki/C2BWw4ZP0iCDy7kkCPCcX2BHnOf?from=from_copylink) if you want to use it with *video*, or get more details about `vLLM`.
-</details>
+    print(outputs[0].outputs[0].text)
+    ```
+    4. click [here](https://modelbest.feishu.cn/wiki/C2BWw4ZP0iCDy7kkCPCcX2BHnOf?from=from_copylink) if you want to use it with *video*, or get more details about `vLLM`.
+    </details>
 
 ## Fine-tuning
 
 ### Simple Fine-tuning <!-- omit in toc -->
 
-We support simple fine-tuning with Hugging Face for MiniCPM-V 2.0 and MiniCPM-Llama3-V 2.5.
+We support simple fine-tuning with Hugging Face for MiniCPM-o 2.6, MiniCPM-V 2.6, MiniCPM-Llama3-V 2.5 and MiniCPM-V 2.0.
 
 [Reference Document](./finetune/readme.md)
 
+### With LLaMA-Factory <!-- omit in toc -->
+
+We support fine-tuning MiniCPM-o-2.6 and MiniCPM-V 2.6 with the LLaMA-Factory framework. LLaMA-Factory provides a solution for flexibly customizing the fine-tuning (Lora/Full/Qlora) of 200+ LLMs without the need for coding through the built-in web UI LLaMABoard. It supports various training methods like sft/ppo/dpo/kto and advanced algorithms like Galore/BAdam/LLaMA-Pro/Pissa/LongLoRA.
+
+Best Practices: [MiniCPM-V-2.6 | MiniCPM-o-2.6](./docs/llamafactory_train.md). 
+
+
 ### With the SWIFT Framework <!-- omit in toc -->
 
 We now support MiniCPM-V series fine-tuning with the SWIFT framework. SWIFT supports training, inference, evaluation and deployment of nearly 200 LLMs and MLLMs . It supports the lightweight training solutions provided by PEFT and a complete Adapters Library including techniques such as NEFTune, LoRA+ and LLaMA-PRO.
@@ -1622,20 +2499,25 @@ Best Practices：[MiniCPM-V 1.0](https://github.com/modelscope/swift/blob/main/d
 ## FAQs
 Click here to view the [FAQs](./docs/faqs.md)
 
+## Limitations
+As an experimental trial, we find MiniCPM-o 2.6 has notable limitations worth further investigation and improvement.
+- **Unstable speech output.** The speech generation can be flawed with noisy background and unmeaningful sound.
+- **Repeated response.** The model tends to repeat its response when encounting similar consecutive user queries.
+- **High-latency on Web Demo.** Users may experience unusual high-latency when using web demo hosted on overseas servers. We recommend deploying the demo locally or with good network connections.
+
 ## Model License <!-- omit in toc -->
 
 * This repository is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License. 
 
-* The usage of MiniCPM-V model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
+* The usage of MiniCPM-o/V model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
 
-* The models and weights of MiniCPM are completely free for academic research. after filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, are also available for free commercial use.
-  
+* The models and weights of MiniCPM are completely free for academic research. after filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, are also available for free commercial use.  
 
 ## Statement <!-- omit in toc -->
 
-As LMMs, MiniCPM-V models (including OmniLMM) generate contents by learning a large amount of multimodal corpora, but they cannot comprehend, express personal opinions or make value judgement. Anything generated by MiniCPM-V models does not represent the views and positions of the model developers
+As MLLMs, MiniCPM-o/V models generate contents by learning a large amount of multimodal corpora, but they cannot comprehend, express personal opinions or make value judgement. Anything generated by MiniCPM-o/V models does not represent the views and positions of the model developers
 
-We will not be liable for any problems arising from the use of MiniCPM-V models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model.
+We will not be liable for any problems arising from the use of MiniCPM-o/V models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model.
 
 
 ## Institutions  <!-- omit in toc -->
@@ -1644,7 +2526,6 @@ This project is developed by the following institutions:
 
 - <img src="assets/thunlp.png" width="28px"> [THUNLP](https://nlp.csai.tsinghua.edu.cn/)
 - <img src="assets/modelbest.png" width="28px"> [ModelBest](https://modelbest.cn/)
-- <img src="assets/zhihu.webp" width="28px"> [Zhihu](https://www.zhihu.com/ )
 
 ## 🌟 Star History <!-- omit in toc -->
 
@@ -1676,14 +2557,14 @@ This project is developed by the following institutions:
 
 ## Key Techniques and Other Multimodal Projects <!-- omit in toc -->
 
-👏 Welcome to explore key techniques of MiniCPM-V and other multimodal projects of our team:
+👏 Welcome to explore key techniques of MiniCPM-o/V and other multimodal projects of our team:
 
 [VisCPM](https://github.com/OpenBMB/VisCPM/tree/main) | [RLHF-V](https://github.com/RLHF-V/RLHF-V) | [LLaVA-UHD](https://github.com/thunlp/LLaVA-UHD) | [RLAIF-V](https://github.com/RLHF-V/RLAIF-V)
 
 
 ## Citation <!-- omit in toc -->
 
-If you find our model/code/paper helpful, please consider cite our papers 📝 and star us ⭐️！
+If you find our model/code/paper helpful, please consider citing our papers 📝 and staring us ⭐️！
 
 ```bib
 @article{yao2024minicpm,
diff --git a/README_en.md b/README_en.md
deleted file mode 100644
index e8e6c29..0000000
--- a/README_en.md
+++ /dev/null
@@ -1,1695 +0,0 @@
-<div align="center">
-
-<img src="./assets/minicpmv.png" width="300em" ></img> 
-
-**A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone**
-
-  <strong>[中文](./README_zh.md) |
-  English</strong>
-
-Join our <a href="docs/wechat.md" target="_blank"> 💬 WeChat</a> | View  MiniCPM-V <a href="docs/best_practice_summary.md" target="_blank"> 📖 best practices</a>
-
-
-<p align="center">
-  MiniCPM-V 2.6 <a href="https://huggingface.co/openbmb/MiniCPM-V-2_6">🤗</a> <a href="http://120.92.209.146:8887/">🤖</a> | MiniCPM-Llama3-V 2.5  <a href="https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/">🤗</a> <a href="https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5">🤖</a> |
-  <a href=https://arxiv.org/abs/2408.01800>MiniCPM-Llama3-V 2.5 Technical Report</a> 
-</p>
-
-</div>
-
-
-**MiniCPM-V** is a series of end-side multimodal LLMs (MLLMs) designed for vision-language understanding. The models take image, video and text as inputs and provide high-quality text outputs. Since February 2024, we have released 5 versions of the model, aiming to achieve **strong performance and efficient deployment**. The most notable models in this series currently include:
-
-- **MiniCPM-V 2.6**: 🔥🔥🔥 The latest and most capable model in the MiniCPM-V series. With a total of 8B parameters, the model **surpasses GPT-4V in single image, multi-image and video understanding**. It outperforms **GPT-4o mini, Gemini 1.5 Pro and Claude 3.5 Sonnet** in single image understanding, and advances MiniCPM-Llama3-V 2.5's features such as strong OCR capability, trustworthy behavior, multilingual support, and end-side deployment. Due to its superior token density, MiniCPM-V 2.6 can for the first time support real-time video understanding on end-side devices such as iPad.
-
-- **MiniCPM-V 2.0**: The lightest model in the MiniCPM-V series. With 2B parameters, it surpasses larger models such as Yi-VL 34B, CogVLM-Chat 17B, and Qwen-VL-Chat 10B in overall performance. It can accept image inputs of any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving comparable performance with Gemini Pro in understanding scene-text and matches GPT-4V in low hallucination rates.
-
-
-## News <!-- omit in toc -->
-
-#### 📌 Pinned
-
-* [2024.08.17] 🚀🚀🚀 MiniCPM-V 2.6 is now fully supported by [official](https://github.com/ggerganov/llama.cpp) llama.cpp! GGUF models of various sizes are available [here](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf).
-* [2024.08.15] We now also support multi-image SFT. For more details, please refer to the [document](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune).
-* [2024.08.14] MiniCPM-V 2.6 now also supports [fine-tuning](https://github.com/modelscope/ms-swift/issues/1613) with the SWIFT framework!
-* [2024.08.10] 🚀🚀🚀 MiniCPM-Llama3-V 2.5 is now fully supported by [official](https://github.com/ggerganov/llama.cpp) llama.cpp! GGUF models of various sizes are available [here](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf).
-* [2024.08.06] 🔥🔥🔥 We open-source MiniCPM-V 2.6, which outperforms GPT-4V on single image, multi-image and video understanding. It advances popular features of MiniCPM-Llama3-V 2.5, and can support real-time video understanding on iPad. Try it now!
-* [2024.08.03] MiniCPM-Llama3-V 2.5 technical report is released! See [here](https://arxiv.org/abs/2408.01800).
-* [2024.07.19] MiniCPM-Llama3-V 2.5 supports vLLM now! See [here](#inference-with-vllm).
-* [2024.05.28] 💫 We now support LoRA fine-tuning for MiniCPM-Llama3-V 2.5, using only 2 V100 GPUs! See more statistics [here](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#model-fine-tuning-memory-usage-statistics).
-* [2024.05.23] 🔍 We've released a comprehensive comparison between Phi-3-vision-128k-instruct and MiniCPM-Llama3-V 2.5, including benchmarks evaluations, multilingual capabilities, and inference efficiency 🌟📊🌍🚀. Click [here](./docs/compare_with_phi-3_vision.md) to view more details.
-* [2024.05.23] 🔥🔥🔥 MiniCPM-V tops GitHub Trending and Hugging Face Trending! Our demo, recommended by Hugging Face Gradio’s official account, is available [here](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5). Come and try it out!
-
-<br>
-
-<details> 
-<summary>Click to view more news.</summary>
-
-* [2024.06.03] Now, you can run MiniCPM-Llama3-V 2.5 on multiple low VRAM GPUs(12 GB or 16 GB) by distributing the model's layers across multiple GPUs. For more details, Check this [link](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md).
-* [2024.05.28] 🚀🚀🚀 MiniCPM-Llama3-V 2.5 now fully supports its feature in llama.cpp and ollama! Please pull the latest code **of our provided forks** ([llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md), [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5)). GGUF models in various sizes are available [here](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/tree/main). MiniCPM-Llama3-V 2.5 series is **not supported by the official repositories yet**, and we are working hard to merge PRs. Please stay tuned!
-* [2024.05.25] MiniCPM-Llama3-V 2.5 now supports streaming outputs and customized system prompts. Try it [here](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5#usage)!
-* [2024.05.24] We release the MiniCPM-Llama3-V 2.5 [gguf](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf), which supports [llama.cpp](#inference-with-llamacpp) inference and provides a 6~8 token/s smooth decoding on mobile phones. Try it now!
-* [2024.05.20] We open-soure MiniCPM-Llama3-V 2.5, it has improved OCR capability and supports 30+ languages, representing the first end-side MLLM achieving GPT-4V level performance! We provide [efficient inference](#deployment-on-mobile-phone) and [simple fine-tuning](./finetune/readme.md). Try it now!
-* [2024.04.23] MiniCPM-V-2.0 supports vLLM now! Click [here](#inference-with-vllm) to view more details.
-* [2024.04.18] We create a HuggingFace Space to host the demo of MiniCPM-V 2.0 at [here](https://huggingface.co/spaces/openbmb/MiniCPM-V-2)!
-* [2024.04.17] MiniCPM-V-2.0 supports deploying [WebUI Demo](#webui-demo) now!
-* [2024.04.15] MiniCPM-V-2.0 now also supports [fine-tuning](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v-2最佳实践.md) with the SWIFT framework!
-* [2024.04.12] We open-source MiniCPM-V 2.0, which achieves comparable performance with Gemini Pro in understanding scene text and outperforms strong Qwen-VL-Chat 9.6B and Yi-VL 34B on <a href="https://rank.opencompass.org.cn/leaderboard-multimodal">OpenCompass</a>, a comprehensive evaluation over 11 popular benchmarks. Click <a href="https://openbmb.vercel.app/minicpm-v-2">here</a> to view the MiniCPM-V 2.0 technical blog.
-* [2024.03.14] MiniCPM-V now supports [fine-tuning](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v最佳实践.md) with the SWIFT framework. Thanks to [Jintao](https://github.com/Jintao-Huang) for the contribution！
-* [2024.03.01] MiniCPM-V now can be deployed on Mac!
-* [2024.02.01] We open-source MiniCPM-V and OmniLMM-12B, which support efficient end-side deployment and powerful multimodal capabilities correspondingly.
-</details> 
-
-
-## Contents <!-- omit in toc -->
-
-
-- [MiniCPM-V 2.6](#minicpm-v-26)
-- [MiniCPM-Llama3-V 2.5](#minicpm-llama3-v-25)
-- [MiniCPM-V 2.0](#minicpm-v-20)
-- [Chat with Our Demo on Gradio 🤗](#chat-with-our-demo-on-gradio-)
-- [Install](#install)
-- [Inference](#inference)
-  - [Model Zoo](#model-zoo)
-  - [Multi-turn Conversation](#multi-turn-conversation)
-    - [Chat with multiple images](#chat-with-multiple-images)
-    - [In-context few-shot learning](#in-context-few-shot-learning)
-    - [Chat with video](#chat-with-video)
-  - [Inference on Multiple GPUs](#inference-on-multiple-gpus)
-  - [Inference on Mac](#inference-on-mac)
-  - [Deployment on Mobile Phone](#deployment-on-mobile-phone)
-  - [Inference with llama.cpp](#inference-with-llamacpp)
-  - [Inference with ollama](#inference-with-ollama)
-  - [Inference with vLLM](#inference-with-vllm)
-- [Fine-tuning](#fine-tuning)
-- [FAQs](#faqs)
-
-
-## MiniCPM-V 2.6
-
-**MiniCPM-V 2.6** is the latest and most capable model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-Llama3-V 2.5, and introduces new features for multi-image and video understanding. Notable features of MiniCPM-V 2.6 include:
-
-- 🔥 **Leading Performance.**
-  MiniCPM-V 2.6 achieves an average score of 65.2 on the latest version of OpenCompass, a comprehensive evaluation over 8 popular benchmarks. **With only 8B parameters, it surpasses widely used proprietary models like GPT-4o mini, GPT-4V, Gemini 1.5 Pro, and Claude 3.5 Sonnet** for single image understanding.
-
-- 🖼️ **Multi Image Understanding and In-context Learning.** MiniCPM-V 2.6 can also perform **conversation and reasoning over multiple images**. It achieves **state-of-the-art performance** on popular multi-image benchmarks such as Mantis-Eval, BLINK, Mathverse mv and Sciverse mv, and also shows promising in-context learning capability.
-
-- 🎬 **Video Understanding.** MiniCPM-V 2.6 can also **accept video inputs**, performing conversation and providing dense captions for spatial-temporal information. It outperforms **GPT-4V, Claude 3.5 Sonnet and LLaVA-NeXT-Video-34B** on Video-MME with/without subtitles.
-
-- 💪 **Strong OCR Capability and Others.**
-  MiniCPM-V 2.6 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344). It achieves **state-of-the-art performance on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V, and Gemini 1.5 Pro**.
-  Based on the the latest [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) and [VisCPM](https://github.com/OpenBMB/VisCPM) techniques, it features **trustworthy behaviors**, with significantly lower hallucination rates than GPT-4o and GPT-4V on Object HalBench, and supports **multilingual capabilities** on English, Chinese, German, French, Italian, Korean, etc.
-
-
-- 🚀 **Superior Efficiency.**
-  In addition to its friendly size, MiniCPM-V 2.6 also shows **state-of-the-art token density** (i.e., number of pixels encoded into each visual token). **It produces only 640 tokens when processing a 1.8M pixel image, which is 75% fewer than most models**. This directly improves the inference speed, first-token latency, memory usage, and power consumption. As a result, MiniCPM-V 2.6 can efficiently support **real-time video understanding** on end-side devices such as iPad.
-
--  💫  **Easy Usage.**
-MiniCPM-V 2.6 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpmv-main/examples/llava/README-minicpmv2.6.md) and [ollama](https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/examples/minicpm-v2.6/README.md) support for efficient CPU inference on local devices, (2) [int4](https://huggingface.co/openbmb/MiniCPM-V-2_6-int4) and [GGUF](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) format quantized models in 16 sizes, (3) [vLLM](#inference-with-vllm) support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks, (5) quick local WebUI demo setup with [Gradio](#chat-with-our-demo-on-gradio), and (6) online web [demo](http://120.92.209.146:8887/).
-
-### Evaluation  <!-- omit in toc -->
-<div align="center">
-    <img src=assets/radar_final.png width=66% />
-</div>
-
-<details>
-<summary>Click to view single image results on OpenCompass, MME, MMVet, OCRBench, MMMU, MathVista, MMB, AI2D, TextVQA, DocVQA, HallusionBench, Object HalBench. </summary>
-<div align="center">
-
-<table style="margin: 0px auto;">
-    <thead>
-        <tr>
-            <th align="left">Model</th>
-            <th>Size</th>
-            <th>Token Density<sup>+</sup></th>
-            <th>OpenCompass</th>
-            <th>MME</th>
-            <th>MMVet</th>
-            <th>OCRBench</th>
-            <th>MMMU val</th>
-            <th>MathVista mini</th>
-            <th>MMB1.1 test</th>
-            <th>AI2D</th>
-            <th>TextVQA val</th>
-            <th>DocVQA test</th>
-            <th>HallusionBench</th>
-            <th>Object HalBench</th>
-        </tr>
-    </thead>
-    <tbody align="center">
-        <tr>
-            <td colspan="15" align="left"><strong>Proprietary</strong></td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">GPT-4o</td>
-            <td>-</td>
-            <td>1088</td>
-            <td>69.9</td>
-            <td>2328.7</td>
-            <td>69.1</td>
-            <td>736</td>
-            <td>69.2</td>
-            <td>61.3</td>
-            <td>82.2</td>
-            <td>84.6</td>
-            <td>-</td>
-            <td>92.8</td>
-            <td>55.0</td>
-            <td>17.6</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Claude 3.5 Sonnet</td>
-            <td>-</td>
-            <td>750</td>
-            <td>67.9</td>
-            <td>1920.0</td>
-            <td>66.0</td>
-            <td>788</td>
-            <td>65.9</td>
-            <td>61.6</td>
-            <td>78.5</td>
-            <td>80.2</td>
-            <td>-</td>
-            <td>95.2</td>
-            <td>49.9</td>
-            <td>13.8</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Gemini 1.5 Pro</td>
-            <td>-</td>
-            <td>-</td>
-            <td>64.4</td>
-            <td>2110.6</td>
-            <td>64.0</td>
-            <td>754</td>
-            <td>60.6</td>
-            <td>57.7</td>
-            <td>73.9</td>
-            <td>79.1</td>
-            <td>73.5</td>
-            <td>86.5</td>
-            <td>45.6</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">GPT-4o mini</td>
-            <td>-</td>
-            <td>1088</td>
-            <td>64.1</td>
-            <td>2003.4</td>
-            <td>66.9</td>
-            <td>785</td>
-            <td>60.0</td>
-            <td>52.4</td>
-            <td>76.0</td>
-            <td>77.8</td>
-            <td>-</td>
-            <td>-</td>
-            <td>46.1</td>
-            <td>12.4</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">GPT-4V</td>
-            <td>-</td>
-            <td>1088</td>
-            <td>63.5</td>
-            <td>2070.2</td>
-            <td>67.5</td>
-            <td>656</td>
-            <td>61.7</td>
-            <td>54.7</td>
-            <td>79.8</td>
-            <td>78.6</td>
-            <td>78.0</td>
-            <td>87.2</td>
-            <td>43.9</td>
-            <td>14.2</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Step-1V</td>
-            <td>-</td>
-            <td>-</td>
-            <td>59.5</td>
-            <td>2206.4</td>
-            <td>63.3</td>
-            <td>625</td>
-            <td>49.9</td>
-            <td>44.8</td>
-            <td>78.0</td>
-            <td>79.2</td>
-            <td>71.6</td>
-            <td>-</td>
-            <td>48.4</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Qwen-VL-Max</td>
-            <td>-</td>
-            <td>784</td>
-            <td>58.3</td>
-            <td>2281.7</td>
-            <td>61.8</td>
-            <td>684</td>
-            <td>52.0</td>
-            <td>43.4</td>
-            <td>74.6</td>
-            <td>75.7</td>
-            <td>79.5</td>
-            <td>93.1</td>
-            <td>41.2</td>
-            <td>13.4</td>
-        </tr>
-        <tr>
-            <td colspan="15" align="left"><strong>Open-source</strong></td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">LLaVA-NeXT-Yi-34B</td>
-            <td>34B</td>
-            <td>157</td>
-            <td>55.0</td>
-            <td>2006.5</td>
-            <td>50.7</td>
-            <td>574</td>
-            <td>48.8</td>
-            <td>40.4</td>
-            <td>77.8</td>
-            <td>78.9</td>
-            <td>69.3</td>
-            <td>-</td>
-            <td>34.8</td>
-            <td>12.6</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Mini-Gemini-HD-34B</td>
-            <td>34B</td>
-            <td>157</td>
-            <td>-</td>
-            <td>2141.0</td>
-            <td>59.3</td>
-            <td>518</td>
-            <td>48.0</td>
-            <td>43.3</td>
-            <td>-</td>
-            <td>80.5</td>
-            <td>74.1</td>
-            <td>78.9</td>
-            <td>-</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Cambrian-34B</td>
-            <td>34B</td>
-            <td>1820</td>
-            <td>58.3</td>
-            <td>2049.9</td>
-            <td>53.2</td>
-            <td>591</td>
-            <td>50.4</td>
-            <td>50.3</td>
-            <td>77.8</td>
-            <td>79.5</td>
-            <td>76.7</td>
-            <td>75.5</td>
-            <td>41.6</td>
-            <td>14.7</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">GLM-4V-9B</td>
-            <td>13B</td>
-            <td>784</td>
-            <td>59.1</td>
-            <td>2018.8</td>
-            <td>58.0</td>
-            <td>776</td>
-            <td>46.9</td>
-            <td>51.1</td>
-            <td>67.9</td>
-            <td>71.2</td>
-            <td>-</td>
-            <td>-</td>
-            <td>45.0</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">InternVL2-8B</td>
-            <td>8B</td>
-            <td>706</td>
-            <td>64.1</td>
-            <td>2215.1</td>
-            <td>54.3</td>
-            <td>794</td>
-            <td><strong>51.2</strong></td>
-            <td>58.3</td>
-            <td><strong>79.4</strong></td>
-            <td><strong>83.6</strong></td>
-            <td>77.4</td>
-            <td><strong>91.6</strong></td>
-            <td>45.0</td>
-            <td>21.3</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">MiniCPM-Llama-V 2.5</td>
-            <td>8B</td>
-            <td>1882</td>
-            <td>58.8</td>
-            <td>2024.6</td>
-            <td>52.8</td>
-            <td>725</td>
-            <td>45.8</td>
-            <td>54.3</td>
-            <td>72.0</td>
-            <td>78.4</td>
-            <td>76.6</td>
-            <td>84.8</td>
-            <td>42.4</td>
-            <td>10.3</td>
-        </tr>
-        <tr style="background-color: #e6f2ff;">
-            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
-            <td>8B</td>
-            <td><strong>2822</strong></td>
-            <td><strong>65.2</strong></td>
-            <td><strong>2348.4</strong>*</td>
-            <td><strong>60.0</strong></td>
-            <td><strong>852</strong>*</td>
-            <td>49.8*</td>
-            <td><strong>60.6</strong></td>
-            <td>78.0</td>
-            <td>82.1</td>
-            <td><strong>80.1<strong></td>
-            <td>90.8</td>
-            <td><strong>48.1</strong>*</td>
-            <td><strong>8.2</strong></td>
-        </tr>
-    </tbody>
-</table>
-
-</div>
-* We evaluate this benchmark using chain-of-thought prompting. Specifically, for MME, we used this technique only for the Cognition set.
-
-<sup>+</sup> Token Density: number of pixels encoded into each visual token at maximum resolution, i.e., # pixels at maximum resolution / # visual tokens.
-
-Note: For proprietary models, we calculate token density based on the image encoding charging strategy defined in the official API documentation, which provides an upper-bound estimation.
-
-</details>
-
-
-<details>
-<summary>Click to view multi-image results on Mantis Eval, BLINK, Mathverse mv, Sciverse mv, MIRB.</summary>
-<div align="center">
- 
-<table style="margin: 0px auto;">
-    <thead>
-        <tr>
-            <th align="left">Model</th>
-            <th>Size</th>
-            <th>Mantis Eval</th>
-            <th>BLINK val</th>
-            <th>Mathverse mv</th>
-            <th>Sciverse mv</th>
-            <th>MIRB</th>
-        </tr>
-    </thead>
-    <tbody align="center">
-        <tr>
-            <td colspan="7" align="left"><strong>Proprietary</strong></td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">GPT-4V</td>
-            <td>-</td>
-            <td>62.7</td>
-            <td>54.6</td>
-            <td>60.3</td>
-            <td>66.9</td>
-            <td>53.1</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">LLaVA-NeXT-Interleave-14B</td>
-            <td>14B</td>
-            <td>66.4</td>
-            <td>52.6</td>
-            <td>32.7</td>
-            <td>30.2</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td colspan="7" align="left"><strong>Open-source</strong></td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Emu2-Chat</td>
-            <td>37B</td>
-            <td>37.8</td>
-            <td>36.2</td>
-            <td>-</td>
-            <td>27.2</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">CogVLM</td>
-            <td>17B</td>
-            <td>45.2</td>
-            <td>41.1</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">VPG-C</td>
-            <td>7B</td>
-            <td>52.4</td>
-            <td>43.1</td>
-            <td>24.3</td>
-            <td>23.1</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">VILA 8B</td>
-            <td>8B</td>
-            <td>51.2</td>
-            <td>39.3</td>
-            <td>-</td>
-            <td>36.5</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">InternLM-XComposer-2.5</td>
-            <td>8B</td>
-            <td>53.1*</td>
-            <td>48.9</td>
-            <td>32.1*</td>
-            <td>-</td>
-            <td>42.5</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">InternVL2-8B</td>
-            <td>8B</td>
-            <td>59.0*</td>
-            <td>50.9</td>
-            <td>30.5*</td>
-            <td>34.4*</td>
-            <td><strong>56.9*</strong></td>
-        </tr>
-        <tr style="background-color: #e6f2ff;">
-            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
-            <td>8B</td>
-            <td><strong>69.1</strong></td>
-            <td><strong>53.0</strong></td>
-            <td><strong>84.9</strong></td>
-            <td><strong>74.9</strong></td>
-            <td>53.8</td>
-        </tr>
-    </tbody>
-</table>
-
-</div>
-* We evaluate the officially released checkpoint by ourselves.
-</details>
-
-<details>
-<summary>Click to view video results on Video-MME and Video-ChatGPT.</summary>
-<div align="center">
-<table style="margin: 0px auto;">
-    <thead>
-        <tr>
-            <th align="left">Model</th>
-            <th>Size</th>
-            <th colspan="2">Video-MME</th>
-            <th colspan="5">Video-ChatGPT</th>
-        </tr>
-        <tr>
-            <th align="left"></th>
-            <th></th>
-            <th>w/o subs</th>
-            <th>w subs</th>
-            <th>Correctness</th>
-            <th>Detail</th>
-            <th>Context</th>
-            <th>Temporal</th>
-            <th>Consistency</th>
-        </tr>
-    </thead>
-    <tbody align="center">
-        <tr>
-            <td colspan="9" align="left"><strong>Proprietary</strong></td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Claude 3.5 Sonnet</td>
-            <td>-</td>
-            <td>60.0</td>
-            <td>62.9</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">GPT-4V</td>
-            <td>-</td>
-            <td>59.9</td>
-            <td>63.3</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td colspan="9" align="left"><strong>Open-source</strong></td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">LLaVA-NeXT-7B</td>
-            <td>7B</td>
-            <td>-</td>
-            <td>-</td>
-            <td>3.39</td>
-            <td>3.29</td>
-            <td>3.92</td>
-            <td>2.60</td>
-            <td>3.12</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">LLaVA-NeXT-34B</td>
-            <td>34B</td>
-            <td>-</td>
-            <td>-</td>
-            <td>3.29</td>
-            <td>3.23</td>
-            <td>3.83</td>
-            <td>2.51</td>
-            <td>3.47</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">CogVLM2-Video</td>
-            <td>12B</td>
-            <td>-</td>
-            <td>-</td>
-            <td>3.49</td>
-            <td><strong>3.46</strong></td>
-            <td>3.23</td>
-            <td><strong>2.98</strong></td>
-            <td><strong>3.64</strong></td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">LongVA</td>
-            <td>7B</td>
-            <td>52.4</td>
-            <td>54.3</td>
-            <td>3.05</td>
-            <td>3.09</td>
-            <td>3.77</td>
-            <td>2.44</td>
-            <td><strong>3.64</strong></td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">InternVL2-8B</td>
-            <td>8B</td>
-            <td>54.0</td>
-            <td>56.9</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">InternLM-XComposer-2.5</td>
-            <td>8B</td>
-            <td>55.8</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">LLaVA-NeXT-Video</td>
-            <td>32B</td>
-            <td>60.2</td>
-            <td>63.0</td>
-            <td>3.48</td>
-            <td>3.37</td>
-            <td><strong>3.95</strong></td>
-            <td>2.64</td>
-            <td>3.28</td>
-        </tr>
-        <tr style="background-color: #e6f2ff;">
-            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
-            <td>8B</td>
-            <td><strong>60.9</strong></td>
-            <td><strong>63.6</strong></td>
-            <td><strong>3.59</strong></td>
-            <td>3.28</td>
-            <td>3.93</td>
-            <td>2.73</td>
-            <td>3.62</td>
-        </tr>
-    </tbody>
-</table>
-</div>
-</details>
-
-
-<details>
-<summary>Click to view few-shot results on TextVQA, VizWiz, VQAv2, OK-VQA.</summary>
-<div align="center">
-<table style="margin: 0px auto;">
-    <thead>
-        <tr>
-            <th align="left">Model</th>
-            <th>Size</th>
-            <th>Shot</th>
-            <th>TextVQA val</th>
-            <th>VizWiz test-dev</th>
-            <th>VQAv2 test-dev</th>
-            <th>OK-VQA val</th>
-        </tr>
-    </thead>
-    <tbody align="center">
-        <tr>
-            <td align="left" nowrap="nowrap" rowspan="3">Flamingo</td>
-            <td rowspan="3">80B</td>
-            <td>0*</td>
-            <td>35.0</td>
-            <td>31.6</td>
-            <td>56.3</td>
-            <td>40.6</td>
-        </tr>
-        <tr>
-            <td>4</td>
-            <td>36.5</td>
-            <td>39.6</td>
-            <td>63.1</td>
-            <td><strong>57.4</strong></td>
-        </tr>
-        <tr>
-            <td>8</td>
-            <td>37.3</td>
-            <td>44.8</td>
-            <td>65.6</td>
-            <td>57.5</td>
-        </tr>
-        <tr>
-            <td align="left" nowrap="nowrap" rowspan="3">IDEFICS</td>
-            <td rowspan="3">80B</td>
-            <td>0*</td>
-            <td>30.9</td>
-            <td>36.0</td>
-            <td>60.0</td>
-            <td>45.2</td>
-        </tr>
-        <tr>
-            <td>4</td>
-            <td>34.3</td>
-            <td>40.4</td>
-            <td>63.6</td>
-            <td>52.4</td>
-        </tr>
-        <tr>
-            <td>8</td>
-            <td>35.7</td>
-            <td>46.1</td>
-            <td>64.8</td>
-            <td>55.1</td>
-        </tr>
-        <tr>
-            <td align="left" nowrap="nowrap" rowspan="3">OmniCorpus</td>
-            <td rowspan="3">7B</td>
-            <td>0*</td>
-            <td>43.0</td>
-            <td>49.8</td>
-            <td>63.2</td>
-            <td>45.5</td>
-        </tr>
-        <tr>
-            <td>4</td>
-            <td>45.4</td>
-            <td>51.3</td>
-            <td>64.5</td>
-            <td>46.5</td>
-        </tr>
-        <tr>
-            <td>8</td>
-            <td>45.6</td>
-            <td>52.2</td>
-            <td>64.7</td>
-            <td>46.6</td>
-        </tr>
-        <tr>
-            <td align="left" nowrap="nowrap" rowspan="3">Emu2</td>
-            <td rowspan="3">37B</td>
-            <td>0</td>
-            <td>26.4</td>
-            <td>40.4</td>
-            <td>33.5</td>
-            <td>26.7</td>
-        </tr>
-        <tr>
-            <td>4</td>
-            <td>48.2</td>
-            <td>54.6</td>
-            <td>67.0</td>
-            <td>53.2</td>
-        </tr>
-        <tr>
-            <td>8</td>
-            <td>49.3</td>
-            <td>54.7</td>
-            <td>67.8</td>
-            <td>54.1</td>
-        </tr>
-        <tr>
-            <td align="left" nowrap="nowrap" rowspan="2">MM1</td>
-            <td rowspan="2">30B</td>
-            <td>0</td>
-            <td>26.2</td>
-            <td>40.4</td>
-            <td>48.9</td>
-            <td>26.7</td>
-        </tr>
-        <tr>
-            <td>8</td>
-            <td>49.3</td>
-            <td>54.7</td>
-            <td><strong>70.9</strong></td>
-            <td>54.1</td>
-        </tr>
-        <tr style="background-color: #e6f2ff;">
-            <td align="left" nowrap="nowrap" rowspan="3">MiniCPM-V 2.6<sup>+</sup></td>
-            <td rowspan="3">8B</td>
-            <td>0</td>
-            <td>43.9</td>
-            <td>33.8</td>
-            <td>45.4</td>
-            <td>23.9</td>
-        </tr>
-        <tr style="background-color: #e6f2ff;">
-            <td>4</td>
-            <td>63.6</td>
-            <td>60.5</td>
-            <td>65.5</td>
-            <td>50.1</td>
-        </tr>
-        <tr style="background-color: #e6f2ff;">
-            <td>8</td>
-            <td><strong>64.6</strong></td>
-            <td><strong>63.4</strong></td>
-            <td>68.2</td>
-            <td>51.4</td>
-        </tr>
-    </tbody>
-</table>
-
-
-</div>
-* denotes zero image shot and two additional text shots following Flamingo.
-
-<sup>+</sup> We evaluate the pretraining ckpt without SFT.
-</details>
-
-### Examples <!-- omit in toc -->
-
-<div style="display: flex; flex-direction: column; align-items: center;">
-  <img src="assets/minicpmv2_6/multi_img-bike.png" alt="Bike" style="margin-bottom: 5px;">
-  <img src="assets/minicpmv2_6/multi_img-menu.png" alt="Menu" style="margin-bottom: 5px;">
-  <img src="assets/minicpmv2_6/multi_img-code.png" alt="Code" style="margin-bottom: 5px;">
-  <img src="assets/minicpmv2_6/ICL-Mem.png" alt="Mem" style="margin-bottom: 5px;">
-  <img src="assets/minicpmv2_6/multiling-medal.png" alt="medal" style="margin-bottom: 10px;">
-</div>
-<details>
-  <summary>Click to view more cases.</summary>
-  <div style="display: flex; flex-direction: column; align-items: center;">
-    <img src="assets/minicpmv2_6/ICL-elec.png" alt="elec" style="margin-bottom: 5px;">
-    <img src="assets/minicpmv2_6/multiling-olympic.png" alt="Menu" style="margin-bottom: 10px;">
-  </div>
-</details>
-
-We deploy MiniCPM-V 2.6 on end devices. The demo video is the raw screen recording on a iPad Pro without edition.
-
-<table align="center"> 
-    <p align="center">
-      <img src="assets/gif_cases/ai.gif" width=32%/>
-      &nbsp;&nbsp;&nbsp;&nbsp;
-      <img src="assets/gif_cases/beer.gif" width=32%/>
-    </p>
-</table> 
-
-<table align="center"> 
-    <p align="center">
-      <img src="assets/gif_cases/ticket.gif" width=32%/>
-      &nbsp;&nbsp;&nbsp;&nbsp;
-      <img src="assets/gif_cases/wfh.gif" width=32%/>
-    </p>
-</table> 
-
-<table align="center">
-    <p align="center">
-      <video src="https://github.com/user-attachments/assets/21f4b818-ede1-4822-920e-91281725c830" width="360" /> </video>
-      <!-- <video src="https://github.com/user-attachments/assets/c835f757-206b-4d9c-8e36-70d67b453628" width="360" /> </video> -->
-    </p>
-</table>
-
-## MiniCPM-Llama3-V 2.5
-
-<details>
-<summary>Click to view more details of MiniCPM-Llama3-V 2.5</summary>
-
-**MiniCPM-Llama3-V 2.5** is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.0. Notable features of MiniCPM-Llama3-V 2.5 include:
-
-- 🔥 **Leading Performance.**
-  MiniCPM-Llama3-V 2.5 has achieved an average score of 65.1 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. **With only 8B parameters, it surpasses widely used proprietary models like GPT-4V-1106, Gemini Pro, Claude 3 and Qwen-VL-Max** and greatly outperforms other Llama 3-based MLLMs.
-
-- 💪 **Strong OCR Capabilities.**
-  MiniCPM-Llama3-V 2.5 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving a **700+ score on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V-0409, Qwen-VL-Max and Gemini Pro**. Based on recent user feedback, MiniCPM-Llama3-V 2.5 has now enhanced full-text OCR extraction, table-to-markdown conversion, and other high-utility capabilities, and has further strengthened its instruction-following and complex reasoning abilities, enhancing multimodal interaction experiences.
-
-- 🏆 **Trustworthy Behavior.**
-  Leveraging the latest [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) method (the newest technique in the [RLHF-V](https://github.com/RLHF-V) [CVPR'24] series), MiniCPM-Llama3-V 2.5 exhibits more trustworthy behavior. It achieves a **10.3%** hallucination rate on Object HalBench, lower than GPT-4V-1106 (13.6%), achieving the best-level performance within the open-source community. [Data released](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset).
-
-- 🌏 **Multilingual Support.**
-  Thanks to the strong multilingual capabilities of Llama 3 and the cross-lingual generalization technique from [VisCPM](https://github.com/OpenBMB/VisCPM), MiniCPM-Llama3-V 2.5 extends its bilingual (Chinese-English) multimodal capabilities to **over 30 languages including German, French, Spanish, Italian, Korean etc.** [All Supported Languages](./assets/minicpm-llama-v-2-5_languages.md).
-
-- 🚀 **Efficient Deployment.**
-  MiniCPM-Llama3-V 2.5 systematically employs **model quantization, CPU optimizations, NPU optimizations and compilation optimizations**, achieving high-efficiency deployment on end-side devices. For mobile phones with Qualcomm chips, we have integrated the NPU acceleration framework QNN into llama.cpp for the first time. After systematic optimization, MiniCPM-Llama3-V 2.5 has realized a **150x acceleration in end-side MLLM image encoding** and a **3x speedup in language decoding**.
-
--  💫  **Easy Usage.**
-MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) and [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) support for efficient CPU inference on local devices, (2) [GGUF](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) format quantized models in 16 sizes, (3) efficient [LoRA](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#lora-finetuning) fine-tuning with only 2 V100 GPUs, (4) [streaming output](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5#usage), (5) quick local WebUI demo setup with [Gradio](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_2.5.py) and [Streamlit](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_streamlit-2_5.py), and (6) interactive demos on [HuggingFace Spaces](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5).
-
-### Evaluation  <!-- omit in toc -->
-
-<div align="center">
-    <img src=assets/MiniCPM-Llama3-V-2.5-peformance.png width=66% />
-</div>
-<details>
-<summary>Click to view results on TextVQA, DocVQA, OCRBench, OpenCompass, MME, MMBench, MMMU, MathVista, LLaVA Bench, RealWorld QA, Object HalBench. </summary>
-<div align="center">
-
-<table style="margin: 0px auto;">
-    <thead>
-        <tr>
-            <th align="left">Model</th>
-            <th>Size</th>
-            <th>OCRBench</th>
-            <th>TextVQA val</th>
-            <th>DocVQA test</th>
-            <th>Open-Compass</th>
-            <th>MME</th>
-            <th>MMB test (en)</th>
-            <th>MMB test (cn)</th>
-            <th>MMMU val</th>
-            <th>Math-Vista</th>
-            <th>LLaVA Bench</th>
-            <th>RealWorld QA</th>
-            <th>Object HalBench</th>
-        </tr>
-    </thead>
-    <tbody align="center">
-        <tr>
-            <td colspan="14" align="left"><strong>Proprietary</strong></td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Gemini Pro</td>
-            <td>-</td>
-            <td>680</td>
-            <td>74.6</td>
-            <td>88.1</td>
-            <td>62.9</td>
-            <td>2148.9</td>
-            <td>73.6</td>
-            <td>74.3</td>
-            <td>48.9</td>
-            <td>45.8</td>
-            <td>79.9</td>
-            <td>60.4</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">GPT-4V (2023.11.06)</td>
-            <td>-</td>
-            <td>645</td>
-            <td>78.0</td>
-            <td>88.4</td>
-            <td>63.5</td>
-            <td>1771.5</td>
-            <td>77.0</td>
-            <td>74.4</td>
-            <td>53.8</td>
-            <td>47.8</td>
-            <td>93.1</td>
-            <td>63.0</td>
-            <td>86.4</td>
-        </tr>
-        <tr>
-            <td colspan="14" align="left"><strong>Open-source</strong></td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Mini-Gemini</td>
-            <td>2.2B</td>
-            <td>-</td>
-            <td>56.2</td>
-            <td>34.2*</td>
-            <td>-</td>
-            <td>1653.0</td>
-            <td>-</td>
-            <td>-</td>
-            <td>31.7</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Qwen-VL-Chat</td>
-            <td>9.6B</td>
-            <td>488</td>
-            <td>61.5</td>
-            <td>62.6</td>
-            <td>51.6</td>
-            <td>1860.0</td>
-            <td>61.8</td>
-            <td>56.3</td>
-            <td>37.0</td>
-            <td>33.8</td>
-            <td>67.7</td>
-            <td>49.3</td>
-            <td>56.2</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">DeepSeek-VL-7B</td>
-            <td>7.3B</td>
-            <td>435</td>
-            <td>64.7*</td>
-            <td>47.0*</td>
-            <td>54.6</td>
-            <td>1765.4</td>
-            <td>73.8</td>
-            <td>71.4</td>
-            <td>38.3</td>
-            <td>36.8</td>
-            <td>77.8</td>
-            <td>54.2</td>
-            <td>-</td>
-        </tr>        
-        <tr>
-            <td nowrap="nowrap" align="left">Yi-VL-34B</td>
-            <td>34B</td>
-            <td>290</td>
-            <td>43.4*</td>
-            <td>16.9*</td>
-            <td>52.2</td>
-            <td><strong>2050.2</strong></td>
-            <td>72.4</td>
-            <td>70.7</td>
-            <td>45.1</td>
-            <td>30.7</td>
-            <td>62.3</td>
-            <td>54.8</td>
-            <td>79.3</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">CogVLM-Chat</td>
-            <td>17.4B</td>
-            <td>590</td>
-            <td>70.4</td>
-            <td>33.3*</td>
-            <td>54.2</td>
-            <td>1736.6</td>
-            <td>65.8</td>
-            <td>55.9</td>
-            <td>37.3</td>
-            <td>34.7</td>
-            <td>73.9</td>
-            <td>60.3</td>
-            <td>73.6</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">TextMonkey</td>
-            <td>9.7B</td>
-            <td>558</td>
-            <td>64.3</td>
-            <td>66.7</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-        </tr>
-        <tr>
-          <td nowrap="nowrap" align="left">Idefics2</td>
-          <td>8.0B</td>
-          <td>-</td>
-          <td>73.0</td>
-          <td>74.0</td>
-          <td>57.2</td>
-          <td>1847.6</td>
-          <td>75.7</td>
-          <td>68.6</td>
-          <td>45.2</td>
-          <td>52.2</td>
-          <td>49.1</td>
-          <td>60.7</td>
-          <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Bunny-LLama-3-8B</td>
-            <td>8.4B</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>54.3</td>
-            <td>1920.3</td>
-            <td>77.0</td>
-            <td>73.9</td>
-            <td>41.3</td>
-            <td>31.5</td>
-            <td>61.2</td>
-            <td>58.8</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">LLaVA-NeXT Llama-3-8B</td>
-            <td>8.4B</td>
-            <td>-</td>
-            <td>-</td>
-            <td>78.2</td>
-            <td>-</td>
-            <td>1971.5</td>
-            <td>-</td>
-            <td>-</td>
-            <td>41.7</td>
-            <td>37.5</td>
-            <td>80.1</td>
-            <td>60.0</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Phi-3-vision-128k-instruct</td>
-            <td>4.2B</td>
-            <td>639*</td>
-            <td>70.9</td>
-            <td>-</td>
-            <td>-</td>
-            <td>1537.5*</td>
-            <td>-</td>
-            <td>-</td>
-            <td>40.4</td>
-            <td>44.5</td>
-            <td>64.2*</td>
-            <td>58.8*</td>
-            <td>-</td>
-        </tr>
-        <tr style="background-color: #e6f2ff;">
-            <td nowrap="nowrap" align="left">MiniCPM-V 1.0</td>
-            <td>2.8B</td>
-            <td>366</td>
-            <td>60.6</td>
-            <td>38.2</td>
-            <td>47.5</td>
-            <td>1650.2</td>
-            <td>64.1</td>
-            <td>62.6</td>
-            <td>38.3</td>
-            <td>28.9</td>
-            <td>51.3</td>
-            <td>51.2</td>
-            <td>78.4</td>
-        </tr>
-        <tr style="background-color: #e6f2ff;">
-            <td nowrap="nowrap" align="left">MiniCPM-V 2.0</td>
-            <td>2.8B</td>
-            <td>605</td>
-            <td>74.1</td>
-            <td>71.9</td>
-            <td>54.5</td>
-            <td>1808.6</td>
-            <td>69.1</td>
-            <td>66.5</td>
-            <td>38.2</td>
-            <td>38.7</td>
-            <td>69.2</td>
-            <td>55.8</td>
-            <td>85.5</td>
-        </tr>
-        <tr style="background-color: #e6f2ff;">
-            <td nowrap="nowrap" align="left">MiniCPM-Llama3-V 2.5</td>
-            <td>8.5B</td>
-            <td><strong>725</strong></td>
-            <td><strong>76.6</strong></td>
-            <td><strong>84.8</strong></td>
-            <td><strong>65.1</strong></td>
-            <td>2024.6</td>
-            <td><strong>77.2</strong></td>
-            <td><strong>74.2</strong></td>
-            <td><strong>45.8</strong></td>
-            <td><strong>54.3</strong></td>
-            <td><strong>86.7</strong></td>
-            <td><strong>63.5</strong></td>
-            <td><strong>89.7</strong></td>
-        </tr>
-    </tbody>
-</table>
-
-
-</div>
-* We evaluate the officially released checkpoint by ourselves.
-
-</details>
-
-<div align="center">
-    <img src="assets/llavabench_compare_3.png" width="100%" />
-    <br>
-    Evaluation results of multilingual LLaVA Bench
-</div>
-
-### Examples <!-- omit in toc -->
-
-<table align="center" >
-  <p align="center" > 
-  <img src="assets/minicpmv-llama3-v2.5/cases_all.png" />
-  </p>
-</table>
-
-</details>
-
-
-## MiniCPM-V 2.0
-
-<details>
-<summary>Click to view more details of MiniCPM-V 2.0</summary>
-
-
-**MiniCPM-V 2.0** is an efficient version with promising performance for deployment. The model is built based on SigLip-400M and [MiniCPM-2.4B](https://github.com/OpenBMB/MiniCPM/), connected by a perceiver resampler. Our latest version, MiniCPM-V 2.0 has several notable features. 
-
-- 🔥 **State-of-the-art Performance.** 
-
-  MiniCPM-V 2.0 achieves **state-of-the-art performance** on multiple benchmarks (including OCRBench, TextVQA, MME, MMB, MathVista, etc) among models under 7B parameters. It even **outperforms strong Qwen-VL-Chat 9.6B, CogVLM-Chat 17.4B, and Yi-VL 34B on OpenCompass, a comprehensive evaluation over 11 popular benchmarks**. Notably, MiniCPM-V 2.0 shows **strong OCR capability**, achieving **comparable performance to Gemini Pro in scene-text understanding**, and **state-of-the-art performance on OCRBench** among open-source models.
-
-- 🏆 **Trustworthy Behavior.** 
-
-  LMMs are known for suffering from hallucination, often generating text not factually grounded in images. MiniCPM-V 2.0 is **the first end-side LMM aligned via multimodal RLHF for trustworthy behavior** (using the recent [RLHF-V](https://rlhf-v.github.io/) [CVPR'24] series technique). This allows the model to **match GPT-4V in preventing hallucinations** on Object HalBench.
-
-- 🌟 **High-Resolution Images at Any Aspect Raito.**
-
-  MiniCPM-V 2.0 can accept **1.8 million pixels (e.g., 1344x1344) images at any aspect ratio**. This enables better perception of fine-grained visual information such as small objects and optical characters, which is achieved via a recent technique from [LLaVA-UHD](https://arxiv.org/pdf/2403.11703.pdf).
-
-- ⚡️ **High Efficiency.** 
-
-  MiniCPM-V 2.0 can be **efficiently deployed on most GPU cards and personal computers**, and **even on end devices such as mobile phones**. For visual encoding, we compress the image representations into much fewer tokens via a perceiver resampler. This allows MiniCPM-V 2.0 to operate with **favorable memory cost and speed during inference even when dealing with high-resolution images**.
-
-- 🙌 **Bilingual Support.** 
-
-  MiniCPM-V 2.0 **supports strong bilingual multimodal capabilities in both English and Chinese**. This is enabled by generalizing multimodal capabilities across languages, a technique from [VisCPM](https://arxiv.org/abs/2308.12038) [ICLR'24].
-
-### Examples <!-- omit in toc -->
-
-<table align="center">
-    <p align="center">
-      <img src="assets/minicpmv2-cases_2.png" width=95%/>
-    </p>
-</table>
-
-We deploy MiniCPM-V 2.0 on end devices. The demo video is the raw screen recording on a Xiaomi 14 Pro without edition.
-
-<table align="center">
-    <p align="center">
-      <img src="assets/gif_cases/station.gif" width=36%/>
-      <img src="assets/gif_cases/london_car.gif" width=36%/>
-    </p>
-</table>
-
-</details>
-
-## Legacy Models <!-- omit in toc --> 
-
-| Model                | Introduction and Guidance       |
-|:----------------------|:-------------------:|
-| MiniCPM-V 1.0  | [Document](./minicpm_v1.md)   | 
-| OmniLMM-12B  | [Document](./omnilmm_en.md)   |  
-
-
-## Chat with Our Demo on Gradio 🤗
-
-We provide online and local demos powered by Hugging Face Gradio <a href='https://github.com/gradio-app/gradio'><img src='https://img.shields.io/github/stars/gradio-app/gradio'></a>, the most popular model deployment framework nowadays. It supports streaming outputs, progress bars, queuing, alerts,  and other useful features.
-
-
-### Online Demo <!-- omit in toc --> 
-
-Click here to try out the online demo of [MiniCPM-V 2.6](http://120.92.209.146:8887/) | [MiniCPM-Llama3-V 2.5](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5) | [MiniCPM-V 2.0](https://huggingface.co/spaces/openbmb/MiniCPM-V-2).
-
-### Local WebUI Demo <!-- omit in toc --> 
-  
-You can easily build your own local WebUI demo with Gradio using the following commands.
-  
-```shell
-pip install -r requirements.txt
-```
-  
-```shell
-# For NVIDIA GPUs, run:
-python web_demo_2.6.py --device cuda
-
-```
-
-
-## Install
-
-1. Clone this repository and navigate to the source folder
-
-```bash
-git clone https://github.com/OpenBMB/MiniCPM-V.git
-cd MiniCPM-V
-```
-
-2. Create conda environment
-
-```Shell
-conda create -n MiniCPM-V python=3.10 -y
-conda activate MiniCPM-V
-```
-
-3. Install dependencies
-
-```shell
-pip install -r requirements.txt
-```
-
-## Inference
-
-
-### Model Zoo
-
-| Model           | Device | Memory    | &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; Description       | Download |
-|:-----------|:--:|:-----------:|:-------------------|:---------------:|
-| MiniCPM-V 2.6| GPU | 17 GB  | The latest version, achieving state-of-the-art end-side performance for single image, multi-image and video understanding.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6) |
-| MiniCPM-V 2.6 gguf | CPU | 6 GB  | The gguf version, lower memory usage and faster inference.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-gguf) |
-| MiniCPM-V 2.6 int4 | GPU | 7 GB  | The int4 quantized version, lower GPU memory usage.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6-int4) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-int4) |
-| MiniCPM-Llama3-V 2.5 | GPU | 19 GB | Strong end-side multimodal performance.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5) |
-| MiniCPM-Llama3-V 2.5 gguf | CPU  | 6 GB | The gguf version, lower memory usage and faster inference.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) &nbsp;&nbsp;[<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-gguf) |
-| MiniCPM-Llama3-V 2.5 int4 | GPU | 8 GB | The int4 quantized version, lower GPU memory usage. |  [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-int4/) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-int4) |
-| MiniCPM-V 2.0 | GPU | 8 GB | Light version, balance the performance the computation cost.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2) |
-| MiniCPM-V 1.0 | GPU | 7 GB | Lightest version, achieving the fastest inference. |   [🤗](https://huggingface.co/openbmb/MiniCPM-V) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V) |
-
-### Multi-turn Conversation
-
-Please refer to the following codes to run.
-
-<div align="center">
-<img src="assets/airplane.jpeg" width="500px">
-</div>
-
-
-```python
-import torch
-from PIL import Image
-from transformers import AutoModel, AutoTokenizer
-
-torch.manual_seed(0)
-
-model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
-    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
-model = model.eval().cuda()
-tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
-
-image = Image.open('./assets/airplane.jpeg').convert('RGB')
-
-# First round chat 
-question = "Tell me the model of this aircraft."
-msgs = [{'role': 'user', 'content': [image, question]}]
-
-answer = model.chat(
-    image=None,
-    msgs=msgs,
-    tokenizer=tokenizer
-)
-print(answer)
-
-# Second round chat 
-# pass history context of multi-turn conversation
-msgs.append({"role": "assistant", "content": [answer]})
-msgs.append({"role": "user", "content": ["Introduce something about Airbus A380."]})
-
-answer = model.chat(
-    image=None,
-    msgs=msgs,
-    tokenizer=tokenizer
-)
-print(answer)
-```
-
-You will get the following output:
-
-```
-"The aircraft in the image is an Airbus A380, which can be identified by its large size, double-deck structure, and the distinctive shape of its wings and engines. The A380 is a wide-body aircraft known for being the world's largest passenger airliner, designed for long-haul flights. It has four engines, which are characteristic of large commercial aircraft. The registration number on the aircraft can also provide specific information about the model if looked up in an aviation database."
-
-"The Airbus A380 is a double-deck, wide-body, four-engine jet airliner made by Airbus. It is the world's largest passenger airliner and is known for its long-haul capabilities. The aircraft was developed to improve efficiency and comfort for passengers traveling over long distances. It has two full-length passenger decks, which can accommodate more passengers than a typical single-aisle airplane. The A380 has been operated by airlines such as Lufthansa, Singapore Airlines, and Emirates, among others. It is widely recognized for its unique design and significant impact on the aviation industry."
-```
-
-#### Chat with multiple images
-<details>
-<summary> Click to view Python code running MiniCPM-V 2.6 with multiple images input. </summary>
-  
-```python
-import torch
-from PIL import Image
-from transformers import AutoModel, AutoTokenizer
-
-model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
-    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
-model = model.eval().cuda()
-tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
-
-image1 = Image.open('image1.jpg').convert('RGB')
-image2 = Image.open('image2.jpg').convert('RGB')
-question = 'Compare image 1 and image 2, tell me about the differences between image 1 and image 2.'
-
-msgs = [{'role': 'user', 'content': [image1, image2, question]}]
-
-answer = model.chat(
-    image=None,
-    msgs=msgs,
-    tokenizer=tokenizer
-)
-print(answer)
-```
-</details>
-
-#### In-context few-shot learning
-<details>
-<summary> Click to view Python code running MiniCPM-V 2.6 with few-shot input. </summary>
-
-```python
-import torch
-from PIL import Image
-from transformers import AutoModel, AutoTokenizer
-
-model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
-    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
-model = model.eval().cuda()
-tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
-
-question = "production date" 
-image1 = Image.open('example1.jpg').convert('RGB')
-answer1 = "2023.08.04"
-image2 = Image.open('example2.jpg').convert('RGB')
-answer2 = "2007.04.24"
-image_test = Image.open('test.jpg').convert('RGB')
-
-msgs = [
-    {'role': 'user', 'content': [image1, question]}, {'role': 'assistant', 'content': [answer1]},
-    {'role': 'user', 'content': [image2, question]}, {'role': 'assistant', 'content': [answer2]},
-    {'role': 'user', 'content': [image_test, question]}
-]
-
-answer = model.chat(
-    image=None,
-    msgs=msgs,
-    tokenizer=tokenizer
-)
-print(answer)
-```
-</details>
-
-#### Chat with video
-<details>
-<summary> Click to view Python code running MiniCPM-V 2.6 with video input. </summary>
-
-```python
-import torch
-from PIL import Image
-from transformers import AutoModel, AutoTokenizer
-from decord import VideoReader, cpu    # pip install decord
-
-model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
-    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
-model = model.eval().cuda()
-tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
-
-MAX_NUM_FRAMES=64 # if cuda OOM set a smaller number
-
-def encode_video(video_path):
-    def uniform_sample(l, n):
-        gap = len(l) / n
-        idxs = [int(i * gap + gap / 2) for i in range(n)]
-        return [l[i] for i in idxs]
-
-    vr = VideoReader(video_path, ctx=cpu(0))
-    sample_fps = round(vr.get_avg_fps() / 1)  # FPS
-    frame_idx = [i for i in range(0, len(vr), sample_fps)]
-    if len(frame_idx) > MAX_NUM_FRAMES:
-        frame_idx = uniform_sample(frame_idx, MAX_NUM_FRAMES)
-    frames = vr.get_batch(frame_idx).asnumpy()
-    frames = [Image.fromarray(v.astype('uint8')) for v in frames]
-    print('num frames:', len(frames))
-    return frames
-
-video_path="video_test.mp4"
-frames = encode_video(video_path)
-question = "Describe the video"
-msgs = [
-    {'role': 'user', 'content': frames + [question]}, 
-]
-
-# Set decode params for video
-params = {}
-params["use_image_id"] = False
-params["max_slice_nums"] = 2 # use 1 if cuda OOM and video resolution > 448*448
-
-answer = model.chat(
-    image=None,
-    msgs=msgs,
-    tokenizer=tokenizer,
-    **params
-)
-print(answer)
-```
-</details>
-
-
-### Inference on Multiple GPUs
-You can run MiniCPM-Llama3-V 2.5 on multiple low VRAM GPUs (12 GB or 16 GB) by distributing the model's layers across multiple GPUs. Please refer to this [tutorial](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md) for detailed instructions on how to load the model and inference using multiple low VRAM GPUs.
-
-
-### Inference on Mac
-<details>
-<summary>Click to view an example, to run MiniCPM-Llama3-V 2.5 on 💻 Mac with MPS (Apple silicon or AMD GPUs). </summary>
-
-```python
-# test.py  Need more than 16GB memory.
-import torch
-from PIL import Image
-from transformers import AutoModel, AutoTokenizer
-
-model = AutoModel.from_pretrained('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True, low_cpu_mem_usage=True)
-model = model.to(device='mps')
-
-tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True)
-model.eval()
-
-image = Image.open('./assets/hk_OCR.jpg').convert('RGB')
-question = 'Where is this photo taken?'
-msgs = [{'role': 'user', 'content': question}]
-
-answer, context, _ = model.chat(
-    image=image,
-    msgs=msgs,
-    context=None,
-    tokenizer=tokenizer,
-    sampling=True
-)
-print(answer)
-```
-Run with command:
-```shell
-PYTORCH_ENABLE_MPS_FALLBACK=1 python test.py
-```
-</details>
-
-### Deployment on Mobile Phone
-MiniCPM-V 2.0 can be deployed on mobile phones with Android operating systems. 🚀 Click [MiniCPM-V 2.0](https://github.com/OpenBMB/mlc-MiniCPM) to install apk.
-
-### Inference with llama.cpp
-MiniCPM-V 2.6 can run with llama.cpp now! See [our fork of llama.cpp](https://github.com/OpenBMB/llama.cpp/tree/minicpmv-main/examples/llava/README-minicpmv2.6.md) for more detail. This implementation supports smooth inference of 16~18 token/s on iPad (test environment：iPad Pro + M4).
-
-### Inference with ollama
-MiniCPM-V 2.6 can run with ollama now! See [our fork of ollama](https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/examples/minicpm-v2.6/README.md) for more detail. This implementation supports smooth inference of 16~18 token/s on iPad (test environment：iPad Pro + M4).
-
-### Inference with vLLM
-
-<details>
-<summary> vLLM now officially supports MiniCPM-V 2.6, MiniCPM-Llama3-V 2.5 and MiniCPM-V 2.0, Click to see. </summary>
-
-1. Install vLLM(>=0.5.4):
-```shell
-pip install vllm
-```
-2. Install timm: (optional, MiniCPM-V 2.0 need timm)
-```shell
-pip install timm==0.9.10
-```
-3. Run the example(for image):
-```python
-from transformers import AutoTokenizer
-from PIL import Image
-from vllm import LLM, SamplingParams
-
-MODEL_NAME = "openbmb/MiniCPM-V-2_6"
-# Also available for previous models
-# MODEL_NAME = "openbmb/MiniCPM-Llama3-V-2_5"
-# MODEL_NAME = "HwwwH/MiniCPM-V-2"
-
-image = Image.open("xxx.png").convert("RGB")
-tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
-llm = LLM(
-    model=MODEL_NAME,
-    trust_remote_code=True,
-    gpu_memory_utilization=1,
-    max_model_len=2048
-)
-
-messages = [{
-    "role":
-    "user",
-    "content":
-    # Number of images
-    "(<image>./</image>)" + \
-    "\nWhat is the content of this image?" 
-}]
-prompt = tokenizer.apply_chat_template(
-    messages,
-    tokenize=False,
-    add_generation_prompt=True
-)
-
-# Single Inference
-inputs = {
-    "prompt": prompt,
-    "multi_modal_data": {
-        "image": image
-        # Multi images, the number of images should be equal to that of `(<image>./</image>)`
-        # "image": [image, image] 
-    },
-}
-# Batch Inference
-# inputs = [{
-#     "prompt": prompt,
-#     "multi_modal_data": {
-#         "image": image
-#     },
-# } for _ in 2]
-
-
-# 2.6
-stop_tokens = ['<|im_end|>', '<|endoftext|>']
-stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]
-# 2.0
-# stop_token_ids = [tokenizer.eos_id]
-# 2.5
-# stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]
-
-sampling_params = SamplingParams(
-    stop_token_ids=stop_token_ids, 
-    use_beam_search=True,
-    temperature=0, 
-    best_of=3,
-    max_tokens=1024
-)
-
-outputs = llm.generate(inputs, sampling_params=sampling_params)
-
-print(outputs[0].outputs[0].text)
-```
-4. click [here](https://modelbest.feishu.cn/wiki/C2BWw4ZP0iCDy7kkCPCcX2BHnOf?from=from_copylink) if you want to use it with *video*, or get more details about `vLLM`.
-</details>
-
-## Fine-tuning
-
-### Simple Fine-tuning <!-- omit in toc -->
-
-We support simple fine-tuning with Hugging Face for MiniCPM-V 2.0 and MiniCPM-Llama3-V 2.5.
-
-[Reference Document](./finetune/readme.md)
-
-### With the SWIFT Framework <!-- omit in toc -->
-
-We now support MiniCPM-V series fine-tuning with the SWIFT framework. SWIFT supports training, inference, evaluation and deployment of nearly 200 LLMs and MLLMs . It supports the lightweight training solutions provided by PEFT and a complete Adapters Library including techniques such as NEFTune, LoRA+ and LLaMA-PRO.
-
-Best Practices：[MiniCPM-V 1.0](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v最佳实践.md), [MiniCPM-V 2.0](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v-2最佳实践.md), [MiniCPM-V 2.6](https://github.com/modelscope/ms-swift/issues/1613).
-
-## FAQs
-Click here to view the [FAQs](./docs/faqs.md)
-
-## Model License <!-- omit in toc -->
-
-* This repository is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License. 
-
-* The usage of MiniCPM-V model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
-
-* The models and weights of MiniCPM are completely free for academic research. after filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, are also available for free commercial use.
-  
-
-## Statement <!-- omit in toc -->
-
-As LMMs, MiniCPM-V models (including OmniLMM) generate contents by learning a large amount of multimodal corpora, but they cannot comprehend, express personal opinions or make value judgement. Anything generated by MiniCPM-V models does not represent the views and positions of the model developers
-
-We will not be liable for any problems arising from the use of MiniCPM-V models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model.
-
-
-## Institutions  <!-- omit in toc -->
-
-This project is developed by the following institutions:
-
-- <img src="assets/thunlp.png" width="28px"> [THUNLP](https://nlp.csai.tsinghua.edu.cn/)
-- <img src="assets/modelbest.png" width="28px"> [ModelBest](https://modelbest.cn/)
-- <img src="assets/zhihu.webp" width="28px"> [Zhihu](https://www.zhihu.com/ )
-
-## 🌟 Star History <!-- omit in toc -->
-
-
-<table align="center">
-    <p align="center">
-      <img src="assets/star_history.svg"/>
-    </p>
-</table>
-
-<!-- <picture>
-  <source
-    media="(prefers-color-scheme: dark)"
-    srcset="
-      https://api.star-history.com/svg?repos=OpenBMB/MiniCPM-V&type=Date&theme=dark
-    "
-  />
-  <source
-    media="(prefers-color-scheme: light)"
-    srcset="
-      https://api.star-history.com/svg?repos=OpenBMB/MiniCPM-V&type=Date
-    "
-  />
-  <img
-    alt="Star History Chart"
-    src="https://api.star-history.com/svg?repos=OpenBMB/MiniCPM-V&type=Date"
-  />
-</picture> -->
-
-## Key Techniques and Other Multimodal Projects <!-- omit in toc -->
-
-👏 Welcome to explore key techniques of MiniCPM-V and other multimodal projects of our team:
-
-[VisCPM](https://github.com/OpenBMB/VisCPM/tree/main) | [RLHF-V](https://github.com/RLHF-V/RLHF-V) | [LLaVA-UHD](https://github.com/thunlp/LLaVA-UHD) | [RLAIF-V](https://github.com/RLHF-V/RLAIF-V)
-
-
-## Citation <!-- omit in toc -->
-
-If you find our model/code/paper helpful, please consider cite our papers 📝 and star us ⭐️！
-
-```bib
-@article{yao2024minicpm,
-  title={MiniCPM-V: A GPT-4V Level MLLM on Your Phone},
-  author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others},
-  journal={arXiv preprint arXiv:2408.01800},
-  year={2024}
-}
-```
diff --git a/README_zh.md b/README_zh.md
index 0399a37..35d8040 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -1,34 +1,36 @@
 <div align="center">
 
-<!-- <!-- <h1 style="color: #33A6B8; font-family: Helvetica"> OmniLMM </h1> -->
+<img src="./assets/MiniCPM-o.png" width="300em" ></img> 
 
-<img src="./assets/minicpmv.png" width="300em" ></img> 
-
-**端侧可用的 GPT-4V 级单图、多图、视频多模态大模型**
+**端侧可用的 GPT-4o 级视觉、语音、多模态实时流式大模型**
 
   <strong>中文 |
-  [English](./README_en.md)</strong>
+  [English](./README.md)</strong>
+
+
+
+ <span style="display: inline-flex; align-items: center; margin-right: 2px;">
+   <a href="docs/wechat.md" target="_blank"> 微信社区</a> &nbsp;|
+ </span>
+  <span style="display: inline-flex; align-items: center; margin-left: 2px;">
+   MiniCPM-V <a href="docs/best_practice_summary_zh.md" target="_blank">&nbsp; 📖 最佳实践</a>
+ </span>
 
- 加入我们的 <a href="docs/wechat.md" target="_blank"> 💬 微信社区</a>
-｜ 了解 MiniCPM-V <a href="docs/best_practice_summary_zh.md" target="_blank"> 📖 最佳实践</a>
   
-
-<p align="center">
-  MiniCPM-V 2.6 <a href="https://huggingface.co/openbmb/MiniCPM-V-2_6">🤗</a> <a href="http://120.92.209.146:8887/">🤖</a> | MiniCPM-Llama3-V 2.5  <a href="https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/">🤗</a> <a href="https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5">🤖</a> |
-  <a href=https://arxiv.org/abs/2408.01800>MiniCPM-Llama3-V 2.5 技术报告</a> 
+  <p align="center">
+  MiniCPM-o 2.6 <a href="https://huggingface.co/openbmb/MiniCPM-o-2_6">🤗</a> <a href="https://minicpm-omni-webdemo.modelbest.cn/"> 国内🤖</a> <a href="https://minicpm-omni-webdemo-us.modelbest.cn/"> 国外🤖</a> | MiniCPM-V 2.6 <a href="https://huggingface.co/openbmb/MiniCPM-V-2_6">🤗</a> <a href="http://120.92.209.146:8887/">🤖</a> | 
+  技术报告近期将发布
 </p>
 
-
 </div>
 
 
-**MiniCPM-V**是面向图文理解的端侧多模态大模型系列。该系列模型接受图像和文本输入，并提供高质量的文本输出。自2024年2月以来，我们共发布了5个版本模型，旨在实现**领先的性能和高效的部署**，目前该系列最值得关注的模型包括：
+**MiniCPM-o** 是从 MiniCPM-V 升级的最新端侧多模态大模型系列。该系列模型可以以端到端方式，接受图像、视频、文本、音频作为输入，并生成高质量文本和语音输出。自2024年2月以来，我们以实现高性能和高效部署为目标，发布了6个版本的模型。目前系列中最值得关注的模型包括：
 
 
-- **MiniCPM-V 2.6**: 🔥🔥🔥 MiniCPM-V系列的最新、性能最佳模型。总参数量 8B，单图、多图和视频理解性能**超越了 GPT-4V**。在单图理解上，它取得了优于 **GPT-4o mini、Gemini 1.5 Pro 和 Claude 3.5 Sonnet**等商用闭源模型的表现，并进一步优化了 MiniCPM-Llama3-V 2.5 的 OCR、可信行为、多语言支持以及端侧部署等诸多特性。基于其领先的视觉 token 密度，MiniCPM-V 2.6 成为了首个支持在 iPad 等端侧设备上进行实时视频理解的多模态大模型。
-
-- **MiniCPM-V 2.0**：MiniCPM-V系列的最轻量级模型。总参数量2B，多模态综合性能超越 Yi-VL 34B、CogVLM-Chat 17B、Qwen-VL-Chat 10B 等更大参数规模的模型，可接受 180 万像素的任意长宽比图像输入，实现了和 Gemini Pro 相近的场景文字识别能力以及和 GPT-4V 相匹的低幻觉率。
+- **MiniCPM-o 2.6**: 🔥🔥🔥 MiniCPM-o 系列的最新、性能最佳模型。总参数量 8B，**视觉、语音和多模态流式能力达到了 GPT-4o-202405 级别**，是开源社区中模态支持最丰富、性能最佳的模型之一。在新的语音模式中，MiniCPM-o 2.6 **支持可配置声音的中英双语语音对话，还具备情感/语速/风格控制、端到端声音克隆、角色扮演等进阶能力**。模型也进一步提升了 MiniCPM-V 2.6 的 **OCR、可信行为、多语言支持和视频理解等视觉能力**。基于其领先的视觉 token 密度，MiniCPM-V 2.6 成为了**首个支持在 iPad 等端侧设备上进行多模态实时流式交互**的多模态大模型。
 
+- **MiniCPM-V 2.6**: MiniCPM-V 系列中性能最佳的模型。总参数量 8B，单图、多图和视频理解性能**超越了 GPT-4V**。它取得了优于 **GPT-4o mini、Gemini 1.5 Pro 和 Claude 3.5 Sonnet**等的单图理解表现，并成为了首个支持在 iPad 等端侧设备上进行实时视频理解的多模态大模型。
 
 
 ## 更新日志 <!-- omit in toc -->
@@ -36,15 +38,10 @@
 #### 📌 置顶
 
 
+* [2025.01.13] 🔥🔥🔥 我们开源了 MiniCPM-o 2.6，该模型视觉、语音和多模态流式能力达到了 GPT-4o-202405 级别，进一步优化了 MiniCPM-V 2.6 的众多亮点能力，还支持了很多有趣的新功能。欢迎试用！
 * [2024.08.17] 🚀🚀🚀 llama.cpp [官方仓库](https://github.com/ggerganov/llama.cpp)正式支持 MiniCPM-V 2.6 啦！点击[这里](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf)查看各种大小的 GGUF 版本。
-* [2024.08.15] MiniCPM-V 2.6 现在支持多图像 SFT。有关更多详细信息，请参阅[微调文档](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune)
-* [2024.08.14] MiniCPM-V 2.6 现在可以通过 SWIFT 框架 [微调](https://github.com/modelscope/ms-swift/issues/1613) 了！
-* [2024.08.10] 🚀🚀🚀 llama.cpp [官方仓库](https://github.com/ggerganov/llama.cpp)正式支持 MiniCPM-Llama3-V 2.5 啦！点击[这里](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/tree/main)查看各种大小的 GGUF 版本。
 * [2024.08.06] 🔥🔥🔥 我们开源了 MiniCPM-V 2.6，该模型在单图、多图和视频理解方面取得了优于 GPT-4V 的表现。我们还进一步提升了 MiniCPM-Llama3-V 2.5 的多项亮点能力，并首次支持了 iPad 上的实时视频理解。欢迎试用！
 * [2024.08.03] MiniCPM-Llama3-V 2.5 技术报告已发布！欢迎点击[这里](https://arxiv.org/abs/2408.01800)查看。
-* [2024.07.19] MiniCPM-Llama3-V 2.5 现已支持[vLLM](#vllm-部署-) ！
-* [2024.05.28] 💫 我们现在支持 MiniCPM-Llama3-V 2.5 的 LoRA 微调，更多内存使用统计信息可以在[这里](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#model-fine-tuning-memory-usage-statistics)找到。
-* [2024.05.23] 🔍 我们添加了Phi-3-vision-128k-instruct 与 MiniCPM-Llama3-V 2.5的全面对比，包括基准测试评估、多语言能力和推理效率 🌟📊🌍🚀。点击[这里](./docs/compare_with_phi-3_vision.md)查看详细信息。
 * [2024.05.23] 🔥🔥🔥 MiniCPM-V 在 GitHub Trending 和 Hugging Face Trending 上登顶！MiniCPM-Llama3-V 2.5 Demo 被 Hugging Face 的 Gradio 官方账户推荐，欢迎点击[这里](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5)体验！
 
 
@@ -53,10 +50,16 @@
 <details> 
 <summary>点击查看完整更新日志。</summary>
 
+* [2024.08.15] MiniCPM-V 2.6 现在支持多图像 SFT。有关更多详细信息，请参阅[微调文档](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune)
+* [2024.08.14] MiniCPM-V 2.6 现在可以通过 SWIFT 框架 [微调](https://github.com/modelscope/ms-swift/issues/1613) 了！
+* [2024.08.10] 🚀🚀🚀 llama.cpp [官方仓库](https://github.com/ggerganov/llama.cpp)正式支持 MiniCPM-Llama3-V 2.5 啦！点击[这里](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/tree/main)查看各种大小的 GGUF 版本。
+* [2024.07.19] MiniCPM-Llama3-V 2.5 现已支持[vLLM](#vllm-部署-) ！
 * [2024.06.03] 现在，你可以利用多张低显存显卡（12G/16G）进行GPU串行推理。详情请参见该[文档](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md)配置。
+* [2024.05.28] 💫 我们现在支持 MiniCPM-Llama3-V 2.5 的 LoRA 微调，更多内存使用统计信息可以在[这里](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#model-fine-tuning-memory-usage-statistics)找到。
 * [2024.05.28] 💥 MiniCPM-Llama3-V 2.5 现在在 llama.cpp 和 ollama 中完全支持其功能！**请拉取我们最新的 fork 来使用**：[llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) & [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5)。我们还发布了各种大小的 GGUF 版本，请点击[这里](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/tree/main)查看。请注意，**目前官方仓库尚未支持 MiniCPM-Llama3-V 2.5**，我们也正积极推进将这些功能合并到 llama.cpp & ollama 官方仓库，敬请关注！
 * [2024.05.25] MiniCPM-Llama3-V 2.5 [支持流式输出和自定义系统提示词](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5#usage)了，欢迎试用!
 * [2024.05.24] 我们开源了 MiniCPM-Llama3-V 2.5 [gguf](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf)，支持 [llama.cpp](#llamacpp-部署) 推理！实现端侧 6-8 tokens/s 的流畅解码，欢迎试用！
+* [2024.05.23] 🔍 我们添加了Phi-3-vision-128k-instruct 与 MiniCPM-Llama3-V 2.5的全面对比，包括基准测试评估、多语言能力和推理效率 🌟📊🌍🚀。点击[这里](./docs/compare_with_phi-3_vision.md)查看详细信息。
 * [2024.05.20] 我们开源了 MiniCPM-Llama3-V 2.5，增强了 OCR 能力，支持 30 多种语言，并首次在端侧实现了 GPT-4V 级的多模态能力！我们提供了[高效推理](#手机端部署)和[简易微调](./finetune/readme.md)的支持，欢迎试用！
 * [2024.04.23] 我们增加了MiniCPM-V 2.0对 [vLLM](#vllm-部署-) 的支持，欢迎体验！
 * [2024.04.18] 我们在 HuggingFace Space 新增了 MiniCPM-V 2.0 的 [demo](https://huggingface.co/spaces/openbmb/MiniCPM-V-2)，欢迎体验！
@@ -71,26 +74,945 @@
 
 ## 目录 <!-- omit in toc -->
 
+- [MiniCPM-o 2.6](#minicpm-o-26)
 - [MiniCPM-V 2.6](#minicpm-v-26)
-- [MiniCPM-Llama3-V 2.5](#minicpm-llama3-v-25)
-- [MiniCPM-V 2.0](#minicpm-v-20)
 - [Gradio Demo 🤗](#gradio-demo-)
 - [安装](#安装)
 - [推理](#推理)
   - [模型库](#模型库)
   - [多轮对话](#多轮对话)
-    - [多图理解](#多图理解)
-    - [少样本上下文学习](#少样本上下文学习)
-    - [视频理解](#视频理解)
+    - [多图对话](#多图对话)
+    - [少样本上下文对话](#少样本上下文对话)
+    - [视频对话](#视频对话)
+    - [语音对话](#语音对话)
+      - [Mimick](#mimick)
+      - [可配置声音的语音对话](#可配置声音的语音对话)
+      - [更多语音任务](#更多语音任务)
+    - [多模态流式交互](#多模态流式交互)
   - [多卡推理](#多卡推理)
   - [Mac 推理](#mac-推理)
   - [手机端部署](#手机端部署)
   - [本地WebUI Demo部署](#本地webui-demo部署)
-  - [llama.cpp 部署](#llamacpp-部署)
-  - [ollama 部署](#ollama-部署)
-  - [vLLM 部署 ](#vllm-部署-)
+  - [基于 llama.cpp、ollama、vLLM 的高效推理](#基于-llamacppollamavllm-的高效推理)
 - [微调](#微调)
 - [FAQs](#faqs)
+- [模型局限性](#模型局限性)
+
+## MiniCPM-o 2.6
+
+
+MiniCPM-o 2.6 是 MiniCPM-o 系列的最新、性能最佳模型。该模型基于 SigLip-400M、Whisper-medium-300M、ChatTTS-200M 和 Qwen2.5-7B 构建，共 8B 参数，通过端到端方式训练和推理。相比 MiniCPM-V 2.6，该模型在性能上有了显著提升，并支持了实时语音对话和多模态流式交互的新功能。MiniCPM-o 2.6 的主要特性包括：
+
+
+- 🔥 **领先的视觉能力。**
+MiniCPM-o 2.6 在 OpenCompass 榜单上（综合 8 个主流多模态评测基准）平均得分 70.2，**以 8B 量级的大小在单图理解方面超越了 GPT-4o-202405、Gemini 1.5 Pro 和 Claude 3.5 Sonnet 等主流商用闭源多模态大模型**。此外，它的多图和视频理解表现也**优于 GPT-4V 和 Claude 3.5 Sonnet**，并展现出了优秀的上下文学习能力。
+
+- 🎙 **出色的语音能力。**
+MiniCPM-o 2.6 **支持可配置声音的中英双语实时对话**。MiniCPM-o 2.6 在语音理解任务（如 ASR 和 STT 等）**优于 GPT-4o-realtime**，并在语音对话的语义和声学评估中展现了**开源模型中最高的语音生成性能**。它还支持情绪/语速/风格控制、语音克隆、角色扮演等进阶能力。
+
+- 🎬 **强大的多模态流式交互能力。**
+作为一项新功能，MiniCPM-o 2.6 能够**接受连续的视频和音频流，并和用户进行实时语音交互**。在针对实时视频理解、全模态视音频理解、多模态上下文理解的综合评测基准 StreamingBench 中，MiniCPM-o 2.6 取得开源社区最佳水平，并**超过了 GPT-4o-202408 和 Claude 3.5 Sonnet**。
+
+- 💪 **强大的 OCR 能力及其他功能。**
+MiniCPM-o 2.6 进一步优化了 MiniCPM-V 2.6 的众多视觉理解能力，其可以处理任意长宽比的图像，像素数可达 180 万（如 1344x1344）。在 OCRBench 上取得**25B 以下最佳水平，超过 GPT-4o-202405 等商用闭源模型**。基于最新的 [RLHF-V](https://rlhf-v.github.io/)、[RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) 和 [VisCPM](https://github.com/OpenBMB/VisCPM) 技术，其具备了**可信的多模态行为**，在 MMHal-Bench 上超过了 GPT-4o 和 Claude 3.5，并支持英语、中文、德语、法语、意大利语、韩语等**30多种语言**。
+
+- 🚀 **卓越的效率。**
+除了对个人用户友好的模型大小，MiniCPM-o 2.6 还表现出**最先进的视觉 token 密度**（即每个视觉 token 编码的像素数量）。它**仅需 640 个 token 即可处理 180 万像素图像，比大多数模型少 75%**。这一特性优化了模型的推理速度、首 token 延迟、内存占用和功耗。因此，MiniCPM-o 2.6 可以支持 iPad 等终端设备上的高效**多模态实时流式交互**。
+
+
+- 💫 **易于使用。**
+MiniCPM-o 2.6 可以通过多种方式轻松使用：(1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-omni/examples/llava/README-minicpmo2.6.md) 支持在本地设备上进行高效的 CPU 推理，(2) [int4](https://huggingface.co/openbmb/MiniCPM-V-2_6-int4) 和 [GGUF](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) 格式的量化模型，有 16 种尺寸，(3) [vLLM](#基于-llamacppollamavllm-的高效推理) 支持高吞吐量和内存高效的推理，(4) 通过[LLaMA-Factory](./docs/llamafactory_train.md)框架针对新领域和任务进行微调，(5) 使用 [Gradio](#本地-webui-demo-) 快速设置本地 WebUI 演示，(6) 部署于[国内](https://minicpm-omni-webdemo.modelbest.cn/ 
+) 或 [国外](https://minicpm-omni-webdemo-us.modelbest.cn/)服务器的在线 demo。
+
+
+**模型架构。**
+
+- **端到端全模态架构。** 通过**端到端**的方式连接和训练不同模态的编/解码模块以充分利用丰富的多模态知识。
+- **全模态流式机制。** (1) 我们将不同模态的离线编/解码器改造为适用于**流式输入/输出**的在线模块。 (2) 我们针对大语言模型基座设计了**时分复用的全模态流式信息处理机制**，将平行的不同模态的信息流拆分重组为周期性时间片序列。
+- **可配置的声音方案。** 我们设计了新的多模态系统提示，包含传统文本系统提示词，和**用于指定模型声音的语音系统提示词**。模型可在推理时灵活地通过文字或语音样例控制声音风格，并支持端到端声音克隆和音色创建等高级能力。
+
+<div align="center">
+<img src="./assets/minicpm-o-26-framework.png" , width=80%>
+</div>
+
+<br>
+
+
+
+### 性能评估  <!-- omit in toc -->
+
+<div align="center">
+  <img src="./assets/radar.jpg", width=70%>
+</div>
+
+<details>
+<summary>点击查看视觉理解能力详细评测结果。</summary>
+
+**图像理解能力**
+
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th>Token Density<sup>+</sup></th>
+            <th>OpenCompass</th>
+            <th>OCRBench</th>
+            <th>MathVista mini</th>
+            <th>ChartQA</th>
+            <th>MMVet</th>
+            <th>MMStar</th>
+            <th>MME</th>
+            <th>MMB1.1 test</th>
+            <th>AI2D</th>
+            <th>MMMU val</th>
+            <th>HallusionBench</th>
+            <th>TextVQA val</th>
+            <th>DocVQA test</th>
+            <th>MathVerse mini</th>
+            <th>MathVision</th>
+            <th>MMHal Score</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="19" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o-20240513</td>
+            <td>-</td>
+            <td>1088</td>
+            <td><u>69.9</u></td>
+            <td>736</td>
+            <td>61.3</td>
+            <td>85.7</td>
+            <td><strong>69.1</strong></td>
+            <td>63.9</td>
+            <td>2328.7</td>
+            <td>82.2</td>
+            <td>84.6</td>
+            <td><strong>69.2</strong></td>
+            <td><strong>55.0</strong></td>
+            <td>-</td>
+            <td>92.8</td>
+            <td><strong>50.2</strong></td>
+            <td><strong>30.4</strong></td>
+            <td><u>3.6</u></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Claude3.5-Sonnet</td>
+            <td>-</td>
+            <td>750</td>
+            <td>67.9</td>
+            <td>788</td>
+            <td>61.6</td>
+            <td><strong>90.8</strong></td>
+            <td>66.0</td>
+            <td>62.2</td>
+            <td>1920.0</td>
+            <td>78.5</td>
+            <td>80.2</td>
+            <td><u>65.9</u></td>
+            <td>49.9</td>
+            <td>-</td>
+            <td><strong>95.2</strong></td>
+            <td>-</td>
+            <td>-</td>
+            <td>3.4</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Gemini-1.5-Pro</td>
+            <td>-</td>
+            <td>-</td>
+            <td>64.4</td>
+            <td>754</td>
+            <td>57.7</td>
+            <td>81.3</td>
+            <td>64.0</td>
+            <td>59.1</td>
+            <td>2110.6</td>
+            <td>73.9</td>
+            <td>79.1</td>
+            <td>60.6</td>
+            <td>45.6</td>
+            <td>73.5</td>
+            <td>86.5</td>
+            <td>-</td>
+            <td>19.2</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o-mini-20240718</td>
+            <td>-</td>
+            <td>1088</td>
+            <td>64.1</td>
+            <td>785</td>
+            <td>52.4</td>
+            <td>-</td>
+            <td>66.9</td>
+            <td>54.8</td>
+            <td>2003.4</td>
+            <td>76.0</td>
+            <td>77.8</td>
+            <td>60.0</td>
+            <td>46.1</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>3.3</td>
+        </tr>
+        <tr>
+            <td colspan="19" align="left"><strong>Open Source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Cambrian-34B</td>
+            <td>34B</td>
+            <td><u>1820</u></td>
+            <td>58.3</td>
+            <td>591</td>
+            <td>50.3</td>
+            <td>75.6</td>
+            <td>53.2</td>
+            <td>54.2</td>
+            <td>2049.9</td>
+            <td>77.8</td>
+            <td>79.5</td>
+            <td>50.4</td>
+            <td>41.6</td>
+            <td>76.7</td>
+            <td>75.5</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GLM-4V-9B</td>
+            <td>13B</td>
+            <td>784</td>
+            <td>59.1</td>
+            <td>776</td>
+            <td>51.1</td>
+            <td>-</td>
+            <td>58.0</td>
+            <td>54.8</td>
+            <td>2018.8</td>
+            <td>67.9</td>
+            <td>71.2</td>
+            <td>46.9</td>
+            <td>45.0</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Pixtral-12B</td>
+            <td>12B</td>
+            <td>256</td>
+            <td>61.0</td>
+            <td>685</td>
+            <td>56.9</td>
+            <td>81.8</td>
+            <td>58.5</td>
+            <td>54.5</td>
+            <td>-</td>
+            <td>72.7</td>
+            <td>79.0</td>
+            <td>51.1</td>
+            <td>47.0</td>
+            <td>75.7</td>
+            <td>90.7</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">DeepSeek-VL2-27B (4B)</td>
+            <td>27B</td>
+            <td>672</td>
+            <td>66.4</td>
+            <td>809</td>
+            <td>63.9</td>
+            <td>86.0</td>
+            <td>60.0</td>
+            <td>61.9</td>
+            <td>2253.0</td>
+            <td>81.2</td>
+            <td>83.8</td>
+            <td>54.0</td>
+            <td>45.3</td>
+            <td><u>84.2</u></td>
+            <td>93.3</td>
+            <td>-</td>
+            <td>-</td>
+            <td>3.0</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2-VL-7B</td>
+            <td>8B</td>
+            <td>784</td>
+            <td>67.1</td>
+            <td><u>866</u></td>
+            <td>58.2</td>
+            <td>83.0</td>
+            <td>62.0</td>
+            <td>60.7</td>
+            <td>2326.0</td>
+            <td>81.8</td>
+            <td>83.0</td>
+            <td>54.1</td>
+            <td>50.6</td>
+            <td><strong>84.3</strong></td>
+            <td><u>94.5</u></td>
+            <td>31.9</td>
+            <td>16.3</td>
+            <td>3.2</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-OneVision-72B</td>
+            <td>72B</td>
+            <td>182</td>
+            <td>68.1</td>
+            <td>741</td>
+            <td>67.5</td>
+            <td>83.7</td>
+            <td>60.6</td>
+            <td><strong>65.8</strong></td>
+            <td>2261.0</td>
+            <td><strong>85.0</strong></td>
+            <td><u>85.6</u></td>
+            <td>56.8</td>
+            <td>49.0</td>
+            <td>80.5</td>
+            <td>91.3</td>
+            <td>39.1</td>
+            <td>-</td>
+            <td>3.5</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL-2.5-8B</td>
+            <td>8B</td>
+            <td>706</td>
+            <td>68.3</td>
+            <td>822</td>
+            <td><u>64.4</u></td>
+            <td>84.8</td>
+            <td>62.8</td>
+            <td>62.8</td>
+            <td>2344.0</td>
+            <td><u>83.6</u></td>
+            <td>84.5</td>
+            <td>56.0</td>
+            <td>50.1</td>
+            <td>79.1</td>
+            <td>93.0</td>
+            <td>39.5</td>
+            <td>19.7</td>
+            <td>3.4</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
+            <td>8B</td>
+            <td><strong>2822</strong></td>
+            <td>65.2</td>
+            <td>852*</td>
+            <td>60.6</td>
+            <td>79.4</td>
+            <td>60.0</td>
+            <td>57.5</td>
+            <td><u>2348.4*</u></td>
+            <td>78.0</td>
+            <td>82.1</td>
+            <td>49.8*</td>
+            <td>48.1*</td>
+            <td>80.1</td>
+            <td>90.8</td>
+            <td>25.7</td>
+            <td>18.3</td>
+            <td>3.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o 2.6</td>
+            <td>8B</td>
+            <td><strong>2822</strong></td>
+            <td><strong>70.2</strong></td>
+            <td><strong>897*</strong></td>
+            <td><strong>71.9*</strong></td>
+            <td><u>86.9*</u></td>
+            <td><u>67.5</u></td>
+            <td><u>64.0</u></td>
+            <td><strong>2372.0*</strong></td>
+            <td>80.5</td>
+            <td><strong>85.8</strong></td>
+            <td>50.4*</td>
+            <td><u>51.9</u></td>
+            <td>82.0</td>
+            <td>93.5</td>
+            <td><u>41.4*</u></td>
+            <td><u>23.1*</u></td>
+            <td><strong>3.8</strong></td>
+        </tr>
+    </tbody>
+</table>
+</div>
+* 我们使用思维链提示词来评估这些基准，对于 MME 我们只在 Cognition 任务上使用了思维链。
++ Token Density：每个视觉 token 在最大分辨率下编码的像素数，即最大分辨率下的像素数 / 视觉 token 数。
+
+注意：闭源模型的 Token Density 由 API 收费方式估算得到。
+
+**多图和视频理解能力**
+
+<div align="center">
+ 
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th>BLINK-val</th>
+            <th>Mantis-Eval</th>
+            <th>MIRB</th>
+            <th>Video-MME (wo / w subs)</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="6" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o-20240513</td>
+            <td>-</td>
+            <td><strong>68</strong></td>
+            <td>-</td>
+            <td>-</td>
+            <td><strong>71.9/77.2<strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT4V</td>
+            <td>-</td>
+            <td>54.6</td>
+            <td>62.7</td>
+            <td>53.1</td>
+            <td>59.9/63.3</td>
+        </tr>
+        <tr>
+            <td colspan="6" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT-Interleave 14B</td>
+            <td>14B</td>
+            <td>52.6</td>
+            <td>66.4</td>
+            <td>30.2</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-One-Vision-72B</td>
+            <td>72B</td>
+            <td>55.4</td>
+            <td><strong>77.6</strong></td>
+            <td>-</td>
+            <td><u>66.2/69.5</u></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MANTIS 8B</td>
+            <td>8B</td>
+            <td>49.1</td>
+            <td>59.5</td>
+            <td>34.8</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2-VL-7B</td>
+            <td>8B</td>
+            <td>53.2</td>
+            <td>69.6*</td>
+            <td><strong>67.6*</strong></td>
+            <td>63.3/69.0</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL-2.5-8B</td>
+            <td>8B</td>
+            <td>54.8</td>
+            <td>67.7</td>
+            <td>52.5</td>
+            <td>64.2/66.9</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
+            <td>8B</td>
+            <td>53</td>
+            <td>69.1</td>
+            <td>53.8</td>
+            <td>60.9/63.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o 2.6</td>
+            <td>8B</td>
+            <td><u>56.7</u></td>
+            <td><u>71.9</u></td>
+            <td><u>58.6</u></td>
+            <td>63.9/67.9</td>
+        </tr>
+    </tbody>
+</table>
+
+</div>
+* 正式开源模型权重的评测结果。
+
+</details>
+
+
+<details>
+<summary>点击查看语音理解和生成能力的详细评测结果。</summary>
+
+**语音理解能力**
+
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Task</th>
+            <th>Size</th>
+            <th colspan="3">ASR (zh)</th>
+            <th colspan="3">ASR (en)</th>
+            <th colspan="2">AST</th>
+            <th>Emotion</th>
+        </tr>
+        <tr>
+            <th align="left">Metric</th>
+            <td></td>
+            <th colspan="3">CER↓</th>
+            <th colspan="3">WER↓</th>
+            <th colspan="2">BLEU↑</th>
+            <th>ACC↑</th>
+        </tr>
+        <tr>
+            <th align="left">Dataset</th>
+            <td></td>
+            <th>AISHELL-1</th>
+            <th>Fleurs zh</th>
+            <th>WenetSpeech test-net</th>
+            <th>LibriSpeech test-clean</th>
+            <th>GigaSpeech</th>
+            <th>TED-LIUM</th>
+            <th>CoVoST en2zh</th>
+            <th>CoVoST zh2en</th>
+            <th>MELD emotion</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="11" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o-Realtime</td>
+            <td>-</td>
+            <td>7.3*</td>
+            <td><u>5.4*</u></td>
+            <td>28.9*</td>
+            <td>2.6*</td>
+            <td>12.9*</td>
+            <td>4.8*</td>
+            <td>37.1*</td>
+            <td>15.7*</td>
+            <td>33.2*</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Gemini-1.5-Pro</td>
+            <td>-</td>
+            <td>4.5*</td>
+            <td>5.9*</td>
+            <td>14.3*</td>
+            <td>2.9*</td>
+            <td>10.6*</td>
+            <td><strong>3.0*</strong></td>
+            <td><u>47.3*</u></td>
+            <td>22.6*</td>
+            <td>48.4*</td>
+        </tr>
+        <tr>
+            <td colspan="11" align="left"><strong>Open-Source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2-Audio-Base</td>
+            <td>8B</td>
+            <td>-</td>
+            <td>7.5</td>
+            <td>-</td>
+            <td><strong>1.6</strong></td>
+            <td>-</td>
+            <td>-</td>
+            <td>45.2</td>
+            <td><u>24.4</u></td>
+            <td><strong>55.3</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2-Audio-Instruction</td>
+            <td>8B</td>
+            <td>2.6*</td>
+            <td>6.9*</td>
+            <td><u>10.3*</u></td>
+            <td>3.1*</td>
+            <td><u>9.7</u>*</td>
+            <td>5.9*</td>
+            <td>39.5*</td>
+            <td>22.9*</td>
+            <td>17.4*</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GLM-4-Voice-Base</td>
+            <td>9B</td>
+            <td><u>2.5</u></td>
+            <td>-</td>
+            <td>-</td>
+            <td>2.8</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o 2.6</td>
+            <td>8B</td>
+            <td><strong>1.6</strong></td>
+            <td><strong>4.4</strong></td>
+            <td><strong>6.9</strong></td>
+            <td><u>1.7</u></td>
+            <td><strong>8.7</strong></td>
+            <td><strong>3.0</strong></td>
+            <td><strong>48.2</strong></td>
+            <td><strong>27.2</strong></td>
+            <td><u>52.4</u></td>
+        </tr>
+    </tbody>
+</table>
+</div>
+* 正式开源模型权重的评测结果。<br><br>
+
+**语音生成能力。**
+
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Task</th>
+            <th>Size</th>
+            <th colspan="9">SpeechQA</th>
+        </tr>
+        <tr>
+            <th align="left">Metric</th>
+            <th></th>
+            <th colspan="3">ACC↑</th>
+            <th>G-Eval (10 point)↑</th>
+            <th>Semantic ELO score↑</th>
+            <th>Acoustic ELO score↑</th>
+            <th>Overall ELO score↑</th>
+            <th>UTMOS↑</th>
+            <th>ASR-WER↓</th>
+        </tr>
+        <tr>
+            <th align="left">Dataset</th>
+            <th></th>
+            <th>Speech Llama Q.</th>
+            <th>Speech Web Q.</th>
+            <th>Speech Trivia QA</th>
+            <th>Speech AlpacaEval</th>
+            <th colspan="5">AudioArena</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="11" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o-Realtime</td>
+            <td></td>
+            <td><strong>71.7</strong></td>
+            <td><strong>51.6</strong></td>
+            <td><strong>69.7</strong></td>
+            <td><strong>7.4</strong></td>
+            <td><strong>1157</strong></td>
+            <td><strong>1203</strong></td>
+            <td><strong>1200</strong></td>
+            <td><strong>4.2</strong></td>
+            <td><strong>2.3</strong></td>
+        </tr>
+        <tr>
+            <td colspan="11" align="left"><strong>Open-Source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GLM-4-Voice</td>
+            <td>9B</td>
+            <td>50.0</td>
+            <td>32.0</td>
+            <td>36.4</td>
+            <td><u>5.1</u></td>
+            <td>999</td>
+            <td>1147</td>
+            <td>1035</td>
+            <td><u>4.1</u></td>
+            <td><u>11.7</u></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Llama-Omni</td>
+            <td>8B</td>
+            <td>45.3</td>
+            <td>22.9</td>
+            <td>10.7</td>
+            <td>3.9</td>
+            <td>960</td>
+            <td>878</td>
+            <td>897</td>
+            <td>3.2</td>
+            <td>24.3</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Moshi</td>
+            <td>7B</td>
+            <td>43.7</td>
+            <td>23.8</td>
+            <td>16.7</td>
+            <td>2.4</td>
+            <td>871</td>
+            <td>808</td>
+            <td>875</td>
+            <td>2.8</td>
+            <td>8.2</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Mini-Omni</td>
+            <td>1B</td>
+            <td>22.0</td>
+            <td>12.8</td>
+            <td>6.9</td>
+            <td>2.5</td>
+            <td>926</td>
+            <td>803</td>
+            <td>865</td>
+            <td>3.4</td>
+            <td>10.0</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o 2.6</td>
+            <td>8B</td>
+            <td><u>61.0</u></td>
+            <td><u>40.0</u></td>
+            <td><u>40.2</u></td>
+            <td><u>5.1</u></td>
+            <td><u>1088</u></td>
+            <td><u>1163</u></td>
+            <td><u>1131</u></td>
+            <td><strong>4.2</strong></td>
+            <td>9.8</td>
+        </tr>
+    </tbody>
+</table>
+</div>
+所有的结果都基于 <a href="https://github.com/OpenBMB/UltraEval-Audio" target="_blank">AudioEvals</a>。<br><br>
+
+**端到端声音克隆能力。**
+
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Task</th>
+            <th colspan="2">TTS</th>
+        </tr>
+        <tr>
+            <th align="left">Metric</th>
+            <th>SIMO↑</th>
+            <th>SIMO↑</th>
+        </tr>
+        <tr>
+            <th align="left">Dataset</th>
+            <th>Seed-TTS test-zh</th>
+            <th>Seed-TTS test-en</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td nowrap="nowrap" align="left">F5-TTS</td>
+            <td><strong>76</strong></td>
+            <td><strong>67</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">CosyVoice</td>
+            <td><u>75</u></td>
+            <td><u>64</u></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">FireRedTTS</td>
+            <td>63</td>
+            <td>46</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o 2.6</td>
+            <td>57</td>
+            <td>47</td>
+        </tr>
+    </tbody>
+</table>
+</div>
+
+</details>
+
+<details>
+<summary>点击查看多模态流式交互能力评测详细结果。</summary>
+  
+**多模态流式交互能力**: StreamingBench 分数
+
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th>Real-Time Video Understanding</th>
+            <th>Omni-Source Understanding</th>
+            <th>Contextual Understanding</th>
+            <th>Overall</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="7" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Gemini 1.5 Pro</td>
+            <td>-</td>
+            <td><u>77.4</u></td>
+            <td><strong>67.8</strong></td>
+            <td><strong>51.1</strong></td>
+            <td><strong>70.3</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o-202408</td>
+            <td>-</td>
+            <td>74.5</td>
+            <td>51.0</td>
+            <td><u>48.0</u></td>
+            <td>64.1</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Claude-3.5-Sonnet</td>
+            <td>-</td>
+            <td>74.0</td>
+            <td>41.4</td>
+            <td>37.8</td>
+            <td>59.7</td>
+        </tr>
+        <tr>
+            <td colspan="9" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">VILA-1.5</td>
+            <td>8B</td>
+            <td>61.5</td>
+            <td>37.5</td>
+            <td>26.7</td>
+            <td>49.5</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LongVA</td>
+            <td>7B</td>
+            <td>63.1</td>
+            <td>35.9</td>
+            <td>30.2</td>
+            <td>50.7</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-Next-Video-34B</td>
+            <td>34B</td>
+            <td>69.8</td>
+            <td>41.7</td>
+            <td>34.3</td>
+            <td>56.7</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen2-VL-7B</td>
+            <td>8B</td>
+            <td>71.2</td>
+            <td>40.7</td>
+            <td>33.1</td>
+            <td>57.0</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2-8B</td>
+            <td>8B</td>
+            <td>70.1</td>
+            <td>42.7</td>
+            <td>34.1</td>
+            <td>57.0</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">VITA-1.5</td>
+            <td>8B</td>
+            <td>70.9</td>
+            <td>40.8</td>
+            <td>35.8</td>
+            <td>57.4</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-OneVision-7B</td>
+            <td>8B</td>
+            <td>74.3</td>
+            <td>40.8</td>
+            <td>31.0</td>
+            <td>58.4</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternLM-XC2.5-OL-7B</td>
+            <td>8B</td>
+            <td>75.4</td>
+            <td>46.2</td>
+            <td>33.6</td>
+            <td>60.8</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
+            <td>8B</td>
+            <td>72.4</td>
+            <td>40.2</td>
+            <td>33.4</td>
+            <td>57.7</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-o 2.6</td>
+            <td>8B</td>
+            <td><strong>79.9</strong></td>
+            <td><u>53.4</u></td>
+            <td>38.5</td>
+            <td><u>66.0</u></td>
+        </tr>
+    </tbody>
+</table>
+
+</details>
+
+
+### 典型示例 <!-- omit in toc -->
+
+以下为 MiniCPM-o 2.6 的 iPad Pro 实机演示和 web demo 演示样例：
+
+
+<div align="center">
+  <a href="https://youtu.be/JFJg9KZ_iZk"><img src="./assets/o-2dot6-demo-video-preview.png", width=70%></a>
+</div>
+
+<div style="display: flex; flex-direction: column; align-items: center;">
+  <img src="assets/minicpmo2_6/minicpmo2_6_math_intersect.png" alt="math" style="margin-bottom: 5px;">
+  <img src="assets/minicpmo2_6/minicpmo2_6_diagram_train_NN.png" alt="diagram" style="margin-bottom: 5px;">
+  <img src="assets/minicpmo2_6/minicpmo2_6_multi-image_bike.png" alt="bike" style="margin-bottom: 5px;">
+</div>
+
+
+<details>
+<summary>Click to view more details of MiniCPM-V 2.6</summary>
+
 
 ## MiniCPM-V 2.6
 
@@ -372,7 +1294,7 @@
             <td>42.4</td>
             <td>10.3</td>
         </tr>
-        <tr style="background-color: #e6f2ff;">
+        <tr>
             <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
             <td>8B</td>
             <td><strong>2822</strong></td>
@@ -496,7 +1418,7 @@
             <td>34.4*</td>
             <td><strong>56.9*</strong></td>
         </tr>
-        <tr style="background-color: #e6f2ff;">
+        <tr>
             <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
             <td>8B</td>
             <td><strong>69.1</strong></td>
@@ -643,7 +1565,7 @@
             <td>2.64</td>
             <td>3.28</td>
         </tr>
-        <tr style="background-color: #e6f2ff;">
+        <tr>
             <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
             <td>8B</td>
             <td><strong>60.9</strong></td>
@@ -785,7 +1707,7 @@
             <td><strong>70.9</strong></td>
             <td>54.1</td>
         </tr>
-        <tr style="background-color: #e6f2ff;">
+        <tr>
             <td align="left" nowrap="nowrap" rowspan="3">MiniCPM-V 2.6<sup>+</sup></td>
             <td rowspan="3">8B</td>
             <td>0</td>
@@ -794,14 +1716,14 @@
             <td>45.4</td>
             <td>23.9</td>
         </tr>
-        <tr style="background-color: #e6f2ff;">
+        <tr>
             <td>4</td>
             <td>63.6</td>
             <td>60.5</td>
             <td>65.5</td>
             <td>50.1</td>
         </tr>
-        <tr style="background-color: #e6f2ff;">
+        <tr>
             <td>8</td>
             <td><strong>64.6</strong></td>
             <td><strong>63.4</strong></td>
@@ -852,390 +1774,16 @@
     </p>
 </table>
 
-## MiniCPM-Llama3-V 2.5
-
-<details>
-<summary>查看 MiniCPM-Llama3-V 2.5 的详细信息</summary>
-
-**MiniCPM-Llama3-V 2.5** 是 MiniCPM-V 系列的最新版本模型，基于 SigLip-400M 和 Llama3-8B-Instruct 构建，共 8B 参数量，相较于 MiniCPM-V 2.0 性能取得较大幅度提升。MiniCPM-Llama3-V 2.5 值得关注的特点包括：
-
-- 🔥 **领先的性能。**
-  MiniCPM-Llama3-V 2.5 在综合了 11 个主流多模态大模型评测基准的 OpenCompass 榜单上平均得分 65.1，**以 8B 量级的大小超过了 GPT-4V-1106、Gemini Pro、Claude 3、Qwen-VL-Max 等主流商用闭源多模态大模型**，大幅超越基于Llama 3构建的其他多模态大模型。
-
-- 💪 **优秀的 OCR 能力。**
-  MiniCPM-Llama3-V 2.5 可接受 180 万像素的任意宽高比图像输入，**OCRBench 得分达到 725，超越 GPT-4o、GPT-4V、Gemini Pro、Qwen-VL-Max 等商用闭源模型**，达到最佳水平。基于近期用户反馈建议，MiniCPM-Llama3-V 2.5 增强了全文 OCR 信息提取、表格图像转 markdown 等高频实用能力，并且进一步加强了指令跟随、复杂推理能力，带来更好的多模态交互体感。
-
-- 🏆 **可信行为。** 
-  借助最新的 [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) 对齐技术（[RLHF-V](https://github.com/RLHF-V/) [CVPR'24]系列的最新技术），MiniCPM-Llama3-V 2.5 具有更加可信的多模态行为，在 Object HalBench 的幻觉率降低到了 **10.3%**，显著低于 GPT-4V-1106 (13.6%)，达到开源社区最佳水平。[数据集已发布](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset)。
-
-- 🌏 **多语言支持。**
-  得益于 Llama 3 强大的多语言能力和 VisCPM 的跨语言泛化技术，MiniCPM-Llama3-V 2.5 在中英双语多模态能力的基础上，仅通过少量翻译的多模态数据的指令微调，高效泛化支持了**德语、法语、西班牙语、意大利语、韩语等 30+ 种语言**的多模态能力，并表现出了良好的多语言多模态对话性能。[查看所有支持语言](./assets/minicpm-llama-v-2-5_languages.md)
-
-- 🚀 **高效部署。**
-  MiniCPM-Llama3-V 2.5 较为系统地通过**模型量化、CPU、NPU、编译优化**等高效加速技术，实现高效的终端设备部署。对于高通芯片的移动手机，我们首次将 NPU 加速框架 QNN 整合进了 llama.cpp。经过系统优化后，MiniCPM-Llama3-V 2.5 实现了多模态大模型端侧**语言解码速度 3 倍加速**、**图像编码 150 倍加速**的巨大提升。
-
-- 💫 **易于使用。**
-  MiniCPM-Llama3-V 2.5 可以通过多种方式轻松使用：（1）[llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) 和 [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) 支持在本地设备上进行高效的 CPU 推理；（2）提供 16 种尺寸的 [GGUF](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) 格式量化模型；（3）仅需 2 张 V100 GPU 即可进行高效的 [LoRA](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#lora-finetuning) 微调；（	4）支持[流式输出](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5#usage)；（5）快速搭建 [Gradio](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_2.5.py) 和 [Streamlit](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_streamlit-2_5.py) 本地 WebUI demo；（	6.）[HuggingFace Spaces](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5) 交互式 demo。
-
-### 性能评估 <!-- omit in toc -->
-
-<div align="center">
-    <img src="assets/MiniCPM-Llama3-V-2.5-peformance.png" width="66%" />
-</div>
-<details>
-<summary>TextVQA, DocVQA, OCRBench, OpenCompass MultiModal Avg Score, MME, MMBench, MMMU, MathVista, LLaVA Bench, RealWorld QA, Object HalBench上的详细评测结果。 </summary>
-<div align="center">
-
-<table style="margin: 0px auto;">
-    <thead>
-        <tr>
-            <th align="left">Model</th>
-            <th>Size</th>
-            <th>OCRBench</th>
-            <th>TextVQA val</th>
-            <th>DocVQA test</th>
-            <th>Open-Compass</th>
-            <th>MME</th>
-            <th>MMB test (en)</th>
-            <th>MMB test (cn)</th>
-            <th>MMMU val</th>
-            <th>Math-Vista</th>
-            <th>LLaVA Bench</th>
-            <th>RealWorld QA</th>
-            <th>Object HalBench</th>
-        </tr>
-    </thead>
-            <tbody align="center">
-        <tr>
-            <td colspan="14" align="left"><strong>Proprietary</strong></td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Gemini Pro</td>
-            <td>-</td>
-            <td>680</td>
-            <td>74.6</td>
-            <td>88.1</td>
-            <td>62.9</td>
-            <td>2148.9</td>
-            <td>73.6</td>
-            <td>74.3</td>
-            <td>48.9</td>
-            <td>45.8</td>
-            <td>79.9</td>
-            <td>60.4</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">GPT-4V (2023.11.06)</td>
-            <td>-</td>
-            <td>645</td>
-            <td>78.0</td>
-            <td>88.4</td>
-            <td>63.5</td>
-            <td>1771.5</td>
-            <td>77.0</td>
-            <td>74.4</td>
-            <td>53.8</td>
-            <td>47.8</td>
-            <td>93.1</td>
-            <td>63.0</td>
-            <td>86.4</td>
-        </tr>
-        <tr>
-            <td colspan="14" align="left"><strong>Open-source</strong></td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Mini-Gemini</td>
-            <td>2.2B</td>
-            <td>-</td>
-            <td>56.2</td>
-            <td>34.2*</td>
-            <td>-</td>
-            <td>1653.0</td>
-            <td>-</td>
-            <td>-</td>
-            <td>31.7</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Qwen-VL-Chat</td>
-            <td>9.6B</td>
-            <td>488</td>
-            <td>61.5</td>
-            <td>62.6</td>
-            <td>51.6</td>
-            <td>1860.0</td>
-            <td>61.8</td>
-            <td>56.3</td>
-            <td>37.0</td>
-            <td>33.8</td>
-            <td>67.7</td>
-            <td>49.3</td>
-            <td>56.2</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">DeepSeek-VL-7B</td>
-            <td>7.3B</td>
-            <td>435</td>
-            <td>64.7*</td>
-            <td>47.0*</td>
-            <td>54.6</td>
-            <td>1765.4</td>
-            <td>73.8</td>
-            <td>71.4</td>
-            <td>38.3</td>
-            <td>36.8</td>
-            <td>77.8</td>
-            <td>54.2</td>
-            <td>-</td>
-        </tr>        
-        <tr>
-            <td nowrap="nowrap" align="left">Yi-VL-34B</td>
-            <td>34B</td>
-            <td>290</td>
-            <td>43.4*</td>
-            <td>16.9*</td>
-            <td>52.2</td>
-            <td><strong>2050.2</strong></td>
-            <td>72.4</td>
-            <td>70.7</td>
-            <td>45.1</td>
-            <td>30.7</td>
-            <td>62.3</td>
-            <td>54.8</td>
-            <td>79.3</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">CogVLM-Chat</td>
-            <td>17.4B</td>
-            <td>590</td>
-            <td>70.4</td>
-            <td>33.3*</td>
-            <td>54.2</td>
-            <td>1736.6</td>
-            <td>65.8</td>
-            <td>55.9</td>
-            <td>37.3</td>
-            <td>34.7</td>
-            <td>73.9</td>
-            <td>60.3</td>
-            <td>73.6</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">TextMonkey</td>
-            <td>9.7B</td>
-            <td>558</td>
-            <td>64.3</td>
-            <td>66.7</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-        </tr>
-        <tr>
-          <td nowrap="nowrap" align="left">Idefics2</td>
-          <td>8.0B</td>
-          <td>-</td>
-          <td>73.0</td>
-          <td>74.0</td>
-          <td>57.2</td>
-          <td>1847.6</td>
-          <td>75.7</td>
-          <td>68.6</td>
-          <td>45.2</td>
-          <td>52.2</td>
-          <td>49.1</td>
-          <td>60.7</td>
-          <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Bunny-LLama-3-8B</td>
-            <td>8.4B</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>54.3</td>
-            <td>1920.3</td>
-            <td>77.0</td>
-            <td>73.9</td>
-            <td>41.3</td>
-            <td>31.5</td>
-            <td>61.2</td>
-            <td>58.8</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">LLaVA-NeXT Llama-3-8B</td>
-            <td>8.4B</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>1971.5</td>
-            <td>-</td>
-            <td>-</td>
-            <td>41.7</td>
-            <td>-</td>
-            <td>80.1</td>
-            <td>60.0</td>
-            <td>-</td>
-        </tr>
-        <tr>
-            <td nowrap="nowrap" align="left">Phi-3-vision-128k-instruct</td>
-            <td>4.2B</td>
-            <td>639*</td>
-            <td>70.9</td>
-            <td>-</td>
-            <td>-</td>
-            <td>1537.5*</td>
-            <td>-</td>
-            <td>-</td>
-            <td>40.4</td>
-            <td>44.5</td>
-            <td>64.2*</td>
-            <td>58.8*</td>
-            <td>-</td>
-        </tr>
-        <tr style="background-color: #e6f2ff;">
-            <td nowrap="nowrap" align="left">MiniCPM-V 1.0</td>
-            <td>2.8B</td>
-            <td>366</td>
-            <td>60.6</td>
-            <td>38.2</td>
-            <td>47.5</td>
-            <td>1650.2</td>
-            <td>64.1</td>
-            <td>62.6</td>
-            <td>38.3</td>
-            <td>28.9</td>
-            <td>51.3</td>
-            <td>51.2</td>
-            <td>78.4</td>
-        </tr>
-        <tr style="background-color: #e6f2ff;">
-            <td nowrap="nowrap" align="left">MiniCPM-V 2.0</td>
-            <td>2.8B</td>
-            <td>605</td>
-            <td>74.1</td>
-            <td>71.9</td>
-            <td>54.5</td>
-            <td>1808.6</td>
-            <td>69.1</td>
-            <td>66.5</td>
-            <td>38.2</td>
-            <td>38.7</td>
-            <td>69.2</td>
-            <td>55.8</td>
-            <td>85.5</td>
-        </tr>
-        <tr style="background-color: #e6f2ff;">
-            <td nowrap="nowrap" align="left">MiniCPM-Llama3-V 2.5</td>
-            <td>8.5B</td>
-            <td><strong>725</strong></td>
-            <td><strong>76.6</strong></td>
-            <td><strong>84.8</strong></td>
-            <td><strong>65.1</strong></td>
-            <td>2024.6</td>
-            <td><strong>77.2</strong></td>
-            <td><strong>74.2</strong></td>
-            <td><strong>45.8</strong></td>
-            <td><strong>54.3</strong></td>
-            <td><strong>86.7</strong></td>
-            <td><strong>63.5</strong></td>
-            <td><strong>89.7</strong></td>
-        </tr>
-    </tbody>
-</table>
-
-</div>
-* 正式开源模型权重的评测结果。
 </details>
 
-<div align="center">
-    <img src="assets/llavabench_compare_3.png" width="80%" />
-    <br>
-    多语言LLaVA Bench评测结果
-</div>
-
-
-### 典型示例 <!-- omit in toc -->
-<table align="center">
-    <p align="center">
-      <img src="assets/minicpmv-llama3-v2.5/cases_all.png" width=95%/>
-    </p>
-</table>
-
-
-</details>
-
-## MiniCPM-V 2.0
-
-<details>
-<summary>查看 MiniCPM-V 2.0 的详细信息</summary>
-
-**MiniCPM-V 2.0**可以高效部署到终端设备。该模型基于 SigLip-400M 和 [MiniCPM-2.4B](https://github.com/OpenBMB/MiniCPM/)构建，通过perceiver resampler连接。其特点包括：
-
-- 🔥 **优秀的性能。**
-
-  MiniCPM-V 2.0 在多个测试基准（如 OCRBench, TextVQA, MME, MMB, MathVista 等）中实现了 7B 以下模型的**最佳性能**。**在综合了 11 个主流多模态大模型评测基准的 OpenCompass 榜单上超过了 Qwen-VL-Chat 9.6B、CogVLM-Chat 17.4B 和 Yi-VL 34B 等更大参数规模的模型**。MiniCPM-V 2.0 还展现出**领先的 OCR 能力**，在场景文字识别能力上**接近 Gemini Pro**，OCRBench 得分达到**开源模型第一**。
-  
-
-- 🏆 **可信行为。** 
-
-  多模态大模型深受幻觉问题困扰，模型经常生成和图像中的事实不符的文本。MiniCPM-V 2.0 是 **第一个通过多模态 RLHF 对齐的端侧多模态大模型**（借助 [RLHF-V](https://rlhf-v.github.io/) [CVPR'24] 系列技术）。该模型在 [Object HalBench](https://arxiv.org/abs/2312.00849) 达到**和 GPT-4V 相仿**的性能。
-
-
-- 🌟 **高清图像高效编码。**
-
-  MiniCPM-V 2.0 可以接受 **180 万像素的任意长宽比图像输入**（基于最新的[LLaVA-UHD](https://arxiv.org/pdf/2403.11703.pdf) 技术），这使得模型可以感知到小物体、密集文字等更加细粒度的视觉信息。 
-
-
-- ⚡️ **高效部署。**
-
-  MiniCPM-V 2.0 可以**高效部署在大多数消费级显卡和个人电脑上**，包括**移动手机等终端设备**。在视觉编码方面，我们通过perceiver resampler将图像表示压缩为更少的 token。这使得 MiniCPM-V 2.0 即便是**面对高分辨率图像，也能占用较低的存储并展现优秀的推理速度**。
-
-- 🙌 **双语支持。**
-
-  MiniCPM-V 2.0 **提供领先的中英双语多模态能力支持**。
-  该能力通过 [VisCPM](https://arxiv.org/abs/2308.12038) [ICLR'24] 论文中提出的多模态能力的跨语言泛化技术实现。
-
-### 典型示例 <!-- omit in toc -->
-
-
-<table align="center">
-    <p align="center">
-      <img src="assets/minicpmv2-cases_2.png" width=95%/>
-    </p>
-</table>
-
-我们将 MiniCPM-V 2.0 部署在小米 14 Pro 上，并录制了以下演示视频，未经任何视频剪辑。
-
-<table align="center">
-    <p align="center">
-      <img src="assets/gif_cases/station.gif" width=36%/>
-      <img src="assets/gif_cases/london_car.gif" width=36%/>
-    </p>
-</table>
-
-</details>
-
-
-<a id='legacy-models'></a>
-
 ## 历史版本模型  <!-- omit in toc -->
 
 
 | 模型                | 介绍信息和使用教程       |
 |:----------------------|:-------------------:|
-| MiniCPM-V 1.0  | [文档](./minicpm_v1.md)   | 
+| MiniCPM-Llama3-V 2.5  | [文档](./docs/minicpm_llama3_v2dot5.md)   | 
+| MiniCPM-V 2.0  | [文档](./docs/minicpm_v2.md)   | 
+| MiniCPM-V 1.0  | [文档](./docs/minicpm_v1.md)   | 
 | OmniLMM-12B  | [文档](./omnilmm.md)   |  
 
 
@@ -1291,23 +1839,21 @@ pip install -r requirements.txt
 
 | 模型           | 设备 | 资源     | &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; 简介       | 下载链接 |
 |:--------------|:-:|:----------:|:-------------------|:---------------:|
-| MiniCPM-V 2.6| GPU | 17 GB  | 最新版本，提供最佳的端侧单图、多图、视频理解能力。   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6) |
+| MiniCPM-o 2.6| GPU | 18 GB  | 最新版本，提供端侧 GPT-4o 级的视觉、语音、多模态流式交互能力。   |  [🤗](https://huggingface.co/openbmb/MiniCPM-o-2_6) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6) |
+| MiniCPM-o 2.6 gguf | CPU | 8 GB  | gguf 版本，更低的内存占用和更高的推理效率。   |  [🤗](https://huggingface.co/openbmb/MiniCPM-o-2_6-gguf) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6-gguf) |
+| MiniCPM-o 2.6 int4 | GPU | 9 GB  | int4量化版，更低显存占用。   |  [🤗](https://huggingface.co/openbmb/MiniCPM-o-2_6-int4) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6-int4) |
+| MiniCPM-V 2.6| GPU | 17 GB  | 提供出色的端侧单图、多图、视频理解能力。   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6) |
 | MiniCPM-V 2.6 gguf | CPU | 6 GB  | gguf 版本，更低的内存占用和更高的推理效率。   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-gguf) |
 | MiniCPM-V 2.6 int4 | GPU | 7 GB  | int4量化版，更低显存占用。   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2_6-int4) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-int4) |
-| MiniCPM-Llama3-V 2.5| GPU | 19 GB  | 提供出色的端侧多模态理解能力。   |  [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5) |
-| MiniCPM-Llama3-V 2.5 gguf | CPU  | 6 GB | gguf 版本，更低的内存占用和更高的推理效率。  |  [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-gguf) |
-| MiniCPM-Llama3-V 2.5 int4 | GPU | 8 GB | int4量化版，更低显存占用。   |  [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-int4/) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-int4) |
-| MiniCPM-V 2.0 | GPU | 8 GB  | 轻量级版本，平衡计算开销和多模态理解能力。   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2) |
-| MiniCPM-V 1.0 | GPU | 7 GB | 最轻量版本， 提供最快的推理速度。    |   [🤗](https://huggingface.co/openbmb/MiniCPM-V) &nbsp;&nbsp; [<img src="./assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V) |
 
 更多[历史版本模型](#legacy-models)
 
+
 ### 多轮对话
 
-请参考以下代码进行推理。
 
 <div align="center">
-<img src="assets/airplane.jpeg" width="500px">
+<img src="assets/minicpmo2_6/show_demo.jpg" width="500px">
 </div>
 
 
@@ -1316,60 +1862,57 @@ import torch
 from PIL import Image
 from transformers import AutoModel, AutoTokenizer
 
-torch.manual_seed(0)
+torch.manual_seed(100)
 
-model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
+model = AutoModel.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True,
     attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
 model = model.eval().cuda()
-tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True)
 
-image = Image.open('./assets/airplane.jpeg').convert('RGB')
+image = Image.open('./assets/minicpmo2_6/show_demo.jpg').convert('RGB')
 
 # First round chat 
-question = "Tell me the model of this aircraft."
+question = "What is the landform in the picture?"
 msgs = [{'role': 'user', 'content': [image, question]}]
 
 answer = model.chat(
-    image=None,
     msgs=msgs,
     tokenizer=tokenizer
 )
 print(answer)
 
-# Second round chat 
-# pass history context of multi-turn conversation
+# Second round chat, pass history context of multi-turn conversation
 msgs.append({"role": "assistant", "content": [answer]})
-msgs.append({"role": "user", "content": ["Introduce something about Airbus A380."]})
+msgs.append({"role": "user", "content": ["What should I pay attention to when traveling here?"]})
 
 answer = model.chat(
-    image=None,
     msgs=msgs,
     tokenizer=tokenizer
 )
 print(answer)
 ```
 
-可以得到以下输出:
+你可以得到如下推理结果：
 
 ```
-"The aircraft in the image is an Airbus A380, which can be identified by its large size, double-deck structure, and the distinctive shape of its wings and engines. The A380 is a wide-body aircraft known for being the world's largest passenger airliner, designed for long-haul flights. It has four engines, which are characteristic of large commercial aircraft. The registration number on the aircraft can also provide specific information about the model if looked up in an aviation database."
+"The landform in the picture is a mountain range. The mountains appear to be karst formations, characterized by their steep, rugged peaks and smooth, rounded shapes. These types of mountains are often found in regions with limestone bedrock and are shaped by processes such as erosion and weathering. The reflection of the mountains in the water adds to the scenic beauty of the landscape."
 
-"The Airbus A380 is a double-deck, wide-body, four-engine jet airliner made by Airbus. It is the world's largest passenger airliner and is known for its long-haul capabilities. The aircraft was developed to improve efficiency and comfort for passengers traveling over long distances. It has two full-length passenger decks, which can accommodate more passengers than a typical single-aisle airplane. The A380 has been operated by airlines such as Lufthansa, Singapore Airlines, and Emirates, among others. It is widely recognized for its unique design and significant impact on the aviation industry."
+"When traveling to this scenic location, it's important to pay attention to the weather conditions, as the area appears to be prone to fog and mist, especially during sunrise or sunset. Additionally, ensure you have proper footwear for navigating the potentially slippery terrain around the water. Lastly, respect the natural environment by not disturbing the local flora and fauna."
 ```
 
-#### 多图理解
+#### 多图对话
 <details>
-<summary> 点击查看使用 MiniCPM-V 2.6 进行多图理解的Python示例 </summary>
+<summary> 点击查看 MiniCPM-o 2.6 多图输入的 Python 代码。 </summary>
   
 ```python
 import torch
 from PIL import Image
 from transformers import AutoModel, AutoTokenizer
 
-model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
+model = AutoModel.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True,
     attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
 model = model.eval().cuda()
-tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True)
 
 image1 = Image.open('image1.jpg').convert('RGB')
 image2 = Image.open('image2.jpg').convert('RGB')
@@ -1378,7 +1921,6 @@ question = 'Compare image 1 and image 2, tell me about the differences between i
 msgs = [{'role': 'user', 'content': [image1, image2, question]}]
 
 answer = model.chat(
-    image=None,
     msgs=msgs,
     tokenizer=tokenizer
 )
@@ -1386,20 +1928,19 @@ print(answer)
 ```
 </details>
 
-#### 少样本上下文学习
-
+#### 少样本上下文对话
 <details>
-<summary> 点击查看使用 MiniCPM-V 2.6 进行few-shot推理的Python示例 </summary>
+<summary> 点击查看 MiniCPM-o 2.6 少样本上下文对话的 Python 代码。 </summary>
 
 ```python
 import torch
 from PIL import Image
 from transformers import AutoModel, AutoTokenizer
 
-model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
+model = AutoModel.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True,
     attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
 model = model.eval().cuda()
-tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True)
 
 question = "production date" 
 image1 = Image.open('example1.jpg').convert('RGB')
@@ -1415,7 +1956,6 @@ msgs = [
 ]
 
 answer = model.chat(
-    image=None,
     msgs=msgs,
     tokenizer=tokenizer
 )
@@ -1423,9 +1963,9 @@ print(answer)
 ```
 </details>
 
-#### 视频理解
+#### 视频对话
 <details>
-<summary> 点击查看使用 MiniCPM-V 2.6 进行视频理解的Python示例 </summary>
+<summary> 点击查看 MiniCPM-o 2.6 视频输入的 Python 代码。 </summary>
 
 ```python
 import torch
@@ -1433,10 +1973,10 @@ from PIL import Image
 from transformers import AutoModel, AutoTokenizer
 from decord import VideoReader, cpu    # pip install decord
 
-model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
+model = AutoModel.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True,
     attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
 model = model.eval().cuda()
-tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True)
 
 MAX_NUM_FRAMES=64 # if cuda OOM set a smaller number
 
@@ -1466,10 +2006,9 @@ msgs = [
 # Set decode params for video
 params = {}
 params["use_image_id"] = False
-params["max_slice_nums"] = 2 # 如果cuda OOM且视频分辨率大于448*448可设为1
+params["max_slice_nums"] = 2 # use 1 if cuda OOM and video resolution > 448*448
 
 answer = model.chat(
-    image=None,
     msgs=msgs,
     tokenizer=tokenizer,
     **params
@@ -1479,6 +2018,295 @@ print(answer)
 </details>
 
 
+#### 语音对话
+<details> <summary> 初始化模型 </summary>
+
+```python
+import torch
+import librosa
+from transformers import AutoModel, AutoTokenizer
+
+model = AutoModel.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True,
+    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
+model = model.eval().cuda()
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True)
+
+model.init_tts()
+model.tts.float()
+```
+
+</details>
+
+##### Mimick
+
+<details> <summary> 点击查看 MiniCPM-o 2.6 端到端语音理解生成的 Python 代码。 </summary>
+
+- `Mimick` 任务反映了模型的端到端语音建模能力。模型接受音频输入，输出语音识别（ASR）转录结果，并随后以高相似度重建原始音频。重建的音频相似度和原始音频越高，表明模型有越高的语音端到端建模基础能力。
+```python
+mimick_prompt = "Please repeat each user's speech, including voice style and speech content."
+audio_input, _ = librosa.load('xxx.wav', sr=16000, mono=True)
+msgs = [{'role': 'user', 'content': [mimick_prompt,audio_input]}]
+res = model.chat(
+    msgs=msgs,
+    tokenizer=tokenizer,
+    sampling=True,
+    max_new_tokens=128,
+    use_tts_template=True,
+    temperature=0.3,
+    generate_audio=True,
+    output_audio_path='output.wav', # save the tts result to output_audio_path
+)
+```
+
+</details>
+
+##### 可配置声音的语音对话
+<details> <summary> 点击查看个性化配置 MiniCPM-o 2.6 对话声音的 Python 代码。</summary>
+
+```python
+ref_audio, _ = librosa.load('./assets/voice_01.wav', sr=16000, mono=True) # load the reference audio
+
+# Audio RolePlay:  # With this mode, model will role-play the character based on the audio prompt.
+sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='audio_roleplay', language='en')
+user_question = {'role': 'user', 'content': [librosa.load('xxx.wav', sr=16000, mono=True)[0]]}
+
+# Audio Assistant: # With this mode, model will speak with the voice in ref_audio as a AI assistant.
+# sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='audio_assistant', language='en') 
+# user_question = {'role': 'user', 'content': [librosa.load('xxx.wav', sr=16000, mono=True)[0]]} # Try to ask something!
+```
+```python
+msgs = [sys_prompt, user_question]
+res = model.chat(
+    msgs=msgs,
+    tokenizer=tokenizer,
+    sampling=True,
+    max_new_tokens=128,
+    use_tts_template=True,
+    generate_audio=True,
+    temperature=0.3,
+    output_audio_path='result.wav',
+)
+
+# round two
+history = msgs.append({'role': 'assistant', 'content': res})
+user_question = {'role': 'user', 'content': [librosa.load('xxx.wav', sr=16000, mono=True)[0]]}
+msgs = history.append(user_question)
+res = model.chat(
+    msgs=msgs,
+    tokenizer=tokenizer,
+    sampling=True,
+    max_new_tokens=128,
+    use_tts_template=True,
+    generate_audio=True,
+    temperature=0.3,
+    output_audio_path='result_round_2.wav',
+)
+print(res)
+```
+
+</details>
+
+##### 更多语音任务
+<details>
+<summary>  点击查看 MiniCPM-o 2.6 完成更多语音任务的 Python 代码。 </summary>
+
+```python
+'''
+Audio Understanding Task Prompt:
+Speech:
+    ASR with ZH(same as AST en2zh): 请仔细听这段音频片段，并将其内容逐字记录。
+    ASR with EN(same as AST zh2en): Please listen to the audio snippet carefully and transcribe the content.
+    Speaker Analysis: Based on the speaker's content, speculate on their gender, condition, age range, and health status.
+General Audio:
+    Audio Caption: Summarize the main content of the audio.
+    Sound Scene Tagging: Utilize one keyword to convey the audio's content or the associated scene.
+'''
+task_prompt = "\n"
+audio_input, _ = librosa.load('xxx.wav', sr=16000, mono=True)
+
+msgs = [{'role': 'user', 'content': [task_prompt,audio_input]}]
+
+res = model.chat(
+    msgs=msgs,
+    tokenizer=tokenizer,
+    sampling=True,
+    max_new_tokens=128,
+    use_tts_template=True,
+    generate_audio=True,
+    temperature=0.3,
+    output_audio_path='result.wav',
+)
+print(res)
+```
+```python
+'''
+Speech Generation Task Prompt:
+    Human Instruction-to-Speech: see https://voxinstruct.github.io/VoxInstruct/
+    Example:
+        # 在新闻中，一个年轻男性兴致勃勃地说：“祝福亲爱的祖国母亲美丽富强！”他用低音调和低音量，慢慢地说出了这句话。
+        # Delighting in a surprised tone, an adult male with low pitch and low volume comments:"One even gave my little dog a biscuit" This dialogue takes place at a leisurely pace, delivering a sense of excitement and surprise in the context. 
+
+    Voice Cloning or Voice Creation: With this mode, model will act like a TTS model. 
+'''
+# Human Instruction-to-Speech:
+task_prompt = '' #Try to make some Human Instruction-to-Speech prompt
+msgs = [{'role': 'user', 'content': [task_prompt]}] # you can try to use the same audio question
+
+# Voice Cloning mode: With this mode, model will act like a TTS model. 
+# sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='voice_cloning', language='en')
+# text_prompt = f"Please read the text below."
+# user_question = {'role': 'user', 'content': [text_prompt, "content that you want to read"]} # using same voice in sys_prompt to read the text. (Voice Cloning)
+# user_question = {'role': 'user', 'content': [text_prompt, librosa.load('xxx.wav', sr=16000, mono=True)[0]]} # using same voice in sys_prompt to read 'xxx.wav'. (Voice Creation)
+
+msgs = [sys_prompt, user_question]
+res = model.chat(
+    msgs=msgs,
+    tokenizer=tokenizer,
+    sampling=True,
+    max_new_tokens=128,
+    use_tts_template=True,
+    generate_audio=True,
+    temperature=0.3,
+    output_audio_path='result.wav',
+)
+
+
+```
+
+</details>
+
+#### 多模态流式交互
+<details>
+<summary> 点击查看 MiniCPM-o 2.6 多模态流式交互的 Python 代码。 </summary>
+
+```python
+import math
+import numpy as np
+from PIL import Image
+from moviepy.editor import VideoFileClip
+import tempfile
+import librosa
+import soundfile as sf
+
+## make sure The model has been initialized and `model.init_tts()` has been executed
+
+def get_video_chunk_content(video_path, flatten=True):
+    video = VideoFileClip(video_path)
+    print('video_duration:', video.duration)
+    
+    with tempfile.NamedTemporaryFile(suffix=".wav", delete=True) as temp_audio_file:
+        temp_audio_file_path = temp_audio_file.name
+        video.audio.write_audiofile(temp_audio_file_path, codec="pcm_s16le", fps=16000)
+        audio_np, sr = librosa.load(temp_audio_file_path, sr=16000, mono=True)
+    num_units = math.ceil(video.duration)
+    
+    # 1 frame + 1s audio chunk
+    contents= []
+    for i in range(num_units):
+        frame = video.get_frame(i+1)
+        image = Image.fromarray((frame).astype(np.uint8))
+        audio = audio_np[sr*i:sr*(i+1)]
+        if flatten:
+            contents.extend(["<unit>", image, audio])
+        else:
+            contents.append(["<unit>", image, audio])
+    
+    return contents
+
+video_path="/path/to/video"
+sys_msg = model.get_sys_prompt(mode='omni', language='en')
+# if use voice clone prompt, please set ref_audio
+# ref_audio_path = '/path/to/ref_audio'
+# ref_audio, _ = librosa.load(ref_audio_path, sr=16000, mono=True)
+# sys_msg = model.get_sys_prompt(ref_audio=ref_audio, mode='omni', language='en')
+
+contents = get_video_chunk_content(video_path)
+msg = {"role":"user", "content": contents}
+msgs = [sys_msg, msg]
+
+# please set generate_audio=True and output_audio_path to save the tts result
+generate_audio = True
+output_audio_path = 'output.wav'
+
+res = model.chat(
+    msgs=msgs,
+    tokenizer=tokenizer,
+    sampling=True,
+    temperature=0.5,
+    max_new_tokens=4096,
+    omni_input=True, # please set omni_input=True when omni inference
+    use_tts_template=True,
+    generate_audio=generate_audio,
+    output_audio_path=output_audio_path,
+    max_slice_nums=1,
+    use_image_id=False,
+    return_dict=True
+)
+print(res)
+```
+</details>
+
+<details>
+<summary> 点击查看多模态流式推理设置。 </summary>
+
+注意：流式推理存在轻微的性能下降，因为音频编码并非全局的。
+```python
+# a new conversation need reset session first, it will reset the kv-cache
+model.reset_session()
+
+contents = get_video_chunk_content(video_path, flatten=False)
+session_id = '123'
+generate_audio = True
+
+# 1. prefill system prompt
+res = model.streaming_prefill(
+    session_id=session_id,
+    msgs=[sys_msg], 
+    tokenizer=tokenizer
+)
+
+# 2. prefill video/audio chunks
+for content in contents:
+    msgs = [{"role":"user", "content": content}]
+    res = model.streaming_prefill(
+        session_id=session_id,
+        msgs=msgs, 
+        tokenizer=tokenizer
+    )
+
+# 3. generate
+res = model.streaming_generate(
+    session_id=session_id,
+    tokenizer=tokenizer,
+    temperature=0.5,
+    generate_audio=generate_audio
+)
+
+audios = []
+text = ""
+
+if generate_audio:
+    for r in res:
+        audio_wav = r.audio_wav
+        sampling_rate = r.sampling_rate
+        txt = r.text
+
+        audios.append(audio_wav)
+        text += txt
+        
+    res = np.concatenate(audios)
+    sf.write("output.wav", res, samplerate=sampling_rate)
+    print("text:", text)
+    print("audio saved to output.wav")
+else:
+    for r in res:
+        text += r['text']
+    print("text:", text)
+```
+
+</details>
+
+
 ### 多卡推理
 您可以通过将模型的层分布在多个低显存显卡（12 GB 或 16 GB）上，运行 MiniCPM-Llama3-V 2.5。请查看该[教程](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md)，详细了解如何使用多张低显存显卡载入模型并进行推理。
 
@@ -1536,97 +2364,110 @@ python web_demo_2.6.py --device cuda
 ```
 </details>
 
-### llama.cpp 部署<a id="llamacpp-部署"></a>
-MiniCPM-V 2.6 现在支持llama.cpp啦! 用法请参考[我们的fork llama.cpp](https://github.com/OpenBMB/llama.cpp/tree/minicpmv-main/examples/llava/README-minicpmv2.6.md)， 在iPad上可以支持 16~18 token/s 的流畅推理（测试环境：iPad Pro + M4）。
+### 基于 llama.cpp、ollama、vLLM 的高效推理
 
-### ollama 部署<a id="ollama-部署"></a>
-MiniCPM-V 2.6 现在支持ollama啦! 用法请参考[我们的fork ollama](https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/examples/minicpm-v2.6/README.md)， 在iPad上可以支持 16~18 token/s 的流畅推理（测试环境：iPad Pro + M4）。
+llama.cpp 用法请参考[我们的fork llama.cpp](https://github.com/OpenBMB/llama.cpp/tree/minicpmv-main/examples/llava/README-minicpmv2.6.md)， 在iPad上可以支持 16~18 token/s 的流畅推理（测试环境：iPad Pro + M4）。
+
+ollama 用法请参考[我们的fork ollama](https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/examples/minicpm-v2.6/README.md)， 在iPad上可以支持 16~18 token/s 的流畅推理（测试环境：iPad Pro + M4）。
 
-### vLLM 部署 <a id='vllm'></a>
 <details>
-<summary>点击查看, vLLM 现已官方支持MiniCPM-V 2.6、MiniCPM-Llama3-V 2.5 和 MiniCPM-V 2.0  </summary>
+<summary>点击查看, vLLM 现已官方支持MiniCPM-V 2.6、MiniCPM-Llama3-V 2.5 和 MiniCPM-V 2.0，MiniCPM-o 2.6 模型也可以临时用我们的 fork 仓库运行。  </summary>
+1. MiniCPM-o 2.6
+   1. 克隆我们的 vLLM fork 仓库:
+   ```shell
+   git clone https://github.com/OpenBMB/vllm.git
+   cd vllm
+   git checkout minicpmo
+   ```
+   2. 从源码进行安装:
+   ```shell
+   VLLM_USE_PRECOMPILED=1 pip install --editable . 
+   ```
+   3. 用和之前同样的方式运行（下有样例）.
 
-1. 安装 vLLM(>=0.5.4):
-```shell
-pip install vllm
-```
-3. 安装 timm 库: （可选，MiniCPM-V 2.0需安装）
-```shell
-pip install timm=0.9.10
-```
-4. 运行示例代码:（注意：如果使用本地路径的模型，请确保模型代码已更新到Hugging Face上的最新版)
-```python
-from transformers import AutoTokenizer
-from PIL import Image
-from vllm import LLM, SamplingParams
+2. 之前版本的 MiniCPM-V
+    1. 安装 vLLM(>=0.5.4):
+    ```shell
+    pip install vllm
+    ```
+    3. 安装 timm 库: （可选，MiniCPM-V 2.0需安装）
+    ```shell
+    pip install timm=0.9.10
+    ```
+    4. 运行示例代码:（注意：如果使用本地路径的模型，请确保模型代码已更新到Hugging Face上的最新版)
+    ```python
+    from transformers import AutoTokenizer
+    from PIL import Image
+    from vllm import LLM, SamplingParams
 
-MODEL_NAME = "openbmb/MiniCPM-V-2_6"
-# Also available for previous models
-# MODEL_NAME = "openbmb/MiniCPM-Llama3-V-2_5"
-# MODEL_NAME = "HwwwH/MiniCPM-V-2"
+    MODEL_NAME = "openbmb/MiniCPM-V-2_6"
+    # MODEL_NAME = "openbmb/MiniCPM-O-2_6"
+    # Also available for previous models
+    # MODEL_NAME = "openbmb/MiniCPM-Llama3-V-2_5"
+    # MODEL_NAME = "HwwwH/MiniCPM-V-2"
 
-image = Image.open("xxx.png").convert("RGB")
-tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
-llm = LLM(
-    model=MODEL_NAME,
-    trust_remote_code=True,
-    gpu_memory_utilization=1,
-    max_model_len=2048
-)
+    image = Image.open("xxx.png").convert("RGB")
+    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
+    llm = LLM(
+        model=MODEL_NAME,
+        trust_remote_code=True,
+        gpu_memory_utilization=1,
+        max_model_len=2048
+    )
 
-messages = [{
-    "role":
-    "user",
-    "content":
-    # Number of images
-    "(<image>./</image>)" + \
-    "\nWhat is the content of this image?" 
-}]
-prompt = tokenizer.apply_chat_template(
-    messages,
-    tokenize=False,
-    add_generation_prompt=True
-)
+    messages = [{
+        "role":
+        "user",
+        "content":
+        # Number of images
+        "(<image>./</image>)" + \
+        "\nWhat is the content of this image?" 
+    }]
+    prompt = tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True
+    )
 
-# Single Inference
-inputs = {
-    "prompt": prompt,
-    "multi_modal_data": {
-        "image": image
-        # Multi images, the number of images should be equal to that of `(<image>./</image>)`
-        # "image": [image, image] 
-    },
-}
-# Batch Inference
-# inputs = [{
-#     "prompt": prompt,
-#     "multi_modal_data": {
-#         "image": image
-#     },
-# } for _ in 2]
+    # Single Inference
+    inputs = {
+        "prompt": prompt,
+        "multi_modal_data": {
+            "image": image
+            # Multi images, the number of images should be equal to that of `(<image>./</image>)`
+            # "image": [image, image] 
+        },
+    }
+    # Batch Inference
+    # inputs = [{
+    #     "prompt": prompt,
+    #     "multi_modal_data": {
+    #         "image": image
+    #     },
+    # } for _ in 2]
 
 
-# 2.6
-stop_tokens = ['<|im_end|>', '<|endoftext|>']
-stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]
-# 2.0
-# stop_token_ids = [tokenizer.eos_id]
-# 2.5
-# stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]
+    # 2.6
+    stop_tokens = ['<|im_end|>', '<|endoftext|>']
+    stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]
+    # 2.0
+    # stop_token_ids = [tokenizer.eos_id]
+    # 2.5
+    # stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]
 
-sampling_params = SamplingParams(
-    stop_token_ids=stop_token_ids, 
-    use_beam_search=True,
-    temperature=0, 
-    best_of=3,
-    max_tokens=1024
-)
+    sampling_params = SamplingParams(
+        stop_token_ids=stop_token_ids, 
+        use_beam_search=True,
+        temperature=0, 
+        best_of=3,
+        max_tokens=1024
+    )
 
-outputs = llm.generate(inputs, sampling_params=sampling_params)
+    outputs = llm.generate(inputs, sampling_params=sampling_params)
 
-print(outputs[0].outputs[0].text)
-```
-4. [点击此处](https://modelbest.feishu.cn/wiki/C2BWw4ZP0iCDy7kkCPCcX2BHnOf?from=from_copylink)查看带视频推理和其他有关 `vLLM` 的信息。
+    print(outputs[0].outputs[0].text)
+    ```
+    4. [点击此处](https://modelbest.feishu.cn/wiki/C2BWw4ZP0iCDy7kkCPCcX2BHnOf?from=from_copylink)查看带视频推理和其他有关 `vLLM` 的信息。
 
 </details>
 
@@ -1635,10 +2476,17 @@ print(outputs[0].outputs[0].text)
 
 ### 简易微调 <!-- omit in toc -->
 
-我们支持使用 Huggingface Transformers 库简易地微调 MiniCPM-V 2.0 和 MiniCPM-Llama3-V 2.5 模型。
+我们支持使用 Huggingface Transformers 库简易地微调 MiniCPM-o 2.6、MiniCPM-V 2.6、MiniCPM-Llama3-V 2.5 和 MiniCPM-V 2.0 模型。
 
 [参考文档](./finetune/readme.md)
 
+### 使用 LLaMA-Factory <!-- omit in toc -->
+
+我们支持使用 LLaMA-Factory 微调 MiniCPM-o-2.6 和 MiniCPM-V 2.6。LLaMA-Factory 提供了一种灵活定制 200 多个大型语言模型（LLM）微调（Lora/Full/Qlora）解决方案，无需编写代码，通过内置的 Web 用户界面 LLaMABoard 即可实现训练/推理/评估。它支持多种训练方法，如 sft/ppo/dpo/kto，并且还支持如 Galore/BAdam/LLaMA-Pro/Pissa/LongLoRA 等高级算法。
+
+最佳实践: [MiniCPM-V-2.6 | MiniCPM-o-2.6](https://github.com/openbmb/MiniCPM-V/blob/main/docs/llamafactory_train.md). 
+
+
 ### 使用 SWIFT 框架 <!-- omit in toc -->
 
 我们支持使用 SWIFT 框架微调 MiniCPM-V 系列模型。SWIFT 支持近 200 种大语言模型和多模态大模型的训练、推理、评测和部署。支持 PEFT 提供的轻量训练方案和完整的 Adapters 库支持的最新训练技术如 NEFTune、LoRA+、LLaMA-PRO 等。 
@@ -1649,15 +2497,23 @@ print(outputs[0].outputs[0].text)
 点击查看 [FAQs](./docs/faqs.md)
 
 
+## 模型局限性
+
+我们实验发现 MiniCPM-o 2.6 存在一些显著的局限性，需要进一步研究和改进：
+- **不稳定的语音输出。** 语音生成可能会受到背景噪音和无意义声音的影响，表现不稳定。
+- **重复响应。** 当遇到连续相似的用户请求时，模型往往会重复相同的回答。
+- **Web Demo 延迟较高。** 用户在使用远程服务器上部署的 web demo 时可能会产生较高延迟。我们推荐用户在本地部署来获得更低延迟的体验。
+
+
 ## 模型协议 <!-- omit in toc -->
 
 * 本仓库中代码依照 [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) 协议开源
-* MiniCPM-V 模型权重的使用则需要遵循 [“MiniCPM模型商用许可协议.md”](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%E6%A8%A1%E5%9E%8B%E5%95%86%E7%94%A8%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.md)。
+* MiniCPM-o/V 模型权重的使用则需要遵循 [“MiniCPM模型商用许可协议.md”](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%E6%A8%A1%E5%9E%8B%E5%95%86%E7%94%A8%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.md)。
 * MiniCPM 模型权重对学术研究完全开放，在填写[“问卷”](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g)进行登记后亦允许免费商业使用。
 
 ## 声明 <!-- omit in toc -->
 
-作为多模态大模型，MiniCPM-V 系列模型（包括 OmniLMM）通过学习大量的多模态数据来生成内容，但它无法理解、表达个人观点或价值判断，它所输出的任何内容都不代表模型开发者的观点和立场。
+作为多模态大模型，MiniCPM-o/V 系列模型（包括 OmniLMM）通过学习大量的多模态数据来生成内容，但它无法理解、表达个人观点或价值判断，它所输出的任何内容都不代表模型开发者的观点和立场。
 
 因此用户在使用本项目的系列模型生成的内容时，应自行负责对其进行评估和验证。如果由于使用本项目的系列开源模型而导致的任何问题，包括但不限于数据安全问题、公共舆论风险，或模型被误导、滥用、传播或不当利用所带来的任何风险和问题，我们将不承担任何责任。
 
@@ -1668,7 +2524,6 @@ print(outputs[0].outputs[0].text)
 
 - <img src="assets/thunlp.png" width="28px"> [清华大学自然语言处理实验室](https://nlp.csai.tsinghua.edu.cn/)
 - <img src="assets/modelbest.png" width="28px"> [面壁智能](https://modelbest.cn/)
-- <img src="assets/zhihu.webp" width="28px"> [知乎](https://www.zhihu.com/ )
 
 ## 🌟 Star History <!-- omit in toc -->
 
@@ -1700,7 +2555,7 @@ print(outputs[0].outputs[0].text)
 
 ## 支持技术和其他多模态项目 <!-- omit in toc -->
 
-👏 欢迎了解 MiniCPM-V 背后的支持技术和更多我们的多模态项目！
+👏 欢迎了解 MiniCPM-o/V 背后的支持技术和更多我们的多模态项目！
 
 [VisCPM](https://github.com/OpenBMB/VisCPM/tree/main) | [RLHF-V](https://github.com/RLHF-V/RLHF-V) | [LLaVA-UHD](https://github.com/thunlp/LLaVA-UHD) | [RLAIF-V](https://github.com/RLHF-V/RLAIF-V)
 
diff --git a/assets/MiniCPM-o.png b/assets/MiniCPM-o.png
new file mode 100644
index 0000000..20fa91e
Binary files /dev/null and b/assets/MiniCPM-o.png differ
diff --git a/assets/discord.png b/assets/discord.png
new file mode 100644
index 0000000..c3067a4
Binary files /dev/null and b/assets/discord.png differ
diff --git a/assets/logo.html b/assets/logo.html
new file mode 100644
index 0000000..71257de
--- /dev/null
+++ b/assets/logo.html
@@ -0,0 +1,3 @@
+<span style="color:#56A7DA; font-size: 10em; font-weight: bold;">
+    MiniCPM-<span>o</span>
+</span>
\ No newline at end of file
diff --git a/assets/minicpm-o-26-framework.png b/assets/minicpm-o-26-framework.png
new file mode 100644
index 0000000..459887e
Binary files /dev/null and b/assets/minicpm-o-26-framework.png differ
diff --git a/assets/minicpmo2_6/minicpmo2_6_diagram_train_NN.png b/assets/minicpmo2_6/minicpmo2_6_diagram_train_NN.png
new file mode 100644
index 0000000..eeef5f2
Binary files /dev/null and b/assets/minicpmo2_6/minicpmo2_6_diagram_train_NN.png differ
diff --git a/assets/minicpmo2_6/minicpmo2_6_math_intersect.png b/assets/minicpmo2_6/minicpmo2_6_math_intersect.png
new file mode 100644
index 0000000..f526b1c
Binary files /dev/null and b/assets/minicpmo2_6/minicpmo2_6_math_intersect.png differ
diff --git a/assets/minicpmo2_6/minicpmo2_6_multi-image_bike.png b/assets/minicpmo2_6/minicpmo2_6_multi-image_bike.png
new file mode 100644
index 0000000..090337b
Binary files /dev/null and b/assets/minicpmo2_6/minicpmo2_6_multi-image_bike.png differ
diff --git a/assets/minicpmo2_6/show_demo.jpg b/assets/minicpmo2_6/show_demo.jpg
new file mode 100644
index 0000000..40ec4fb
Binary files /dev/null and b/assets/minicpmo2_6/show_demo.jpg differ
diff --git a/assets/o-2dot6-demo-video-preview.png b/assets/o-2dot6-demo-video-preview.png
new file mode 100644
index 0000000..8e34ab4
Binary files /dev/null and b/assets/o-2dot6-demo-video-preview.png differ
diff --git a/assets/radar.jpg b/assets/radar.jpg
new file mode 100644
index 0000000..51f75bc
Binary files /dev/null and b/assets/radar.jpg differ
diff --git a/assets/ref_audios/default.wav b/assets/ref_audios/default.wav
new file mode 100644
index 0000000..8171eee
Binary files /dev/null and b/assets/ref_audios/default.wav differ
diff --git a/assets/ref_audios/female_example.wav b/assets/ref_audios/female_example.wav
new file mode 100644
index 0000000..4f795b2
Binary files /dev/null and b/assets/ref_audios/female_example.wav differ
diff --git a/assets/ref_audios/male_example.wav b/assets/ref_audios/male_example.wav
new file mode 100644
index 0000000..09e725b
Binary files /dev/null and b/assets/ref_audios/male_example.wav differ
diff --git a/assets/ref_audios/video_default.wav b/assets/ref_audios/video_default.wav
new file mode 100644
index 0000000..2e6061b
Binary files /dev/null and b/assets/ref_audios/video_default.wav differ
diff --git a/assets/wechat.png b/assets/wechat.png
new file mode 100644
index 0000000..8a109ef
Binary files /dev/null and b/assets/wechat.png differ
diff --git a/docs/minicpm_llama3_v2dot5.md b/docs/minicpm_llama3_v2dot5.md
new file mode 100644
index 0000000..7ab8700
--- /dev/null
+++ b/docs/minicpm_llama3_v2dot5.md
@@ -0,0 +1,333 @@
+## MiniCPM-Llama3-V 2.5
+
+> Archieve at: 2025-01-13
+
+
+**MiniCPM-Llama3-V 2.5** is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.0. Notable features of MiniCPM-Llama3-V 2.5 include:
+
+- 🔥 **Leading Performance.**
+  MiniCPM-Llama3-V 2.5 has achieved an average score of 65.1 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. **With only 8B parameters, it surpasses widely used proprietary models like GPT-4V-1106, Gemini Pro, Claude 3 and Qwen-VL-Max** and greatly outperforms other Llama 3-based MLLMs.
+
+- 💪 **Strong OCR Capabilities.**
+  MiniCPM-Llama3-V 2.5 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving a **700+ score on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V-0409, Qwen-VL-Max and Gemini Pro**. Based on recent user feedback, MiniCPM-Llama3-V 2.5 has now enhanced full-text OCR extraction, table-to-markdown conversion, and other high-utility capabilities, and has further strengthened its instruction-following and complex reasoning abilities, enhancing multimodal interaction experiences.
+
+- 🏆 **Trustworthy Behavior.**
+  Leveraging the latest [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) method (the newest technique in the [RLHF-V](https://github.com/RLHF-V) [CVPR'24] series), MiniCPM-Llama3-V 2.5 exhibits more trustworthy behavior. It achieves a **10.3%** hallucination rate on Object HalBench, lower than GPT-4V-1106 (13.6%), achieving the best-level performance within the open-source community. [Data released](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset).
+
+- 🌏 **Multilingual Support.**
+  Thanks to the strong multilingual capabilities of Llama 3 and the cross-lingual generalization technique from [VisCPM](https://github.com/OpenBMB/VisCPM), MiniCPM-Llama3-V 2.5 extends its bilingual (Chinese-English) multimodal capabilities to **over 30 languages including German, French, Spanish, Italian, Korean etc.** [All Supported Languages](./assets/minicpm-llama-v-2-5_languages.md).
+
+- 🚀 **Efficient Deployment.**
+  MiniCPM-Llama3-V 2.5 systematically employs **model quantization, CPU optimizations, NPU optimizations and compilation optimizations**, achieving high-efficiency deployment on end-side devices. For mobile phones with Qualcomm chips, we have integrated the NPU acceleration framework QNN into llama.cpp for the first time. After systematic optimization, MiniCPM-Llama3-V 2.5 has realized a **150x acceleration in end-side MLLM image encoding** and a **3x speedup in language decoding**.
+
+-  💫  **Easy Usage.**
+MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) and [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) support for efficient CPU inference on local devices, (2) [GGUF](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) format quantized models in 16 sizes, (3) efficient [LoRA](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#lora-finetuning) fine-tuning with only 2 V100 GPUs, (4) [streaming output](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5#usage), (5) quick local WebUI demo setup with [Gradio](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_2.5.py) and [Streamlit](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_streamlit-2_5.py), and (6) interactive demos on [HuggingFace Spaces](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5).
+
+### Evaluation  <!-- omit in toc -->
+
+<div align="center">
+    <img src=../assets/MiniCPM-Llama3-V-2.5-peformance.png width=66% />
+</div>
+<details>
+<summary>Click to view results on TextVQA, DocVQA, OCRBench, OpenCompass, MME, MMBench, MMMU, MathVista, LLaVA Bench, RealWorld QA, Object HalBench. </summary>
+<div align="center">
+
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th>OCRBench</th>
+            <th>TextVQA val</th>
+            <th>DocVQA test</th>
+            <th>Open-Compass</th>
+            <th>MME</th>
+            <th>MMB test (en)</th>
+            <th>MMB test (cn)</th>
+            <th>MMMU val</th>
+            <th>Math-Vista</th>
+            <th>LLaVA Bench</th>
+            <th>RealWorld QA</th>
+            <th>Object HalBench</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="14" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Gemini Pro</td>
+            <td>-</td>
+            <td>680</td>
+            <td>74.6</td>
+            <td>88.1</td>
+            <td>62.9</td>
+            <td>2148.9</td>
+            <td>73.6</td>
+            <td>74.3</td>
+            <td>48.9</td>
+            <td>45.8</td>
+            <td>79.9</td>
+            <td>60.4</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4V (2023.11.06)</td>
+            <td>-</td>
+            <td>645</td>
+            <td>78.0</td>
+            <td>88.4</td>
+            <td>63.5</td>
+            <td>1771.5</td>
+            <td>77.0</td>
+            <td>74.4</td>
+            <td>53.8</td>
+            <td>47.8</td>
+            <td>93.1</td>
+            <td>63.0</td>
+            <td>86.4</td>
+        </tr>
+        <tr>
+            <td colspan="14" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Mini-Gemini</td>
+            <td>2.2B</td>
+            <td>-</td>
+            <td>56.2</td>
+            <td>34.2*</td>
+            <td>-</td>
+            <td>1653.0</td>
+            <td>-</td>
+            <td>-</td>
+            <td>31.7</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen-VL-Chat</td>
+            <td>9.6B</td>
+            <td>488</td>
+            <td>61.5</td>
+            <td>62.6</td>
+            <td>51.6</td>
+            <td>1860.0</td>
+            <td>61.8</td>
+            <td>56.3</td>
+            <td>37.0</td>
+            <td>33.8</td>
+            <td>67.7</td>
+            <td>49.3</td>
+            <td>56.2</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">DeepSeek-VL-7B</td>
+            <td>7.3B</td>
+            <td>435</td>
+            <td>64.7*</td>
+            <td>47.0*</td>
+            <td>54.6</td>
+            <td>1765.4</td>
+            <td>73.8</td>
+            <td>71.4</td>
+            <td>38.3</td>
+            <td>36.8</td>
+            <td>77.8</td>
+            <td>54.2</td>
+            <td>-</td>
+        </tr>        
+        <tr>
+            <td nowrap="nowrap" align="left">Yi-VL-34B</td>
+            <td>34B</td>
+            <td>290</td>
+            <td>43.4*</td>
+            <td>16.9*</td>
+            <td>52.2</td>
+            <td><strong>2050.2</strong></td>
+            <td>72.4</td>
+            <td>70.7</td>
+            <td>45.1</td>
+            <td>30.7</td>
+            <td>62.3</td>
+            <td>54.8</td>
+            <td>79.3</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">CogVLM-Chat</td>
+            <td>17.4B</td>
+            <td>590</td>
+            <td>70.4</td>
+            <td>33.3*</td>
+            <td>54.2</td>
+            <td>1736.6</td>
+            <td>65.8</td>
+            <td>55.9</td>
+            <td>37.3</td>
+            <td>34.7</td>
+            <td>73.9</td>
+            <td>60.3</td>
+            <td>73.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">TextMonkey</td>
+            <td>9.7B</td>
+            <td>558</td>
+            <td>64.3</td>
+            <td>66.7</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+          <td nowrap="nowrap" align="left">Idefics2</td>
+          <td>8.0B</td>
+          <td>-</td>
+          <td>73.0</td>
+          <td>74.0</td>
+          <td>57.2</td>
+          <td>1847.6</td>
+          <td>75.7</td>
+          <td>68.6</td>
+          <td>45.2</td>
+          <td>52.2</td>
+          <td>49.1</td>
+          <td>60.7</td>
+          <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Bunny-LLama-3-8B</td>
+            <td>8.4B</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>54.3</td>
+            <td>1920.3</td>
+            <td>77.0</td>
+            <td>73.9</td>
+            <td>41.3</td>
+            <td>31.5</td>
+            <td>61.2</td>
+            <td>58.8</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT Llama-3-8B</td>
+            <td>8.4B</td>
+            <td>-</td>
+            <td>-</td>
+            <td>78.2</td>
+            <td>-</td>
+            <td>1971.5</td>
+            <td>-</td>
+            <td>-</td>
+            <td>41.7</td>
+            <td>37.5</td>
+            <td>80.1</td>
+            <td>60.0</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Phi-3-vision-128k-instruct</td>
+            <td>4.2B</td>
+            <td>639*</td>
+            <td>70.9</td>
+            <td>-</td>
+            <td>-</td>
+            <td>1537.5*</td>
+            <td>-</td>
+            <td>-</td>
+            <td>40.4</td>
+            <td>44.5</td>
+            <td>64.2*</td>
+            <td>58.8*</td>
+            <td>-</td>
+        </tr>
+        <tr style="background-color: #e6f2ff;">
+            <td nowrap="nowrap" align="left">MiniCPM-V 1.0</td>
+            <td>2.8B</td>
+            <td>366</td>
+            <td>60.6</td>
+            <td>38.2</td>
+            <td>47.5</td>
+            <td>1650.2</td>
+            <td>64.1</td>
+            <td>62.6</td>
+            <td>38.3</td>
+            <td>28.9</td>
+            <td>51.3</td>
+            <td>51.2</td>
+            <td>78.4</td>
+        </tr>
+        <tr style="background-color: #e6f2ff;">
+            <td nowrap="nowrap" align="left">MiniCPM-V 2.0</td>
+            <td>2.8B</td>
+            <td>605</td>
+            <td>74.1</td>
+            <td>71.9</td>
+            <td>54.5</td>
+            <td>1808.6</td>
+            <td>69.1</td>
+            <td>66.5</td>
+            <td>38.2</td>
+            <td>38.7</td>
+            <td>69.2</td>
+            <td>55.8</td>
+            <td>85.5</td>
+        </tr>
+        <tr style="background-color: #e6f2ff;">
+            <td nowrap="nowrap" align="left">MiniCPM-Llama3-V 2.5</td>
+            <td>8.5B</td>
+            <td><strong>725</strong></td>
+            <td><strong>76.6</strong></td>
+            <td><strong>84.8</strong></td>
+            <td><strong>65.1</strong></td>
+            <td>2024.6</td>
+            <td><strong>77.2</strong></td>
+            <td><strong>74.2</strong></td>
+            <td><strong>45.8</strong></td>
+            <td><strong>54.3</strong></td>
+            <td><strong>86.7</strong></td>
+            <td><strong>63.5</strong></td>
+            <td><strong>89.7</strong></td>
+        </tr>
+    </tbody>
+</table>
+
+
+</div>
+* We evaluate the officially released checkpoint by ourselves.
+
+</details>
+
+<div align="center">
+    <img src="../assets/llavabench_compare_3.png" width="100%" />
+    <br>
+    Evaluation results of multilingual LLaVA Bench
+</div>
+
+### Examples <!-- omit in toc -->
+
+<table align="center" >
+  <p align="center" > 
+  <img src="../assets/minicpmv-llama3-v2.5/cases_all.png" />
+  </p>
+</table>
+
+</details>
+
+
+### Model Zoo
+
+| Model           | Device | Memory    | &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; Description       | Download |
+|:-----------|:--:|:-----------:|:-------------------|:---------------:|
+| MiniCPM-Llama3-V 2.5 | GPU | 19 GB | Strong end-side multimodal performance.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/) &nbsp;&nbsp; [<img src="../assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5) |
+| MiniCPM-Llama3-V 2.5 gguf | CPU  | 6 GB | The gguf version, lower memory usage and faster inference.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) &nbsp;&nbsp;[<img src="../assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-gguf) |
+| MiniCPM-Llama3-V 2.5 int4 | GPU | 8 GB | The int4 quantized version, lower GPU memory usage. |  [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-int4/) &nbsp;&nbsp; [<img src="../assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-int4) |
diff --git a/minicpm_v1.md b/docs/minicpm_v1.md
similarity index 100%
rename from minicpm_v1.md
rename to docs/minicpm_v1.md
diff --git a/docs/minicpm_v2.md b/docs/minicpm_v2.md
new file mode 100644
index 0000000..9dcb5a0
--- /dev/null
+++ b/docs/minicpm_v2.md
@@ -0,0 +1,294 @@
+## MiniCPM-V 2.0
+
+
+> Archive at：2025-01-13
+
+
+
+**MiniCPM-V 2.0** is an efficient version with promising performance for deployment. The model is built based on SigLip-400M and [MiniCPM-2.4B](https://github.com/OpenBMB/MiniCPM/), connected by a perceiver resampler. Our latest version, MiniCPM-V 2.0 has several notable features. 
+
+- 🔥 **State-of-the-art Performance.** 
+
+  MiniCPM-V 2.0 achieves **state-of-the-art performance** on multiple benchmarks (including OCRBench, TextVQA, MME, MMB, MathVista, etc) among models under 7B parameters. It even **outperforms strong Qwen-VL-Chat 9.6B, CogVLM-Chat 17.4B, and Yi-VL 34B on OpenCompass, a comprehensive evaluation over 11 popular benchmarks**. Notably, MiniCPM-V 2.0 shows **strong OCR capability**, achieving **comparable performance to Gemini Pro in scene-text understanding**, and **state-of-the-art performance on OCRBench** among open-source models.
+
+- 🏆 **Trustworthy Behavior.** 
+
+  LMMs are known for suffering from hallucination, often generating text not factually grounded in images. MiniCPM-V 2.0 is **the first end-side LMM aligned via multimodal RLHF for trustworthy behavior** (using the recent [RLHF-V](https://rlhf-v.github.io/) [CVPR'24] series technique). This allows the model to **match GPT-4V in preventing hallucinations** on Object HalBench.
+
+- 🌟 **High-Resolution Images at Any Aspect Raito.**
+
+  MiniCPM-V 2.0 can accept **1.8 million pixels (e.g., 1344x1344) images at any aspect ratio**. This enables better perception of fine-grained visual information such as small objects and optical characters, which is achieved via a recent technique from [LLaVA-UHD](https://arxiv.org/pdf/2403.11703.pdf).
+
+- ⚡️ **High Efficiency.** 
+
+  MiniCPM-V 2.0 can be **efficiently deployed on most GPU cards and personal computers**, and **even on end devices such as mobile phones**. For visual encoding, we compress the image representations into much fewer tokens via a perceiver resampler. This allows MiniCPM-V 2.0 to operate with **favorable memory cost and speed during inference even when dealing with high-resolution images**.
+
+- 🙌 **Bilingual Support.** 
+
+  MiniCPM-V 2.0 **supports strong bilingual multimodal capabilities in both English and Chinese**. This is enabled by generalizing multimodal capabilities across languages, a technique from [VisCPM](https://arxiv.org/abs/2308.12038) [ICLR'24].
+
+
+### Evaluation <!-- omit in toc -->
+
+<div align="center">
+    <img src=../assets/minicpmv-2-peformance.png width=66% />
+</div>
+<details>
+<summary>Click to view results on TextVQA, DocVQA, OCRBench, OpenCompass, MME, MMBench, MMMU, MathVista, LLaVA Bench, Object HalBench. </summary>
+<div align="center">
+
+<table style="margin: 0px auto;">
+<thead>
+  <tr>
+    <th align="left">Model</th>
+    <th>Size</th>
+    <th>TextVQA val</th>
+    <th>DocVQA test</th>
+    <th>OCRBench</th>
+    <th>OpenCompass</th>
+    <th nowrap="nowrap" >MME</th>
+    <th>MMB dev(en)</th>
+    <th>MMB dev(zh)</th>
+    <th>MMMU val</th>
+    <th>MathVista</th>
+    <th>LLaVA Bench</th>
+    <th nowrap="nowrap">Object HalBench</th>
+  </tr>
+</thead>
+<tbody align="center">
+  <tr>
+    <td colspan="12" align="left"><strong>Proprietary models</strong></td>
+  </tr>
+  <tr>
+    <td nowrap="nowrap" align="left">Gemini Pro Vision</td>
+    <td>- </td>
+    <td>74.6</td>
+    <td>88.1</td>
+    <td>680</td>
+    <td>63.8</td>
+    <td>2148.9</td>
+    <td>75.2</td>
+    <td>74.0</td>
+    <td>48.9</td>
+    <td>45.8</td>
+    <td>79.9</td>
+    <td>- </td>
+  </tr>
+  <tr>
+    <td nowrap="nowrap" align="left">GPT-4V</td>
+    <td>- </td>
+    <td>78.0</td>
+    <td>88.4</td>
+    <td>645</td>
+    <td>63.2</td>
+    <td>1771.5</td>
+    <td>75.1</td>
+    <td>75.0</td>
+    <td>53.8</td>
+    <td>47.8</td>
+    <td>93.1</td>
+    <td>86.4 / 92.7</td>
+  </tr>
+  <tr>
+    <td colspan="12" align="left"><strong>Open-source models 6B~34B</strong></td>
+  </tr>
+  <tr>
+    <td  nowrap="nowrap" align="left" >Yi-VL-6B</td>
+    <td align="right" >6.7B</td>
+    <td>45.5*</td>
+    <td>17.1*</td>
+    <td>290</td>
+    <td>49.3</td>
+    <td>1915.1 </td>
+    <td>68.6 </td>
+    <td>68.3 </td>
+    <td>40.3 </td>
+    <td>28.8 </td>
+    <td>51.9 </td>
+    <td>- </td>
+  </tr>
+  <tr>
+    <td  nowrap="nowrap" align="left" >Qwen-VL-Chat</td>
+    <td align="right" >9.6B</td>
+    <td>61.5</td>
+    <td>62.6</td>
+    <td>488 </td>
+    <td>52.1 </td>
+    <td>1860.0 </td>
+    <td>60.6 </td>
+    <td>56.7 </td>
+    <td>37.0 </td>
+    <td>33.8 </td>
+    <td>67.7 </td>
+    <td>56.2 / 80.0</td>
+  </tr>
+  <tr>
+    <td nowrap="nowrap" align="left" >Yi-VL-34B</td>
+    <td align="right" >34B</td>
+    <td>43.4*</td>
+    <td>16.9*</td>
+    <td>290</td>
+    <td>52.6 </td>
+    <td>2050.2</td>
+    <td>71.1</td>
+    <td>71.4</td>
+    <td>45.1</td>
+    <td>30.7</td>
+    <td>62.3</td>
+    <td>- </td>
+  </tr>
+  <tr>
+    <td  nowrap="nowrap" align="left" >DeepSeek-VL-7B</td>
+    <td align="right" >7.3B</td>
+    <td>64.7*</td>
+    <td>47.0* </td>
+    <td>435</td>
+    <td>55.6 </td>
+    <td>1765.4 </td>
+    <td>74.1 </td>
+    <td>72.8 </td>
+    <td>38.3 </td>
+    <td>36.8</td>
+    <td>77.8 </td>
+    <td>- </td>
+  </tr>
+  <tr>
+    <td  nowrap="nowrap" align="left" >TextMonkey</td>
+    <td align="right" >9.7B</td>
+    <td>64.3</td>
+    <td>66.7 </td>
+    <td>558</td>
+    <td>- </td>
+    <td>- </td>
+    <td>- </td>
+    <td>- </td>
+    <td>- </td>
+    <td>-</td>
+    <td>- </td>
+    <td>- </td>
+  </tr>
+    <tr>
+    <td  nowrap="nowrap" align="left" >CogVLM-Chat</td>
+    <td align="right" >17.4B</td>
+    <td>70.4</td>
+    <td>33.3*</td>
+    <td>590 </td>
+    <td>52.5 </td>
+    <td>1736.6 </td>
+    <td>63.7 </td>
+    <td>53.8 </td>
+    <td>37.3 </td>
+    <td>34.7 </td>
+    <td>73.9 </td>
+    <td>73.6 / 87.4 </td>
+  </tr>
+  <tr>
+    <td colspan="12" align="left"><strong>Open-source models 1B~3B </strong></td>
+  </tr>
+  <tr>
+    <td  nowrap="nowrap" align="left" >DeepSeek-VL-1.3B</td>
+    <td align="right" >1.7B</td>
+    <td>58.4*</td>
+    <td>37.9*</td>
+    <td>413</td>
+    <td>46.0 </td>
+    <td>1531.6 </td>
+    <td>64.0 </td>
+    <td>61.2 </td>
+    <td>33.8 </td>
+    <td>29.4 </td>
+    <td>51.1 </td>
+    <td>- </td>
+  </tr>
+  <tr>
+    <td  nowrap="nowrap" align="left" >MobileVLM V2</td>
+    <td align="right" >3.1B</td>
+    <td>57.5</td>
+    <td>19.4*</td>
+    <td>-</td>
+    <td>-</td>
+    <td>1440.5(P) </td>
+    <td>63.2 </td>
+    <td>-</td>
+    <td>-</td>
+    <td>-</td>
+    <td>-</td>
+    <td>-</td>
+  </tr>
+  <tr>
+    <td  nowrap="nowrap" align="left" >Mini-Gemini</td>
+    <td align="right" >2.2B</td>
+    <td>56.2</td>
+    <td>34.2*</td>
+    <td>-</td>
+    <td>-</td>
+    <td>1653.0 </td>
+    <td>59.8 </td>
+    <td>- </td>
+    <td>31.7 </td>
+    <td>-</td>
+    <td>- </td>
+    <td>- </td>
+  </tr>
+  <tr>
+    <td  nowrap="nowrap" align="left" >MiniCPM-V</td>
+    <td align="right" >2.8B </td>
+    <td>60.6</td>
+    <td>38.2 </td>
+    <td>366</td>
+    <td>47.6</td>
+    <td>1650.2 </td>
+    <td>67.9 </td>
+    <td>65.3 </td>
+    <td><strong>38.3</strong></td>
+    <td>28.9</td>
+    <td>51.3 </td>
+    <td>78.4 / 88.5 </td>
+  </tr>
+  <tr>
+    <td  nowrap="nowrap" align="left" ><strong>MiniCPM-V 2.0</strong></td>
+    <td align="right" >2.8B </td>
+    <td><strong>74.1</strong></td>
+    <td><strong>71.9</strong> </td>
+    <td><strong>605</strong></td>
+    <td><strong>55.0</strong></td>
+    <td><strong>1808.6</strong> </td>
+    <td><strong>69.6</strong> </td>
+    <td><strong>68.1</strong> </td>
+    <td>38.2 </td>
+    <td><strong>38.7</strong></td>
+    <td><strong>69.2</strong> </td>
+    <td><strong>85.5 / 92.2 </strong></td>
+  </tr>
+</tbody>
+</table>
+
+</div>
+* We evaluate the officially released checkpoint by ourselves.
+</details>
+
+### Examples <!-- omit in toc -->
+
+<table align="center">
+    <p align="center">
+      <img src="../assets/minicpmv2-cases_2.png" width=95%/>
+    </p>
+</table>
+
+We deploy MiniCPM-V 2.0 on end devices. The demo video is the raw screen recording on a Xiaomi 14 Pro without edition.
+
+<table align="center">
+    <p align="center">
+      <img src="../assets/gif_cases/station.gif" width=36%/>
+      <img src="../assets/gif_cases/london_car.gif" width=36%/>
+    </p>
+</table>
+
+
+
+### Model Zoo
+
+| Model           | Device | Memory    | &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; Description       | Download |
+|:-----------|:--:|:-----------:|:-------------------|:---------------:|
+| MiniCPM-V 2.0 | GPU | 8 GB | Light version, balance the performance the computation cost.   |  [🤗](https://huggingface.co/openbmb/MiniCPM-V-2) &nbsp;&nbsp; [<img src="../assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2) |
+| MiniCPM-V 1.0 | GPU | 7 GB | Lightest version, achieving the fastest inference. |   [🤗](https://huggingface.co/openbmb/MiniCPM-V) &nbsp;&nbsp; [<img src="../assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V) |
diff --git a/docs/minicpm_v2dot6.md b/docs/minicpm_v2dot6.md
new file mode 100644
index 0000000..9ef6dac
--- /dev/null
+++ b/docs/minicpm_v2dot6.md
@@ -0,0 +1,945 @@
+## MiniCPM-V 2.6
+
+> Archieve at: 2025-01-13
+
+**MiniCPM-V 2.6** is the latest and most capable model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-Llama3-V 2.5, and introduces new features for multi-image and video understanding. Notable features of MiniCPM-V 2.6 include:
+
+- 🔥 **Leading Performance.**
+  MiniCPM-V 2.6 achieves an average score of 65.2 on the latest version of OpenCompass, a comprehensive evaluation over 8 popular benchmarks. **With only 8B parameters, it surpasses widely used proprietary models like GPT-4o mini, GPT-4V, Gemini 1.5 Pro, and Claude 3.5 Sonnet** for single image understanding.
+
+- 🖼️ **Multi Image Understanding and In-context Learning.** MiniCPM-V 2.6 can also perform **conversation and reasoning over multiple images**. It achieves **state-of-the-art performance** on popular multi-image benchmarks such as Mantis-Eval, BLINK, Mathverse mv and Sciverse mv, and also shows promising in-context learning capability.
+
+- 🎬 **Video Understanding.** MiniCPM-V 2.6 can also **accept video inputs**, performing conversation and providing dense captions for spatial-temporal information. It outperforms **GPT-4V, Claude 3.5 Sonnet and LLaVA-NeXT-Video-34B** on Video-MME with/without subtitles.
+
+- 💪 **Strong OCR Capability and Others.**
+  MiniCPM-V 2.6 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344). It achieves **state-of-the-art performance on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V, and Gemini 1.5 Pro**.
+  Based on the the latest [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) and [VisCPM](https://github.com/OpenBMB/VisCPM) techniques, it features **trustworthy behaviors**, with significantly lower hallucination rates than GPT-4o and GPT-4V on Object HalBench, and supports **multilingual capabilities** on English, Chinese, German, French, Italian, Korean, etc.
+
+
+- 🚀 **Superior Efficiency.**
+  In addition to its friendly size, MiniCPM-V 2.6 also shows **state-of-the-art token density** (i.e., number of pixels encoded into each visual token). **It produces only 640 tokens when processing a 1.8M pixel image, which is 75% fewer than most models**. This directly improves the inference speed, first-token latency, memory usage, and power consumption. As a result, MiniCPM-V 2.6 can efficiently support **real-time video understanding** on end-side devices such as iPad.
+
+-  💫  **Easy Usage.**
+MiniCPM-V 2.6 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpmv-main/examples/llava/README-minicpmv2.6.md) and [ollama](https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/examples/minicpm-v2.6/README.md) support for efficient CPU inference on local devices, (2) [int4](https://huggingface.co/openbmb/MiniCPM-V-2_6-int4) and [GGUF](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) format quantized models in 16 sizes, (3) [vLLM](#inference-with-vllm) support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks, (5) quick local WebUI demo setup with [Gradio](#chat-with-our-demo-on-gradio), and (6) online web [demo](http://120.92.209.146:8887/).
+
+### Evaluation  <!-- omit in toc -->
+<div align="center">
+    <img src=../assets/radar_final.png width=66% />
+</div>
+
+<details>
+<summary>Click to view single image results on OpenCompass, MME, MMVet, OCRBench, MMMU, MathVista, MMB, AI2D, TextVQA, DocVQA, HallusionBench, Object HalBench. </summary>
+<div align="center">
+
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th>Token Density<sup>+</sup></th>
+            <th>OpenCompass</th>
+            <th>MME</th>
+            <th>MMVet</th>
+            <th>OCRBench</th>
+            <th>MMMU val</th>
+            <th>MathVista mini</th>
+            <th>MMB1.1 test</th>
+            <th>AI2D</th>
+            <th>TextVQA val</th>
+            <th>DocVQA test</th>
+            <th>HallusionBench</th>
+            <th>Object HalBench</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="15" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o</td>
+            <td>-</td>
+            <td>1088</td>
+            <td>69.9</td>
+            <td>2328.7</td>
+            <td>69.1</td>
+            <td>736</td>
+            <td>69.2</td>
+            <td>61.3</td>
+            <td>82.2</td>
+            <td>84.6</td>
+            <td>-</td>
+            <td>92.8</td>
+            <td>55.0</td>
+            <td>17.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Claude 3.5 Sonnet</td>
+            <td>-</td>
+            <td>750</td>
+            <td>67.9</td>
+            <td>1920.0</td>
+            <td>66.0</td>
+            <td>788</td>
+            <td>65.9</td>
+            <td>61.6</td>
+            <td>78.5</td>
+            <td>80.2</td>
+            <td>-</td>
+            <td>95.2</td>
+            <td>49.9</td>
+            <td>13.8</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Gemini 1.5 Pro</td>
+            <td>-</td>
+            <td>-</td>
+            <td>64.4</td>
+            <td>2110.6</td>
+            <td>64.0</td>
+            <td>754</td>
+            <td>60.6</td>
+            <td>57.7</td>
+            <td>73.9</td>
+            <td>79.1</td>
+            <td>73.5</td>
+            <td>86.5</td>
+            <td>45.6</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4o mini</td>
+            <td>-</td>
+            <td>1088</td>
+            <td>64.1</td>
+            <td>2003.4</td>
+            <td>66.9</td>
+            <td>785</td>
+            <td>60.0</td>
+            <td>52.4</td>
+            <td>76.0</td>
+            <td>77.8</td>
+            <td>-</td>
+            <td>-</td>
+            <td>46.1</td>
+            <td>12.4</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4V</td>
+            <td>-</td>
+            <td>1088</td>
+            <td>63.5</td>
+            <td>2070.2</td>
+            <td>67.5</td>
+            <td>656</td>
+            <td>61.7</td>
+            <td>54.7</td>
+            <td>79.8</td>
+            <td>78.6</td>
+            <td>78.0</td>
+            <td>87.2</td>
+            <td>43.9</td>
+            <td>14.2</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Step-1V</td>
+            <td>-</td>
+            <td>-</td>
+            <td>59.5</td>
+            <td>2206.4</td>
+            <td>63.3</td>
+            <td>625</td>
+            <td>49.9</td>
+            <td>44.8</td>
+            <td>78.0</td>
+            <td>79.2</td>
+            <td>71.6</td>
+            <td>-</td>
+            <td>48.4</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Qwen-VL-Max</td>
+            <td>-</td>
+            <td>784</td>
+            <td>58.3</td>
+            <td>2281.7</td>
+            <td>61.8</td>
+            <td>684</td>
+            <td>52.0</td>
+            <td>43.4</td>
+            <td>74.6</td>
+            <td>75.7</td>
+            <td>79.5</td>
+            <td>93.1</td>
+            <td>41.2</td>
+            <td>13.4</td>
+        </tr>
+        <tr>
+            <td colspan="15" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT-Yi-34B</td>
+            <td>34B</td>
+            <td>157</td>
+            <td>55.0</td>
+            <td>2006.5</td>
+            <td>50.7</td>
+            <td>574</td>
+            <td>48.8</td>
+            <td>40.4</td>
+            <td>77.8</td>
+            <td>78.9</td>
+            <td>69.3</td>
+            <td>-</td>
+            <td>34.8</td>
+            <td>12.6</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Mini-Gemini-HD-34B</td>
+            <td>34B</td>
+            <td>157</td>
+            <td>-</td>
+            <td>2141.0</td>
+            <td>59.3</td>
+            <td>518</td>
+            <td>48.0</td>
+            <td>43.3</td>
+            <td>-</td>
+            <td>80.5</td>
+            <td>74.1</td>
+            <td>78.9</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Cambrian-34B</td>
+            <td>34B</td>
+            <td>1820</td>
+            <td>58.3</td>
+            <td>2049.9</td>
+            <td>53.2</td>
+            <td>591</td>
+            <td>50.4</td>
+            <td>50.3</td>
+            <td>77.8</td>
+            <td>79.5</td>
+            <td>76.7</td>
+            <td>75.5</td>
+            <td>41.6</td>
+            <td>14.7</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GLM-4V-9B</td>
+            <td>13B</td>
+            <td>784</td>
+            <td>59.1</td>
+            <td>2018.8</td>
+            <td>58.0</td>
+            <td>776</td>
+            <td>46.9</td>
+            <td>51.1</td>
+            <td>67.9</td>
+            <td>71.2</td>
+            <td>-</td>
+            <td>-</td>
+            <td>45.0</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2-8B</td>
+            <td>8B</td>
+            <td>706</td>
+            <td>64.1</td>
+            <td>2215.1</td>
+            <td>54.3</td>
+            <td>794</td>
+            <td><strong>51.2</strong></td>
+            <td>58.3</td>
+            <td><strong>79.4</strong></td>
+            <td><strong>83.6</strong></td>
+            <td>77.4</td>
+            <td><strong>91.6</strong></td>
+            <td>45.0</td>
+            <td>21.3</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">MiniCPM-Llama-V 2.5</td>
+            <td>8B</td>
+            <td>1882</td>
+            <td>58.8</td>
+            <td>2024.6</td>
+            <td>52.8</td>
+            <td>725</td>
+            <td>45.8</td>
+            <td>54.3</td>
+            <td>72.0</td>
+            <td>78.4</td>
+            <td>76.6</td>
+            <td>84.8</td>
+            <td>42.4</td>
+            <td>10.3</td>
+        </tr>
+        <tr style="background-color: #e6f2ff;">
+            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
+            <td>8B</td>
+            <td><strong>2822</strong></td>
+            <td><strong>65.2</strong></td>
+            <td><strong>2348.4</strong>*</td>
+            <td><strong>60.0</strong></td>
+            <td><strong>852</strong>*</td>
+            <td>49.8*</td>
+            <td><strong>60.6</strong></td>
+            <td>78.0</td>
+            <td>82.1</td>
+            <td><strong>80.1<strong></td>
+            <td>90.8</td>
+            <td><strong>48.1</strong>*</td>
+            <td><strong>8.2</strong></td>
+        </tr>
+    </tbody>
+</table>
+
+</div>
+* We evaluate this benchmark using chain-of-thought prompting. Specifically, for MME, we used this technique only for the Cognition set.
+
+<sup>+</sup> Token Density: number of pixels encoded into each visual token at maximum resolution, i.e., # pixels at maximum resolution / # visual tokens.
+
+Note: For proprietary models, we calculate token density based on the image encoding charging strategy defined in the official API documentation, which provides an upper-bound estimation.
+
+</details>
+
+
+<details>
+<summary>Click to view multi-image results on Mantis Eval, BLINK, Mathverse mv, Sciverse mv, MIRB.</summary>
+<div align="center">
+ 
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th>Mantis Eval</th>
+            <th>BLINK val</th>
+            <th>Mathverse mv</th>
+            <th>Sciverse mv</th>
+            <th>MIRB</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="7" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4V</td>
+            <td>-</td>
+            <td>62.7</td>
+            <td>54.6</td>
+            <td>60.3</td>
+            <td>66.9</td>
+            <td>53.1</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT-Interleave-14B</td>
+            <td>14B</td>
+            <td>66.4</td>
+            <td>52.6</td>
+            <td>32.7</td>
+            <td>30.2</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td colspan="7" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Emu2-Chat</td>
+            <td>37B</td>
+            <td>37.8</td>
+            <td>36.2</td>
+            <td>-</td>
+            <td>27.2</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">CogVLM</td>
+            <td>17B</td>
+            <td>45.2</td>
+            <td>41.1</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">VPG-C</td>
+            <td>7B</td>
+            <td>52.4</td>
+            <td>43.1</td>
+            <td>24.3</td>
+            <td>23.1</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">VILA 8B</td>
+            <td>8B</td>
+            <td>51.2</td>
+            <td>39.3</td>
+            <td>-</td>
+            <td>36.5</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternLM-XComposer-2.5</td>
+            <td>8B</td>
+            <td>53.1*</td>
+            <td>48.9</td>
+            <td>32.1*</td>
+            <td>-</td>
+            <td>42.5</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2-8B</td>
+            <td>8B</td>
+            <td>59.0*</td>
+            <td>50.9</td>
+            <td>30.5*</td>
+            <td>34.4*</td>
+            <td><strong>56.9*</strong></td>
+        </tr>
+        <tr style="background-color: #e6f2ff;">
+            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
+            <td>8B</td>
+            <td><strong>69.1</strong></td>
+            <td><strong>53.0</strong></td>
+            <td><strong>84.9</strong></td>
+            <td><strong>74.9</strong></td>
+            <td>53.8</td>
+        </tr>
+    </tbody>
+</table>
+
+</div>
+* We evaluate the officially released checkpoint by ourselves.
+</details>
+
+<details>
+<summary>Click to view video results on Video-MME and Video-ChatGPT.</summary>
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th colspan="2">Video-MME</th>
+            <th colspan="5">Video-ChatGPT</th>
+        </tr>
+        <tr>
+            <th align="left"></th>
+            <th></th>
+            <th>w/o subs</th>
+            <th>w subs</th>
+            <th>Correctness</th>
+            <th>Detail</th>
+            <th>Context</th>
+            <th>Temporal</th>
+            <th>Consistency</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td colspan="9" align="left"><strong>Proprietary</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">Claude 3.5 Sonnet</td>
+            <td>-</td>
+            <td>60.0</td>
+            <td>62.9</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">GPT-4V</td>
+            <td>-</td>
+            <td>59.9</td>
+            <td>63.3</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td colspan="9" align="left"><strong>Open-source</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT-7B</td>
+            <td>7B</td>
+            <td>-</td>
+            <td>-</td>
+            <td>3.39</td>
+            <td>3.29</td>
+            <td>3.92</td>
+            <td>2.60</td>
+            <td>3.12</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT-34B</td>
+            <td>34B</td>
+            <td>-</td>
+            <td>-</td>
+            <td>3.29</td>
+            <td>3.23</td>
+            <td>3.83</td>
+            <td>2.51</td>
+            <td>3.47</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">CogVLM2-Video</td>
+            <td>12B</td>
+            <td>-</td>
+            <td>-</td>
+            <td>3.49</td>
+            <td><strong>3.46</strong></td>
+            <td>3.23</td>
+            <td><strong>2.98</strong></td>
+            <td><strong>3.64</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LongVA</td>
+            <td>7B</td>
+            <td>52.4</td>
+            <td>54.3</td>
+            <td>3.05</td>
+            <td>3.09</td>
+            <td>3.77</td>
+            <td>2.44</td>
+            <td><strong>3.64</strong></td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternVL2-8B</td>
+            <td>8B</td>
+            <td>54.0</td>
+            <td>56.9</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">InternLM-XComposer-2.5</td>
+            <td>8B</td>
+            <td>55.8</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td nowrap="nowrap" align="left">LLaVA-NeXT-Video</td>
+            <td>32B</td>
+            <td>60.2</td>
+            <td>63.0</td>
+            <td>3.48</td>
+            <td>3.37</td>
+            <td><strong>3.95</strong></td>
+            <td>2.64</td>
+            <td>3.28</td>
+        </tr>
+        <tr style="background-color: #e6f2ff;">
+            <td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
+            <td>8B</td>
+            <td><strong>60.9</strong></td>
+            <td><strong>63.6</strong></td>
+            <td><strong>3.59</strong></td>
+            <td>3.28</td>
+            <td>3.93</td>
+            <td>2.73</td>
+            <td>3.62</td>
+        </tr>
+    </tbody>
+</table>
+</div>
+</details>
+
+
+<details>
+<summary>Click to view few-shot results on TextVQA, VizWiz, VQAv2, OK-VQA.</summary>
+<div align="center">
+<table style="margin: 0px auto;">
+    <thead>
+        <tr>
+            <th align="left">Model</th>
+            <th>Size</th>
+            <th>Shot</th>
+            <th>TextVQA val</th>
+            <th>VizWiz test-dev</th>
+            <th>VQAv2 test-dev</th>
+            <th>OK-VQA val</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td align="left" nowrap="nowrap" rowspan="3">Flamingo</td>
+            <td rowspan="3">80B</td>
+            <td>0*</td>
+            <td>35.0</td>
+            <td>31.6</td>
+            <td>56.3</td>
+            <td>40.6</td>
+        </tr>
+        <tr>
+            <td>4</td>
+            <td>36.5</td>
+            <td>39.6</td>
+            <td>63.1</td>
+            <td><strong>57.4</strong></td>
+        </tr>
+        <tr>
+            <td>8</td>
+            <td>37.3</td>
+            <td>44.8</td>
+            <td>65.6</td>
+            <td>57.5</td>
+        </tr>
+        <tr>
+            <td align="left" nowrap="nowrap" rowspan="3">IDEFICS</td>
+            <td rowspan="3">80B</td>
+            <td>0*</td>
+            <td>30.9</td>
+            <td>36.0</td>
+            <td>60.0</td>
+            <td>45.2</td>
+        </tr>
+        <tr>
+            <td>4</td>
+            <td>34.3</td>
+            <td>40.4</td>
+            <td>63.6</td>
+            <td>52.4</td>
+        </tr>
+        <tr>
+            <td>8</td>
+            <td>35.7</td>
+            <td>46.1</td>
+            <td>64.8</td>
+            <td>55.1</td>
+        </tr>
+        <tr>
+            <td align="left" nowrap="nowrap" rowspan="3">OmniCorpus</td>
+            <td rowspan="3">7B</td>
+            <td>0*</td>
+            <td>43.0</td>
+            <td>49.8</td>
+            <td>63.2</td>
+            <td>45.5</td>
+        </tr>
+        <tr>
+            <td>4</td>
+            <td>45.4</td>
+            <td>51.3</td>
+            <td>64.5</td>
+            <td>46.5</td>
+        </tr>
+        <tr>
+            <td>8</td>
+            <td>45.6</td>
+            <td>52.2</td>
+            <td>64.7</td>
+            <td>46.6</td>
+        </tr>
+        <tr>
+            <td align="left" nowrap="nowrap" rowspan="3">Emu2</td>
+            <td rowspan="3">37B</td>
+            <td>0</td>
+            <td>26.4</td>
+            <td>40.4</td>
+            <td>33.5</td>
+            <td>26.7</td>
+        </tr>
+        <tr>
+            <td>4</td>
+            <td>48.2</td>
+            <td>54.6</td>
+            <td>67.0</td>
+            <td>53.2</td>
+        </tr>
+        <tr>
+            <td>8</td>
+            <td>49.3</td>
+            <td>54.7</td>
+            <td>67.8</td>
+            <td>54.1</td>
+        </tr>
+        <tr>
+            <td align="left" nowrap="nowrap" rowspan="2">MM1</td>
+            <td rowspan="2">30B</td>
+            <td>0</td>
+            <td>26.2</td>
+            <td>40.4</td>
+            <td>48.9</td>
+            <td>26.7</td>
+        </tr>
+        <tr>
+            <td>8</td>
+            <td>49.3</td>
+            <td>54.7</td>
+            <td><strong>70.9</strong></td>
+            <td>54.1</td>
+        </tr>
+        <tr style="background-color: #e6f2ff;">
+            <td align="left" nowrap="nowrap" rowspan="3">MiniCPM-V 2.6<sup>+</sup></td>
+            <td rowspan="3">8B</td>
+            <td>0</td>
+            <td>43.9</td>
+            <td>33.8</td>
+            <td>45.4</td>
+            <td>23.9</td>
+        </tr>
+        <tr style="background-color: #e6f2ff;">
+            <td>4</td>
+            <td>63.6</td>
+            <td>60.5</td>
+            <td>65.5</td>
+            <td>50.1</td>
+        </tr>
+        <tr style="background-color: #e6f2ff;">
+            <td>8</td>
+            <td><strong>64.6</strong></td>
+            <td><strong>63.4</strong></td>
+            <td>68.2</td>
+            <td>51.4</td>
+        </tr>
+    </tbody>
+</table>
+
+
+</div>
+* denotes zero image shot and two additional text shots following Flamingo.
+
+<sup>+</sup> We evaluate the pretraining ckpt without SFT.
+</details>
+
+### Examples <!-- omit in toc -->
+
+<div style="display: flex; flex-direction: column; align-items: center;">
+  <img src="../assets/minicpmv2_6/multi_img-bike.png" alt="Bike" style="margin-bottom: 5px;">
+  <img src="../assets/minicpmv2_6/multi_img-menu.png" alt="Menu" style="margin-bottom: 5px;">
+  <img src="../assets/minicpmv2_6/multi_img-code.png" alt="Code" style="margin-bottom: 5px;">
+  <img src="../assets/minicpmv2_6/ICL-Mem.png" alt="Mem" style="margin-bottom: 5px;">
+  <img src="../assets/minicpmv2_6/multiling-medal.png" alt="medal" style="margin-bottom: 10px;">
+</div>
+<details>
+  <summary>Click to view more cases.</summary>
+  <div style="display: flex; flex-direction: column; align-items: center;">
+    <img src="../assets/minicpmv2_6/ICL-elec.png" alt="elec" style="margin-bottom: 5px;">
+    <img src="../assets/minicpmv2_6/multiling-olympic.png" alt="Menu" style="margin-bottom: 10px;">
+  </div>
+</details>
+
+We deploy MiniCPM-V 2.6 on end devices. The demo video is the raw screen recording on a iPad Pro without edition.
+
+<table align="center"> 
+    <p align="center">
+      <img src="../assets/gif_cases/ai.gif" width=32%/>
+      &nbsp;&nbsp;&nbsp;&nbsp;
+      <img src="../assets/gif_cases/beer.gif" width=32%/>
+    </p>
+</table> 
+
+<table align="center"> 
+    <p align="center">
+      <img src="../assets/gif_cases/ticket.gif" width=32%/>
+      &nbsp;&nbsp;&nbsp;&nbsp;
+      <img src="../assets/gif_cases/wfh.gif" width=32%/>
+    </p>
+</table> 
+
+<table align="center">
+    <p align="center">
+      <video src="https://github.com/user-attachments/assets/21f4b818-ede1-4822-920e-91281725c830" width="360" /> </video>
+      <!-- <video src="https://github.com/user-attachments/assets/c835f757-206b-4d9c-8e36-70d67b453628" width="360" /> </video> -->
+    </p>
+</table>
+
+</details>
+
+
+
+### Multi-turn Conversation
+
+
+<div align="center">
+<img src="../assets/airplane.jpeg" width="500px">
+</div>
+
+
+```python
+import torch
+from PIL import Image
+from transformers import AutoModel, AutoTokenizer
+
+torch.manual_seed(0)
+
+model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
+    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
+model = model.eval().cuda()
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
+
+image = Image.open('./assets/airplane.jpeg').convert('RGB')
+
+# First round chat 
+question = "Tell me the model of this aircraft."
+msgs = [{'role': 'user', 'content': [image, question]}]
+
+answer = model.chat(
+    image=None,
+    msgs=msgs,
+    tokenizer=tokenizer
+)
+print(answer)
+
+# Second round chat 
+# pass history context of multi-turn conversation
+msgs.append({"role": "assistant", "content": [answer]})
+msgs.append({"role": "user", "content": ["Introduce something about Airbus A380."]})
+
+answer = model.chat(
+    image=None,
+    msgs=msgs,
+    tokenizer=tokenizer
+)
+print(answer)
+```
+
+You could get the following output:
+
+```
+"The aircraft in the image is an Airbus A380, which can be identified by its large size, double-deck structure, and the distinctive shape of its wings and engines. The A380 is a wide-body aircraft known for being the world's largest passenger airliner, designed for long-haul flights. It has four engines, which are characteristic of large commercial aircraft. The registration number on the aircraft can also provide specific information about the model if looked up in an aviation database."
+
+"The Airbus A380 is a double-deck, wide-body, four-engine jet airliner made by Airbus. It is the world's largest passenger airliner and is known for its long-haul capabilities. The aircraft was developed to improve efficiency and comfort for passengers traveling over long distances. It has two full-length passenger decks, which can accommodate more passengers than a typical single-aisle airplane. The A380 has been operated by airlines such as Lufthansa, Singapore Airlines, and Emirates, among others. It is widely recognized for its unique design and significant impact on the aviation industry."
+```
+
+#### Multi-image Understanding
+<details>
+<summary> Click to view Python example of MiniCPM-V 2.6 multi-image understanding </summary>
+  
+```python
+import torch
+from PIL import Image
+from transformers import AutoModel, AutoTokenizer
+
+model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
+    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
+model = model.eval().cuda()
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
+
+image1 = Image.open('image1.jpg').convert('RGB')
+image2 = Image.open('image2.jpg').convert('RGB')
+question = 'Compare image 1 and image 2, tell me about the differences between image 1 and image 2.'
+
+msgs = [{'role': 'user', 'content': [image1, image2, question]}]
+
+answer = model.chat(
+    image=None,
+    msgs=msgs,
+    tokenizer=tokenizer
+)
+print(answer)
+```
+</details>
+
+#### Few-shot In-Context-Learning 
+
+<details>
+<summary> Click to view Python example of MiniCPM-V 2.6 few-shot in-context-learning example </summary>
+
+```python
+import torch
+from PIL import Image
+from transformers import AutoModel, AutoTokenizer
+
+model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
+    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
+model = model.eval().cuda()
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
+
+question = "production date" 
+image1 = Image.open('example1.jpg').convert('RGB')
+answer1 = "2023.08.04"
+image2 = Image.open('example2.jpg').convert('RGB')
+answer2 = "2007.04.24"
+image_test = Image.open('test.jpg').convert('RGB')
+
+msgs = [
+    {'role': 'user', 'content': [image1, question]}, {'role': 'assistant', 'content': [answer1]},
+    {'role': 'user', 'content': [image2, question]}, {'role': 'assistant', 'content': [answer2]},
+    {'role': 'user', 'content': [image_test, question]}
+]
+
+answer = model.chat(
+    image=None,
+    msgs=msgs,
+    tokenizer=tokenizer
+)
+print(answer)
+```
+</details>
+
+#### Video understanding
+<details>
+<summary> Click to view Python example of MiniCPM-V 2.6 video understanding </summary>
+
+```python
+import torch
+from PIL import Image
+from transformers import AutoModel, AutoTokenizer
+from decord import VideoReader, cpu    # pip install decord
+
+model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
+    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
+model = model.eval().cuda()
+tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
+
+MAX_NUM_FRAMES=64 # if cuda OOM set a smaller number
+
+def encode_video(video_path):
+    def uniform_sample(l, n):
+        gap = len(l) / n
+        idxs = [int(i * gap + gap / 2) for i in range(n)]
+        return [l[i] for i in idxs]
+
+    vr = VideoReader(video_path, ctx=cpu(0))
+    sample_fps = round(vr.get_avg_fps() / 1)  # FPS
+    frame_idx = [i for i in range(0, len(vr), sample_fps)]
+    if len(frame_idx) > MAX_NUM_FRAMES:
+        frame_idx = uniform_sample(frame_idx, MAX_NUM_FRAMES)
+    frames = vr.get_batch(frame_idx).asnumpy()
+    frames = [Image.fromarray(v.astype('uint8')) for v in frames]
+    print('num frames:', len(frames))
+    return frames
+
+video_path="video_test.mp4"
+frames = encode_video(video_path)
+question = "Describe the video"
+msgs = [
+    {'role': 'user', 'content': frames + [question]}, 
+]
+
+# Set decode params for video
+params = {}
+params["use_image_id"] = False
+params["max_slice_nums"] = 2 # 如果cuda OOM且视频分辨率大于448*448可设为1
+
+answer = model.chat(
+    image=None,
+    msgs=msgs,
+    tokenizer=tokenizer,
+    **params
+)
+print(answer)
+```
+</details>
diff --git a/omnilmm.md b/docs/omnilmm.md
similarity index 100%
rename from omnilmm.md
rename to docs/omnilmm.md
diff --git a/omnilmm_en.md b/docs/omnilmm_en.md
similarity index 98%
rename from omnilmm_en.md
rename to docs/omnilmm_en.md
index 62f7992..6782d44 100644
--- a/omnilmm_en.md
+++ b/docs/omnilmm_en.md
@@ -1,6 +1,6 @@
 ## OmniLMM-12B
 
-> OmniLMM-12B is released at early time of this project. We recommond you to use our [recently released models](./README_en.md), for better performance and efficiency.
+> OmniLMM-12B is released at early time of this project. We recommond you to use our [recently released models](./README.md), for better performance and efficiency.
 
 > Archieve at: 2024-05-19
 
diff --git a/finetune/dataset.py b/finetune/dataset.py
index 885ae14..5012ec6 100644
--- a/finetune/dataset.py
+++ b/finetune/dataset.py
@@ -7,7 +7,6 @@ import re
 import random
 from dataclasses import dataclass, field
 from typing import Dict, List, Optional
-from decord import VideoReader, cpu    # pip install decord
 
 import numpy as np
 import torch
@@ -21,26 +20,6 @@ logger = logging.getLogger(__name__)
 
 llama3_chat_template = "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}"
 
-MAX_NUM_FRAMES=64
-def encode_video(video_path, max_num_frames=64):
-    max_num_frames = min(max_num_frames, MAX_NUM_FRAMES)
-    def uniform_sample(l, n):
-        gap = len(l) / n
-        idxs = [int(i * gap + gap / 2) for i in range(n)]
-        return [l[i] for i in idxs]
-
-    vr = VideoReader(video_path, ctx=cpu(0))
-    sample_fps = round(vr.get_avg_fps() / 1)  # FPS
-    frame_idx = [i for i in range(0, len(vr), sample_fps)]
-    if len(frame_idx) > max_num_frames:
-        if max_num_frames==1:
-            frame_idx = [frame_idx[len(frame_idx)//2]]
-        else:
-            frame_idx = uniform_sample(frame_idx, max_num_frames)
-    frames = vr.get_batch(frame_idx).asnumpy()
-    frames = [Image.fromarray(v.astype('uint8')) for v in frames]
-    return frames
-
 class SupervisedDataset(Dataset):
     """Dataset for supervised fine-tuning."""
 
@@ -55,8 +34,6 @@ class SupervisedDataset(Dataset):
         query_nums=64,
         batch_vision=False,
         max_length=2048,
-        video_max_slice_nums=2,
-        max_num_frames=1,
     ):
         super(SupervisedDataset, self).__init__()
         self.raw_data = raw_data
@@ -68,58 +45,17 @@ class SupervisedDataset(Dataset):
         self.query_nums=query_nums
         self.batch_vision = batch_vision
         self.max_length = max_length
-        # video config
-        self.video_slice_config = copy.deepcopy(slice_config)
-        self.video_slice_config['max_slice_nums'] = video_max_slice_nums
-        self.max_num_frames = max_num_frames
 
     def __len__(self):
         return len(self.raw_data)
 
     def __getitem__(self, i) -> Dict[str, torch.Tensor]:
         try:
-            # default: sft image 
-            use_image_id = True
-            slice_config = self.slice_config
-            if "image" in self.raw_data[i]:
-                if isinstance(self.raw_data[i]["image"], str):
-                    images_dict = { "<image>" : Image.open(self.raw_data[i]["image"]).convert("RGB") }
-                elif isinstance(self.raw_data[i]["image"], Dict):
-                    ### for multi-images input, the template for every image is <image_xx>, such as <image_00>, <image_01>
-                    images_dict = {img_name : Image.open(img_path).convert("RGB") for img_name, img_path in self.raw_data[i]["image"].items()}
-            elif "video" in self.raw_data[i]:
-                if isinstance(self.raw_data[i]["video"], str):
-                    frames = encode_video(self.raw_data[i]["video"], max_num_frames=self.max_num_frames)
-                    image_names = []
-                    images_dict = {}
-                    for j, frame in enumerate(frames):
-                        image_name = "<image_{:02d}>".format(j)
-                        images_dict[image_name] = frame
-                        image_names.append(image_name)
-                    for j in range(len(self.raw_data[i]["conversations"])):
-                        content = self.raw_data[i]["conversations"][j]['content']
-                        self.raw_data[i]["conversations"][j]['content'] = content.replace("<video>", "".join(image_names))
-                elif isinstance(self.raw_data[i]["video"], Dict):
-                    videos = self.raw_data[i]["video"]
-                    images_dict = {}
-                    video_names = {}
-                    cnt = 0
-                    for video_name in videos:
-                        video_id = video_name.split("_")[-1].strip(">")
-                        video = videos[video_name]
-                        frames = encode_video(video, max_num_frames=self.max_num_frames)
-                        image_names = []
-                        for j, frame in enumerate(frames):
-                            image_name = "<image_{:02d}>".format(cnt)
-                            cnt += 1
-                            images_dict[image_name] = frame
-                            image_names.append(image_name)
-                        for j in range(len(self.raw_data[i]["conversations"])):
-                            content = self.raw_data[i]["conversations"][j]['content']
-                            self.raw_data[i]["conversations"][j]['content'] = content.replace(video_name, "".join(image_names))
-                # video: modify config
-                slice_config = self.video_slice_config
-                use_image_id = False
+            if isinstance(self.raw_data[i]["image"], str):
+                images_dict = { "<image>" : Image.open(self.raw_data[i]["image"]).convert("RGB") }
+            elif isinstance(self.raw_data[i]["image"], Dict):
+                ### for multi-images input, the template for every image is <image_xx>, such as <image_00>, <image_01>
+                images_dict = {img_name : Image.open(img_path).convert("RGB") for img_name, img_path in self.raw_data[i]["image"].items()}
                 
             ret = preprocess(
                 images_dict,
@@ -131,8 +67,7 @@ class SupervisedDataset(Dataset):
                 llm_type=self.llm_type,
                 patch_size=self.patch_size,
                 batch_vision=self.batch_vision,
-                max_length=self.max_length,
-                use_image_id=use_image_id
+                max_length=self.max_length
             )
             ret = dict(
                 input_ids=ret["input_ids"],
@@ -197,7 +132,7 @@ def conversation_to_ids(conversation, tokenizer, llm_type=None, new_schema=False
         input_ids, context, raw_msg = conversation_to_ids_llama3(
             conversation, tokenizer
         )
-    elif llm_type == "qwen2":
+    elif llm_type == "qwen":
         input_ids, context, raw_msg = conversation_to_ids_qwen2(
             conversation, tokenizer
         )
@@ -383,7 +318,6 @@ def preprocess(
     patch_size=14,
     batch_vision=False,
     max_length=2048,
-    use_image_id=True
 ):
     """
     single(multi) image(s) preprocess, the image(s) will be placed at the top of the conversation
@@ -402,9 +336,9 @@ def preprocess(
     )
     new_schema = False
     use_image_id = False
-    if llm_type=='qwen2':
+    if llm_type=='qwen':
         new_schema = True
-        use_image_id = use_image_id
+        use_image_id = True
     image_placeholder_dict = {}
     images = []
     image_id_cnt = 0 
diff --git a/finetune/finetune.py b/finetune/finetune.py
index eeccd6a..bc18611 100644
--- a/finetune/finetune.py
+++ b/finetune/finetune.py
@@ -14,7 +14,7 @@ from accelerate.utils import DistributedType
 from deepspeed import zero
 from deepspeed.runtime.zero.partition_parameters import ZeroParamStatus
 
-from transformers import AutoModel, AutoTokenizer, AutoProcessor
+from transformers import AutoModel, AutoTokenizer
 from transformers.integrations import deepspeed
 from transformers import AutoModel, AutoTokenizer
 
@@ -53,8 +53,6 @@ class TrainingArguments(transformers.TrainingArguments):
     llm_type: str = field(default="minicpm")
     use_lora: Optional[bool] = field(default=False)
     max_slice_nums: Optional[int] = field(default=9)
-    video_max_slice_nums: Optional[int] = field(default=2)
-    max_num_frames: Optional[int] = field(default=1)
 
 
 @dataclass
@@ -94,8 +92,6 @@ def make_supervised_data_module(
     query_nums=64,
     batch_vision=False,
     max_length=2048,
-    video_max_slice_nums=2,
-    max_num_frames=1,
 ) -> Dict:
     """Make dataset and collator for supervised fine-tuning."""
     dataset_cls = SupervisedDataset
@@ -113,8 +109,6 @@ def make_supervised_data_module(
         query_nums=query_nums,
         batch_vision=batch_vision,
         max_length=max_length,
-        video_max_slice_nums=video_max_slice_nums,
-        max_num_frames=max_num_frames,
     )
 
     if data_args.eval_data_path:
@@ -129,8 +123,6 @@ def make_supervised_data_module(
             query_nums=query_nums,
             batch_vision=batch_vision,
             max_length=max_length,
-            video_max_slice_nums=video_max_slice_nums,
-            max_num_frames=max_num_frames,
         )
     else:
         eval_dataset = None
@@ -210,10 +202,10 @@ def train():
         trust_remote_code=True,
         torch_dtype=compute_dtype,
         device_map=device_map,
+        init_vision=True,
+        init_audio=False,
+        init_tts=False,
     )
-    model.__class__.register_for_auto_class()
-    
-    model.processor = AutoProcessor.from_pretrained(model_args.model_name_or_path, trust_remote_code=True)
 
     tokenizer = AutoTokenizer.from_pretrained(
         model_args.model_name_or_path, trust_remote_code=True
@@ -287,8 +279,6 @@ def train():
         query_nums=model.config.query_num,
         batch_vision=batch_vision,
         max_length=training_args.model_max_length,
-        video_max_slice_nums=training_args.video_max_slice_nums,
-        max_num_frames=training_args.max_num_frames,
     )
     
     training_args.gradient_checkpointing_kwargs={"use_reentrant":False}
diff --git a/finetune/finetune_ds.sh b/finetune/finetune_ds.sh
index c049471..6c8bc1c 100644
--- a/finetune/finetune_ds.sh
+++ b/finetune/finetune_ds.sh
@@ -5,14 +5,17 @@ NNODES=1
 NODE_RANK=0
 MASTER_ADDR=localhost
 MASTER_PORT=6001
-
-MODEL="openbmb/MiniCPM-V-2_6"
-# or openbmb/MiniCPM-V-2, openbmb/MiniCPM-Llama3-V-2_5
+ 
+MODEL="openbmb/MiniCPM-o-2_6"
+# or openbmb/MiniCPM-V-2, openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2_6
 # ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.
 # See the section for finetuning in README for more information.
 DATA="path/to/trainging_data"
 EVAL_DATA="path/to/test_data"
-LLM_TYPE="qwen2" # if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm, if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE="llama3"
+
+# if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm, if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE="llama3",
+# if use openbmb/MiniCPM-o-2_6 or openbmb/MiniCPM-V-2_6, please set LLM_TYPE=qwen
+LLM_TYPE="qwen" 
 MODEL_MAX_Length=2048 # if conduct multi-images sft, please set MODEL_MAX_Length=4096
 
 
@@ -38,7 +41,7 @@ torchrun $DISTRIBUTED_ARGS finetune.py  \
     --do_train \
     --do_eval \
     --tune_vision true \
-    --tune_llm true \
+    --tune_llm false \
     --model_max_length $MODEL_MAX_Length \
     --max_slice_nums 9 \
     --max_steps 10000 \
@@ -60,5 +63,5 @@ torchrun $DISTRIBUTED_ARGS finetune.py  \
     --lr_scheduler_type "cosine" \
     --logging_steps 1 \
     --gradient_checkpointing true \
-    --deepspeed ds_config_zero2.json \
+    --deepspeed ds_config_zero3.json \
     --report_to "tensorboard" 
diff --git a/finetune/finetune_lora.sh b/finetune/finetune_lora.sh
index 19437b1..df3140a 100644
--- a/finetune/finetune_lora.sh
+++ b/finetune/finetune_lora.sh
@@ -5,16 +5,16 @@ NNODES=1
 NODE_RANK=0
 MASTER_ADDR=localhost
 MASTER_PORT=6001
-
-MODEL="openbmb/MiniCPM-V-2_6" # or openbmb/MiniCPM-V-2, openbmb/MiniCPM-Llama3-V-2_5
+ 
+MODEL="openbmb/MiniCPM-o-2_6"
+# or openbmb/MiniCPM-V-2, openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2_6
 # ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.
 # See the section for finetuning in README for more information.
 DATA="path/to/trainging_data"
 EVAL_DATA="path/to/test_data"
-LLM_TYPE="qwen2" 
-# if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm
-#if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE=llama3
-
+# if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm, if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE="llama3",
+# if use openbmb/MiniCPM-o-2_6 or openbmb/MiniCPM-V-2_6, please set LLM_TYPE=qwen
+LLM_TYPE="qwen"   
 MODEL_MAX_Length=2048 # if conduct multi-images sft, please set MODEL_MAX_Length=4096
 
 DISTRIBUTED_ARGS="
@@ -24,6 +24,7 @@ DISTRIBUTED_ARGS="
     --master_addr $MASTER_ADDR \
     --master_port $MASTER_PORT
 "
+
 torchrun $DISTRIBUTED_ARGS finetune.py  \
     --model_name_or_path $MODEL \
     --llm_type $LLM_TYPE \
diff --git a/finetune/readme.md b/finetune/readme.md
index 07700e7..74c1dab 100644
--- a/finetune/readme.md
+++ b/finetune/readme.md
@@ -1,7 +1,7 @@
 # MiniCPM-V Finetuning
 
 
-We offer the official scripts for easy finetuning of the pretrained **MiniCPM-V-2_6**, **MiniCPM-Llama3-V 2.5** and **MiniCPM-V 2.0** on downstream tasks. Our finetune scripts use transformers Trainer and DeepSpeed by default.
+We offer the official scripts for easy finetuning of the pretrained **MiniCPM-o-2_6**, **MiniCPM-V-2_6**, **MiniCPM-Llama3-V 2.5** and **MiniCPM-V 2.0** on downstream tasks. Our finetune scripts use transformers Trainer and DeepSpeed by default.
 
 ### Data preparation
 
@@ -20,30 +20,30 @@ If your input consists of a single image, you can use a single placeholder **\<i
   [
     {
       "id": "0",
-      "image": "path/to/image_0.jpg",
+      "image": 'path/to/image_0.jpg',
       "conversations": [
             {
-              "role": "user", 
-              "content": "<image>\nHow many desserts are on the white plate?"
+              'role': 'user', 
+              'content': '<image>\nHow many desserts are on the white plate?'
             }, 
             {
-                "role": "assistant", 
-                "content": "There are three desserts on the white plate."
+                'role': 'assistant', 
+                'content': 'There are three desserts on the white plate.'
             },   
             {
-                "role": "user", 
-                "content": "What type of desserts are they?"
+                'role': 'user', 
+                'content': 'What type of desserts are they?'
             },
             {
-                "role": "assistant", 
-                "content": "The desserts are cakes with bananas and pecans on top. They share similarities with donuts, but the presence of bananas and pecans differentiates them."
+                'role': 'assistant', 
+                'content': 'The desserts are cakes with bananas and pecans on top. They share similarities with donuts, but the presence of bananas and pecans differentiates them.'
             }, 
             {
-                "role": "user", 
-                "content": "What is the setting of the image?"}, 
+                'role': 'user', 
+                'content': 'What is the setting of the image?'}, 
             {
-                "role": "assistant", 
-                "content": "The image is set on a table top with a plate containing the three desserts."
+                'role': 'assistant', 
+                'content': 'The image is set on a table top with a plate containing the three desserts.'
             },
         ]
     },
@@ -91,81 +91,16 @@ If the total token count exceeds `max_length`, truncation will be applied. For m
 ```
 </details>
 
-#### Single Video Example
-If your input consists of a single video, you can use a single placeholder **\<video\>** to indicate where the video should be inserted in the conversation.
-<details>
-  <summary>
-    <b>Single video example (vl_finetune_video.json) with 1 samples.</b>
-  </summary>
-
-```
-  [
-    {
-      "id": "0",
-      "video": "path/to/video_0.mp4",
-      "conversations": [
-            {
-              "role": "user", 
-              "content": "<video>\nHow many desserts are on the white plate?"
-            }, 
-            {
-                "role": "assistant", 
-                "content": "There are three desserts on the white plate."
-            }
-        ]
-    }
-  ]
-```
-
-</details>
-
-#### Multiple Videos Example
-For inputs containing multiple videos, utilize a dictionary where each key represents a unique placeholder (e.g., **\<video_00\>**, **\<video_01\**) with the corresponding video path as its value. These placeholders can then be used within the conversation to seamlessly insert videos at specific positions.
-
-Additionally, to optimize resource management, especially when dealing with large batches of videos during training or inference, consider reducing `video_max_slice_nums` and `max_num_frames`. To minimize the number of tokens used per video, you can set `video_max_slice_nums=1` and `max_num_frames=1`, resulting in a single video being represented by 64 tokens.
-
-If the total token count exceeds `max_length`, truncation will be applied. For multi-video supervised fine-tuning (SFT), it's recommended to set `MODEL_MAX_LENGTH=4096` in your script for better performance.
-
-<details>
-  <summary>
-    <b>Multiple videos example (vl_finetune_data.json) with 1 samples.</b>
-  </summary>
-
-```
-  [
-    {
-      "id": "0",
-      "video": {
-        "<video_00>": "path/to/video_0.mp4",
-        "<video_01>": "path/to/video_1.avi",
-        "<video_02>": "path/to/video_2.mp4",
-        "<video_03>": "path/to/video_3.avi"
-      },
-      "conversations": [
-        {
-          "role": "user", 
-          "content": "How to create such text-only videos using CapCut?\n<video_00>\n<image_01>\n<video_01>\n<video_02>\n"
-        }, 
-        {
-          "role": "assistant", 
-          "content": "To create a text-only video as shown in the videos, follow these steps in CapCut..."
-        }
-      ]
-    }
-  ]
-```
-</details>
-
-
 ### Full-parameter finetuning
 
 Full-parameter parameter finetuning requires updating all parameters of LLM in the whole training process. Please specify the correct MODEL path, DATA path and LLM_TYPE in the shell scripts.
 
 ```shell
-MODEL="openbmb/MiniCPM-V-2_6" # or openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2
+MODEL="MiniCPM-o-2_6" # or "openbmb/MiniCPM-V-2_6", openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2
 DATA="path/to/trainging_data" # json file
 EVAL_DATA="path/to/test_data" # json file
-LLM_TYPE="qwen2" # if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm, if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE="llama3"
+LLM_TYPE="qwen" # if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm, if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE="llama3",
+# if use openbmb/MiniCPM-o-2_6 or openbmb/MiniCPM-V-2_6, please set LLM_TYPE=qwen
 ```
 
 To launch your training, run the following script:
@@ -188,7 +123,7 @@ After training, you could load the model with the path to the adapter. We advise
 ```
 from peft import PeftModel
 from transformers import AutoModel
-model_type=  "openbmb/MiniCPM-V-2_6"   # or openbmb/MiniCPM-Llama3-V-2_5 , openbmb/MiniCPM-V-2
+model_type=  ""openbmb/MiniCPM-o-2_6" or # openbmb/MiniCPM-V-2_6", openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2
 path_to_adapter="path_to_your_fine_tuned_checkpoint"
 
 model =  AutoModel.from_pretrained(
diff --git a/finetune/requirements.txt b/finetune/requirements.txt
new file mode 100644
index 0000000..16160a6
--- /dev/null
+++ b/finetune/requirements.txt
@@ -0,0 +1,44 @@
+packaging==23.2
+addict==2.4.0
+editdistance==0.6.2
+einops==0.7.0
+fairscale==0.4.0
+jsonlines==4.0.0
+markdown2==2.4.10
+matplotlib==3.7.4
+more_itertools==10.1.0
+nltk==3.8.1
+numpy==1.24.4
+opencv_python_headless==4.5.5.64
+openpyxl==3.1.2
+Pillow==10.1.0
+sacrebleu==2.3.2
+seaborn==0.13.0
+shortuuid==1.0.11
+spacy==3.7.2
+torch==2.2.0
+torchaudio==2.2.0
+torchvision==0.17.0
+timm==0.9.10
+tqdm==4.66.1
+protobuf==4.25.0 
+typing_extensions==4.8.0
+uvicorn==0.24.0.post1
+#xformers==0.0.22.post7
+#flash_attn==2.3.4
+sentencepiece==0.1.99
+accelerate==0.30.1
+socksio==1.0.0
+gradio==4.41.0
+gradio_client
+http://thunlp.oss-cn-qingdao.aliyuncs.com/multi_modal/never_delete/modelscope_studio-0.4.0.9-py3-none-any.whl
+decord
+aiosignal
+tensorborad
+deepspeed==0.12.3
+transformers==4.44.2
+librosa==0.9.0
+soundfile==0.12.1
+vector-quantize-pytorch==1.18.5
+vocos==0.1.0
+moviepy
diff --git a/finetune/trainer.py b/finetune/trainer.py
index 3443e34..7da95ed 100644
--- a/finetune/trainer.py
+++ b/finetune/trainer.py
@@ -170,7 +170,7 @@ class CPMTrainer(Trainer):
 
         return (loss, logits, labels)
         
-    def training_step(self, model: nn.Module, inputs: Dict[str, Union[torch.Tensor, Any]], num_items_in_batch: int=None) -> torch.Tensor:
+    def training_step(self, model: nn.Module, inputs: Dict[str, Union[torch.Tensor, Any]]) -> torch.Tensor:
         """
         Perform a training step on a batch of inputs.
 
@@ -245,9 +245,6 @@ class CPMTrainer(Trainer):
 
         if self.tokenizer is not None:
             self.tokenizer.save_pretrained(output_dir)
-            
-        if self.model.processor is not None:
-            self.model.processor.save_pretrained(output_dir)
 
         # Good practice: save your training arguments together with the trained model
         torch.save(self.args, os.path.join(output_dir, TRAINING_ARGS_NAME))
diff --git a/requirements_o2.6.txt b/requirements_o2.6.txt
new file mode 100644
index 0000000..ae9ac2f
--- /dev/null
+++ b/requirements_o2.6.txt
@@ -0,0 +1,18 @@
+Pillow==10.1.0
+torch==2.2.0
+torchaudio==2.2.0
+torchvision==0.17.0
+transformers==4.44.2
+sentencepiece==0.2.0
+vector-quantize-pytorch==1.18.5
+vocos==0.1.0
+accelerate==1.2.1
+timm==0.9.10
+soundfile==0.12.1
+librosa==0.9.0
+decord
+moviepy
+
+# for web
+fastapi
+uvicorn
diff --git a/web_demos/minicpm-o_2.6/model_server.py b/web_demos/minicpm-o_2.6/model_server.py
new file mode 100644
index 0000000..9cfb876
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/model_server.py
@@ -0,0 +1,935 @@
+import base64
+import json
+import asyncio
+import numpy as np
+import os, sys, io
+import threading
+import time
+import aiofiles
+import librosa
+import soundfile
+import wave
+from typing import Dict, List, Any, Optional
+import argparse
+import logging
+import torch
+from PIL import Image
+from transformers import AutoModel, AutoTokenizer, AutoProcessor
+import uvicorn
+from fastapi import FastAPI, Header, Query, Request, HTTPException, WebSocket, WebSocketDisconnect
+from fastapi.responses import JSONResponse, StreamingResponse
+
+cur_path = os.path.split(os.path.realpath(__file__))[0]
+sys.path.append(os.path.abspath(cur_path))
+import vad_utils
+
+def setup_logger():
+    logger = logging.getLogger("api_logger")
+    logger.setLevel(logging.DEBUG)
+
+    # Create formatter
+    formatter = logging.Formatter(
+        '%(asctime)s.%(msecs)03d-%(levelname)s-[%(filename)s:%(lineno)d] - %(message)s',
+        datefmt='%Y-%m-%d %H:%M:%S'
+    )
+
+    # Create handlers for stdout and stderr
+    stdout_handler = logging.StreamHandler(sys.stdout)
+    stdout_handler.setLevel(logging.INFO)  # INFO and DEBUG go to stdout
+    stdout_handler.setFormatter(formatter)
+    stdout_handler.addFilter(lambda record: record.levelno <= logging.INFO)
+
+    stderr_handler = logging.StreamHandler(sys.stderr)
+    stderr_handler.setLevel(logging.WARNING)  # WARNING, ERROR, CRITICAL go to stderr
+    stderr_handler.setFormatter(formatter)
+
+    # Add handlers to logger
+    logger.addHandler(stdout_handler)
+    logger.addHandler(stderr_handler)
+
+    return logger
+
+
+app = FastAPI()
+logger = setup_logger()
+
+ap = argparse.ArgumentParser()
+ap.add_argument('--port', type=int , default=8088)
+args = ap.parse_args()
+
+
+class StreamManager:
+    def __init__(self):
+        self.uid = None
+
+        self.is_streaming_complete = threading.Event()
+        self.conversation_started = threading.Event()
+        self.last_request_time = None
+        self.last_stream_time = None
+        self.timeout = 900  # seconds timeout
+        self.stream_timeout = 3  # seconds no stream
+        self.num_stream = 0
+        self.stream_started = False
+        self.stop_response = False
+
+        # VAD settings
+        self.vad_options = vad_utils.VadOptions()
+        self.vad_sequence_length = 5
+        self.vad_sequence = []
+        self.audio_prefill = []
+        self.audio_input = []
+        self.image_prefill = None
+        self.audio_chunk = 200
+
+        # customized options
+        self.customized_audio = None
+        self.customized_options = None
+
+        # Omni model
+        self.target_dtype = torch.bfloat16
+        self.device='cuda:0'
+        
+        self.minicpmo_model_path = "openbmb/MiniCPM-o-2_6"
+        self.model_version = "2.6"
+        with torch.no_grad():
+            self.minicpmo_model = AutoModel.from_pretrained(self.minicpmo_model_path, trust_remote_code=True, torch_dtype=self.target_dtype, attn_implementation='sdpa')
+        self.minicpmo_tokenizer = AutoTokenizer.from_pretrained(self.minicpmo_model_path, trust_remote_code=True)
+        self.minicpmo_model.init_tts()
+        # self.minicpmo_model.tts.float()
+        self.minicpmo_model.to(self.device).eval()
+
+        self.ref_path_video_default = "assets/ref_audios/video_default.wav"
+        self.ref_path_default = "assets/ref_audios/default.wav"
+        self.ref_path_female = "assets/ref_audios/female_example.wav"
+        self.ref_path_male = "assets/ref_audios/male_example.wav"
+        
+        self.input_audio_id = 0
+        self.input_audio_vad_id = 0
+        self.input_image_id = 0
+        self.output_audio_id = 0
+        self.flag_decode = False
+        self.cnts = None
+        
+        self.all_start_time = time.time()
+        self.session_id = 233
+        self.sys_prompt_flag = False
+        self.vad_time = 0
+        self.ls_time = 0
+        self.msg_type = 1
+        
+        self.speaking_time_stamp = 0
+        self.cycle_wait_time = 12800/24000 + 0.15
+        self.extra_wait_time = 2.5
+        self.server_wait = True
+        
+        self.past_session_id = 0
+        self.sys_prompt_init(0)
+        self.session_id += 1
+        
+        
+    def start_conversation(self):
+        logger.info(f"uid {self.uid}: new conversation started.")
+        self.conversation_started.set()
+        self.stop_response = False
+
+    def update_last_request_time(self):
+        self.last_request_time = time.time()
+        #logger.info(f"update last_request_time {self.last_request_time}")
+
+    def update_last_stream_time(self):
+        self.last_stream_time = time.time()
+        #logger.info(f"update last_stream_time {self.last_stream_time}")
+
+    def move_to_device(self, obj, device):
+        if isinstance(obj, torch.Tensor):
+            obj_ = obj.to(device)
+            if (obj_.dtype == torch.float) or (obj_.dtype == torch.half):
+                # cast to `torch.bfloat16`
+                obj_ = obj_.to(self.target_dtype)
+            return obj_
+        elif isinstance(obj, dict):
+            return {key: self.move_to_device(value, device) for key, value in obj.items()}
+        elif isinstance(obj, list):
+            return [self.move_to_device(item, device) for item in obj]
+        elif isinstance(obj, tuple):
+            return tuple(self.move_to_device(item, device) for item in obj)
+        elif isinstance(obj, set):
+            return {self.move_to_device(item, device) for item in obj}
+        else:
+            return obj
+          
+    def reset(self):
+        logger.info("reset")
+        self.is_streaming_complete.clear()
+        self.conversation_started.clear()
+        self.last_request_time = None
+        self.last_stream_time = None
+        self.audio_buffer_raw = bytearray()
+        self.num_stream = 0
+        self.stream_started = False
+        self.stop_response = False
+        # self.customized_audio = None
+        # self.customized_options = None
+        # clear model
+        self.clear()
+
+    def merge_wav_files(self, input_bytes_list, output_file):
+        with wave.open(io.BytesIO(input_bytes_list[0]), 'rb') as wav:
+            params = wav.getparams()
+            n_channels, sampwidth, framerate, n_frames, comptype, compname = params
+            
+        with wave.open(output_file, 'wb') as output_wav:
+            output_wav.setnchannels(n_channels)
+            output_wav.setsampwidth(sampwidth)
+            output_wav.setframerate(framerate)
+            output_wav.setcomptype(comptype, compname)
+        
+            for wav_bytes in input_bytes_list:
+                with wave.open(io.BytesIO(wav_bytes), 'rb') as wav:
+                    output_wav.writeframes(wav.readframes(wav.getnframes()))
+
+    
+    def is_timed_out(self):
+        if self.last_request_time is not None:
+            return time.time() - self.last_request_time > self.timeout
+        return False
+
+    def no_active_stream(self):
+        if self.last_stream_time is not None and self.stream_started:
+            no_stream_duration = time.time() - self.last_stream_time
+            if no_stream_duration > self.stream_timeout:
+                #logger.info(f"no active stream for {no_stream_duration} secs.")
+                return True
+        return False
+
+    def sys_prompt_init(self, msg_type):
+        if self.past_session_id == self.session_id:
+            return
+        logger.info("### sys_prompt_init ###")
+
+        logger.info(f'msg_type is {msg_type}')
+        if msg_type <= 1: #audio
+            audio_voice_clone_prompt = "克隆音频提示中的音色以生成语音。"
+            audio_assistant_prompt = "Your task is to be a helpful assistant using this voice pattern."
+            ref_path = self.ref_path_default
+
+            
+            if self.customized_options is not None:
+                audio_voice_clone_prompt = self.customized_options['voice_clone_prompt']
+                audio_assistant_prompt = self.customized_options['assistant_prompt']
+                if self.customized_options['use_audio_prompt'] == 1:
+                    ref_path = self.ref_path_default
+                elif self.customized_options['use_audio_prompt'] == 2:
+                    ref_path = self.ref_path_female
+                elif self.customized_options['use_audio_prompt'] == 3:
+                    ref_path = self.ref_path_male
+
+            audio_prompt, sr = librosa.load(ref_path, sr=16000, mono=True)
+            sys_msg = {'role': 'user', 'content': [audio_voice_clone_prompt + "\n", audio_prompt, "\n" + audio_assistant_prompt]}
+        elif msg_type == 2: #video
+            voice_clone_prompt="你是一个AI助手。你能接受视频，音频和文本输入并输出语音和文本。模仿输入音频中的声音特征。"
+            assistant_prompt="作为助手，你将使用这种声音风格说话。"
+            ref_path = self.ref_path_video_default
+            
+            if self.customized_options is not None:
+                voice_clone_prompt = self.customized_options['voice_clone_prompt']
+                assistant_prompt = self.customized_options['assistant_prompt']
+                if self.customized_options['use_audio_prompt'] == 1:
+                    ref_path = self.ref_path_default
+                elif self.customized_options['use_audio_prompt'] == 2:
+                    ref_path = self.ref_path_female
+                elif self.customized_options['use_audio_prompt'] == 3:
+                    ref_path = self.ref_path_male
+                
+            audio_prompt, sr = librosa.load(ref_path, sr=16000, mono=True)
+            sys_msg = {'role': 'user', 'content': [voice_clone_prompt, audio_prompt, assistant_prompt]}
+        # elif msg_type == 3: #user start
+        #     assistant_prompt="作为助手，你将使用这种声音风格说话。"
+        #     if self.customized_options is not None:
+        #         assistant_prompt = self.customized_options['assistant_prompt']
+                
+        #     sys_msg = {'role': 'user', 'content': [assistant_prompt]}
+        
+        self.msg_type = msg_type
+        msgs = [sys_msg]
+        if self.customized_options is not None:
+            if self.customized_options['use_audio_prompt'] > 0:
+                self.minicpmo_model.streaming_prefill(
+                    session_id=str(self.session_id),
+                    msgs=msgs,
+                    tokenizer=self.minicpmo_tokenizer,
+                )
+        if msg_type == 0:
+            self.minicpmo_model.streaming_prefill(
+                session_id=str(self.session_id),
+                msgs=msgs,
+                tokenizer=self.minicpmo_tokenizer,
+            )
+            
+        self.savedir = os.path.join(f"./log_data/{args.port}/", str(time.time()))
+        if not os.path.exists(self.savedir):
+            os.makedirs(self.savedir)
+        if not os.path.exists(self.savedir + "/input_audio_log"):
+            os.makedirs(self.savedir + "/input_audio_log")
+        if not os.path.exists(self.savedir + "/input_audio_vad_log"):
+            os.makedirs(self.savedir + "/input_audio_vad_log")
+        if not os.path.exists(self.savedir + "/input_image_log"):
+            os.makedirs(self.savedir + "/input_image_log")
+        if not os.path.exists(self.savedir + "/output_audio_log"):
+            os.makedirs(self.savedir + "/output_audio_log")
+        if not os.path.exists(self.savedir + "/feedback_log"):
+            os.makedirs(self.savedir + "/feedback_log")
+        if not os.path.exists(self.savedir + "/input_audio"):
+            os.makedirs(self.savedir + "/input_audio")
+        
+        self.past_session_id = self.session_id
+        self.audio_prefill = []
+        self.audio_input = []
+        
+    def clear(self):
+        try:
+            self.flag_decode = False
+            self.stream_started = False
+            self.cnts = None
+            self.vad_sequence = []
+            self.audio_prefill = []
+            self.audio_input = []
+            self.image_prefill = None
+            
+            if self.minicpmo_model.llm_past_key_values[0][0].shape[2]>8192:
+                self.session_id += 1  # to clear all kv cache
+                self.sys_prompt_flag = False
+
+            self.vad_time = 0
+            self.ls_time = 0
+            self.msg_type = 1
+            
+        except Exception as e:
+            raise ValueError(f"Clear error: {str(e)}")
+    
+    
+    def process_message(self, message: Dict[str, Any]):
+        try:
+            # Process content items
+            audio_data = None
+            image_data = None
+            for content_item in message["content"]:
+                if content_item["type"] == "stop_response":
+                    logger.info("process_message: received request to stop_response")
+                    self.stop_response = True
+                    return "stop"
+                elif content_item["type"] == "input_audio":
+                    audio_data = content_item["input_audio"]["data"]
+                    audio_timestamp = content_item["input_audio"].get("timestamp", "")
+                elif content_item["type"] == "image_data":
+                    image_data = content_item["image_data"]["data"]
+            if audio_data is None:
+                return "empty audio"
+
+            if self.conversation_started.is_set() and self.is_streaming_complete.is_set():
+                logger.info("conversation not started or still in generation, skip stream message.")
+                return "skip"
+
+            if self.flag_decode:
+                return "skip"
+
+            try:
+                audio_bytes = base64.b64decode(audio_data)
+
+                image = None
+                if image_data is not None:
+                    if len(image_data) > 0:
+                        image_bytes = base64.b64decode(image_data)
+                        image_buffer = io.BytesIO(image_bytes)
+                        image_buffer.seek(0)
+                        image = Image.open(image_buffer)
+                        # logger.info("read image")
+
+                if self.sys_prompt_flag is False:
+                    self.all_start_time = time.time()
+                    self.sys_prompt_flag = True
+                    if image_data is not None:
+                        self.sys_prompt_init(2)
+                    else:
+                        self.sys_prompt_init(1)
+                    
+                self.prefill(audio_bytes, image, False)
+                
+                self.vad_sequence.append(audio_bytes)
+                if len(self.vad_sequence) < self.vad_sequence_length:
+                    # logger.info('length of vad_sequence is {}, insufficient'.format(self.vad_sequence_length))
+                    return "done"
+                elif len(self.vad_sequence) > self.vad_sequence_length:
+                    # logger.info('length of vad_sequence exceeds {}'.format(self.vad_sequence_length))
+                    self.vad_sequence.pop(0)
+                self.vad_check_audio_bytes(audio_bytes, image, 16000)
+
+                return "done"
+
+            except Exception as e:
+                raise ValueError(f"Audio processing error: {str(e)}")
+
+        except Exception as e:
+            raise ValueError(f"Message processing error: {str(e)}")
+
+    def resample_audio(self, input_path, src_sr, tar_sr, output_path):
+        audio_data, _ = librosa.load(input_path, sr=src_sr)
+        audio_new = librosa.resample(audio_data, orig_sr=src_sr, target_sr=tar_sr)
+        soundfile.write(output_path, audio_new, tar_sr)
+
+    def calculate_rms(self, input_path, sr):
+        audio_data, _ = librosa.load(input_path, sr=sr)
+        return (np.sqrt(np.mean(audio_data**2)) > 0.002)
+
+    def vad_check_audio_bytes(self, audio, image, sr):
+        try:
+            input_audio_vad_path = self.savedir + f"/input_audio_vad_log/vad_{self.input_audio_vad_id}.wav"
+            self.input_audio_vad_id += 1
+            self.merge_wav_files(self.vad_sequence, input_audio_vad_path)
+
+            with open(input_audio_vad_path,"rb") as f:
+                temp_audio = f.read()
+            dur_vad, vad_audio_bytes, time_vad = vad_utils.run_vad(temp_audio, sr, self.vad_options)
+            if self.customized_options is not None:
+                vad_threshold = 1 - self.customized_options['vad_threshold']
+            else:
+                vad_threshold = 0.2
+                
+            if self.calculate_rms(input_audio_vad_path, sr) and dur_vad > 0.4:
+                if self.stream_started == False:
+                    self.vad_time = time.time()
+                    self.stream_started = True
+            elif dur_vad < vad_threshold:
+                if self.stream_started:
+                    self.stream_started = False
+                    if (time.time() - self.vad_time >= 0.6):
+                        self.prefill(audio, image, True)
+                        self.is_streaming_complete.set()
+                        # self.ls_time = time.time()
+                                
+        except Exception as e:
+            logger.error(f"VAD error: {e}")
+            raise
+        return
+
+    def prefill(self, audio, image, is_end):
+        if self.server_wait:   
+            now = time.time()
+            await_time = self.speaking_time_stamp - now + self.extra_wait_time
+            if await_time > 0:
+                return False
+        
+        if self.flag_decode:
+            return False
+        
+        if image is not None:
+            self.image_prefill = image
+        try:
+            if is_end == False:
+                self.audio_prefill.append(audio)
+                self.audio_input.append(audio)
+            slice_nums = 1
+            if is_end and self.customized_options is not None:
+                if self.customized_options['hd_video']:
+                    slice_nums = 6
+                else:
+                    return True
+            if (len(self.audio_prefill) == (1000/self.audio_chunk)) or (is_end and len(self.audio_prefill)>0):
+                time_prefill = time.time()
+                input_audio_path = self.savedir + f"/input_audio_log/input_audio_{self.input_audio_id}.wav"
+                self.merge_wav_files(self.audio_prefill, input_audio_path)
+                with open(input_audio_path,"rb") as wav_io:
+                    signal, sr = soundfile.read(wav_io, dtype='float32')
+                    soundfile.write(input_audio_path, signal, 16000)
+                    audio_np, sr = librosa.load(input_audio_path, sr=16000, mono=True)
+                self.audio_prefill = []
+
+                if len(audio_np) > 16000:
+                    audio_np = audio_np[:16000] 
+
+                with torch.no_grad():
+                    if self.image_prefill is not None:
+                        input_image_path = self.savedir + f'/input_image_log/input_image_{self.input_audio_id}.png'
+                        self.image_prefill.save(input_image_path, 'PNG')
+                        self.image_prefill = self.image_prefill.convert("RGB")
+                        
+                    cnts = None
+                    if self.image_prefill is not None:
+                        cnts = ["<unit>", self.image_prefill, audio_np]
+                    else:
+                        cnts = [audio_np]
+                        
+                    if cnts is not None:
+                        msg = {"role":"user", "content": cnts}
+                        msgs = [msg]
+                        res = self.minicpmo_model.streaming_prefill(
+                            session_id=str(self.session_id),
+                            msgs=msgs, 
+                            tokenizer=self.minicpmo_tokenizer,
+                            max_slice_nums=slice_nums,
+                        )
+
+                self.input_audio_id += 1
+            return True
+
+        except Exception as e:
+            logger.error(f"prefill error: {e}")
+            import traceback
+            traceback.print_exc()
+            raise
+
+    def generate_end(self):
+        self.input_audio_id += 10
+        self.output_audio_id += 10
+        self.flag_decode = False
+        self.reset()
+        return
+
+    async def generate(self):
+        """ return audio bytes and response text (optional) """
+        if self.stop_response:
+            self.generate_end()
+            return
+
+        self.flag_decode = True
+        try:
+            with torch.no_grad():
+                logger.info("=== model gen start ===")
+                time_gen = time.time()
+                input_audio_path = self.savedir + f"/input_audio/all_input_audio_{self.input_audio_id}.wav"
+                self.merge_wav_files(self.audio_input, input_audio_path)
+                audio_stream = None
+                try:
+                    with open(input_audio_path, 'rb') as wav_file:
+                        audio_stream = wav_file.read()
+                except FileNotFoundError:
+                    print(f"File {input_audio_path} not found.")
+                yield base64.b64encode(audio_stream).decode('utf-8'), "assistant:\n"
+                
+                print('=== gen start: ', time.time() - time_gen)
+                first_time = True
+                temp_time = time.time()
+                temp_time1 = time.time()
+                with torch.inference_mode():
+                    if self.stop_response:
+                        self.generate_end()
+                        return
+                    self.minicpmo_model.config.stream_input=True
+                    msg = {"role":"user", "content": self.cnts}
+                    msgs = [msg]
+                    text = ''
+                    self.speaking_time_stamp = time.time()
+                    try:
+                        for r in self.minicpmo_model.streaming_generate(
+                            session_id=str(self.session_id),
+                            tokenizer=self.minicpmo_tokenizer,
+                            use_tts=True,
+                            # enable_regenerate=True,
+                        ):
+                            if self.stop_response:
+                                self.generate_end()
+                                return
+                            audio_np, sr, text = r
+
+                            output_audio_path = self.savedir + f'/output_audio_log/output_audio_{self.output_audio_id}.wav'
+                            self.output_audio_id += 1
+                            soundfile.write(output_audio_path, audio_np, samplerate=sr)
+                            audio_stream = None
+                            try:
+                                with open(output_audio_path, 'rb') as wav_file:
+                                    audio_stream = wav_file.read()
+                            except FileNotFoundError:
+                                print(f"File {output_audio_path} not found.")
+                            temp_time1 = time.time()
+                            print('text: ', text)
+                            yield base64.b64encode(audio_stream).decode('utf-8'), text
+                            self.speaking_time_stamp += self.cycle_wait_time
+                    except Exception as e:
+                        logger.error(f"Error happened during generation: {str(e)}")
+                    yield None, '\n<end>'
+
+        except Exception as e:
+            logger.error(f"发生异常:{e}")
+            import traceback
+            traceback.print_exc()
+            raise
+
+        finally:
+            logger.info(f"uid {self.uid}: generation finished!")
+            self.generate_end()
+
+    async def check_activity(self):
+        while True:
+            # Check for overall inactivity (30 minutes)
+            if self.is_timed_out():
+                self.reset()
+            if self.no_active_stream() and not self.is_streaming_complete.is_set():
+               self.is_streaming_complete.set()
+
+            await asyncio.sleep(1)  # Check every second
+
+    def upload_customized_audio(self, audio_data, audio_fmt):
+        self.customized_audio = None
+        try:
+            if audio_data is not None and len(audio_data) > 0:
+                # if audio_fmt == "mp3" or audio_fmt == "wav":
+                audio_bytes = base64.b64decode(audio_data)
+                fio = io.BytesIO(audio_bytes)
+                fio.seek(0)
+                audio_np, sr = librosa.load(fio, sr=16000, mono=True)
+                if audio_np is not None and len(audio_np) > 1000:
+                    output_audio_path = self.savedir + f'/customized_audio.wav'
+                    soundfile.write(output_audio_path, audio_np, sr)
+                    self.customized_audio = output_audio_path
+                    logger.info(f"processed customized {audio_fmt} audio")
+                    print(audio_np.shape, type(audio_np), sr)
+            else:
+                logger.info(f"empty customized audio, use default value instead.")
+                self.customized_audio = None
+        except Exception as e:
+            raise ValueError(f"Process customized audio error: {str(e)}")
+
+    def update_customized_options(self, uid, options):
+        self.customized_options = None
+        if options is None:
+            raise ValueError("Invalid None type for options, expected dict type")
+        self.customized_options = options
+        logger.info(f"uid: {uid} set customized_options to {options}")
+
+
+stream_manager = StreamManager()
+
+
+@app.on_event("startup")
+async def startup_event():
+    logger.info("Starting application and activity checker")
+    asyncio.create_task(stream_manager.check_activity())
+
+@app.on_event("shutdown")
+async def shutdown_event():
+    logger.info("Shutting down application")
+
+@app.post("/stream")
+@app.post("/api/v1/stream")
+async def stream(request: Request, uid: Optional[str] = Header(None)):
+    global stream_manager
+
+    stream_manager.update_last_request_time()
+    stream_manager.update_last_stream_time()
+
+    if not uid:
+        raise HTTPException(status_code=400, detail="Missing uid in headers")
+    if stream_manager.uid is not None and stream_manager.uid != uid:
+        logger.error(f"uid changed during steram: previous uid {stream_manager.uid}, new uid {uid}")
+        raise HTTPException(status_code=400, detail="uid changed in stream")
+
+    try:
+        # Parse JSON request
+        data = await request.json()
+
+        # Validate basic structure
+        if not isinstance(data, dict) or "messages" not in data:
+            raise HTTPException(status_code=400, detail="Invalid request format")
+
+        # Process messages
+        reason = ""
+        for message in data["messages"]:
+            if not isinstance(message, dict) or "role" not in message or "content" not in message:
+                raise HTTPException(status_code=400, detail="Invalid message format")
+            reason = stream_manager.process_message(message)
+
+        # Return response using uid from header
+        response = {
+            "id": uid,
+            "choices": {
+                "role": "assistant",
+                "content": "success",
+                "finish_reason": reason
+            }
+        }
+        return JSONResponse(content=response, status_code=200)
+
+    except json.JSONDecodeError:
+        raise HTTPException(status_code=400, detail="Invalid JSON")
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+
+@app.websocket("/ws/stream")
+@app.websocket("/ws/api/v1/stream")
+async def websocket_stream(websocket: WebSocket,
+                           uid: Optional[str] = Query(None)):
+    global stream_manager
+
+    if not uid:
+        await websocket.close(code=400, reason="Missing uid in request")
+        return
+
+    # Accept the WebSocket connection
+    await websocket.accept()
+
+    #if stream_manager.uid is not None and stream_manager.uid != uid:
+    #    logger.error(f"uid changed during steram: previous uid {stream_manager.uid}, new uid {uid}")
+    #    await websocket.close(code=400, reason="Uid changed in stream.")
+    #    return
+
+    try:
+        while True:
+           # Continuously listen for incoming messages from the client
+           data = await websocket.receive_text()
+
+           # Parse JSON request
+           try:
+               request_data = json.loads(data)
+           except json.JSONDecodeError:
+               await websocket.send_json({"error": "Invalid JSON"})
+               continue
+
+           stream_manager.update_last_request_time()
+           stream_manager.update_last_stream_time()
+
+           if stream_manager.uid is not None and stream_manager.uid != uid:
+               logger.error(f"uid changed during stream: previous uid {stream_manager.uid}, new uid {uid}")
+               await websocket.send_json({"error": "UID changed in stream"})
+               continue
+
+           # Validate basic structure
+           if not isinstance(request_data, dict) or "messages" not in request_data:
+               await websocket.send_json({"error": "Invalid request format"})
+               continue
+
+           # Process messages
+           try:
+               reason = ""
+               for message in request_data["messages"]:
+                   if not isinstance(message, dict) or "role" not in message or "content" not in message:
+                       await websocket.send_json({"error": "Invalid message format"})
+                       continue
+                   reason = stream_manager.process_message(message)
+
+               # Respond with success message
+               response = {
+                   "id": uid,
+                   "choices": {
+                       "role": "assistant",
+                       "content": "success",
+                       "finish_reason": reason,
+                   },
+               }
+               await websocket.send_json(response)
+           except WebSocketDisconnect:
+               # Handle WebSocket disconnection
+               break
+           except Exception as e:
+               logger.error(f"process message error: {str(e)}")
+               await websocket.close(code=1011, reason =f"Internal server error: {str(e)}")
+
+    except WebSocketDisconnect:
+        # Handle WebSocket disconnection
+        return
+    except Exception as e:
+        logger.error(f"ws_stream error: {str(e)}")
+        await websocket.close(code=1011, reason =f"Unexpected error: {str(e)}")
+
+
+async def generate_sse_response(request: Request, uid: Optional[str] = Header(None)):
+    global stream_manager
+    print(f"uid: {uid}")
+    try:
+        # Wait for streaming to complete or timeout
+        while not stream_manager.is_streaming_complete.is_set():
+            # if stream_manager.is_timed_out():
+            #     yield f"data: {json.dumps({'error': 'Stream timeout'})}\n\n"
+            #     return
+            # print(f"{uid} whille not stream_manager.is_streaming_complete.is_set(), asyncio.sleep(0.1)")
+            await asyncio.sleep(0.1)
+
+        logger.info("streaming complete\n")
+        # Generate response
+        try:
+            yield f"event: message\n"
+            async for audio, text in stream_manager.generate():
+                if text == "stop":
+                    break
+                res = {
+                    "id": stream_manager.uid,
+                    "response_id": stream_manager.output_audio_id,
+                    "choices": [
+                        {
+                            "role": "assistant",
+                            "audio": audio,
+                            "text": text,
+                            "finish_reason": "processing"
+                        }
+                    ]
+                }
+                # logger.info("generate_sse_response yield response")
+                yield f"data: {json.dumps(res)}\n\n"
+                await asyncio.sleep(0)
+
+        except Exception as e:
+            logger.error(f"Error while generation: {str(e)}")
+            yield f'data:{{"error": "{str(exc)}"}}\n\n'
+    except Exception as e:
+        yield f'data:{{"error": "{str(e)}"}}\n\n'
+
+@app.post("/completions")
+@app.post("/api/v1/completions")
+async def completions(request: Request, uid: Optional[str] = Header(None)):
+    global stream_manager
+
+    if not uid:
+        raise HTTPException(status_code=400, detail="Missing uid in headers")
+
+    try:
+        # if stream_manager.uid is not None and stream_manager.uid != uid:
+        if stream_manager.uid != uid:
+        #     stream_manager.stop_response = True
+        #     logger.info(f"uid changed, reset model: previous uid {stream_manager.uid}, new uid {uid}")
+            stream_manager.session_id += 1
+            stream_manager.sys_prompt_flag = False
+            stream_manager.reset()
+
+            # raise HTTPException(
+            #    status_code=409,
+            #    detail="User id changed, reset context."
+            # )
+        stream_manager.speaking_time_stamp = 0
+        stream_manager.update_last_request_time()
+        stream_manager.uid = uid
+        stream_manager.start_conversation()
+
+        data = await request.json()
+
+        return StreamingResponse(
+            generate_sse_response(request, uid),
+            media_type="text/event-stream",
+            headers={
+                "Cache-Control": "no-cache",
+                "Connection": "keep-alive",
+                "Transfer-Encoding": "chunked"
+            }
+        )
+    except asyncio.TimeoutError:
+        raise HTTPException(
+            status_code=503,
+            detail="Server busy, please try again later"
+        )
+    except Exception as e:
+        logger.error(f"Error processing request for user {uid}: {str(e)}")
+        raise HTTPException(status_code=500, detail=str(e))
+
+
+@app.post("/stop")
+@app.post("/api/v1/stop")
+async def stop_response(request: Request, uid: Optional[str] = Header(None)):
+    if not uid:
+        raise HTTPException(status_code=400, detail="Missing uid in headers")
+
+    global stream_manager
+    # stream_manager.session_id += 1
+    logger.info(f"uid {uid}: received stop_response")
+    stream_manager.stop_response = True
+    response = {
+        "id": uid,
+        "choices": {
+            "role": "assistant",
+            "content": "success",
+            "finish_reason": "stop"
+        }
+    }
+    return JSONResponse(content=response, status_code=200)
+
+@app.post("/feedback")
+@app.post("/api/v1/feedback")
+async def feedback(request: Request, uid: Optional[str] = Header(None)):
+    global stream_manager
+
+    # Validate the 'uid' header
+    if not uid:
+        raise HTTPException(status_code=400, detail="Missing 'uid' header")
+
+    try:
+        data = await request.json()
+        if "response_id" not in data or "rating" not in data:
+            raise HTTPException(status_code=400, detail="Invalid request: must have response_id and rating")
+        response_id = data.get("response_id", "")
+        rating = data.get("rating", "")
+        comment = data.get("comment", "")
+        # Validate the rating
+        if rating not in ["like", "dislike"]:
+            raise HTTPException(status_code=400, detail=f"Invalid rating value: {rating}")
+
+        # Define the log file path
+        log_file_path = f"{stream_manager.savedir}/feedback_log/{response_id}.{rating}"
+        # Write the feedback to the file asynchronously
+        async with aiofiles.open(log_file_path, mode="a") as file:
+            await file.write(f"model: {stream_manager.minicpmo_model_path}\nuid {uid}: {comment}\n")
+        response = {
+            "id": uid,
+            "choices": {
+                "role": "assistant",
+                "content": "success",
+                "finish_reason": "done"
+            }
+        }
+        return JSONResponse(content=response, status_code=200)
+    except Exception as e:
+        logger.error(f"Error processing feedback for user {uid}: {str(e)}")
+        raise HTTPException(status_code=500, detail=str(e))
+
+
+@app.post("/init_options")
+@app.post("/api/v1/init_options")
+async def init_options(request: Request, uid: Optional[str] = Header(None)):
+    global stream_manager
+
+    stream_manager.update_last_request_time()
+
+    if not uid:
+        raise HTTPException(status_code=400, detail="Missing uid in headers")
+    try:
+        # Parse JSON request
+        data = await request.json()
+
+        # Validate basic structure
+        if not isinstance(data, dict) or "messages" not in data:
+            raise HTTPException(status_code=400, detail="Invalid request format")
+
+        messages = data.get("messages", [])
+        for message in messages:
+            if not isinstance(message, dict) or "role" not in message or "content" not in message:
+                raise HTTPException(status_code=400, detail="Invalid message format")
+
+            for content in message.get("content", []):
+                if content["type"] == "input_audio":
+                    audio_data = content["input_audio"].get("data", "")
+                    audio_fmt = content["input_audio"].get("format", "")
+                    stream_manager.upload_customized_audio(audio_data, audio_fmt)
+                elif content["type"] == "options":
+                    stream_manager.update_customized_options(uid, content["options"])
+                else:
+                    ctype = content["type"]
+                    raise HTTPException(status_code=400, detail=f"Invalid content type: {ctype}")
+        version = stream_manager.model_version
+        print(version)
+        response = {
+            "id": uid,
+            "choices": {
+                "role": "assistant",
+                "content": version,
+                "finish_reason": "done"
+            }
+        }
+        return JSONResponse(content=response, status_code=200)
+    except Exception as e:
+        raise HTTPException(status_code=400, detail=f"init options error: {str(e)}")
+
+
+@app.get('/health')
+@app.get('/api/v1/health')
+async def health_check():
+    return {"status": "OK"}
+
+
+if __name__ == "__main__":
+    uvicorn.run(app, host="0.0.0.0", port=args.port)
diff --git a/web_demos/minicpm-o_2.6/silero_vad.onnx b/web_demos/minicpm-o_2.6/silero_vad.onnx
new file mode 100644
index 0000000..5c21912
Binary files /dev/null and b/web_demos/minicpm-o_2.6/silero_vad.onnx differ
diff --git a/web_demos/minicpm-o_2.6/vad_utils.py b/web_demos/minicpm-o_2.6/vad_utils.py
new file mode 100644
index 0000000..9781ba7
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/vad_utils.py
@@ -0,0 +1,301 @@
+import functools
+import numpy as np
+import librosa
+import os
+import time
+import traceback
+
+from typing import List, NamedTuple, Optional
+
+class VadOptions(NamedTuple):
+    """VAD options.
+
+    Attributes:
+      threshold: Speech threshold. Silero VAD outputs speech probabilities for each audio chunk,
+        probabilities ABOVE this value are considered as SPEECH. It is better to tune this
+        parameter for each dataset separately, but "lazy" 0.5 is pretty good for most datasets.
+      min_speech_duration_ms: Final speech chunks shorter min_speech_duration_ms are thrown out.
+      max_speech_duration_s: Maximum duration of speech chunks in seconds. Chunks longer
+        than max_speech_duration_s will be split at the timestamp of the last silence that
+        lasts more than 100ms (if any), to prevent aggressive cutting. Otherwise, they will be
+        split aggressively just before max_speech_duration_s.
+      min_silence_duration_ms: In the end of each speech chunk wait for min_silence_duration_ms
+        before separating it
+      window_size_samples: Audio chunks of window_size_samples size are fed to the silero VAD model.
+        WARNING! Silero VAD models were trained using 512, 1024, 1536 samples for 16000 sample rate.
+        Values other than these may affect model performance!!
+      speech_pad_ms: Final speech chunks are padded by speech_pad_ms each side
+    """
+
+    # threshold: float = 0.3 # rep 0.5
+    # min_speech_duration_ms: int = 250 
+    # max_speech_duration_s: float = float("inf")
+    # min_silence_duration_ms: int = 2000 
+    # window_size_samples: int = 1024
+    # speech_pad_ms: int = 600 # rep 400
+
+    threshold: float = 0.7 # gw: 0.3 # rep 0.5
+    min_speech_duration_ms: int = 128  # original & gw: 250
+    max_speech_duration_s: float = float("inf")
+    min_silence_duration_ms: int = 500 # original & gw: 2000 
+    window_size_samples: int = 1024
+    speech_pad_ms: int = 30 # gw: 600 # rep 400
+
+class SileroVADModel:
+    def __init__(self, path):
+        try:
+            import onnxruntime
+        except ImportError as e:
+            raise RuntimeError(
+                "Applying the VAD filter requires the onnxruntime package"
+            ) from e
+
+        opts = onnxruntime.SessionOptions()
+        opts.inter_op_num_threads = 1
+        opts.intra_op_num_threads = 1
+        opts.log_severity_level = 4
+
+        self.session = onnxruntime.InferenceSession(
+            path,
+            providers=["CPUExecutionProvider"],
+            sess_options=opts,
+        )
+
+    def get_initial_state(self, batch_size: int):
+        h = np.zeros((2, batch_size, 64), dtype=np.float32)
+        c = np.zeros((2, batch_size, 64), dtype=np.float32)
+        return h, c
+
+    def __call__(self, x, state, sr: int):
+        if len(x.shape) == 1:
+            x = np.expand_dims(x, 0)
+        if len(x.shape) > 2:
+            raise ValueError(
+                f"Too many dimensions for input audio chunk {len(x.shape)}"
+            )
+        if sr / x.shape[1] > 31.25:
+            raise ValueError("Input audio chunk is too short")
+
+        h, c = state
+
+        ort_inputs = {
+            "input": x,
+            #"state": np.concatenate((h, c), axis=0),
+            "h": h,
+            "c": c,
+            "sr": np.array(sr, dtype="int64"),
+        }
+
+        out, h, c = self.session.run(None, ort_inputs)
+        #out = self.session.run(None, ort_inputs)
+        state = (h, c)
+        return out, state
+
+
+@functools.lru_cache
+def get_vad_model():
+    """Returns the VAD model instance."""
+    path = os.path.join(os.path.dirname(__file__), "silero_vad.onnx")
+    return SileroVADModel(path)
+
+
+def get_speech_timestamps(
+    audio: np.ndarray,
+    vad_options: Optional[VadOptions] = None,
+    **kwargs,
+) -> List[dict]:
+    """This method is used for splitting long audios into speech chunks using silero VAD.
+
+    Args:
+      audio: One dimensional float array.
+      vad_options: Options for VAD processing.
+      kwargs: VAD options passed as keyword arguments for backward compatibility.
+
+    Returns:
+      List of dicts containing begin and end samples of each speech chunk.
+    """
+    if vad_options is None:
+        vad_options = VadOptions(**kwargs)
+
+    threshold = vad_options.threshold
+    min_speech_duration_ms = vad_options.min_speech_duration_ms
+    max_speech_duration_s = vad_options.max_speech_duration_s
+    min_silence_duration_ms = vad_options.min_silence_duration_ms
+    window_size_samples = vad_options.window_size_samples
+    speech_pad_ms = vad_options.speech_pad_ms
+
+    if window_size_samples not in [512, 1024, 1536]:
+        warnings.warn(
+            "Unusual window_size_samples! Supported window_size_samples:\n"
+            " - [512, 1024, 1536] for 16000 sampling_rate"
+        )
+
+    sampling_rate = 16000
+    min_speech_samples = sampling_rate * min_speech_duration_ms / 1000 #如果间隔区间没这个长度就不会添加
+    speech_pad_samples = sampling_rate * speech_pad_ms / 1000
+    max_speech_samples = (
+        sampling_rate * max_speech_duration_s
+        - window_size_samples
+        - 2 * speech_pad_samples
+    )
+    min_silence_samples = sampling_rate * min_silence_duration_ms / 1000 # 在每个silent需要等 min_silence_duration_ms 后才结束，
+    min_silence_samples_at_max_speech = sampling_rate * 98 / 1000 # 0.098s # need to adjust？
+
+    audio_length_samples = len(audio)
+
+    # import pdb
+    # pdb.set_trace()
+
+    model = get_vad_model()
+    state = model.get_initial_state(batch_size=1)
+
+    speech_probs = []
+    #print("audio_length_samples ", audio_length_samples, ", window_size_samples ", window_size_samples)
+    for current_start_sample in range(0, audio_length_samples, window_size_samples):
+        chunk = audio[current_start_sample : current_start_sample + window_size_samples]
+        if len(chunk) < window_size_samples:
+            chunk = np.pad(chunk, (0, int(window_size_samples - len(chunk))))
+        speech_prob, state = model(chunk, state, sampling_rate)
+        speech_probs.append(speech_prob)
+
+    triggered = False
+    speeches = []
+    current_speech = {}
+    neg_threshold = threshold - 0.15
+
+    # to save potential segment end (and tolerate some silence)
+    temp_end = 0
+    # to save potential segment limits in case of maximum segment size reached
+    prev_end = next_start = 0
+
+    # 大概是一段音频找出其中连续部分，如果遇到silent的话会先记录temp_end，然后如果没超过最小silent长度遇到active的情况下会重置temp_end。silent片段会分别记录silent的起终，在超过长度的时候切开（不完全确定，但是inf的最大长也遇不到）
+
+    for i, speech_prob in enumerate(speech_probs):
+        if (speech_prob >= threshold) and temp_end:
+            temp_end = 0
+            if next_start < prev_end:
+                next_start = window_size_samples * i
+
+        if (speech_prob >= threshold) and not triggered:
+            triggered = True
+            current_speech["start"] = window_size_samples * i
+            continue
+
+        if (
+            triggered
+            and (window_size_samples * i) - current_speech["start"] > max_speech_samples
+        ):
+            if prev_end:
+                current_speech["end"] = prev_end
+                speeches.append(current_speech)
+                current_speech = {}
+                # previously reached silence (< neg_thres) and is still not speech (< thres)
+                if next_start < prev_end:
+                    triggered = False
+                else:
+                    current_speech["start"] = next_start
+                prev_end = next_start = temp_end = 0
+            else:
+                current_speech["end"] = window_size_samples * i
+                speeches.append(current_speech)
+                current_speech = {}
+                prev_end = next_start = temp_end = 0
+                triggered = False
+                continue
+
+        if (speech_prob < neg_threshold) and triggered:
+            if not temp_end:
+                temp_end = window_size_samples * i
+            # condition to avoid cutting in very short silence
+            if (window_size_samples * i) - temp_end > min_silence_samples_at_max_speech:
+                prev_end = temp_end
+            if (window_size_samples * i) - temp_end < min_silence_samples:
+                continue
+            else:
+                current_speech["end"] = temp_end
+                if (
+                    current_speech["end"] - current_speech["start"]
+                ) > min_speech_samples:
+                    speeches.append(current_speech)
+                current_speech = {}
+                prev_end = next_start = temp_end = 0
+                triggered = False
+                continue
+
+
+    if (
+        current_speech
+        and (audio_length_samples - current_speech["start"]) > min_speech_samples
+    ):
+        current_speech["end"] = audio_length_samples
+        speeches.append(current_speech)
+
+    # pad 多少ms，每个中间都会不足平分
+    for i, speech in enumerate(speeches):
+        if i == 0:
+            speech["start"] = int(max(0, speech["start"] - speech_pad_samples))
+        if i != len(speeches) - 1:
+            silence_duration = speeches[i + 1]["start"] - speech["end"]
+            if silence_duration < 2 * speech_pad_samples:
+                speech["end"] += int(silence_duration // 2)
+                speeches[i + 1]["start"] = int(
+                    max(0, speeches[i + 1]["start"] - silence_duration // 2)
+                )
+            else:
+                speech["end"] = int(
+                    min(audio_length_samples, speech["end"] + speech_pad_samples)
+                )
+                speeches[i + 1]["start"] = int(
+                    max(0, speeches[i + 1]["start"] - speech_pad_samples)
+                )
+        else:
+            speech["end"] = int(
+                min(audio_length_samples, speech["end"] + speech_pad_samples)
+            )
+    return speeches
+
+def collect_chunks(audio: np.ndarray, chunks: List[dict]) -> np.ndarray:
+    """Collects and concatenates audio chunks."""
+    if not chunks:
+        return np.array([], dtype=np.float32)
+
+    return np.concatenate([audio[chunk["start"] : chunk["end"]] for chunk in chunks])
+
+
+def run_vad(ori_audio, sr, vad_options=None):
+    _st = time.time()
+    try:
+        audio = np.frombuffer(ori_audio, dtype=np.int16)
+        audio = audio.astype(np.float32) / 32768.0
+        sampling_rate = 16000
+        if sr != sampling_rate:
+            audio = librosa.resample(audio, orig_sr=sr, target_sr=sampling_rate)
+        # print('audio.encode.shape: {}'.format(audio.shape))
+        if vad_options is None:
+            vad_options = VadOptions()
+
+        # 确保传递给 get_speech_timestamps 的是 VadOptions 实例
+        speech_chunks = get_speech_timestamps(audio, vad_options=vad_options)
+        # print(speech_chunks)
+        audio = collect_chunks(audio, speech_chunks)
+        # print(audio.shape)
+        duration_after_vad = audio.shape[0] / sampling_rate
+
+        # print('audio.decode.shape: {}'.format(audio.shape))
+        if sr != sampling_rate:
+            # resample to original sampling rate
+            vad_audio = librosa.resample(audio, orig_sr=sampling_rate, target_sr=sr)
+        else:
+            vad_audio = audio
+        vad_audio = np.round(vad_audio * 32768.0).astype(np.int16)
+        
+        # 这个round会有一定的误差
+
+        vad_audio_bytes = vad_audio.tobytes()
+
+        return duration_after_vad, vad_audio_bytes, round(time.time() - _st, 4)
+    except Exception as e:
+        msg = f"[asr vad error] audio_len: {len(ori_audio)/(sr*2):.3f} s, trace: {traceback.format_exc()}"
+        print(msg)
+        return -1, ori_audio, round(time.time() - _st, 4)
+
diff --git a/web_demos/minicpm-o_2.6/web_server/.env.development b/web_demos/minicpm-o_2.6/web_server/.env.development
new file mode 100644
index 0000000..e69de29
diff --git a/web_demos/minicpm-o_2.6/web_server/.env.production b/web_demos/minicpm-o_2.6/web_server/.env.production
new file mode 100644
index 0000000..e69de29
diff --git a/web_demos/minicpm-o_2.6/web_server/.eslintrc-auto-import.json b/web_demos/minicpm-o_2.6/web_server/.eslintrc-auto-import.json
new file mode 100644
index 0000000..81c3760
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/.eslintrc-auto-import.json
@@ -0,0 +1,359 @@
+{
+  "globals": {
+    "Component": true,
+    "ComponentPublicInstance": true,
+    "ComputedRef": true,
+    "EffectScope": true,
+    "ExtractDefaultPropTypes": true,
+    "ExtractPropTypes": true,
+    "ExtractPublicPropTypes": true,
+    "InjectionKey": true,
+    "LegalTypeEnum": true,
+    "LoginTypeEnum": true,
+    "PropType": true,
+    "Ref": true,
+    "VNode": true,
+    "WritableComputedRef": true,
+    "acceptHMRUpdate": true,
+    "ajaxHeader": true,
+    "asyncComputed": true,
+    "authLogin": true,
+    "autoResetRef": true,
+    "computed": true,
+    "computedAsync": true,
+    "computedEager": true,
+    "computedInject": true,
+    "computedWithControl": true,
+    "controlledComputed": true,
+    "controlledRef": true,
+    "createApp": true,
+    "createEventHook": true,
+    "createGlobalState": true,
+    "createInjectionState": true,
+    "createPinia": true,
+    "createReactiveFn": true,
+    "createReusableTemplate": true,
+    "createSharedComposable": true,
+    "createTemplatePromise": true,
+    "createUnrefFn": true,
+    "customRef": true,
+    "debouncedRef": true,
+    "debouncedWatch": true,
+    "defineAsyncComponent": true,
+    "defineComponent": true,
+    "defineStore": true,
+    "eagerComputed": true,
+    "effectScope": true,
+    "extendRef": true,
+    "fetchSmsVerifyCode": true,
+    "getActivePinia": true,
+    "getCurrentInstance": true,
+    "getCurrentScope": true,
+    "getHomeInfo": true,
+    "h": true,
+    "ignorableWatch": true,
+    "inject": true,
+    "injectLocal": true,
+    "isDefined": true,
+    "isProxy": true,
+    "isReactive": true,
+    "isReadonly": true,
+    "isRef": true,
+    "loginSuccess": true,
+    "makeDestructurable": true,
+    "mapActions": true,
+    "mapGetters": true,
+    "mapState": true,
+    "mapStores": true,
+    "mapWritableState": true,
+    "markRaw": true,
+    "nextTick": true,
+    "onActivated": true,
+    "onBeforeMount": true,
+    "onBeforeRouteLeave": true,
+    "onBeforeRouteUpdate": true,
+    "onBeforeUnmount": true,
+    "onBeforeUpdate": true,
+    "onClickOutside": true,
+    "onDeactivated": true,
+    "onErrorCaptured": true,
+    "onKeyStroke": true,
+    "onLongPress": true,
+    "onMounted": true,
+    "onRenderTracked": true,
+    "onRenderTriggered": true,
+    "onScopeDispose": true,
+    "onServerPrefetch": true,
+    "onStartTyping": true,
+    "onUnmounted": true,
+    "onUpdated": true,
+    "pausableWatch": true,
+    "provide": true,
+    "provideLocal": true,
+    "reactify": true,
+    "reactifyObject": true,
+    "reactive": true,
+    "reactiveComputed": true,
+    "reactiveOmit": true,
+    "reactivePick": true,
+    "readonly": true,
+    "ref": true,
+    "refAutoReset": true,
+    "refDebounced": true,
+    "refDefault": true,
+    "refThrottled": true,
+    "refWithControl": true,
+    "resolveComponent": true,
+    "resolveRef": true,
+    "resolveUnref": true,
+    "setActivePinia": true,
+    "setMapStoreSuffix": true,
+    "setupStore": true,
+    "shallowReactive": true,
+    "shallowReadonly": true,
+    "shallowRef": true,
+    "store": true,
+    "storeToRefs": true,
+    "submitFeedback": true,
+    "syncRef": true,
+    "syncRefs": true,
+    "templateRef": true,
+    "throttledRef": true,
+    "throttledWatch": true,
+    "toRaw": true,
+    "toReactive": true,
+    "toRef": true,
+    "toRefs": true,
+    "toValue": true,
+    "triggerRef": true,
+    "tryOnBeforeMount": true,
+    "tryOnBeforeUnmount": true,
+    "tryOnMounted": true,
+    "tryOnScopeDispose": true,
+    "tryOnUnmounted": true,
+    "unref": true,
+    "unrefElement": true,
+    "until": true,
+    "useActiveElement": true,
+    "useAnimate": true,
+    "useArrayDifference": true,
+    "useArrayEvery": true,
+    "useArrayFilter": true,
+    "useArrayFind": true,
+    "useArrayFindIndex": true,
+    "useArrayFindLast": true,
+    "useArrayIncludes": true,
+    "useArrayJoin": true,
+    "useArrayMap": true,
+    "useArrayReduce": true,
+    "useArraySome": true,
+    "useArrayUnique": true,
+    "useAsyncQueue": true,
+    "useAsyncState": true,
+    "useAttrs": true,
+    "useBase64": true,
+    "useBattery": true,
+    "useBluetooth": true,
+    "useBreakpoints": true,
+    "useBroadcastChannel": true,
+    "useBrowserLocation": true,
+    "useCached": true,
+    "useClearLocalCache": true,
+    "useClipboard": true,
+    "useClipboardItems": true,
+    "useCloned": true,
+    "useColorMode": true,
+    "useConfirmDialog": true,
+    "useCounter": true,
+    "useCssModule": true,
+    "useCssVar": true,
+    "useCssVars": true,
+    "useCurrentElement": true,
+    "useCycleList": true,
+    "useDark": true,
+    "useDateFormat": true,
+    "useDebounce": true,
+    "useDebounceFn": true,
+    "useDebouncedRefHistory": true,
+    "useDeviceMotion": true,
+    "useDeviceOrientation": true,
+    "useDevicePixelRatio": true,
+    "useDevicesList": true,
+    "useDisplayMedia": true,
+    "useDocumentVisibility": true,
+    "useDraggable": true,
+    "useDropZone": true,
+    "useElementBounding": true,
+    "useElementByPoint": true,
+    "useElementHover": true,
+    "useElementSize": true,
+    "useElementVisibility": true,
+    "useEventBus": true,
+    "useEventListener": true,
+    "useEventSource": true,
+    "useEyeDropper": true,
+    "useFavicon": true,
+    "useFetch": true,
+    "useFetchLogin": true,
+    "useFileDialog": true,
+    "useFileSystemAccess": true,
+    "useFocus": true,
+    "useFocusWithin": true,
+    "useFps": true,
+    "useFullscreen": true,
+    "useGamepad": true,
+    "useGeolocation": true,
+    "useGetLocalCache": true,
+    "useHttp": true,
+    "useIdle": true,
+    "useImage": true,
+    "useInfiniteScroll": true,
+    "useIntersectionObserver": true,
+    "useInterval": true,
+    "useIntervalFn": true,
+    "useKeyModifier": true,
+    "useLastChanged": true,
+    "useLegal": true,
+    "useLink": true,
+    "useLocalStorage": true,
+    "useLogin": true,
+    "useMagicKeys": true,
+    "useManualRefHistory": true,
+    "useMediaControls": true,
+    "useMediaQuery": true,
+    "useMemoize": true,
+    "useMemory": true,
+    "useMounted": true,
+    "useMouse": true,
+    "useMouseInElement": true,
+    "useMousePressed": true,
+    "useMutationObserver": true,
+    "useNavigatorLanguage": true,
+    "useNetwork": true,
+    "useNow": true,
+    "useObjectUrl": true,
+    "useOffsetPagination": true,
+    "useOnline": true,
+    "usePageLeave": true,
+    "useParallax": true,
+    "useParentElement": true,
+    "usePerformanceObserver": true,
+    "usePermission": true,
+    "usePointer": true,
+    "usePointerLock": true,
+    "usePointerSwipe": true,
+    "usePreferredColorScheme": true,
+    "usePreferredContrast": true,
+    "usePreferredDark": true,
+    "usePreferredLanguages": true,
+    "usePreferredReducedMotion": true,
+    "usePrevious": true,
+    "useRafFn": true,
+    "useRefHistory": true,
+    "useResizeObserver": true,
+    "useRoute": true,
+    "useRouter": true,
+    "useScreenOrientation": true,
+    "useScreenSafeArea": true,
+    "useScriptTag": true,
+    "useScroll": true,
+    "useScrollLock": true,
+    "useSessionStorage": true,
+    "useSetLocalCache": true,
+    "useShare": true,
+    "useSlots": true,
+    "useSorted": true,
+    "useSpeechRecognition": true,
+    "useSpeechSynthesis": true,
+    "useStepper": true,
+    "useStorage": true,
+    "useStorageAsync": true,
+    "useStyleTag": true,
+    "useSupported": true,
+    "useSwipe": true,
+    "useTemplateRefsList": true,
+    "useTextDirection": true,
+    "useTextSelection": true,
+    "useTextareaAutosize": true,
+    "useThrottle": true,
+    "useThrottleFn": true,
+    "useThrottledRefHistory": true,
+    "useTimeAgo": true,
+    "useTimeout": true,
+    "useTimeoutFn": true,
+    "useTimeoutPoll": true,
+    "useTimestamp": true,
+    "useTitle": true,
+    "useToNumber": true,
+    "useToString": true,
+    "useToggle": true,
+    "useTransition": true,
+    "useUrlSearchParams": true,
+    "useUserMedia": true,
+    "useUserStore": true,
+    "useUserStoreWithOut": true,
+    "useVModel": true,
+    "useVModels": true,
+    "useVibrate": true,
+    "useVirtualList": true,
+    "useWakeLock": true,
+    "useWebNotification": true,
+    "useWebSocket": true,
+    "useWebWorker": true,
+    "useWebWorkerFn": true,
+    "useWindowFocus": true,
+    "useWindowScroll": true,
+    "useWindowSize": true,
+    "watch": true,
+    "watchArray": true,
+    "watchAtMost": true,
+    "watchDebounced": true,
+    "watchDeep": true,
+    "watchEffect": true,
+    "watchIgnorable": true,
+    "watchImmediate": true,
+    "watchOnce": true,
+    "watchPausable": true,
+    "watchPostEffect": true,
+    "watchSyncEffect": true,
+    "watchThrottled": true,
+    "watchTriggerable": true,
+    "watchWithFilter": true,
+    "whenever": true,
+    "ElMessage": true,
+    "ElLoading": true,
+    "deleteHistoryBatch": true,
+    "deleteHistoryItem": true,
+    "getHistory": true,
+    "createConv": true,
+    "fetchHistoryList": true,
+    "stopChat": true,
+    "useChatStore": true,
+    "useChatStoreWithOut": true,
+    "useChatExchangeStore": true,
+    "useChatExchangeStoreWithOut": true,
+    "useExchangeStore": true,
+    "useExchangeStoreWithOut": true,
+    "delMessage": true,
+    "sendRating": true,
+    "getInitialActions": true,
+    "sendFeedback": true,
+    "md": true,
+    "useMarkdown": true,
+    "connectService": true,
+    "sendMessage": true,
+    "Audio": true,
+    "SoundRecording": true,
+    "getVolume": true,
+    "ElMessageBox": true,
+    "encodeWav": true,
+    "encodeWAV": true,
+    "stopMessage": true,
+    "TaskQueue": true,
+    "getNewUserId": true,
+    "setNewUserId": true,
+    "uploadFile": true,
+    "feedback": true,
+    "uploadConfig": true
+  }
+}
diff --git a/web_demos/minicpm-o_2.6/web_server/.eslintrc.cjs b/web_demos/minicpm-o_2.6/web_server/.eslintrc.cjs
new file mode 100644
index 0000000..5e47525
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/.eslintrc.cjs
@@ -0,0 +1,26 @@
+/* eslint-env node */
+require('@rushstack/eslint-patch/modern-module-resolution');
+
+module.exports = {
+    root: true,
+    extends: [
+        'plugin:vue/vue3-essential',
+        'eslint:recommended',
+        '@vue/eslint-config-prettier/skip-formatting',
+        './.eslintrc-auto-import.json',
+    ],
+    parserOptions: {
+        ecmaVersion: 'latest',
+    },
+    rules: {
+        'no-console': process.env.NODE_ENV === 'production' ? 'off' : 'warn',
+        'no-debugger': process.env.NODE_ENV === 'production' ? 'error' : 'warn',
+        'no-var': process.env.NODE_ENV === 'production' ? 'off' : 'warn',
+        'no-undef': process.env.NODE_ENV === 'production' ? 'error' : 'warn',
+        'vue/multi-word-component-names': 'off', // 不校验组件名
+        'no-empty': 0, // 允许代码块为空
+        'vue/no-unused-components': 'warn',
+        'no-unused-vars': 'warn',
+        'prettier/prettier': 'off', // 不符合prettier格式规范的编码eslint直接自动报错
+    },
+};
diff --git a/web_demos/minicpm-o_2.6/web_server/.gitignore b/web_demos/minicpm-o_2.6/web_server/.gitignore
new file mode 100644
index 0000000..cde3acd
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/.gitignore
@@ -0,0 +1,32 @@
+# Logs
+logs
+*.log
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+pnpm-debug.log*
+lerna-debug.log*
+
+node_modules
+.DS_Store
+dist
+dist-ssr
+coverage
+*.local
+
+/cypress/videos/
+/cypress/screenshots/
+
+# Editor directories and files
+.vscode/*
+!.vscode/extensions.json
+.idea
+*.suo
+*.ntvs*
+*.njsproj
+*.sln
+*.sw?
+
+*.tsbuildinfo
+.VSCodeCounter
+.history
diff --git a/web_demos/minicpm-o_2.6/web_server/.husky/pre-push b/web_demos/minicpm-o_2.6/web_server/.husky/pre-push
new file mode 100755
index 0000000..540d112
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/.husky/pre-push
@@ -0,0 +1,10 @@
+#!/usr/bin/env sh
+. "$(dirname -- "$0")/_/husky.sh"
+
+echo "---format start---"
+pnpm run format
+echo "---format end---"
+
+echo "---eslint start---"
+pnpm run lint
+echo "---eslint end---"
diff --git a/web_demos/minicpm-o_2.6/web_server/.prettierrc.json b/web_demos/minicpm-o_2.6/web_server/.prettierrc.json
new file mode 100644
index 0000000..6ce9ba8
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/.prettierrc.json
@@ -0,0 +1,19 @@
+{
+    "$schema": "https://json.schemastore.org/prettierrc",
+    "semi": true,
+    "trailingComma": "none",
+    "singleQuote": true,
+    "printWidth": 120,
+    "tabWidth": 4,
+    "useTabs": false,
+    "quoteProps": "as-needed",
+    "bracketSpacing": true,
+    "jsxBracketSameLine": false,
+    "arrowParens": "avoid",
+    "endOfLine": "auto",
+    "htmlWhitespaceSensitivity": "css",
+    "cssDeclarationSortOrder": "alphabetical",
+    "tableContentIndentation": "align",
+    "vueIndentScriptAndStyle": true,
+    "proseWrap": "preserve"
+}
diff --git a/web_demos/minicpm-o_2.6/web_server/.vscode/extensions.json b/web_demos/minicpm-o_2.6/web_server/.vscode/extensions.json
new file mode 100644
index 0000000..a1c7588
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/.vscode/extensions.json
@@ -0,0 +1,3 @@
+{
+    "recommendations": ["Vue.volar", "dbaeumer.vscode-eslint", "esbenp.prettier-vscode"]
+}
diff --git a/web_demos/minicpm-o_2.6/web_server/Dockerfile b/web_demos/minicpm-o_2.6/web_server/Dockerfile
new file mode 100644
index 0000000..5e60a7b
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/Dockerfile
@@ -0,0 +1,21 @@
+# FROM 基于node的版本镜像，并通过构建阶段命名，将有node环境的阶段命名为build-stage
+FROM modelbest-registry-vpc.cn-beijing.cr.aliyuncs.com/modelbest/playground:20.10.0 as build-stage
+# 设置工作区为 /build 于系统文件隔离
+WORKDIR /build
+COPY . /build
+
+# 在容器中安装依赖
+RUN npm config set registry https://registry.npmmirror.com/ 
+# 或者用源 https://registry.npm.taobao.org
+RUN npm i pnpm -g
+RUN pnpm config set registry https://registry.npmmirror.com/
+RUN pnpm install
+
+# 打包
+RUN pnpm run build
+
+# production stage
+FROM modelbest-registry-vpc.cn-beijing.cr.aliyuncs.com/modelbest/playground:alpine as production-stage
+COPY --from=build-stage /build/dist /usr/share/nginx/html
+COPY nginx.conf /etc/nginx/
+EXPOSE 3000
\ No newline at end of file
diff --git a/web_demos/minicpm-o_2.6/web_server/README.md b/web_demos/minicpm-o_2.6/web_server/README.md
new file mode 100644
index 0000000..96d84bb
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/README.md
@@ -0,0 +1,74 @@
+## Language
+
+-   [English](#english)
+-   [中文](#中文)
+
+---
+
+# English
+
+## important
+
+This project depends on Node and PNPM. If they are not installed, please install them.
+
+
+## Project Setup
+
+```sh
+pnpm install
+```
+
+## Compile and Hot-Reload for Development
+
+```sh
+pnpm run dev
+```
+
+## Compile and Minify for Production
+
+```sh
+pnpm run build
+```
+
+### Tips
+
+If you want to use your own backend in the development environment, please modify the proxy object in <font color="red">vite.config.js</font> located in the root directory.
+
+### Recommended IDE Setup
+
+[VSCode](https://code.visualstudio.com/)
+
+---
+
+# 中文
+
+## 重要
+
+这个项目依赖于node、pnpm环境，如果你的PC上没有，请先安装。
+
+## 安装依赖
+
+```sh
+pnpm install
+```
+
+## 运行在本地开发模式下（可热更新）
+
+```sh
+pnpm run dev
+```
+
+## 编译代码（用于生产环境）
+
+```sh
+pnpm run build
+```
+
+### 注意
+
+如果你想在本地开发模式下运行项目，并且调用自己的后端服务，请修改项目根目录下的<font color="red">vite.config.js</font>文件中的proxy配置。
+
+### 推荐IDE
+
+[VSCode](https://code.visualstudio.com/)
+
diff --git a/web_demos/minicpm-o_2.6/web_server/components.d.ts b/web_demos/minicpm-o_2.6/web_server/components.d.ts
new file mode 100644
index 0000000..2cecf2f
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/components.d.ts
@@ -0,0 +1,31 @@
+/* eslint-disable */
+/* prettier-ignore */
+// @ts-nocheck
+// Generated by unplugin-vue-components
+// Read more: https://github.com/vuejs/core/pull/3399
+export {}
+
+declare module 'vue' {
+  export interface GlobalComponents {
+    ElButton: typeof import('element-plus/es')['ElButton']
+    ElCheckbox: typeof import('element-plus/es')['ElCheckbox']
+    ElCheckboxGroup: typeof import('element-plus/es')['ElCheckboxGroup']
+    ElDialog: typeof import('element-plus/es')['ElDialog']
+    ElDropdown: typeof import('element-plus/es')['ElDropdown']
+    ElDropdownItem: typeof import('element-plus/es')['ElDropdownItem']
+    ElDropdownMenu: typeof import('element-plus/es')['ElDropdownMenu']
+    ElForm: typeof import('element-plus/es')['ElForm']
+    ElFormItem: typeof import('element-plus/es')['ElFormItem']
+    ElIcon: typeof import('element-plus/es')['ElIcon']
+    ElInput: typeof import('element-plus/es')['ElInput']
+    ElTooltip: typeof import('element-plus/es')['ElTooltip']
+    Lottie: typeof import('./src/components/Lottie/index.vue')['default']
+    RouterLink: typeof import('vue-router')['RouterLink']
+    RouterView: typeof import('vue-router')['RouterView']
+    SiderMenu: typeof import('./src/components/SiderMenu/index.vue')['default']
+    Toast: typeof import('./src/components/Toast/index.vue')['default']
+  }
+  export interface ComponentCustomProperties {
+    vInfiniteScroll: typeof import('element-plus/es')['ElInfiniteScroll']
+  }
+}
diff --git a/web_demos/minicpm-o_2.6/web_server/index.html b/web_demos/minicpm-o_2.6/web_server/index.html
new file mode 100644
index 0000000..d7b413b
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/index.html
@@ -0,0 +1,13 @@
+<!doctype html>
+<html lang="en">
+    <head>
+        <meta charset="UTF-8" />
+        <link rel="icon" href="/favicon.svg" />
+        <meta name="viewport" content="viewport-fit=cover,maximum-scale=1" />
+        <title>MiniCPM-omni</title>
+    </head>
+    <body>
+        <div id="app"></div>
+        <script type="module" src="/src/main.js"></script>
+    </body>
+</html>
diff --git a/web_demos/minicpm-o_2.6/web_server/nginx.conf b/web_demos/minicpm-o_2.6/web_server/nginx.conf
new file mode 100644
index 0000000..e2507b9
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/nginx.conf
@@ -0,0 +1,110 @@
+user root;
+worker_processes auto;
+pid /run/nginx.pid;
+include /etc/nginx/modules-enabled/*.conf;
+
+events {
+	worker_connections 768;
+	# multi_accept on;
+}
+
+http {
+
+	##
+	# Basic Settings
+	##
+
+	client_max_body_size 20M;
+
+	sendfile on;
+	tcp_nopush on;
+	tcp_nodelay on;
+	keepalive_timeout 65;
+	types_hash_max_size 2048;
+	# server_tokens off;
+
+	# server_names_hash_bucket_size 64;
+	# server_name_in_redirect off;
+
+	include /etc/nginx/mime.types;
+	default_type application/octet-stream;
+
+	##
+	# SSL Settings
+	##
+
+	ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3; # Dropping SSLv3, ref: POODLE
+	ssl_prefer_server_ciphers on;
+
+	##
+	# Logging Settings
+	##
+
+	access_log /var/log/nginx/access.log;
+	error_log /var/log/nginx/error.log;
+
+	##
+	# Gzip Settings
+	##
+
+	gzip on;
+
+	# gzip_vary on;
+	# gzip_proxied any;
+	# gzip_comp_level 6;
+	# gzip_buffers 16 8k;
+	# gzip_http_version 1.1;
+	# gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
+
+    ##
+	# Virtual Host Configs
+	##
+	server {
+		#	listen 8080;
+		server_name localhost;
+
+		add_header Access-Control-Allow-Origin *;
+		add_header Access-Control-Allow-Headers X-Requested-With;
+		add_header Access-Control-Allow-Methods GET,POST,OPTIONS;
+
+        # 后端请求
+        location /api/v1 {
+            proxy_pass http://127.0.0.1:32550;
+            proxy_set_header Host $host;
+            proxy_set_header Connection "";
+            chunked_transfer_encoding off;
+            proxy_set_header X-Accel-Buffering off;  # 这里设置X-Accel-Buffering头部
+            add_header X-Accel-Buffering off;         # 这里是用于响应中显示X-Accel-Buffering头部
+            proxy_http_version 1.1;
+            # 关闭 Nginx 缓存
+            proxy_buffering off;
+            proxy_cache off;
+            # 禁用 Nginx 默认缓冲条件
+            sendfile off;
+            tcp_nodelay on;
+        }
+        location /ws {
+            proxy_pass http://127.0.0.1:32550;
+            proxy_http_version 1.1;
+            proxy_set_header Upgrade $http_upgrade;
+            proxy_set_header Connection 'upgrade';
+            proxy_set_header Host $host;
+            proxy_cache_bypass $http_upgrade;
+        }
+		location / {
+			root /usr/share/nginx/html;
+
+			index index.html index.htm;
+			try_files $uri $uri/ /index.html;
+		}
+
+		location @router {
+			rewrite ^.*$ /index.html last;
+		}
+
+		location =/robots.txt {
+			index robots.txt;
+		}
+
+	}
+}
diff --git a/web_demos/minicpm-o_2.6/web_server/package.json b/web_demos/minicpm-o_2.6/web_server/package.json
new file mode 100644
index 0000000..af9e051
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/package.json
@@ -0,0 +1,45 @@
+{
+    "name": "web",
+    "version": "0.0.0",
+    "private": true,
+    "type": "module",
+    "scripts": {
+        "dev": "vite",
+        "build": "vite build",
+        "preview": "vite preview",
+        "lint": "eslint . --ext .vue,.js,.jsx,.cjs,.mjs --fix --ignore-path .gitignore",
+        "format": "prettier --write src/",
+        "prepare": "husky install"
+    },
+    "dependencies": {
+        "@element-plus/icons-vue": "^2.3.1",
+        "@microsoft/fetch-event-source": "^2.0.1",
+        "@ricky0123/vad-web": "^0.0.22",
+        "@vueuse/core": "^11.0.3",
+        "axios": "^1.7.7",
+        "clipboard": "^2.0.11",
+        "el-table-infinite-scroll": "^3.0.6",
+        "element-plus": "^2.8.1",
+        "pinia": "^2.1.7",
+        "unplugin-icons": "^0.19.3",
+        "vue": "^3.4.29",
+        "vue-i18n": "^11.0.1",
+        "vue-router": "^4.3.3"
+    },
+    "devDependencies": {
+        "@iconify-json/fluent": "^1.2.1",
+        "@iconify-json/material-symbols": "^1.2.1",
+        "@rushstack/eslint-patch": "^1.8.0",
+        "@vitejs/plugin-vue": "^5.0.5",
+        "@vue/eslint-config-prettier": "^9.0.0",
+        "eslint": "^8.57.0",
+        "eslint-plugin-vue": "^9.23.0",
+        "husky": "^9.1.5",
+        "less": "^4.2.0",
+        "prettier": "^3.2.5",
+        "unplugin-auto-import": "^0.18.2",
+        "unplugin-vue-components": "^0.27.4",
+        "vite": "^5.3.1",
+        "vite-plugin-vue-devtools": "^7.3.1"
+    }
+}
diff --git a/web_demos/minicpm-o_2.6/web_server/pnpm-lock.yaml b/web_demos/minicpm-o_2.6/web_server/pnpm-lock.yaml
new file mode 100644
index 0000000..248aef9
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/pnpm-lock.yaml
@@ -0,0 +1,3743 @@
+lockfileVersion: '9.0'
+
+settings:
+  autoInstallPeers: true
+  excludeLinksFromLockfile: false
+
+importers:
+
+  .:
+    dependencies:
+      '@element-plus/icons-vue':
+        specifier: ^2.3.1
+        version: 2.3.1(vue@3.5.0)
+      '@microsoft/fetch-event-source':
+        specifier: ^2.0.1
+        version: 2.0.1
+      '@ricky0123/vad-web':
+        specifier: ^0.0.22
+        version: 0.0.22
+      '@vueuse/core':
+        specifier: ^11.0.3
+        version: 11.0.3(vue@3.5.0)
+      axios:
+        specifier: ^1.7.7
+        version: 1.7.7
+      clipboard:
+        specifier: ^2.0.11
+        version: 2.0.11
+      el-table-infinite-scroll:
+        specifier: ^3.0.6
+        version: 3.0.6
+      element-plus:
+        specifier: ^2.8.1
+        version: 2.8.1(vue@3.5.0)
+      pinia:
+        specifier: ^2.1.7
+        version: 2.2.2(vue@3.5.0)
+      unplugin-icons:
+        specifier: ^0.19.3
+        version: 0.19.3(@vue/compiler-sfc@3.5.0)
+      vue:
+        specifier: ^3.4.29
+        version: 3.5.0
+      vue-i18n:
+        specifier: ^11.0.1
+        version: 11.0.1(vue@3.5.0)
+      vue-router:
+        specifier: ^4.3.3
+        version: 4.4.3(vue@3.5.0)
+    devDependencies:
+      '@iconify-json/fluent':
+        specifier: ^1.2.1
+        version: 1.2.1
+      '@iconify-json/material-symbols':
+        specifier: ^1.2.1
+        version: 1.2.1
+      '@rushstack/eslint-patch':
+        specifier: ^1.8.0
+        version: 1.10.4
+      '@vitejs/plugin-vue':
+        specifier: ^5.0.5
+        version: 5.1.3(vite@5.4.3(@types/node@22.10.3)(less@4.2.0))(vue@3.5.0)
+      '@vue/eslint-config-prettier':
+        specifier: ^9.0.0
+        version: 9.0.0(eslint@8.57.0)(prettier@3.3.3)
+      eslint:
+        specifier: ^8.57.0
+        version: 8.57.0
+      eslint-plugin-vue:
+        specifier: ^9.23.0
+        version: 9.28.0(eslint@8.57.0)
+      husky:
+        specifier: ^9.1.5
+        version: 9.1.5
+      less:
+        specifier: ^4.2.0
+        version: 4.2.0
+      prettier:
+        specifier: ^3.2.5
+        version: 3.3.3
+      unplugin-auto-import:
+        specifier: ^0.18.2
+        version: 0.18.2(@vueuse/core@11.0.3(vue@3.5.0))(rollup@4.21.2)
+      unplugin-vue-components:
+        specifier: ^0.27.4
+        version: 0.27.4(@babel/parser@7.25.6)(rollup@4.21.2)(vue@3.5.0)
+      vite:
+        specifier: ^5.3.1
+        version: 5.4.3(@types/node@22.10.3)(less@4.2.0)
+      vite-plugin-vue-devtools:
+        specifier: ^7.3.1
+        version: 7.4.0(rollup@4.21.2)(vite@5.4.3(@types/node@22.10.3)(less@4.2.0))(vue@3.5.0)
+
+packages:
+
+  '@ampproject/remapping@2.3.0':
+    resolution: {integrity: sha512-30iZtAPgz+LTIYoeivqYo853f02jBYSd5uGnGpkFV0M3xOt9aN73erkgYAmZU43x4VfqcnLxW9Kpg3R5LC4YYw==}
+    engines: {node: '>=6.0.0'}
+
+  '@antfu/install-pkg@0.4.1':
+    resolution: {integrity: sha512-T7yB5QNG29afhWVkVq7XeIMBa5U/vs9mX69YqayXypPRmYzUmzwnYltplHmPtZ4HPCn+sQKeXW8I47wCbuBOjw==}
+
+  '@antfu/utils@0.7.10':
+    resolution: {integrity: sha512-+562v9k4aI80m1+VuMHehNJWLOFjBnXn3tdOitzD0il5b7smkSBal4+a3oKiQTbrwMmN/TBUMDvbdoWDehgOww==}
+
+  '@babel/code-frame@7.24.7':
+    resolution: {integrity: sha512-BcYH1CVJBO9tvyIZ2jVeXgSIMvGZ2FDRvDdOIVQyuklNKSsx+eppDEBq/g47Ayw+RqNFE+URvOShmf+f/qwAlA==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/compat-data@7.25.4':
+    resolution: {integrity: sha512-+LGRog6RAsCJrrrg/IO6LGmpphNe5DiK30dGjCoxxeGv49B10/3XYGxPsAwrDlMFcFEvdAUavDT8r9k/hSyQqQ==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/core@7.25.2':
+    resolution: {integrity: sha512-BBt3opiCOxUr9euZ5/ro/Xv8/V7yJ5bjYMqG/C1YAo8MIKAnumZalCN+msbci3Pigy4lIQfPUpfMM27HMGaYEA==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/generator@7.25.6':
+    resolution: {integrity: sha512-VPC82gr1seXOpkjAAKoLhP50vx4vGNlF4msF64dSFq1P8RfB+QAuJWGHPXXPc8QyfVWwwB/TNNU4+ayZmHNbZw==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/helper-annotate-as-pure@7.24.7':
+    resolution: {integrity: sha512-BaDeOonYvhdKw+JoMVkAixAAJzG2jVPIwWoKBPdYuY9b452e2rPuI9QPYh3KpofZ3pW2akOmwZLOiOsHMiqRAg==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/helper-compilation-targets@7.25.2':
+    resolution: {integrity: sha512-U2U5LsSaZ7TAt3cfaymQ8WHh0pxvdHoEk6HVpaexxixjyEquMh0L0YNJNM6CTGKMXV1iksi0iZkGw4AcFkPaaw==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/helper-create-class-features-plugin@7.25.4':
+    resolution: {integrity: sha512-ro/bFs3/84MDgDmMwbcHgDa8/E6J3QKNTk4xJJnVeFtGE+tL0K26E3pNxhYz2b67fJpt7Aphw5XcploKXuCvCQ==}
+    engines: {node: '>=6.9.0'}
+    peerDependencies:
+      '@babel/core': ^7.0.0
+
+  '@babel/helper-member-expression-to-functions@7.24.8':
+    resolution: {integrity: sha512-LABppdt+Lp/RlBxqrh4qgf1oEH/WxdzQNDJIu5gC/W1GyvPVrOBiItmmM8wan2fm4oYqFuFfkXmlGpLQhPY8CA==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/helper-module-imports@7.22.15':
+    resolution: {integrity: sha512-0pYVBnDKZO2fnSPCrgM/6WMc7eS20Fbok+0r88fp+YtWVLZrp4CkafFGIp+W0VKw4a22sgebPT99y+FDNMdP4w==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/helper-module-imports@7.24.7':
+    resolution: {integrity: sha512-8AyH3C+74cgCVVXow/myrynrAGv+nTVg5vKu2nZph9x7RcRwzmh0VFallJuFTZ9mx6u4eSdXZfcOzSqTUm0HCA==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/helper-module-transforms@7.25.2':
+    resolution: {integrity: sha512-BjyRAbix6j/wv83ftcVJmBt72QtHI56C7JXZoG2xATiLpmoC7dpd8WnkikExHDVPpi/3qCmO6WY1EaXOluiecQ==}
+    engines: {node: '>=6.9.0'}
+    peerDependencies:
+      '@babel/core': ^7.0.0
+
+  '@babel/helper-optimise-call-expression@7.24.7':
+    resolution: {integrity: sha512-jKiTsW2xmWwxT1ixIdfXUZp+P5yURx2suzLZr5Hi64rURpDYdMW0pv+Uf17EYk2Rd428Lx4tLsnjGJzYKDM/6A==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/helper-plugin-utils@7.24.8':
+    resolution: {integrity: sha512-FFWx5142D8h2Mgr/iPVGH5G7w6jDn4jUSpZTyDnQO0Yn7Ks2Kuz6Pci8H6MPCoUJegd/UZQ3tAvfLCxQSnWWwg==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/helper-replace-supers@7.25.0':
+    resolution: {integrity: sha512-q688zIvQVYtZu+i2PsdIu/uWGRpfxzr5WESsfpShfZECkO+d2o+WROWezCi/Q6kJ0tfPa5+pUGUlfx2HhrA3Bg==}
+    engines: {node: '>=6.9.0'}
+    peerDependencies:
+      '@babel/core': ^7.0.0
+
+  '@babel/helper-simple-access@7.24.7':
+    resolution: {integrity: sha512-zBAIvbCMh5Ts+b86r/CjU+4XGYIs+R1j951gxI3KmmxBMhCg4oQMsv6ZXQ64XOm/cvzfU1FmoCyt6+owc5QMYg==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/helper-skip-transparent-expression-wrappers@7.24.7':
+    resolution: {integrity: sha512-IO+DLT3LQUElMbpzlatRASEyQtfhSE0+m465v++3jyyXeBTBUjtVZg28/gHeV5mrTJqvEKhKroBGAvhW+qPHiQ==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/helper-string-parser@7.24.8':
+    resolution: {integrity: sha512-pO9KhhRcuUyGnJWwyEgnRJTSIZHiT+vMD0kPeD+so0l7mxkMT19g3pjY9GTnHySck/hDzq+dtW/4VgnMkippsQ==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/helper-validator-identifier@7.24.7':
+    resolution: {integrity: sha512-rR+PBcQ1SMQDDyF6X0wxtG8QyLCgUB0eRAGguqRLfkCA87l7yAP7ehq8SNj96OOGTO8OBV70KhuFYcIkHXOg0w==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/helper-validator-option@7.24.8':
+    resolution: {integrity: sha512-xb8t9tD1MHLungh/AIoWYN+gVHaB9kwlu8gffXGSt3FFEIT7RjS+xWbc2vUD1UTZdIpKj/ab3rdqJ7ufngyi2Q==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/helpers@7.25.6':
+    resolution: {integrity: sha512-Xg0tn4HcfTijTwfDwYlvVCl43V6h4KyVVX2aEm4qdO/PC6L2YvzLHFdmxhoeSA3eslcE6+ZVXHgWwopXYLNq4Q==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/highlight@7.24.7':
+    resolution: {integrity: sha512-EStJpq4OuY8xYfhGVXngigBJRWxftKX9ksiGDnmlY3o7B/V7KIAc9X4oiK87uPJSc/vs5L869bem5fhZa8caZw==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/parser@7.25.6':
+    resolution: {integrity: sha512-trGdfBdbD0l1ZPmcJ83eNxB9rbEax4ALFTF7fN386TMYbeCQbyme5cOEXQhbGXKebwGaB/J52w1mrklMcbgy6Q==}
+    engines: {node: '>=6.0.0'}
+    hasBin: true
+
+  '@babel/plugin-proposal-decorators@7.24.7':
+    resolution: {integrity: sha512-RL9GR0pUG5Kc8BUWLNDm2T5OpYwSX15r98I0IkgmRQTXuELq/OynH8xtMTMvTJFjXbMWFVTKtYkTaYQsuAwQlQ==}
+    engines: {node: '>=6.9.0'}
+    peerDependencies:
+      '@babel/core': ^7.0.0-0
+
+  '@babel/plugin-syntax-decorators@7.24.7':
+    resolution: {integrity: sha512-Ui4uLJJrRV1lb38zg1yYTmRKmiZLiftDEvZN2iq3kd9kUFU+PttmzTbAFC2ucRk/XJmtek6G23gPsuZbhrT8fQ==}
+    engines: {node: '>=6.9.0'}
+    peerDependencies:
+      '@babel/core': ^7.0.0-0
+
+  '@babel/plugin-syntax-import-attributes@7.25.6':
+    resolution: {integrity: sha512-sXaDXaJN9SNLymBdlWFA+bjzBhFD617ZaFiY13dGt7TVslVvVgA6fkZOP7Ki3IGElC45lwHdOTrCtKZGVAWeLQ==}
+    engines: {node: '>=6.9.0'}
+    peerDependencies:
+      '@babel/core': ^7.0.0-0
+
+  '@babel/plugin-syntax-import-meta@7.10.4':
+    resolution: {integrity: sha512-Yqfm+XDx0+Prh3VSeEQCPU81yC+JWZ2pDPFSS4ZdpfZhp4MkFMaDC1UqseovEKwSUpnIL7+vK+Clp7bfh0iD7g==}
+    peerDependencies:
+      '@babel/core': ^7.0.0-0
+
+  '@babel/plugin-syntax-jsx@7.24.7':
+    resolution: {integrity: sha512-6ddciUPe/mpMnOKv/U+RSd2vvVy+Yw/JfBB0ZHYjEZt9NLHmCUylNYlsbqCCS1Bffjlb0fCwC9Vqz+sBz6PsiQ==}
+    engines: {node: '>=6.9.0'}
+    peerDependencies:
+      '@babel/core': ^7.0.0-0
+
+  '@babel/plugin-syntax-typescript@7.25.4':
+    resolution: {integrity: sha512-uMOCoHVU52BsSWxPOMVv5qKRdeSlPuImUCB2dlPuBSU+W2/ROE7/Zg8F2Kepbk+8yBa68LlRKxO+xgEVWorsDg==}
+    engines: {node: '>=6.9.0'}
+    peerDependencies:
+      '@babel/core': ^7.0.0-0
+
+  '@babel/plugin-transform-typescript@7.25.2':
+    resolution: {integrity: sha512-lBwRvjSmqiMYe/pS0+1gggjJleUJi7NzjvQ1Fkqtt69hBa/0t1YuW/MLQMAPixfwaQOHUXsd6jeU3Z+vdGv3+A==}
+    engines: {node: '>=6.9.0'}
+    peerDependencies:
+      '@babel/core': ^7.0.0-0
+
+  '@babel/template@7.25.0':
+    resolution: {integrity: sha512-aOOgh1/5XzKvg1jvVz7AVrx2piJ2XBi227DHmbY6y+bM9H2FlN+IfecYu4Xl0cNiiVejlsCri89LUsbj8vJD9Q==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/traverse@7.25.6':
+    resolution: {integrity: sha512-9Vrcx5ZW6UwK5tvqsj0nGpp/XzqthkT0dqIc9g1AdtygFToNtTF67XzYS//dm+SAK9cp3B9R4ZO/46p63SCjlQ==}
+    engines: {node: '>=6.9.0'}
+
+  '@babel/types@7.25.6':
+    resolution: {integrity: sha512-/l42B1qxpG6RdfYf343Uw1vmDjeNhneUXtzhojE7pDgfpEypmRhI6j1kr17XCVv4Cgl9HdAiQY2x0GwKm7rWCw==}
+    engines: {node: '>=6.9.0'}
+
+  '@ctrl/tinycolor@3.6.1':
+    resolution: {integrity: sha512-SITSV6aIXsuVNV3f3O0f2n/cgyEDWoSqtZMYiAmcsYHydcKrOz3gUxB/iXd/Qf08+IZX4KpgNbvUdMBmWz+kcA==}
+    engines: {node: '>=10'}
+
+  '@element-plus/icons-vue@2.3.1':
+    resolution: {integrity: sha512-XxVUZv48RZAd87ucGS48jPf6pKu0yV5UCg9f4FFwtrYxXOwWuVJo6wOvSLKEoMQKjv8GsX/mhP6UsC1lRwbUWg==}
+    peerDependencies:
+      vue: ^3.2.0
+
+  '@esbuild/aix-ppc64@0.21.5':
+    resolution: {integrity: sha512-1SDgH6ZSPTlggy1yI6+Dbkiz8xzpHJEVAlF/AM1tHPLsf5STom9rwtjE4hKAF20FfXXNTFqEYXyJNWh1GiZedQ==}
+    engines: {node: '>=12'}
+    cpu: [ppc64]
+    os: [aix]
+
+  '@esbuild/android-arm64@0.21.5':
+    resolution: {integrity: sha512-c0uX9VAUBQ7dTDCjq+wdyGLowMdtR/GoC2U5IYk/7D1H1JYC0qseD7+11iMP2mRLN9RcCMRcjC4YMclCzGwS/A==}
+    engines: {node: '>=12'}
+    cpu: [arm64]
+    os: [android]
+
+  '@esbuild/android-arm@0.21.5':
+    resolution: {integrity: sha512-vCPvzSjpPHEi1siZdlvAlsPxXl7WbOVUBBAowWug4rJHb68Ox8KualB+1ocNvT5fjv6wpkX6o/iEpbDrf68zcg==}
+    engines: {node: '>=12'}
+    cpu: [arm]
+    os: [android]
+
+  '@esbuild/android-x64@0.21.5':
+    resolution: {integrity: sha512-D7aPRUUNHRBwHxzxRvp856rjUHRFW1SdQATKXH2hqA0kAZb1hKmi02OpYRacl0TxIGz/ZmXWlbZgjwWYaCakTA==}
+    engines: {node: '>=12'}
+    cpu: [x64]
+    os: [android]
+
+  '@esbuild/darwin-arm64@0.21.5':
+    resolution: {integrity: sha512-DwqXqZyuk5AiWWf3UfLiRDJ5EDd49zg6O9wclZ7kUMv2WRFr4HKjXp/5t8JZ11QbQfUS6/cRCKGwYhtNAY88kQ==}
+    engines: {node: '>=12'}
+    cpu: [arm64]
+    os: [darwin]
+
+  '@esbuild/darwin-x64@0.21.5':
+    resolution: {integrity: sha512-se/JjF8NlmKVG4kNIuyWMV/22ZaerB+qaSi5MdrXtd6R08kvs2qCN4C09miupktDitvh8jRFflwGFBQcxZRjbw==}
+    engines: {node: '>=12'}
+    cpu: [x64]
+    os: [darwin]
+
+  '@esbuild/freebsd-arm64@0.21.5':
+    resolution: {integrity: sha512-5JcRxxRDUJLX8JXp/wcBCy3pENnCgBR9bN6JsY4OmhfUtIHe3ZW0mawA7+RDAcMLrMIZaf03NlQiX9DGyB8h4g==}
+    engines: {node: '>=12'}
+    cpu: [arm64]
+    os: [freebsd]
+
+  '@esbuild/freebsd-x64@0.21.5':
+    resolution: {integrity: sha512-J95kNBj1zkbMXtHVH29bBriQygMXqoVQOQYA+ISs0/2l3T9/kj42ow2mpqerRBxDJnmkUDCaQT/dfNXWX/ZZCQ==}
+    engines: {node: '>=12'}
+    cpu: [x64]
+    os: [freebsd]
+
+  '@esbuild/linux-arm64@0.21.5':
+    resolution: {integrity: sha512-ibKvmyYzKsBeX8d8I7MH/TMfWDXBF3db4qM6sy+7re0YXya+K1cem3on9XgdT2EQGMu4hQyZhan7TeQ8XkGp4Q==}
+    engines: {node: '>=12'}
+    cpu: [arm64]
+    os: [linux]
+
+  '@esbuild/linux-arm@0.21.5':
+    resolution: {integrity: sha512-bPb5AHZtbeNGjCKVZ9UGqGwo8EUu4cLq68E95A53KlxAPRmUyYv2D6F0uUI65XisGOL1hBP5mTronbgo+0bFcA==}
+    engines: {node: '>=12'}
+    cpu: [arm]
+    os: [linux]
+
+  '@esbuild/linux-ia32@0.21.5':
+    resolution: {integrity: sha512-YvjXDqLRqPDl2dvRODYmmhz4rPeVKYvppfGYKSNGdyZkA01046pLWyRKKI3ax8fbJoK5QbxblURkwK/MWY18Tg==}
+    engines: {node: '>=12'}
+    cpu: [ia32]
+    os: [linux]
+
+  '@esbuild/linux-loong64@0.21.5':
+    resolution: {integrity: sha512-uHf1BmMG8qEvzdrzAqg2SIG/02+4/DHB6a9Kbya0XDvwDEKCoC8ZRWI5JJvNdUjtciBGFQ5PuBlpEOXQj+JQSg==}
+    engines: {node: '>=12'}
+    cpu: [loong64]
+    os: [linux]
+
+  '@esbuild/linux-mips64el@0.21.5':
+    resolution: {integrity: sha512-IajOmO+KJK23bj52dFSNCMsz1QP1DqM6cwLUv3W1QwyxkyIWecfafnI555fvSGqEKwjMXVLokcV5ygHW5b3Jbg==}
+    engines: {node: '>=12'}
+    cpu: [mips64el]
+    os: [linux]
+
+  '@esbuild/linux-ppc64@0.21.5':
+    resolution: {integrity: sha512-1hHV/Z4OEfMwpLO8rp7CvlhBDnjsC3CttJXIhBi+5Aj5r+MBvy4egg7wCbe//hSsT+RvDAG7s81tAvpL2XAE4w==}
+    engines: {node: '>=12'}
+    cpu: [ppc64]
+    os: [linux]
+
+  '@esbuild/linux-riscv64@0.21.5':
+    resolution: {integrity: sha512-2HdXDMd9GMgTGrPWnJzP2ALSokE/0O5HhTUvWIbD3YdjME8JwvSCnNGBnTThKGEB91OZhzrJ4qIIxk/SBmyDDA==}
+    engines: {node: '>=12'}
+    cpu: [riscv64]
+    os: [linux]
+
+  '@esbuild/linux-s390x@0.21.5':
+    resolution: {integrity: sha512-zus5sxzqBJD3eXxwvjN1yQkRepANgxE9lgOW2qLnmr8ikMTphkjgXu1HR01K4FJg8h1kEEDAqDcZQtbrRnB41A==}
+    engines: {node: '>=12'}
+    cpu: [s390x]
+    os: [linux]
+
+  '@esbuild/linux-x64@0.21.5':
+    resolution: {integrity: sha512-1rYdTpyv03iycF1+BhzrzQJCdOuAOtaqHTWJZCWvijKD2N5Xu0TtVC8/+1faWqcP9iBCWOmjmhoH94dH82BxPQ==}
+    engines: {node: '>=12'}
+    cpu: [x64]
+    os: [linux]
+
+  '@esbuild/netbsd-x64@0.21.5':
+    resolution: {integrity: sha512-Woi2MXzXjMULccIwMnLciyZH4nCIMpWQAs049KEeMvOcNADVxo0UBIQPfSmxB3CWKedngg7sWZdLvLczpe0tLg==}
+    engines: {node: '>=12'}
+    cpu: [x64]
+    os: [netbsd]
+
+  '@esbuild/openbsd-x64@0.21.5':
+    resolution: {integrity: sha512-HLNNw99xsvx12lFBUwoT8EVCsSvRNDVxNpjZ7bPn947b8gJPzeHWyNVhFsaerc0n3TsbOINvRP2byTZ5LKezow==}
+    engines: {node: '>=12'}
+    cpu: [x64]
+    os: [openbsd]
+
+  '@esbuild/sunos-x64@0.21.5':
+    resolution: {integrity: sha512-6+gjmFpfy0BHU5Tpptkuh8+uw3mnrvgs+dSPQXQOv3ekbordwnzTVEb4qnIvQcYXq6gzkyTnoZ9dZG+D4garKg==}
+    engines: {node: '>=12'}
+    cpu: [x64]
+    os: [sunos]
+
+  '@esbuild/win32-arm64@0.21.5':
+    resolution: {integrity: sha512-Z0gOTd75VvXqyq7nsl93zwahcTROgqvuAcYDUr+vOv8uHhNSKROyU961kgtCD1e95IqPKSQKH7tBTslnS3tA8A==}
+    engines: {node: '>=12'}
+    cpu: [arm64]
+    os: [win32]
+
+  '@esbuild/win32-ia32@0.21.5':
+    resolution: {integrity: sha512-SWXFF1CL2RVNMaVs+BBClwtfZSvDgtL//G/smwAc5oVK/UPu2Gu9tIaRgFmYFFKrmg3SyAjSrElf0TiJ1v8fYA==}
+    engines: {node: '>=12'}
+    cpu: [ia32]
+    os: [win32]
+
+  '@esbuild/win32-x64@0.21.5':
+    resolution: {integrity: sha512-tQd/1efJuzPC6rCFwEvLtci/xNFcTZknmXs98FYDfGE4wP9ClFV98nyKrzJKVPMhdDnjzLhdUyMX4PsQAPjwIw==}
+    engines: {node: '>=12'}
+    cpu: [x64]
+    os: [win32]
+
+  '@eslint-community/eslint-utils@4.4.0':
+    resolution: {integrity: sha512-1/sA4dwrzBAyeUoQ6oxahHKmrZvsnLCg4RfxW3ZFGGmQkSNQPFNLV9CUEFQP1x9EYXHTo5p6xdhZM1Ne9p/AfA==}
+    engines: {node: ^12.22.0 || ^14.17.0 || >=16.0.0}
+    peerDependencies:
+      eslint: ^6.0.0 || ^7.0.0 || >=8.0.0
+
+  '@eslint-community/regexpp@4.11.0':
+    resolution: {integrity: sha512-G/M/tIiMrTAxEWRfLfQJMmGNX28IxBg4PBz8XqQhqUHLFI6TL2htpIB1iQCj144V5ee/JaKyT9/WZ0MGZWfA7A==}
+    engines: {node: ^12.0.0 || ^14.0.0 || >=16.0.0}
+
+  '@eslint/eslintrc@2.1.4':
+    resolution: {integrity: sha512-269Z39MS6wVJtsoUl10L60WdkhJVdPG24Q4eZTH3nnF6lpvSShEK3wQjDX9JRWAUPvPh7COouPpU9IrqaZFvtQ==}
+    engines: {node: ^12.22.0 || ^14.17.0 || >=16.0.0}
+
+  '@eslint/js@8.57.0':
+    resolution: {integrity: sha512-Ys+3g2TaW7gADOJzPt83SJtCDhMjndcDMFVQ/Tj9iA1BfJzFKD9mAUXT3OenpuPHbI6P/myECxRJrofUsDx/5g==}
+    engines: {node: ^12.22.0 || ^14.17.0 || >=16.0.0}
+
+  '@floating-ui/core@1.6.7':
+    resolution: {integrity: sha512-yDzVT/Lm101nQ5TCVeK65LtdN7Tj4Qpr9RTXJ2vPFLqtLxwOrpoxAHAJI8J3yYWUc40J0BDBheaitK5SJmno2g==}
+
+  '@floating-ui/dom@1.6.10':
+    resolution: {integrity: sha512-fskgCFv8J8OamCmyun8MfjB1Olfn+uZKjOKZ0vhYF3gRmEUXcGOjxWL8bBr7i4kIuPZ2KD2S3EUIOxnjC8kl2A==}
+
+  '@floating-ui/utils@0.2.7':
+    resolution: {integrity: sha512-X8R8Oj771YRl/w+c1HqAC1szL8zWQRwFvgDwT129k9ACdBoud/+/rX9V0qiMl6LWUdP9voC2nDVZYPMQQsb6eA==}
+
+  '@humanwhocodes/config-array@0.11.14':
+    resolution: {integrity: sha512-3T8LkOmg45BV5FICb15QQMsyUSWrQ8AygVfC7ZG32zOalnqrilm018ZVCw0eapXux8FtA33q8PSRSstjee3jSg==}
+    engines: {node: '>=10.10.0'}
+    deprecated: Use @eslint/config-array instead
+
+  '@humanwhocodes/module-importer@1.0.1':
+    resolution: {integrity: sha512-bxveV4V8v5Yb4ncFTT3rPSgZBOpCkjfK0y4oVVVJwIuDVBRMDXrPyXRL988i5ap9m9bnyEEjWfm5WkBmtffLfA==}
+    engines: {node: '>=12.22'}
+
+  '@humanwhocodes/object-schema@2.0.3':
+    resolution: {integrity: sha512-93zYdMES/c1D69yZiKDBj0V24vqNzB/koF26KPaagAfd3P/4gUlh3Dys5ogAK+Exi9QyzlD8x/08Zt7wIKcDcA==}
+    deprecated: Use @eslint/object-schema instead
+
+  '@iconify-json/fluent@1.2.1':
+    resolution: {integrity: sha512-jvKSynYz6iuYuZDwDry4BxVcDHMrr/cFN5DMi6JOdTubYEHCT+oxyyIVjjSKtX5ebnlhJIuHVI+LhC/U9HSoFQ==}
+
+  '@iconify-json/material-symbols@1.2.1':
+    resolution: {integrity: sha512-r9yaBzlUmN87aCTSoCNtDCd7R9F0iVDjNPL9QHHhm1WglFJvTUKx9iBC5xcZpP0qN0bg9R5FkM90CndWxEBAnw==}
+
+  '@iconify/types@2.0.0':
+    resolution: {integrity: sha512-+wluvCrRhXrhyOmRDJ3q8mux9JkKy5SJ/v8ol2tu4FVjyYvtEzkc/3pK15ET6RKg4b4w4BmTk1+gsCUhf21Ykg==}
+
+  '@iconify/utils@2.1.32':
+    resolution: {integrity: sha512-LeifFZPPKu28O3AEDpYJNdEbvS4/ojAPyIW+pF/vUpJTYnbTiXUHkCh0bwgFRzKvdpb8H4Fbfd/742++MF4fPQ==}
+
+  '@intlify/core-base@11.0.1':
+    resolution: {integrity: sha512-NAmhw1l/llM0HZRpagR/ChJTNymW4ll6/4EDSJML5c8L5Hl/+k6UyF8EIgE6DeHpfheQujkSRngauViHqq6jJQ==}
+    engines: {node: '>= 16'}
+
+  '@intlify/message-compiler@11.0.1':
+    resolution: {integrity: sha512-5RFH8x+Mn3mbjcHXnb6KCXGiczBdiQkWkv99iiA0JpKrNuTAQeW59Pjq/uObMB0eR0shnKYGTkIJxum+DbL3sw==}
+    engines: {node: '>= 16'}
+
+  '@intlify/shared@11.0.1':
+    resolution: {integrity: sha512-lH164+aDDptHZ3dBDbIhRa1dOPQUp+83iugpc+1upTOWCnwyC1PVis6rSWNMMJ8VQxvtHQB9JMib48K55y0PvQ==}
+    engines: {node: '>= 16'}
+
+  '@jridgewell/gen-mapping@0.3.5':
+    resolution: {integrity: sha512-IzL8ZoEDIBRWEzlCcRhOaCupYyN5gdIK+Q6fbFdPDg6HqX6jpkItn7DFIpW9LQzXG6Df9sA7+OKnq0qlz/GaQg==}
+    engines: {node: '>=6.0.0'}
+
+  '@jridgewell/resolve-uri@3.1.2':
+    resolution: {integrity: sha512-bRISgCIjP20/tbWSPWMEi54QVPRZExkuD9lJL+UIxUKtwVJA8wW1Trb1jMs1RFXo1CBTNZ/5hpC9QvmKWdopKw==}
+    engines: {node: '>=6.0.0'}
+
+  '@jridgewell/set-array@1.2.1':
+    resolution: {integrity: sha512-R8gLRTZeyp03ymzP/6Lil/28tGeGEzhx1q2k703KGWRAI1VdvPIXdG70VJc2pAMw3NA6JKL5hhFu1sJX0Mnn/A==}
+    engines: {node: '>=6.0.0'}
+
+  '@jridgewell/sourcemap-codec@1.5.0':
+    resolution: {integrity: sha512-gv3ZRaISU3fjPAgNsriBRqGWQL6quFx04YMPW/zD8XMLsU32mhCCbfbO6KZFLjvYpCZ8zyDEgqsgf+PwPaM7GQ==}
+
+  '@jridgewell/trace-mapping@0.3.25':
+    resolution: {integrity: sha512-vNk6aEwybGtawWmy/PzwnGDOjCkLWSD2wqvjGGAgOAwCGWySYXfYoxt00IJkTF+8Lb57DwOb3Aa0o9CApepiYQ==}
+
+  '@microsoft/fetch-event-source@2.0.1':
+    resolution: {integrity: sha512-W6CLUJ2eBMw3Rec70qrsEW0jOm/3twwJv21mrmj2yORiaVmVYGS4sSS5yUwvQc1ZlDLYGPnClVWmUUMagKNsfA==}
+
+  '@nodelib/fs.scandir@2.1.5':
+    resolution: {integrity: sha512-vq24Bq3ym5HEQm2NKCr3yXDwjc7vTsEThRDnkp2DK9p1uqLR+DHurm/NOTo0KG7HYHU7eppKZj3MyqYuMBf62g==}
+    engines: {node: '>= 8'}
+
+  '@nodelib/fs.stat@2.0.5':
+    resolution: {integrity: sha512-RkhPPp2zrqDAQA/2jNhnztcPAlv64XdhIp7a7454A5ovI7Bukxgt7MX7udwAu3zg1DcpPU0rz3VV1SeaqvY4+A==}
+    engines: {node: '>= 8'}
+
+  '@nodelib/fs.walk@1.2.8':
+    resolution: {integrity: sha512-oGB+UxlgWcgQkgwo8GcEGwemoTFt3FIO9ababBmaGwXIoBKZ+GTy0pP185beGg7Llih/NSHSV2XAs1lnznocSg==}
+    engines: {node: '>= 8'}
+
+  '@pkgr/core@0.1.1':
+    resolution: {integrity: sha512-cq8o4cWH0ibXh9VGi5P20Tu9XF/0fFXl9EUinr9QfTM7a7p0oTA4iJRCQWppXR1Pg8dSM0UCItCkPwsk9qWWYA==}
+    engines: {node: ^12.20.0 || ^14.18.0 || >=16.0.0}
+
+  '@polka/url@1.0.0-next.25':
+    resolution: {integrity: sha512-j7P6Rgr3mmtdkeDGTe0E/aYyWEWVtc5yFXtHCRHs28/jptDEWfaVOc5T7cblqy1XKPPfCxJc/8DwQ5YgLOZOVQ==}
+
+  '@protobufjs/aspromise@1.1.2':
+    resolution: {integrity: sha512-j+gKExEuLmKwvz3OgROXtrJ2UG2x8Ch2YZUxahh+s1F2HZ+wAceUNLkvy6zKCPVRkU++ZWQrdxsUeQXmcg4uoQ==}
+
+  '@protobufjs/base64@1.1.2':
+    resolution: {integrity: sha512-AZkcAA5vnN/v4PDqKyMR5lx7hZttPDgClv83E//FMNhR2TMcLUhfRUBHCmSl0oi9zMgDDqRUJkSxO3wm85+XLg==}
+
+  '@protobufjs/codegen@2.0.4':
+    resolution: {integrity: sha512-YyFaikqM5sH0ziFZCN3xDC7zeGaB/d0IUb9CATugHWbd1FRFwWwt4ld4OYMPWu5a3Xe01mGAULCdqhMlPl29Jg==}
+
+  '@protobufjs/eventemitter@1.1.0':
+    resolution: {integrity: sha512-j9ednRT81vYJ9OfVuXG6ERSTdEL1xVsNgqpkxMsbIabzSo3goCjDIveeGv5d03om39ML71RdmrGNjG5SReBP/Q==}
+
+  '@protobufjs/fetch@1.1.0':
+    resolution: {integrity: sha512-lljVXpqXebpsijW71PZaCYeIcE5on1w5DlQy5WH6GLbFryLUrBD4932W/E2BSpfRJWseIL4v/KPgBFxDOIdKpQ==}
+
+  '@protobufjs/float@1.0.2':
+    resolution: {integrity: sha512-Ddb+kVXlXst9d+R9PfTIxh1EdNkgoRe5tOX6t01f1lYWOvJnSPDBlG241QLzcyPdoNTsblLUdujGSE4RzrTZGQ==}
+
+  '@protobufjs/inquire@1.1.0':
+    resolution: {integrity: sha512-kdSefcPdruJiFMVSbn801t4vFK7KB/5gd2fYvrxhuJYg8ILrmn9SKSX2tZdV6V+ksulWqS7aXjBcRXl3wHoD9Q==}
+
+  '@protobufjs/path@1.1.2':
+    resolution: {integrity: sha512-6JOcJ5Tm08dOHAbdR3GrvP+yUUfkjG5ePsHYczMFLq3ZmMkAD98cDgcT2iA1lJ9NVwFd4tH/iSSoe44YWkltEA==}
+
+  '@protobufjs/pool@1.1.0':
+    resolution: {integrity: sha512-0kELaGSIDBKvcgS4zkjz1PeddatrjYcmMWOlAuAPwAeccUrPHdUqo/J6LiymHHEiJT5NrF1UVwxY14f+fy4WQw==}
+
+  '@protobufjs/utf8@1.1.0':
+    resolution: {integrity: sha512-Vvn3zZrhQZkkBE8LSuW3em98c0FwgO4nxzv6OdSxPKJIEKY2bGbHn+mhGIPerzI4twdxaP8/0+06HBpwf345Lw==}
+
+  '@ricky0123/vad-web@0.0.22':
+    resolution: {integrity: sha512-679R6sfwXx4jkquK+FJ9RC2W29oulWC+9ZINK6LVpuy90IBV7UaTGNN79oQXufpJTJs5z4X/22nw1DQ4+Rh8CA==}
+
+  '@rollup/pluginutils@5.1.0':
+    resolution: {integrity: sha512-XTIWOPPcpvyKI6L1NHo0lFlCyznUEyPmPY1mc3KpPVDYulHSTvyeLNVW00QTLIAFNhR3kYnJTQHeGqU4M3n09g==}
+    engines: {node: '>=14.0.0'}
+    peerDependencies:
+      rollup: ^1.20.0||^2.0.0||^3.0.0||^4.0.0
+    peerDependenciesMeta:
+      rollup:
+        optional: true
+
+  '@rollup/rollup-android-arm-eabi@4.21.2':
+    resolution: {integrity: sha512-fSuPrt0ZO8uXeS+xP3b+yYTCBUd05MoSp2N/MFOgjhhUhMmchXlpTQrTpI8T+YAwAQuK7MafsCOxW7VrPMrJcg==}
+    cpu: [arm]
+    os: [android]
+
+  '@rollup/rollup-android-arm64@4.21.2':
+    resolution: {integrity: sha512-xGU5ZQmPlsjQS6tzTTGwMsnKUtu0WVbl0hYpTPauvbRAnmIvpInhJtgjj3mcuJpEiuUw4v1s4BimkdfDWlh7gA==}
+    cpu: [arm64]
+    os: [android]
+
+  '@rollup/rollup-darwin-arm64@4.21.2':
+    resolution: {integrity: sha512-99AhQ3/ZMxU7jw34Sq8brzXqWH/bMnf7ZVhvLk9QU2cOepbQSVTns6qoErJmSiAvU3InRqC2RRZ5ovh1KN0d0Q==}
+    cpu: [arm64]
+    os: [darwin]
+
+  '@rollup/rollup-darwin-x64@4.21.2':
+    resolution: {integrity: sha512-ZbRaUvw2iN/y37x6dY50D8m2BnDbBjlnMPotDi/qITMJ4sIxNY33HArjikDyakhSv0+ybdUxhWxE6kTI4oX26w==}
+    cpu: [x64]
+    os: [darwin]
+
+  '@rollup/rollup-linux-arm-gnueabihf@4.21.2':
+    resolution: {integrity: sha512-ztRJJMiE8nnU1YFcdbd9BcH6bGWG1z+jP+IPW2oDUAPxPjo9dverIOyXz76m6IPA6udEL12reYeLojzW2cYL7w==}
+    cpu: [arm]
+    os: [linux]
+    libc: [glibc]
+
+  '@rollup/rollup-linux-arm-musleabihf@4.21.2':
+    resolution: {integrity: sha512-flOcGHDZajGKYpLV0JNc0VFH361M7rnV1ee+NTeC/BQQ1/0pllYcFmxpagltANYt8FYf9+kL6RSk80Ziwyhr7w==}
+    cpu: [arm]
+    os: [linux]
+    libc: [musl]
+
+  '@rollup/rollup-linux-arm64-gnu@4.21.2':
+    resolution: {integrity: sha512-69CF19Kp3TdMopyteO/LJbWufOzqqXzkrv4L2sP8kfMaAQ6iwky7NoXTp7bD6/irKgknDKM0P9E/1l5XxVQAhw==}
+    cpu: [arm64]
+    os: [linux]
+    libc: [glibc]
+
+  '@rollup/rollup-linux-arm64-musl@4.21.2':
+    resolution: {integrity: sha512-48pD/fJkTiHAZTnZwR0VzHrao70/4MlzJrq0ZsILjLW/Ab/1XlVUStYyGt7tdyIiVSlGZbnliqmult/QGA2O2w==}
+    cpu: [arm64]
+    os: [linux]
+    libc: [musl]
+
+  '@rollup/rollup-linux-powerpc64le-gnu@4.21.2':
+    resolution: {integrity: sha512-cZdyuInj0ofc7mAQpKcPR2a2iu4YM4FQfuUzCVA2u4HI95lCwzjoPtdWjdpDKyHxI0UO82bLDoOaLfpZ/wviyQ==}
+    cpu: [ppc64]
+    os: [linux]
+    libc: [glibc]
+
+  '@rollup/rollup-linux-riscv64-gnu@4.21.2':
+    resolution: {integrity: sha512-RL56JMT6NwQ0lXIQmMIWr1SW28z4E4pOhRRNqwWZeXpRlykRIlEpSWdsgNWJbYBEWD84eocjSGDu/XxbYeCmwg==}
+    cpu: [riscv64]
+    os: [linux]
+    libc: [glibc]
+
+  '@rollup/rollup-linux-s390x-gnu@4.21.2':
+    resolution: {integrity: sha512-PMxkrWS9z38bCr3rWvDFVGD6sFeZJw4iQlhrup7ReGmfn7Oukrr/zweLhYX6v2/8J6Cep9IEA/SmjXjCmSbrMQ==}
+    cpu: [s390x]
+    os: [linux]
+    libc: [glibc]
+
+  '@rollup/rollup-linux-x64-gnu@4.21.2':
+    resolution: {integrity: sha512-B90tYAUoLhU22olrafY3JQCFLnT3NglazdwkHyxNDYF/zAxJt5fJUB/yBoWFoIQ7SQj+KLe3iL4BhOMa9fzgpw==}
+    cpu: [x64]
+    os: [linux]
+    libc: [glibc]
+
+  '@rollup/rollup-linux-x64-musl@4.21.2':
+    resolution: {integrity: sha512-7twFizNXudESmC9oneLGIUmoHiiLppz/Xs5uJQ4ShvE6234K0VB1/aJYU3f/4g7PhssLGKBVCC37uRkkOi8wjg==}
+    cpu: [x64]
+    os: [linux]
+    libc: [musl]
+
+  '@rollup/rollup-win32-arm64-msvc@4.21.2':
+    resolution: {integrity: sha512-9rRero0E7qTeYf6+rFh3AErTNU1VCQg2mn7CQcI44vNUWM9Ze7MSRS/9RFuSsox+vstRt97+x3sOhEey024FRQ==}
+    cpu: [arm64]
+    os: [win32]
+
+  '@rollup/rollup-win32-ia32-msvc@4.21.2':
+    resolution: {integrity: sha512-5rA4vjlqgrpbFVVHX3qkrCo/fZTj1q0Xxpg+Z7yIo3J2AilW7t2+n6Q8Jrx+4MrYpAnjttTYF8rr7bP46BPzRw==}
+    cpu: [ia32]
+    os: [win32]
+
+  '@rollup/rollup-win32-x64-msvc@4.21.2':
+    resolution: {integrity: sha512-6UUxd0+SKomjdzuAcp+HAmxw1FlGBnl1v2yEPSabtx4lBfdXHDVsW7+lQkgz9cNFJGY3AWR7+V8P5BqkD9L9nA==}
+    cpu: [x64]
+    os: [win32]
+
+  '@rushstack/eslint-patch@1.10.4':
+    resolution: {integrity: sha512-WJgX9nzTqknM393q1QJDJmoW28kUfEnybeTfVNcNAPnIx210RXm2DiXiHzfNPJNIUUb1tJnz/l4QGtJ30PgWmA==}
+
+  '@sxzz/popperjs-es@2.11.7':
+    resolution: {integrity: sha512-Ccy0NlLkzr0Ex2FKvh2X+OyERHXJ88XJ1MXtsI9y9fGexlaXaVTPzBCRBwIxFkORuOb+uBqeu+RqnpgYTEZRUQ==}
+
+  '@types/estree@1.0.5':
+    resolution: {integrity: sha512-/kYRxGDLWzHOB7q+wtSUQlFrtcdUccpfy+X+9iMBpHK8QLLhx2wIPYuS5DYtR9Wa/YlZAbIovy7qVdB1Aq6Lyw==}
+
+  '@types/lodash-es@4.17.12':
+    resolution: {integrity: sha512-0NgftHUcV4v34VhXm8QBSftKVXtbkBG3ViCjs6+eJ5a6y6Mi/jiFGPc1sC7QK+9BFhWrURE3EOggmWaSxL9OzQ==}
+
+  '@types/lodash@4.17.7':
+    resolution: {integrity: sha512-8wTvZawATi/lsmNu10/j2hk1KEP0IvjubqPE3cu1Xz7xfXXt5oCq3SNUz4fMIP4XGF9Ky+Ue2tBA3hcS7LSBlA==}
+
+  '@types/long@4.0.2':
+    resolution: {integrity: sha512-MqTGEo5bj5t157U6fA/BiDynNkn0YknVdh48CMPkTSpFTVmvao5UQmm7uEF6xBEo7qIMAlY/JSleYaE6VOdpaA==}
+
+  '@types/node@22.10.3':
+    resolution: {integrity: sha512-DifAyw4BkrufCILvD3ucnuN8eydUfc/C1GlyrnI+LK6543w5/L3VeVgf05o3B4fqSXP1dKYLOZsKfutpxPzZrw==}
+
+  '@types/web-bluetooth@0.0.16':
+    resolution: {integrity: sha512-oh8q2Zc32S6gd/j50GowEjKLoOVOwHP/bWVjKJInBwQqdOYMdPrf1oVlelTlyfFK3CKxL1uahMDAr+vy8T7yMQ==}
+
+  '@types/web-bluetooth@0.0.20':
+    resolution: {integrity: sha512-g9gZnnXVq7gM7v3tJCWV/qw7w+KeOlSHAhgF9RytFyifW6AF61hdT2ucrYhPq9hLs5JIryeupHV3qGk95dH9ow==}
+
+  '@ungap/structured-clone@1.2.0':
+    resolution: {integrity: sha512-zuVdFrMJiuCDQUMCzQaD6KL28MjnqqN8XnAqiEq9PNm/hCPTSGfrXCOfwj1ow4LFb/tNymJPwsNbVePc1xFqrQ==}
+
+  '@vitejs/plugin-vue@5.1.3':
+    resolution: {integrity: sha512-3xbWsKEKXYlmX82aOHufFQVnkbMC/v8fLpWwh6hWOUrK5fbbtBh9Q/WWse27BFgSy2/e2c0fz5Scgya9h2GLhw==}
+    engines: {node: ^18.0.0 || >=20.0.0}
+    peerDependencies:
+      vite: ^5.0.0
+      vue: ^3.2.25
+
+  '@vue/babel-helper-vue-transform-on@1.2.2':
+    resolution: {integrity: sha512-nOttamHUR3YzdEqdM/XXDyCSdxMA9VizUKoroLX6yTyRtggzQMHXcmwh8a7ZErcJttIBIc9s68a1B8GZ+Dmvsw==}
+
+  '@vue/babel-plugin-jsx@1.2.2':
+    resolution: {integrity: sha512-nYTkZUVTu4nhP199UoORePsql0l+wj7v/oyQjtThUVhJl1U+6qHuoVhIvR3bf7eVKjbCK+Cs2AWd7mi9Mpz9rA==}
+    peerDependencies:
+      '@babel/core': ^7.0.0-0
+    peerDependenciesMeta:
+      '@babel/core':
+        optional: true
+
+  '@vue/babel-plugin-resolve-type@1.2.2':
+    resolution: {integrity: sha512-EntyroPwNg5IPVdUJupqs0CFzuf6lUrVvCspmv2J1FITLeGnUCuoGNNk78dgCusxEiYj6RMkTJflGSxk5aIC4A==}
+    peerDependencies:
+      '@babel/core': ^7.0.0-0
+
+  '@vue/compiler-core@3.5.0':
+    resolution: {integrity: sha512-ja7cpqAOfw4tyFAxgBz70Z42miNDeaqTxExTsnXDLomRpqfyCgyvZvFp482fmsElpfvsoMJUsvzULhvxUTW6Iw==}
+
+  '@vue/compiler-dom@3.5.0':
+    resolution: {integrity: sha512-xYjUybWZXl+1R/toDy815i4PbeehL2hThiSGkcpmIOCy2HoYyeeC/gAWK/Y/xsoK+GSw198/T5O31bYuQx5uvQ==}
+
+  '@vue/compiler-sfc@3.5.0':
+    resolution: {integrity: sha512-B9DgLtrqok2GLuaFjLlSL15ZG3ZDBiitUH1ecex9guh/ZcA5MCdwuVE6nsfQxktuZY/QY0awJ35/ripIviCQTQ==}
+
+  '@vue/compiler-ssr@3.5.0':
+    resolution: {integrity: sha512-E263QZmA1dqRd7c3u/sWTLRMpQOT0aZ8av/L9SoD/v/BVMZaWFHPUUBswS+bzrfvG2suJF8vSLKx6k6ba5SUdA==}
+
+  '@vue/devtools-api@6.6.3':
+    resolution: {integrity: sha512-0MiMsFma/HqA6g3KLKn+AGpL1kgKhFWszC9U29NfpWK5LE7bjeXxySWJrOJ77hBz+TBrBQ7o4QJqbPbqbs8rJw==}
+
+  '@vue/devtools-core@7.4.0':
+    resolution: {integrity: sha512-FqVAUrxpXsj3q8TreNLkT5ZtKg7uFWmi6nBtbuCKEb19s3W9qHdFHdFI0op2djdUEVUnanZHYIKWRmllumNW/w==}
+    peerDependencies:
+      vue: ^3.0.0
+
+  '@vue/devtools-kit@7.4.0':
+    resolution: {integrity: sha512-s89qhhMtlKUIgivToI9g8BK43wD0+5M5IyCkp7OF1yk2ImcZimNeBEezfPST6JGYpwwI9OUFFa8QuMh2Q2DM1Q==}
+
+  '@vue/devtools-shared@7.4.0':
+    resolution: {integrity: sha512-LpHkjzUlbPHSH6qaCVSyfQDaF8fZwFbEDbHrtAGA9K1/yEZn99zYvXXqE4e5IQCk8GBXiVJo9/bn7vBXJQIIGA==}
+
+  '@vue/eslint-config-prettier@9.0.0':
+    resolution: {integrity: sha512-z1ZIAAUS9pKzo/ANEfd2sO+v2IUalz7cM/cTLOZ7vRFOPk5/xuRKQteOu1DErFLAh/lYGXMVZ0IfYKlyInuDVg==}
+    peerDependencies:
+      eslint: '>= 8.0.0'
+      prettier: '>= 3.0.0'
+
+  '@vue/reactivity@3.5.0':
+    resolution: {integrity: sha512-Ew3F5riP3B3ZDGjD3ZKb9uZylTTPSqt8hAf4sGbvbjrjDjrFb3Jm15Tk1/w7WwTE5GbQ2Qhwxx1moc9hr8A/OQ==}
+
+  '@vue/runtime-core@3.5.0':
+    resolution: {integrity: sha512-mQyW0F9FaNRdt8ghkAs+BMG3iQ7LGgWKOpkzUzR5AI5swPNydHGL5hvVTqFaeMzwecF1g0c86H4yFQsSxJhH1w==}
+
+  '@vue/runtime-dom@3.5.0':
+    resolution: {integrity: sha512-NQQXjpdXgyYVJ2M56FJ+lSJgZiecgQ2HhxhnQBN95FymXegRNY/N2htI7vOTwpP75pfxhIeYOJ8mE8sW8KAW6A==}
+
+  '@vue/server-renderer@3.5.0':
+    resolution: {integrity: sha512-HyDIFUg+l7L4PKrEnJlCYWHUOlm6NxZhmSxIefZ5MTYjkIPfDfkwhX7hqxAQHfgIAE1uLMLQZwuNR/ozI0NhZg==}
+    peerDependencies:
+      vue: 3.5.0
+
+  '@vue/shared@3.5.0':
+    resolution: {integrity: sha512-m9IgiteBpCkFaMNwCOBkFksA7z8QiKc30ooRuoXWUFRDu0mGyNPlFHmbncF0/Kra1RlX8QrmBbRaIxVvikaR0Q==}
+
+  '@vueuse/core@11.0.3':
+    resolution: {integrity: sha512-RENlh64+SYA9XMExmmH1a3TPqeIuJBNNB/63GT35MZI+zpru3oMRUA6cEFr9HmGqEgUisurwGwnIieF6qu3aXw==}
+
+  '@vueuse/core@9.13.0':
+    resolution: {integrity: sha512-pujnclbeHWxxPRqXWmdkKV5OX4Wk4YeK7wusHqRwU0Q7EFusHoqNA/aPhB6KCh9hEqJkLAJo7bb0Lh9b+OIVzw==}
+
+  '@vueuse/metadata@11.0.3':
+    resolution: {integrity: sha512-+FtbO4SD5WpsOcQTcC0hAhNlOid6QNLzqedtquTtQ+CRNBoAt9GuV07c6KNHK1wCmlq8DFPwgiLF2rXwgSHX5Q==}
+
+  '@vueuse/metadata@9.13.0':
+    resolution: {integrity: sha512-gdU7TKNAUVlXXLbaF+ZCfte8BjRJQWPCa2J55+7/h+yDtzw3vOoGQDRXzI6pyKyo6bXFT5/QoPE4hAknExjRLQ==}
+
+  '@vueuse/shared@11.0.3':
+    resolution: {integrity: sha512-0rY2m6HS5t27n/Vp5cTDsKTlNnimCqsbh/fmT2LgE+aaU42EMfXo8+bNX91W9I7DDmxfuACXMmrd7d79JxkqWA==}
+
+  '@vueuse/shared@9.13.0':
+    resolution: {integrity: sha512-UrnhU+Cnufu4S6JLCPZnkWh0WwZGUp72ktOF2DFptMlOs3TOdVv8xJN53zhHGARmVOsz5KqOls09+J1NR6sBKw==}
+
+  acorn-jsx@5.3.2:
+    resolution: {integrity: sha512-rq9s+JNhf0IChjtDXxllJ7g41oZk5SlXtp0LHwyA5cejwn7vKmKp4pPri6YEePv2PU65sAsegbXtIinmDFDXgQ==}
+    peerDependencies:
+      acorn: ^6.0.0 || ^7.0.0 || ^8.0.0
+
+  acorn@8.12.1:
+    resolution: {integrity: sha512-tcpGyI9zbizT9JbV6oYE477V6mTlXvvi0T0G3SNIYE2apm/G5huBa1+K89VGeovbg+jycCrfhl3ADxErOuO6Jg==}
+    engines: {node: '>=0.4.0'}
+    hasBin: true
+
+  ajv@6.12.6:
+    resolution: {integrity: sha512-j3fVLgvTo527anyYyJOGTYJbG+vnnQYvE0m5mmkc1TK+nxAppkCLMIL0aZ4dblVCNoGShhm+kzE4ZUykBoMg4g==}
+
+  ansi-regex@5.0.1:
+    resolution: {integrity: sha512-quJQXlTSUGL2LH9SUXo8VwsY4soanhgo6LNSm84E1LBcE8s3O0wpdiRzyR9z/ZZJMlMWv37qOOb9pdJlMUEKFQ==}
+    engines: {node: '>=8'}
+
+  ansi-styles@3.2.1:
+    resolution: {integrity: sha512-VT0ZI6kZRdTh8YyJw3SMbYm/u+NqfsAxEpWO0Pf9sq8/e94WxxOpPKx9FR1FlyCtOVDNOQ+8ntlqFxiRc+r5qA==}
+    engines: {node: '>=4'}
+
+  ansi-styles@4.3.0:
+    resolution: {integrity: sha512-zbB9rCJAT1rbjiVDb2hqKFHNYLxgtk8NURxZ3IZwD3F6NtxbXZQCnnSi1Lkx+IDohdPlFp222wVALIheZJQSEg==}
+    engines: {node: '>=8'}
+
+  anymatch@3.1.3:
+    resolution: {integrity: sha512-KMReFUr0B4t+D+OBkjR3KYqvocp2XaSzO55UcB6mgQMd3KbcE+mWTyvVV7D/zsdEbNnV6acZUutkiHQXvTr1Rw==}
+    engines: {node: '>= 8'}
+
+  argparse@2.0.1:
+    resolution: {integrity: sha512-8+9WqebbFzpX9OR+Wa6O29asIogeRMzcGtAINdpMHHyAg10f05aSFVBbcEqGf/PXw1EjAZ+q2/bEBg3DvurK3Q==}
+
+  async-validator@4.2.5:
+    resolution: {integrity: sha512-7HhHjtERjqlNbZtqNqy2rckN/SpOOlmDliet+lP7k+eKZEjPk3DgyeU9lIXLdeLz0uBbbVp+9Qdow9wJWgwwfg==}
+
+  asynckit@0.4.0:
+    resolution: {integrity: sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q==}
+
+  axios@1.7.7:
+    resolution: {integrity: sha512-S4kL7XrjgBmvdGut0sN3yJxqYzrDOnivkBiN0OFs6hLiUam3UPvswUo0kqGyhqUZGEOytHyumEdXsAkgCOUf3Q==}
+
+  balanced-match@1.0.2:
+    resolution: {integrity: sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw==}
+
+  binary-extensions@2.3.0:
+    resolution: {integrity: sha512-Ceh+7ox5qe7LJuLHoY0feh3pHuUDHAcRUeyL2VYghZwfpkNIy/+8Ocg0a3UuSoYzavmylwuLWQOf3hl0jjMMIw==}
+    engines: {node: '>=8'}
+
+  birpc@0.2.17:
+    resolution: {integrity: sha512-+hkTxhot+dWsLpp3gia5AkVHIsKlZybNT5gIYiDlNzJrmYPcTM9k5/w2uaj3IPpd7LlEYpmCj4Jj1nC41VhDFg==}
+
+  boolbase@1.0.0:
+    resolution: {integrity: sha512-JZOSA7Mo9sNGB8+UjSgzdLtokWAky1zbztM3WRLCbZ70/3cTANmQmOdR7y2g+J0e2WXywy1yS468tY+IruqEww==}
+
+  brace-expansion@1.1.11:
+    resolution: {integrity: sha512-iCuPHDFgrHX7H2vEI/5xpz07zSHB00TpugqhmYtVmMO6518mCuRMoOYFldEBl0g187ufozdaHgWKcYFb61qGiA==}
+
+  brace-expansion@2.0.1:
+    resolution: {integrity: sha512-XnAIvQ8eM+kC6aULx6wuQiwVsnzsi9d3WxzV3FpWTGA19F621kwdbsAcFKXgKUHZWsy+mY6iL1sHTxWEFCytDA==}
+
+  braces@3.0.3:
+    resolution: {integrity: sha512-yQbXgO/OSZVD2IsiLlro+7Hf6Q18EJrKSEsdoMzKePKXct3gvD8oLcOQdIzGupr5Fj+EDe8gO/lxc1BzfMpxvA==}
+    engines: {node: '>=8'}
+
+  browserslist@4.23.3:
+    resolution: {integrity: sha512-btwCFJVjI4YWDNfau8RhZ+B1Q/VLoUITrm3RlP6y1tYGWIOa+InuYiRGXUBXo8nA1qKmHMyLB/iVQg5TT4eFoA==}
+    engines: {node: ^6 || ^7 || ^8 || ^9 || ^10 || ^11 || ^12 || >=13.7}
+    hasBin: true
+
+  bundle-name@4.1.0:
+    resolution: {integrity: sha512-tjwM5exMg6BGRI+kNmTntNsvdZS1X8BFYS6tnJ2hdH0kVxM6/eVZ2xy+FqStSWvYmtfFMDLIxurorHwDKfDz5Q==}
+    engines: {node: '>=18'}
+
+  callsites@3.1.0:
+    resolution: {integrity: sha512-P8BjAsXvZS+VIDUI11hHCQEv74YT67YUi5JJFNWIqL235sBmjX4+qx9Muvls5ivyNENctx46xQLQ3aTuE7ssaQ==}
+    engines: {node: '>=6'}
+
+  camelcase@6.3.0:
+    resolution: {integrity: sha512-Gmy6FhYlCY7uOElZUSbxo2UCDH8owEk996gkbrpsgGtrJLM3J7jGxl9Ic7Qwwj4ivOE5AWZWRMecDdF7hqGjFA==}
+    engines: {node: '>=10'}
+
+  caniuse-lite@1.0.30001655:
+    resolution: {integrity: sha512-jRGVy3iSGO5Uutn2owlb5gR6qsGngTw9ZTb4ali9f3glshcNmJ2noam4Mo9zia5P9Dk3jNNydy7vQjuE5dQmfg==}
+
+  chalk@2.4.2:
+    resolution: {integrity: sha512-Mti+f9lpJNcwF4tWV8/OrTTtF1gZi+f8FqlyAdouralcFWFQWF2+NgCHShjkCb+IFBLq9buZwE1xckQU4peSuQ==}
+    engines: {node: '>=4'}
+
+  chalk@4.1.2:
+    resolution: {integrity: sha512-oKnbhFyRIXpUuez8iBMmyEa4nbj4IOQyuhc/wy9kY7/WVPcwIO9VA668Pu8RkO7+0G76SLROeyw9CpQ061i4mA==}
+    engines: {node: '>=10'}
+
+  chokidar@3.6.0:
+    resolution: {integrity: sha512-7VT13fmjotKpGipCW9JEQAusEPE+Ei8nl6/g4FBAmIm0GOOLMua9NDDo/DWp0ZAxCr3cPq5ZpBqmPAQgDda2Pw==}
+    engines: {node: '>= 8.10.0'}
+
+  clipboard@2.0.11:
+    resolution: {integrity: sha512-C+0bbOqkezLIsmWSvlsXS0Q0bmkugu7jcfMIACB+RDEntIzQIkdr148we28AfSloQLRdZlYL/QYyrq05j/3Faw==}
+
+  color-convert@1.9.3:
+    resolution: {integrity: sha512-QfAUtd+vFdAtFQcC8CCyYt1fYWxSqAiK2cSD6zDB8N3cpsEBAvRxp9zOGg6G/SHHJYAT88/az/IuDGALsNVbGg==}
+
+  color-convert@2.0.1:
+    resolution: {integrity: sha512-RRECPsj7iu/xb5oKYcsFHSppFNnsj/52OVTRKb4zP5onXwVF3zVmmToNcOfGC+CRDpfK/U584fMg38ZHCaElKQ==}
+    engines: {node: '>=7.0.0'}
+
+  color-name@1.1.3:
+    resolution: {integrity: sha512-72fSenhMw2HZMTVHeCA9KCmpEIbzWiQsjN+BHcBbS9vr1mtt+vJjPdksIBNUmKAW8TFUDPJK5SUU3QhE9NEXDw==}
+
+  color-name@1.1.4:
+    resolution: {integrity: sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA==}
+
+  combined-stream@1.0.8:
+    resolution: {integrity: sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg==}
+    engines: {node: '>= 0.8'}
+
+  concat-map@0.0.1:
+    resolution: {integrity: sha512-/Srv4dswyQNBfohGpz9o6Yb3Gz3SrUDqBH5rTuhGR7ahtlbYKnVxw2bCFMRljaA7EXHaXZ8wsHdodFvbkhKmqg==}
+
+  confbox@0.1.7:
+    resolution: {integrity: sha512-uJcB/FKZtBMCJpK8MQji6bJHgu1tixKPxRLeGkNzBoOZzpnZUJm0jm2/sBDWcuBx1dYgxV4JU+g5hmNxCyAmdA==}
+
+  convert-source-map@2.0.0:
+    resolution: {integrity: sha512-Kvp459HrV2FEJ1CAsi1Ku+MY3kasH19TFykTz2xWmMeq6bk2NU3XXvfJ+Q61m0xktWwt+1HSYf3JZsTms3aRJg==}
+
+  copy-anything@2.0.6:
+    resolution: {integrity: sha512-1j20GZTsvKNkc4BY3NpMOM8tt///wY3FpIzozTOFO2ffuZcV61nojHXVKIy3WM+7ADCy5FVhdZYHYDdgTU0yJw==}
+
+  copy-anything@3.0.5:
+    resolution: {integrity: sha512-yCEafptTtb4bk7GLEQoM8KVJpxAfdBJYaXyzQEgQQQgYrZiDp8SJmGKlYza6CYjEDNstAdNdKA3UuoULlEbS6w==}
+    engines: {node: '>=12.13'}
+
+  core-js@3.38.1:
+    resolution: {integrity: sha512-OP35aUorbU3Zvlx7pjsFdu1rGNnD4pgw/CWoYzRY3t2EzoVT7shKHY1dlAy3f41cGIO7ZDPQimhGFTlEYkG/Hw==}
+
+  cross-spawn@7.0.3:
+    resolution: {integrity: sha512-iRDPJKUPVEND7dHPO8rkbOnPpyDygcDFtWjpeWNCgy8WP2rXcxXL8TskReQl6OrB2G7+UJrags1q15Fudc7G6w==}
+    engines: {node: '>= 8'}
+
+  cssesc@3.0.0:
+    resolution: {integrity: sha512-/Tb/JcjK111nNScGob5MNtsntNM1aCNUDipB/TkwZFhyDrrE47SOx/18wF2bbjgc3ZzCSKW1T5nt5EbFoAz/Vg==}
+    engines: {node: '>=4'}
+    hasBin: true
+
+  csstype@3.1.3:
+    resolution: {integrity: sha512-M1uQkMl8rQK/szD0LNhtqxIPLpimGm8sOBwU7lLnCpSbTyY3yeU1Vc7l4KT5zT4s/yOxHH5O7tIuuLOCnLADRw==}
+
+  dayjs@1.11.13:
+    resolution: {integrity: sha512-oaMBel6gjolK862uaPQOVTA7q3TZhuSvuMQAAglQDOWYO9A91IrAOUJEyKVlqJlHE0vq5p5UXxzdPfMH/x6xNg==}
+
+  debug@4.3.6:
+    resolution: {integrity: sha512-O/09Bd4Z1fBrU4VzkhFqVgpPzaGbw6Sm9FEkBT1A/YBXQFGuuSxa1dN2nxgxS34JmKXqYx8CZAwEVoJFImUXIg==}
+    engines: {node: '>=6.0'}
+    peerDependencies:
+      supports-color: '*'
+    peerDependenciesMeta:
+      supports-color:
+        optional: true
+
+  deep-is@0.1.4:
+    resolution: {integrity: sha512-oIPzksmTg4/MriiaYGO+okXDT7ztn/w3Eptv/+gSIdMdKsJo0u4CfYNFJPy+4SKMuCqGw2wxnA+URMg3t8a/bQ==}
+
+  default-browser-id@5.0.0:
+    resolution: {integrity: sha512-A6p/pu/6fyBcA1TRz/GqWYPViplrftcW2gZC9q79ngNCKAeR/X3gcEdXQHl4KNXV+3wgIJ1CPkJQ3IHM6lcsyA==}
+    engines: {node: '>=18'}
+
+  default-browser@5.2.1:
+    resolution: {integrity: sha512-WY/3TUME0x3KPYdRRxEJJvXRHV4PyPoUsxtZa78lwItwRQRHhd2U9xOscaT/YTf8uCXIAjeJOFBVEh/7FtD8Xg==}
+    engines: {node: '>=18'}
+
+  define-lazy-prop@3.0.0:
+    resolution: {integrity: sha512-N+MeXYoqr3pOgn8xfyRPREN7gHakLYjhsHhWGT3fWAiL4IkAt0iDw14QiiEm2bE30c5XX5q0FtAA3CK5f9/BUg==}
+    engines: {node: '>=12'}
+
+  delayed-stream@1.0.0:
+    resolution: {integrity: sha512-ZySD7Nf91aLB0RxL4KGrKHBXl7Eds1DAmEdcoVawXnLD7SDhpNgtuII2aAkg7a7QS41jxPSZ17p4VdGnMHk3MQ==}
+    engines: {node: '>=0.4.0'}
+
+  delegate@3.2.0:
+    resolution: {integrity: sha512-IofjkYBZaZivn0V8nnsMJGBr4jVLxHDheKSW88PyxS5QC4Vo9ZbZVvhzlSxY87fVq3STR6r+4cGepyHkcWOQSw==}
+
+  doctrine@3.0.0:
+    resolution: {integrity: sha512-yS+Q5i3hBf7GBkd4KG8a7eBNNWNGLTaEwwYWUijIYM7zrlYDM0BFXHjjPWlWZ1Rg7UaddZeIDmi9jF3HmqiQ2w==}
+    engines: {node: '>=6.0.0'}
+
+  el-table-infinite-scroll@3.0.6:
+    resolution: {integrity: sha512-rdrEBcSMYpkD0s0jl28KcGZpiIbWzVR2OAf7hBB+c+c08G89jb9d6rOn+y2DuhE1iI1C0pEOlC5/lrP3QceOXg==}
+
+  electron-to-chromium@1.5.13:
+    resolution: {integrity: sha512-lbBcvtIJ4J6sS4tb5TLp1b4LyfCdMkwStzXPyAgVgTRAsep4bvrAGaBOP7ZJtQMNJpSQ9SqG4brWOroNaQtm7Q==}
+
+  element-plus@2.8.1:
+    resolution: {integrity: sha512-p11/6w/O0+hGvPhiN3jrcgh+XG+eg5jZlLdQVYvcPHZYhhCh3J3YeZWW1JO/REPES1vevkboT6VAi+9wHA8Dsg==}
+    peerDependencies:
+      vue: ^3.2.0
+
+  entities@4.5.0:
+    resolution: {integrity: sha512-V0hjH4dGPh9Ao5p0MoRY6BVqtwCjhz6vI5LT8AJ55H+4g9/4vbHx1I54fS0XuclLhDHArPQCiMjDxjaL8fPxhw==}
+    engines: {node: '>=0.12'}
+
+  errno@0.1.8:
+    resolution: {integrity: sha512-dJ6oBr5SQ1VSd9qkk7ByRgb/1SH4JZjCHSW/mr63/QcXO9zLVxvJ6Oy13nio03rxpSnVDDjFor75SjVeZWPW/A==}
+    hasBin: true
+
+  error-stack-parser-es@0.1.5:
+    resolution: {integrity: sha512-xHku1X40RO+fO8yJ8Wh2f2rZWVjqyhb1zgq1yZ8aZRQkv6OOKhKWRUaht3eSCUbAOBaKIgM+ykwFLE+QUxgGeg==}
+
+  esbuild@0.21.5:
+    resolution: {integrity: sha512-mg3OPMV4hXywwpoDxu3Qda5xCKQi+vCTZq8S9J/EpkhB2HzKXq4SNFZE3+NK93JYxc8VMSep+lOUSC/RVKaBqw==}
+    engines: {node: '>=12'}
+    hasBin: true
+
+  escalade@3.2.0:
+    resolution: {integrity: sha512-WUj2qlxaQtO4g6Pq5c29GTcWGDyd8itL8zTlipgECz3JesAiiOKotd8JU6otB3PACgG6xkJUyVhboMS+bje/jA==}
+    engines: {node: '>=6'}
+
+  escape-html@1.0.3:
+    resolution: {integrity: sha512-NiSupZ4OeuGwr68lGIeym/ksIZMJodUGOSCZ/FSnTxcrekbvqrgdUxlJOMpijaKZVjAJrWrGs/6Jy8OMuyj9ow==}
+
+  escape-string-regexp@1.0.5:
+    resolution: {integrity: sha512-vbRorB5FUQWvla16U8R/qgaFIya2qGzwDrNmCZuYKrbdSUMG6I1ZCGQRefkRVhuOkIGVne7BQ35DSfo1qvJqFg==}
+    engines: {node: '>=0.8.0'}
+
+  escape-string-regexp@4.0.0:
+    resolution: {integrity: sha512-TtpcNJ3XAzx3Gq8sWRzJaVajRs0uVxA2YAkdb1jm2YkPz4G6egUFAyA3n5vtEIZefPk5Wa4UXbKuS5fKkJWdgA==}
+    engines: {node: '>=10'}
+
+  escape-string-regexp@5.0.0:
+    resolution: {integrity: sha512-/veY75JbMK4j1yjvuUxuVsiS/hr/4iHs9FTT6cgTexxdE0Ly/glccBAkloH/DofkjRbZU3bnoj38mOmhkZ0lHw==}
+    engines: {node: '>=12'}
+
+  eslint-config-prettier@9.1.0:
+    resolution: {integrity: sha512-NSWl5BFQWEPi1j4TjVNItzYV7dZXZ+wP6I6ZhrBGpChQhZRUaElihE9uRRkcbRnNb76UMKDF3r+WTmNcGPKsqw==}
+    hasBin: true
+    peerDependencies:
+      eslint: '>=7.0.0'
+
+  eslint-plugin-prettier@5.2.1:
+    resolution: {integrity: sha512-gH3iR3g4JfF+yYPaJYkN7jEl9QbweL/YfkoRlNnuIEHEz1vHVlCmWOS+eGGiRuzHQXdJFCOTxRgvju9b8VUmrw==}
+    engines: {node: ^14.18.0 || >=16.0.0}
+    peerDependencies:
+      '@types/eslint': '>=8.0.0'
+      eslint: '>=8.0.0'
+      eslint-config-prettier: '*'
+      prettier: '>=3.0.0'
+    peerDependenciesMeta:
+      '@types/eslint':
+        optional: true
+      eslint-config-prettier:
+        optional: true
+
+  eslint-plugin-vue@9.28.0:
+    resolution: {integrity: sha512-ShrihdjIhOTxs+MfWun6oJWuk+g/LAhN+CiuOl/jjkG3l0F2AuK5NMTaWqyvBgkFtpYmyks6P4603mLmhNJW8g==}
+    engines: {node: ^14.17.0 || >=16.0.0}
+    peerDependencies:
+      eslint: ^6.2.0 || ^7.0.0 || ^8.0.0 || ^9.0.0
+
+  eslint-scope@7.2.2:
+    resolution: {integrity: sha512-dOt21O7lTMhDM+X9mB4GX+DZrZtCUJPL/wlcTqxyrx5IvO0IYtILdtrQGQp+8n5S0gwSVmOf9NQrjMOgfQZlIg==}
+    engines: {node: ^12.22.0 || ^14.17.0 || >=16.0.0}
+
+  eslint-visitor-keys@3.4.3:
+    resolution: {integrity: sha512-wpc+LXeiyiisxPlEkUzU6svyS1frIO3Mgxj1fdy7Pm8Ygzguax2N3Fa/D/ag1WqbOprdI+uY6wMUl8/a2G+iag==}
+    engines: {node: ^12.22.0 || ^14.17.0 || >=16.0.0}
+
+  eslint@8.57.0:
+    resolution: {integrity: sha512-dZ6+mexnaTIbSBZWgou51U6OmzIhYM2VcNdtiTtI7qPNZm35Akpr0f6vtw3w1Kmn5PYo+tZVfh13WrhpS6oLqQ==}
+    engines: {node: ^12.22.0 || ^14.17.0 || >=16.0.0}
+    hasBin: true
+
+  espree@9.6.1:
+    resolution: {integrity: sha512-oruZaFkjorTpF32kDSI5/75ViwGeZginGGy2NoOSg3Q9bnwlnmDm4HLnkl0RE3n+njDXR037aY1+x58Z/zFdwQ==}
+    engines: {node: ^12.22.0 || ^14.17.0 || >=16.0.0}
+
+  esquery@1.6.0:
+    resolution: {integrity: sha512-ca9pw9fomFcKPvFLXhBKUK90ZvGibiGOvRJNbjljY7s7uq/5YO4BOzcYtJqExdx99rF6aAcnRxHmcUHcz6sQsg==}
+    engines: {node: '>=0.10'}
+
+  esrecurse@4.3.0:
+    resolution: {integrity: sha512-KmfKL3b6G+RXvP8N1vr3Tq1kL/oCFgn2NYXEtqP8/L3pKapUA4G8cFVaoF3SU323CD4XypR/ffioHmkti6/Tag==}
+    engines: {node: '>=4.0'}
+
+  estraverse@5.3.0:
+    resolution: {integrity: sha512-MMdARuVEQziNTeJD8DgMqmhwR11BRQ/cBP+pLtYdSTnf3MIO8fFeiINEbX36ZdNlfU/7A9f3gUw49B3oQsvwBA==}
+    engines: {node: '>=4.0'}
+
+  estree-walker@2.0.2:
+    resolution: {integrity: sha512-Rfkk/Mp/DL7JVje3u18FxFujQlTNR2q6QfMSMB7AvCBx91NGj/ba3kCfza0f6dVDbw7YlRf/nDrn7pQrCCyQ/w==}
+
+  estree-walker@3.0.3:
+    resolution: {integrity: sha512-7RUKfXgSMMkzt6ZuXmqapOurLGPPfgj6l9uRZ7lRGolvk0y2yocc35LdcxKC5PQZdn2DMqioAQ2NoWcrTKmm6g==}
+
+  esutils@2.0.3:
+    resolution: {integrity: sha512-kVscqXk4OCp68SZ0dkgEKVi6/8ij300KBWTJq32P/dYeWTSwK41WyTxalN1eRmA5Z9UU/LX9D7FWSmV9SAYx6g==}
+    engines: {node: '>=0.10.0'}
+
+  execa@8.0.1:
+    resolution: {integrity: sha512-VyhnebXciFV2DESc+p6B+y0LjSm0krU4OgJN44qFAhBY0TJ+1V61tYD2+wHusZ6F9n5K+vl8k0sTy7PEfV4qpg==}
+    engines: {node: '>=16.17'}
+
+  fast-deep-equal@3.1.3:
+    resolution: {integrity: sha512-f3qQ9oQy9j2AhBe/H9VC91wLmKBCCU/gDOnKNAYG5hswO7BLKj09Hc5HYNz9cGI++xlpDCIgDaitVs03ATR84Q==}
+
+  fast-diff@1.3.0:
+    resolution: {integrity: sha512-VxPP4NqbUjj6MaAOafWeUn2cXWLcCtljklUtZf0Ind4XQ+QPtmA0b18zZy0jIQx+ExRVCR/ZQpBmik5lXshNsw==}
+
+  fast-glob@3.3.2:
+    resolution: {integrity: sha512-oX2ruAFQwf/Orj8m737Y5adxDQO0LAB7/S5MnxCdTNDd4p6BsyIVsv9JQsATbTSq8KHRpLwIHbVlUNatxd+1Ow==}
+    engines: {node: '>=8.6.0'}
+
+  fast-json-stable-stringify@2.1.0:
+    resolution: {integrity: sha512-lhd/wF+Lk98HZoTCtlVraHtfh5XYijIjalXck7saUtuanSDyLMxnHhSXEDJqHxD7msR8D0uCmqlkwjCV8xvwHw==}
+
+  fast-levenshtein@2.0.6:
+    resolution: {integrity: sha512-DCXu6Ifhqcks7TZKY3Hxp3y6qphY5SJZmrWMDrKcERSOXWQdMhU9Ig/PYrzyw/ul9jOIyh0N4M0tbC5hodg8dw==}
+
+  fastq@1.17.1:
+    resolution: {integrity: sha512-sRVD3lWVIXWg6By68ZN7vho9a1pQcN/WBFaAAsDDFzlJjvoGx0P8z7V1t72grFJfJhu3YPZBuu25f7Kaw2jN1w==}
+
+  file-entry-cache@6.0.1:
+    resolution: {integrity: sha512-7Gps/XWymbLk2QLYK4NzpMOrYjMhdIxXuIvy2QBsLE6ljuodKvdkWs/cpyJJ3CVIVpH0Oi1Hvg1ovbMzLdFBBg==}
+    engines: {node: ^10.12.0 || >=12.0.0}
+
+  fill-range@7.1.1:
+    resolution: {integrity: sha512-YsGpe3WHLK8ZYi4tWDg2Jy3ebRz2rXowDxnld4bkQB00cc/1Zw9AWnC0i9ztDJitivtQvaI9KaLyKrc+hBW0yg==}
+    engines: {node: '>=8'}
+
+  find-up@5.0.0:
+    resolution: {integrity: sha512-78/PXT1wlLLDgTzDs7sjq9hzz0vXD+zn+7wypEe4fXQxCmdmqfGsEPQxmiCSQI3ajFV91bVSsvNtrJRiW6nGng==}
+    engines: {node: '>=10'}
+
+  flat-cache@3.2.0:
+    resolution: {integrity: sha512-CYcENa+FtcUKLmhhqyctpclsq7QF38pKjZHsGNiSQF5r4FtoKDWabFDl3hzaEQMvT1LHEysw5twgLvpYYb4vbw==}
+    engines: {node: ^10.12.0 || >=12.0.0}
+
+  flatbuffers@1.12.0:
+    resolution: {integrity: sha512-c7CZADjRcl6j0PlvFy0ZqXQ67qSEZfrVPynmnL+2zPc+NtMvrF8Y0QceMo7QqnSPc7+uWjUIAbvCQ5WIKlMVdQ==}
+
+  flatted@3.3.1:
+    resolution: {integrity: sha512-X8cqMLLie7KsNUDSdzeN8FYK9rEt4Dt67OsG/DNGnYTSDBG4uFAJFBnUeiV+zCVAvwFy56IjM9sH51jVaEhNxw==}
+
+  follow-redirects@1.15.8:
+    resolution: {integrity: sha512-xgrmBhBToVKay1q2Tao5LI26B83UhrB/vM1avwVSDzt8rx3rO6AizBAaF46EgksTVr+rFTQaqZZ9MVBfUe4nig==}
+    engines: {node: '>=4.0'}
+    peerDependencies:
+      debug: '*'
+    peerDependenciesMeta:
+      debug:
+        optional: true
+
+  form-data@4.0.0:
+    resolution: {integrity: sha512-ETEklSGi5t0QMZuiXoA/Q6vcnxcLQP5vdugSpuAyi6SVGi2clPPp+xgEhuMaHC+zGgn31Kd235W35f7Hykkaww==}
+    engines: {node: '>= 6'}
+
+  fs-extra@11.2.0:
+    resolution: {integrity: sha512-PmDi3uwK5nFuXh7XDTlVnS17xJS7vW36is2+w3xcv8SVxiB4NyATf4ctkVY5bkSjX0Y4nbvZCq1/EjtEyr9ktw==}
+    engines: {node: '>=14.14'}
+
+  fs.realpath@1.0.0:
+    resolution: {integrity: sha512-OO0pH2lK6a0hZnAdau5ItzHPI6pUlvI7jMVnxUQRtw4owF2wk8lOSabtGDCTP4Ggrg2MbGnWO9X8K1t4+fGMDw==}
+
+  fsevents@2.3.3:
+    resolution: {integrity: sha512-5xoDfX+fL7faATnagmWPpbFtwh/R77WmMMqqHGS65C3vvB0YHrgF+B1YmZ3441tMj5n63k0212XNoJwzlhffQw==}
+    engines: {node: ^8.16.0 || ^10.6.0 || >=11.0.0}
+    os: [darwin]
+
+  gensync@1.0.0-beta.2:
+    resolution: {integrity: sha512-3hN7NaskYvMDLQY55gnW3NQ+mesEAepTqlg+VEbj7zzqEMBVNhzcGYYeqFo/TlYz6eQiFcp1HcsCZO+nGgS8zg==}
+    engines: {node: '>=6.9.0'}
+
+  get-stream@8.0.1:
+    resolution: {integrity: sha512-VaUJspBffn/LMCJVoMvSAdmscJyS1auj5Zulnn5UoYcY531UWmdwhRWkcGKnGU93m5HSXP9LP2usOryrBtQowA==}
+    engines: {node: '>=16'}
+
+  glob-parent@5.1.2:
+    resolution: {integrity: sha512-AOIgSQCepiJYwP3ARnGx+5VnTu2HBYdzbGP45eLw1vr3zB3vZLeyed1sC9hnbcOc9/SrMyM5RPQrkGz4aS9Zow==}
+    engines: {node: '>= 6'}
+
+  glob-parent@6.0.2:
+    resolution: {integrity: sha512-XxwI8EOhVQgWp6iDL+3b0r86f4d6AX6zSU55HfB4ydCEuXLXc5FcYeOu+nnGftS4TEju/11rt4KJPTMgbfmv4A==}
+    engines: {node: '>=10.13.0'}
+
+  glob@7.2.3:
+    resolution: {integrity: sha512-nFR0zLpU2YCaRxwoCJvL6UvCH2JFyFVIvwTLsIf21AuHlMskA1hhTdk+LlYJtOlYt9v6dvszD2BGRqBL+iQK9Q==}
+    deprecated: Glob versions prior to v9 are no longer supported
+
+  globals@11.12.0:
+    resolution: {integrity: sha512-WOBp/EEGUiIsJSp7wcv/y6MO+lV9UoncWqxuFfm8eBwzWNgyfBd6Gz+IeKQ9jCmyhoH99g15M3T+QaVHFjizVA==}
+    engines: {node: '>=4'}
+
+  globals@13.24.0:
+    resolution: {integrity: sha512-AhO5QUcj8llrbG09iWhPU2B204J1xnPeL8kQmVorSsy+Sjj1sk8gIyh6cUocGmH4L0UuhAJy+hJMRA4mgA4mFQ==}
+    engines: {node: '>=8'}
+
+  good-listener@1.2.2:
+    resolution: {integrity: sha512-goW1b+d9q/HIwbVYZzZ6SsTr4IgE+WA44A0GmPIQstuOrgsFcT7VEJ48nmr9GaRtNu0XTKacFLGnBPAM6Afouw==}
+
+  graceful-fs@4.2.11:
+    resolution: {integrity: sha512-RbJ5/jmFcNNCcDV5o9eTnBLJ/HszWV0P73bc+Ff4nS/rJj+YaS6IGyiOL0VoBYX+l1Wrl3k63h/KrH+nhJ0XvQ==}
+
+  graphemer@1.4.0:
+    resolution: {integrity: sha512-EtKwoO6kxCL9WO5xipiHTZlSzBm7WLT627TqC/uVRd0HKmq8NXyebnNYxDoBi7wt8eTWrUrKXCOVaFq9x1kgag==}
+
+  guid-typescript@1.0.9:
+    resolution: {integrity: sha512-Y8T4vYhEfwJOTbouREvG+3XDsjr8E3kIr7uf+JZ0BYloFsttiHU0WfvANVsR7TxNUJa/WpCnw/Ino/p+DeBhBQ==}
+
+  has-flag@3.0.0:
+    resolution: {integrity: sha512-sKJf1+ceQBr4SMkvQnBDNDtf4TXpVhVGateu0t918bl30FnbE2m4vNLX+VWe/dpjlb+HugGYzW7uQXH98HPEYw==}
+    engines: {node: '>=4'}
+
+  has-flag@4.0.0:
+    resolution: {integrity: sha512-EykJT/Q1KjTWctppgIAgfSO0tKVuZUjhgMr17kqTumMl6Afv3EISleU7qZUzoXDFTAHTDC4NOoG/ZxU3EvlMPQ==}
+    engines: {node: '>=8'}
+
+  hookable@5.5.3:
+    resolution: {integrity: sha512-Yc+BQe8SvoXH1643Qez1zqLRmbA5rCL+sSmk6TVos0LWVfNIB7PGncdlId77WzLGSIB5KaWgTaNTs2lNVEI6VQ==}
+
+  html-tags@3.3.1:
+    resolution: {integrity: sha512-ztqyC3kLto0e9WbNp0aeP+M3kTt+nbaIveGmUxAtZa+8iFgKLUOD4YKM5j+f3QD89bra7UeumolZHKuOXnTmeQ==}
+    engines: {node: '>=8'}
+
+  human-signals@5.0.0:
+    resolution: {integrity: sha512-AXcZb6vzzrFAUE61HnN4mpLqd/cSIwNQjtNWR0euPm6y0iqx3G4gOXaIDdtdDwZmhwe82LA6+zinmW4UBWVePQ==}
+    engines: {node: '>=16.17.0'}
+
+  husky@9.1.5:
+    resolution: {integrity: sha512-rowAVRUBfI0b4+niA4SJMhfQwc107VLkBUgEYYAOQAbqDCnra1nYh83hF/MDmhYs9t9n1E3DuKOrs2LYNC+0Ag==}
+    engines: {node: '>=18'}
+    hasBin: true
+
+  iconv-lite@0.6.3:
+    resolution: {integrity: sha512-4fCk79wshMdzMp2rH06qWrJE4iolqLhCUH+OiuIgU++RB0+94NlDL81atO7GX55uUKueo0txHNtvEyI6D7WdMw==}
+    engines: {node: '>=0.10.0'}
+
+  ignore@5.3.2:
+    resolution: {integrity: sha512-hsBTNUqQTDwkWtcdYI2i06Y/nUBEsNEDJKjWdigLvegy8kDuJAS8uRlpkkcQpyEXL0Z/pjDy5HBmMjRCJ2gq+g==}
+    engines: {node: '>= 4'}
+
+  image-size@0.5.5:
+    resolution: {integrity: sha512-6TDAlDPZxUFCv+fuOkIoXT/V/f3Qbq8e37p+YOiYrUv3v9cc3/6x78VdfPgFVaB9dZYeLUfKgHRebpkm/oP2VQ==}
+    engines: {node: '>=0.10.0'}
+    hasBin: true
+
+  import-fresh@3.3.0:
+    resolution: {integrity: sha512-veYYhQa+D1QBKznvhUHxb8faxlrwUnxseDAbAp457E0wLNio2bOSKnjYDhMj+YiAq61xrMGhQk9iXVk5FzgQMw==}
+    engines: {node: '>=6'}
+
+  imurmurhash@0.1.4:
+    resolution: {integrity: sha512-JmXMZ6wuvDmLiHEml9ykzqO6lwFbof0GG4IkcGaENdCRDDmMVnny7s5HsIgHCbaq0w2MyPhDqkhTUgS2LU2PHA==}
+    engines: {node: '>=0.8.19'}
+
+  inflight@1.0.6:
+    resolution: {integrity: sha512-k92I/b08q4wvFscXCLvqfsHCrjrF7yiXsQuIVvVE7N82W3+aqpzuUdBbfhWcy/FZR3/4IgflMgKLOsvPDrGCJA==}
+    deprecated: This module is not supported, and leaks memory. Do not use it. Check out lru-cache if you want a good and tested way to coalesce async requests by a key value, which is much more comprehensive and powerful.
+
+  inherits@2.0.4:
+    resolution: {integrity: sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==}
+
+  is-binary-path@2.1.0:
+    resolution: {integrity: sha512-ZMERYes6pDydyuGidse7OsHxtbI7WVeUEozgR/g7rd0xUimYNlvZRE/K2MgZTjWy725IfelLeVcEM97mmtRGXw==}
+    engines: {node: '>=8'}
+
+  is-docker@3.0.0:
+    resolution: {integrity: sha512-eljcgEDlEns/7AXFosB5K/2nCM4P7FQPkGc/DWLy5rmFEWvZayGrik1d9/QIY5nJ4f9YsVvBkA6kJpHn9rISdQ==}
+    engines: {node: ^12.20.0 || ^14.13.1 || >=16.0.0}
+    hasBin: true
+
+  is-extglob@2.1.1:
+    resolution: {integrity: sha512-SbKbANkN603Vi4jEZv49LeVJMn4yGwsbzZworEoyEiutsN3nJYdbO36zfhGJ6QEDpOZIFkDtnq5JRxmvl3jsoQ==}
+    engines: {node: '>=0.10.0'}
+
+  is-glob@4.0.3:
+    resolution: {integrity: sha512-xelSayHH36ZgE7ZWhli7pW34hNbNl8Ojv5KVmkJD4hBdD3th8Tfk9vYasLM+mXWOZhFkgZfxhLSnrwRr4elSSg==}
+    engines: {node: '>=0.10.0'}
+
+  is-inside-container@1.0.0:
+    resolution: {integrity: sha512-KIYLCCJghfHZxqjYBE7rEy0OBuTd5xCHS7tHVgvCLkx7StIoaxwNW3hCALgEUjFfeRk+MG/Qxmp/vtETEF3tRA==}
+    engines: {node: '>=14.16'}
+    hasBin: true
+
+  is-number@7.0.0:
+    resolution: {integrity: sha512-41Cifkg6e8TylSpdtTpeLVMqvSBEVzTttHvERD741+pnZ8ANv0004MRL43QKPDlK9cGvNp6NZWZUBlbGXYxxng==}
+    engines: {node: '>=0.12.0'}
+
+  is-path-inside@3.0.3:
+    resolution: {integrity: sha512-Fd4gABb+ycGAmKou8eMftCupSir5lRxqf4aD/vd0cD2qc4HL07OjCeuHMr8Ro4CoMaeCKDB0/ECBOVWjTwUvPQ==}
+    engines: {node: '>=8'}
+
+  is-stream@3.0.0:
+    resolution: {integrity: sha512-LnQR4bZ9IADDRSkvpqMGvt/tEJWclzklNgSw48V5EAaAeDd6qGvN8ei6k5p0tvxSR171VmGyHuTiAOfxAbr8kA==}
+    engines: {node: ^12.20.0 || ^14.13.1 || >=16.0.0}
+
+  is-what@3.14.1:
+    resolution: {integrity: sha512-sNxgpk9793nzSs7bA6JQJGeIuRBQhAaNGG77kzYQgMkrID+lS6SlK07K5LaptscDlSaIgH+GPFzf+d75FVxozA==}
+
+  is-what@4.1.16:
+    resolution: {integrity: sha512-ZhMwEosbFJkA0YhFnNDgTM4ZxDRsS6HqTo7qsZM08fehyRYIYa0yHu5R6mgo1n/8MgaPBXiPimPD77baVFYg+A==}
+    engines: {node: '>=12.13'}
+
+  is-wsl@3.1.0:
+    resolution: {integrity: sha512-UcVfVfaK4Sc4m7X3dUSoHoozQGBEFeDC+zVo06t98xe8CzHSZZBekNXH+tu0NalHolcJ/QAGqS46Hef7QXBIMw==}
+    engines: {node: '>=16'}
+
+  isexe@2.0.0:
+    resolution: {integrity: sha512-RHxMLp9lnKHGHRng9QFhRCMbYAcVpn69smSGcq3f36xjgVVWThj4qqLbTLlq7Ssj8B+fIQ1EuCEGI2lKsyQeIw==}
+
+  js-tokens@4.0.0:
+    resolution: {integrity: sha512-RdJUflcE3cUzKiMqQgsCu06FPu9UdIJO0beYbPhHN4k6apgJtifcoCtT9bcxOpYBtpD2kCM6Sbzg4CausW/PKQ==}
+
+  js-tokens@9.0.0:
+    resolution: {integrity: sha512-WriZw1luRMlmV3LGJaR6QOJjWwgLUTf89OwT2lUOyjX2dJGBwgmIkbcz+7WFZjrZM635JOIR517++e/67CP9dQ==}
+
+  js-yaml@4.1.0:
+    resolution: {integrity: sha512-wpxZs9NoxZaJESJGIZTyDEaYpl0FKSA+FB9aJiyemKhMwkxQg63h4T1KJgUGHpTqPDNRcmmYLugrRjJlBtWvRA==}
+    hasBin: true
+
+  jsesc@2.5.2:
+    resolution: {integrity: sha512-OYu7XEzjkCQ3C5Ps3QIZsQfNpqoJyZZA99wd9aWd05NCtC5pWOkShK2mkL6HXQR6/Cy2lbNdPlZBpuQHXE63gA==}
+    engines: {node: '>=4'}
+    hasBin: true
+
+  json-buffer@3.0.1:
+    resolution: {integrity: sha512-4bV5BfR2mqfQTJm+V5tPPdf+ZpuhiIvTuAB5g8kcrXOZpTT/QwwVRWBywX1ozr6lEuPdbHxwaJlm9G6mI2sfSQ==}
+
+  json-schema-traverse@0.4.1:
+    resolution: {integrity: sha512-xbbCH5dCYU5T8LcEhhuh7HJ88HXuW3qsI3Y0zOZFKfZEHcpWiHU/Jxzk629Brsab/mMiHQti9wMP+845RPe3Vg==}
+
+  json-stable-stringify-without-jsonify@1.0.1:
+    resolution: {integrity: sha512-Bdboy+l7tA3OGW6FjyFHWkP5LuByj1Tk33Ljyq0axyzdk9//JSi2u3fP1QSmd1KNwq6VOKYGlAu87CisVir6Pw==}
+
+  json5@2.2.3:
+    resolution: {integrity: sha512-XmOWe7eyHYH14cLdVPoyg+GOH3rYX++KpzrylJwSW98t3Nk+U8XOl8FWKOgwtzdb8lXGf6zYwDUzeHMWfxasyg==}
+    engines: {node: '>=6'}
+    hasBin: true
+
+  jsonfile@6.1.0:
+    resolution: {integrity: sha512-5dgndWOriYSm5cnYaJNhalLNDKOqFwyDB/rr1E9ZsGciGvKPs8R2xYGCacuf3z6K1YKDz182fd+fY3cn3pMqXQ==}
+
+  keyv@4.5.4:
+    resolution: {integrity: sha512-oxVHkHR/EJf2CNXnWxRLW6mg7JyCCUcG0DtEGmL2ctUo1PNTin1PUil+r/+4r5MpVgC/fn1kjsx7mjSujKqIpw==}
+
+  kolorist@1.8.0:
+    resolution: {integrity: sha512-Y+60/zizpJ3HRH8DCss+q95yr6145JXZo46OTpFvDZWLfRCE4qChOyk1b26nMaNpfHHgxagk9dXT5OP0Tfe+dQ==}
+
+  less@4.2.0:
+    resolution: {integrity: sha512-P3b3HJDBtSzsXUl0im2L7gTO5Ubg8mEN6G8qoTS77iXxXX4Hvu4Qj540PZDvQ8V6DmX6iXo98k7Md0Cm1PrLaA==}
+    engines: {node: '>=6'}
+    hasBin: true
+
+  levn@0.4.1:
+    resolution: {integrity: sha512-+bT2uH4E5LGE7h/n3evcS/sQlJXCpIp6ym8OWJ5eV6+67Dsql/LaaT7qJBAt2rzfoa/5QBGBhxDix1dMt2kQKQ==}
+    engines: {node: '>= 0.8.0'}
+
+  local-pkg@0.5.0:
+    resolution: {integrity: sha512-ok6z3qlYyCDS4ZEU27HaU6x/xZa9Whf8jD4ptH5UZTQYZVYeb9bnZ3ojVhiJNLiXK1Hfc0GNbLXcmZ5plLDDBg==}
+    engines: {node: '>=14'}
+
+  locate-path@6.0.0:
+    resolution: {integrity: sha512-iPZK6eYjbxRu3uB4/WZ3EsEIMJFMqAoopl3R+zuq0UjcAm/MO6KCweDgPfP3elTztoKP3KtnVHxTn2NHBSDVUw==}
+    engines: {node: '>=10'}
+
+  lodash-es@4.17.21:
+    resolution: {integrity: sha512-mKnC+QJ9pWVzv+C4/U3rRsHapFfHvQFoFB92e52xeyGMcX6/OlIl78je1u8vePzYZSkkogMPJ2yjxxsb89cxyw==}
+
+  lodash-unified@1.0.3:
+    resolution: {integrity: sha512-WK9qSozxXOD7ZJQlpSqOT+om2ZfcT4yO+03FuzAHD0wF6S0l0090LRPDx3vhTTLZ8cFKpBn+IOcVXK6qOcIlfQ==}
+    peerDependencies:
+      '@types/lodash-es': '*'
+      lodash: '*'
+      lodash-es: '*'
+
+  lodash.merge@4.6.2:
+    resolution: {integrity: sha512-0KpjqXRVvrYyCsX1swR/XTK0va6VQkQM6MNo7PqW77ByjAhoARA8EfrP1N4+KlKj8YS0ZUCtRT/YUuhyYDujIQ==}
+
+  lodash@4.17.21:
+    resolution: {integrity: sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg==}
+
+  long@4.0.0:
+    resolution: {integrity: sha512-XsP+KhQif4bjX1kbuSiySJFNAehNxgLb6hPRGJ9QsUr8ajHkuXGdrHmFUTUUXhDwVX2R5bY4JNZEwbUiMhV+MA==}
+
+  lru-cache@5.1.1:
+    resolution: {integrity: sha512-KpNARQA3Iwv+jTA0utUVVbrh+Jlrr1Fv0e56GGzAFOXN7dk/FviaDW8LHmK52DlcH4WP2n6gI8vN1aesBFgo9w==}
+
+  magic-string@0.30.11:
+    resolution: {integrity: sha512-+Wri9p0QHMy+545hKww7YAu5NyzF8iomPL/RQazugQ9+Ez4Ic3mERMd8ZTX5rfK944j+560ZJi8iAwgak1Ac7A==}
+
+  make-dir@2.1.0:
+    resolution: {integrity: sha512-LS9X+dc8KLxXCb8dni79fLIIUA5VyZoyjSMCwTluaXA0o27cCK0bhXkpgw+sTXVpPy/lSO57ilRixqk0vDmtRA==}
+    engines: {node: '>=6'}
+
+  memoize-one@6.0.0:
+    resolution: {integrity: sha512-rkpe71W0N0c0Xz6QD0eJETuWAJGnJ9afsl1srmwPrI+yBCkge5EycXXbYRyvL29zZVUWQCY7InPRCv3GDXuZNw==}
+
+  merge-stream@2.0.0:
+    resolution: {integrity: sha512-abv/qOcuPfk3URPfDzmZU1LKmuw8kT+0nIHvKrKgFrwifol/doWcdA4ZqsWQ8ENrFKkd67Mfpo/LovbIUsbt3w==}
+
+  merge2@1.4.1:
+    resolution: {integrity: sha512-8q7VEgMJW4J8tcfVPy8g09NcQwZdbwFEqhe/WZkoIzjn/3TGDwtOCYtXGxA3O8tPzpczCCDgv+P2P5y00ZJOOg==}
+    engines: {node: '>= 8'}
+
+  micromatch@4.0.8:
+    resolution: {integrity: sha512-PXwfBhYu0hBCPw8Dn0E+WDYb7af3dSLVWKi3HGv84IdF4TyFoC0ysxFd0Goxw7nSv4T/PzEJQxsYsEiFCKo2BA==}
+    engines: {node: '>=8.6'}
+
+  mime-db@1.52.0:
+    resolution: {integrity: sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg==}
+    engines: {node: '>= 0.6'}
+
+  mime-types@2.1.35:
+    resolution: {integrity: sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw==}
+    engines: {node: '>= 0.6'}
+
+  mime@1.6.0:
+    resolution: {integrity: sha512-x0Vn8spI+wuJ1O6S7gnbaQg8Pxh4NNHb7KSINmEWKiPE4RKOplvijn+NkmYmmRgP68mc70j2EbeTFRsrswaQeg==}
+    engines: {node: '>=4'}
+    hasBin: true
+
+  mimic-fn@4.0.0:
+    resolution: {integrity: sha512-vqiC06CuhBTUdZH+RYl8sFrL096vA45Ok5ISO6sE/Mr1jRbGH4Csnhi8f3wKVl7x8mO4Au7Ir9D3Oyv1VYMFJw==}
+    engines: {node: '>=12'}
+
+  minimatch@3.1.2:
+    resolution: {integrity: sha512-J7p63hRiAjw1NDEww1W7i37+ByIrOWO5XQQAzZ3VOcL0PNybwpfmV/N05zFAzwQ9USyEcX6t3UO+K5aqBQOIHw==}
+
+  minimatch@9.0.5:
+    resolution: {integrity: sha512-G6T0ZX48xgozx7587koeX9Ys2NYy6Gmv//P89sEte9V9whIapMNF4idKxnW2QtCcLiTWlb/wfCabAtAFWhhBow==}
+    engines: {node: '>=16 || 14 >=14.17'}
+
+  mitt@3.0.1:
+    resolution: {integrity: sha512-vKivATfr97l2/QBCYAkXYDbrIWPM2IIKEl7YPhjCvKlG3kE2gm+uBo6nEXK3M5/Ffh/FLpKExzOQ3JJoJGFKBw==}
+
+  mlly@1.7.1:
+    resolution: {integrity: sha512-rrVRZRELyQzrIUAVMHxP97kv+G786pHmOKzuFII8zDYahFBS7qnHh2AlYSl1GAHhaMPCz6/oHjVMcfFYgFYHgA==}
+
+  mrmime@2.0.0:
+    resolution: {integrity: sha512-eu38+hdgojoyq63s+yTpN4XMBdt5l8HhMhc4VKLO9KM5caLIBvUm4thi7fFaxyTmCKeNnXZ5pAlBwCUnhA09uw==}
+    engines: {node: '>=10'}
+
+  ms@2.1.2:
+    resolution: {integrity: sha512-sGkPx+VjMtmA6MX27oA4FBFELFCZZ4S4XqeGOXCv68tT+jb3vk/RyaKWP0PTKyWtmLSM0b+adUTEvbs1PEaH2w==}
+
+  nanoid@3.3.7:
+    resolution: {integrity: sha512-eSRppjcPIatRIMC1U6UngP8XFcz8MQWGQdt1MTBQ7NaAmvXDfvNxbvWV3x2y6CdEUciCSsDHDQZbhYaB8QEo2g==}
+    engines: {node: ^10 || ^12 || ^13.7 || ^14 || >=15.0.1}
+    hasBin: true
+
+  natural-compare@1.4.0:
+    resolution: {integrity: sha512-OWND8ei3VtNC9h7V60qff3SVobHr996CTwgxubgyQYEpg290h9J0buyECNNJexkFm5sOajh5G116RYA1c8ZMSw==}
+
+  needle@3.3.1:
+    resolution: {integrity: sha512-6k0YULvhpw+RoLNiQCRKOl09Rv1dPLr8hHnVjHqdolKwDrdNyk+Hmrthi4lIGPPz3r39dLx0hsF5s40sZ3Us4Q==}
+    engines: {node: '>= 4.4.x'}
+    hasBin: true
+
+  node-releases@2.0.18:
+    resolution: {integrity: sha512-d9VeXT4SJ7ZeOqGX6R5EM022wpL+eWPooLI+5UpWn2jCT1aosUQEhQP214x33Wkwx3JQMvIm+tIoVOdodFS40g==}
+
+  normalize-path@3.0.0:
+    resolution: {integrity: sha512-6eZs5Ls3WtCisHWp9S2GUy8dqkpGi4BVSz3GaqiE6ezub0512ESztXUwUB6C6IKbQkY2Pnb/mD4WYojCRwcwLA==}
+    engines: {node: '>=0.10.0'}
+
+  normalize-wheel-es@1.2.0:
+    resolution: {integrity: sha512-Wj7+EJQ8mSuXr2iWfnujrimU35R2W4FAErEyTmJoJ7ucwTn2hOUSsRehMb5RSYkxXGTM7Y9QpvPmp++w5ftoJw==}
+
+  npm-run-path@5.3.0:
+    resolution: {integrity: sha512-ppwTtiJZq0O/ai0z7yfudtBpWIoxM8yE6nHi1X47eFR2EWORqfbu6CnPlNsjeN683eT0qG6H/Pyf9fCcvjnnnQ==}
+    engines: {node: ^12.20.0 || ^14.13.1 || >=16.0.0}
+
+  nth-check@2.1.1:
+    resolution: {integrity: sha512-lqjrjmaOoAnWfMmBPL+XNnynZh2+swxiX3WUE0s4yEHI6m+AwrK2UZOimIRl3X/4QctVqS8AiZjFqyOGrMXb/w==}
+
+  once@1.4.0:
+    resolution: {integrity: sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w==}
+
+  onetime@6.0.0:
+    resolution: {integrity: sha512-1FlR+gjXK7X+AsAHso35MnyN5KqGwJRi/31ft6x0M194ht7S+rWAvd7PHss9xSKMzE0asv1pyIHaJYq+BbacAQ==}
+    engines: {node: '>=12'}
+
+  onnx-proto@4.0.4:
+    resolution: {integrity: sha512-aldMOB3HRoo6q/phyB6QRQxSt895HNNw82BNyZ2CMh4bjeKv7g/c+VpAFtJuEMVfYLMbRx61hbuqnKceLeDcDA==}
+
+  onnxruntime-common@1.14.0:
+    resolution: {integrity: sha512-3LJpegM2iMNRX2wUmtYfeX/ytfOzNwAWKSq1HbRrKc9+uqG/FsEA0bbKZl1btQeZaXhC26l44NWpNUeXPII7Ew==}
+
+  onnxruntime-web@1.14.0:
+    resolution: {integrity: sha512-Kcqf43UMfW8mCydVGcX9OMXI2VN17c0p6XvR7IPSZzBf/6lteBzXHvcEVWDPmCKuGombl997HgLqj91F11DzXw==}
+
+  open@10.1.0:
+    resolution: {integrity: sha512-mnkeQ1qP5Ue2wd+aivTD3NHd/lZ96Lu0jgf0pwktLPtx6cTZiH7tyeGRRHs0zX0rbrahXPnXlUnbeXyaBBuIaw==}
+    engines: {node: '>=18'}
+
+  optionator@0.9.4:
+    resolution: {integrity: sha512-6IpQ7mKUxRcZNLIObR0hz7lxsapSSIYNZJwXPGeF0mTVqGKFIXj1DQcMoT22S3ROcLyY/rz0PWaWZ9ayWmad9g==}
+    engines: {node: '>= 0.8.0'}
+
+  p-limit@3.1.0:
+    resolution: {integrity: sha512-TYOanM3wGwNGsZN2cVTYPArw454xnXj5qmWF1bEoAc4+cU/ol7GVh7odevjp1FNHduHc3KZMcFduxU5Xc6uJRQ==}
+    engines: {node: '>=10'}
+
+  p-locate@5.0.0:
+    resolution: {integrity: sha512-LaNjtRWUBY++zB5nE/NwcaoMylSPk+S+ZHNB1TzdbMJMny6dynpAGt7X/tl/QYq3TIeE6nxHppbo2LGymrG5Pw==}
+    engines: {node: '>=10'}
+
+  package-manager-detector@0.2.0:
+    resolution: {integrity: sha512-E385OSk9qDcXhcM9LNSe4sdhx8a9mAPrZ4sMLW+tmxl5ZuGtPUcdFu+MPP2jbgiWAZ6Pfe5soGFMd+0Db5Vrog==}
+
+  parent-module@1.0.1:
+    resolution: {integrity: sha512-GQ2EWRpQV8/o+Aw8YqtfZZPfNRWZYkbidE9k5rpl/hC3vtHHBfGm2Ifi6qWV+coDGkrUKZAxE3Lot5kcsRlh+g==}
+    engines: {node: '>=6'}
+
+  parse-node-version@1.0.1:
+    resolution: {integrity: sha512-3YHlOa/JgH6Mnpr05jP9eDG254US9ek25LyIxZlDItp2iJtwyaXQb57lBYLdT3MowkUFYEV2XXNAYIPlESvJlA==}
+    engines: {node: '>= 0.10'}
+
+  path-exists@4.0.0:
+    resolution: {integrity: sha512-ak9Qy5Q7jYb2Wwcey5Fpvg2KoAc/ZIhLSLOSBmRmygPsGwkVVt0fZa0qrtMz+m6tJTAHfZQ8FnmB4MG4LWy7/w==}
+    engines: {node: '>=8'}
+
+  path-is-absolute@1.0.1:
+    resolution: {integrity: sha512-AVbw3UJ2e9bq64vSaS9Am0fje1Pa8pbGqTTsmXfaIiMpnr5DlDhfJOuLj9Sf95ZPVDAUerDfEk88MPmPe7UCQg==}
+    engines: {node: '>=0.10.0'}
+
+  path-key@3.1.1:
+    resolution: {integrity: sha512-ojmeN0qd+y0jszEtoY48r0Peq5dwMEkIlCOu6Q5f41lfkswXuKtYrhgoTpLnyIcHm24Uhqx+5Tqm2InSwLhE6Q==}
+    engines: {node: '>=8'}
+
+  path-key@4.0.0:
+    resolution: {integrity: sha512-haREypq7xkM7ErfgIyA0z+Bj4AGKlMSdlQE2jvJo6huWD1EdkKYV+G/T4nq0YEF2vgTT8kqMFKo1uHn950r4SQ==}
+    engines: {node: '>=12'}
+
+  pathe@1.1.2:
+    resolution: {integrity: sha512-whLdWMYL2TwI08hn8/ZqAbrVemu0LNaNNJZX73O6qaIdCTfXutsLhMkjdENX0qhsQ9uIimo4/aQOmXkoon2nDQ==}
+
+  perfect-debounce@1.0.0:
+    resolution: {integrity: sha512-xCy9V055GLEqoFaHoC1SoLIaLmWctgCUaBaWxDZ7/Zx4CTyX7cJQLJOok/orfjZAh9kEYpjJa4d0KcJmCbctZA==}
+
+  picocolors@1.1.0:
+    resolution: {integrity: sha512-TQ92mBOW0l3LeMeyLV6mzy/kWr8lkd/hp3mTg7wYK7zJhuBStmGMBG0BdeDZS/dZx1IukaX6Bk11zcln25o1Aw==}
+
+  picomatch@2.3.1:
+    resolution: {integrity: sha512-JU3teHTNjmE2VCGFzuY8EXzCDVwEqB2a8fsIvwaStHhAWJEeVd1o1QD80CU6+ZdEXXSLbSsuLwJjkCBWqRQUVA==}
+    engines: {node: '>=8.6'}
+
+  pify@4.0.1:
+    resolution: {integrity: sha512-uB80kBFb/tfd68bVleG9T5GGsGPjJrLAUpR5PZIrhBnIaRTQRjqdJSsIKkOP6OAIFbj7GOrcudc5pNjZ+geV2g==}
+    engines: {node: '>=6'}
+
+  pinia@2.2.2:
+    resolution: {integrity: sha512-ja2XqFWZC36mupU4z1ZzxeTApV7DOw44cV4dhQ9sGwun+N89v/XP7+j7q6TanS1u1tdbK4r+1BUx7heMaIdagA==}
+    peerDependencies:
+      '@vue/composition-api': ^1.4.0
+      typescript: '>=4.4.4'
+      vue: ^2.6.14 || ^3.3.0
+    peerDependenciesMeta:
+      '@vue/composition-api':
+        optional: true
+      typescript:
+        optional: true
+
+  pkg-types@1.2.0:
+    resolution: {integrity: sha512-+ifYuSSqOQ8CqP4MbZA5hDpb97n3E8SVWdJe+Wms9kj745lmd3b7EZJiqvmLwAlmRfjrI7Hi5z3kdBJ93lFNPA==}
+
+  platform@1.3.6:
+    resolution: {integrity: sha512-fnWVljUchTro6RiCFvCXBbNhJc2NijN7oIQxbwsyL0buWJPG85v81ehlHI9fXrJsMNgTofEoWIQeClKpgxFLrg==}
+
+  postcss-selector-parser@6.1.2:
+    resolution: {integrity: sha512-Q8qQfPiZ+THO/3ZrOrO0cJJKfpYCagtMUkXbnEfmgUjwXg6z/WBeOyS9APBBPCTSiDV+s4SwQGu8yFsiMRIudg==}
+    engines: {node: '>=4'}
+
+  postcss@8.4.44:
+    resolution: {integrity: sha512-Aweb9unOEpQ3ezu4Q00DPvvM2ZTUitJdNKeP/+uQgr1IBIqu574IaZoURId7BKtWMREwzKa9OgzPzezWGPWFQw==}
+    engines: {node: ^10 || ^12 || >=14}
+
+  prelude-ls@1.2.1:
+    resolution: {integrity: sha512-vkcDPrRZo1QZLbn5RLGPpg/WmIQ65qoWWhcGKf/b5eplkkarX0m9z8ppCat4mlOqUsWpyNuYgO3VRyrYHSzX5g==}
+    engines: {node: '>= 0.8.0'}
+
+  prettier-linter-helpers@1.0.0:
+    resolution: {integrity: sha512-GbK2cP9nraSSUF9N2XwUwqfzlAFlMNYYl+ShE/V+H8a9uNl/oUqB1w2EL54Jh0OlyRSd8RfWYJ3coVS4TROP2w==}
+    engines: {node: '>=6.0.0'}
+
+  prettier@3.3.3:
+    resolution: {integrity: sha512-i2tDNA0O5IrMO757lfrdQZCc2jPNDVntV0m/+4whiDfWaTKfMNgR7Qz0NAeGz/nRqF4m5/6CLzbP4/liHt12Ew==}
+    engines: {node: '>=14'}
+    hasBin: true
+
+  protobufjs@6.11.4:
+    resolution: {integrity: sha512-5kQWPaJHi1WoCpjTGszzQ32PG2F4+wRY6BmAT4Vfw56Q2FZ4YZzK20xUYQH4YkfehY1e6QSICrJquM6xXZNcrw==}
+    hasBin: true
+
+  proxy-from-env@1.1.0:
+    resolution: {integrity: sha512-D+zkORCbA9f1tdWRK0RaCR3GPv50cMxcrz4X8k5LTSUD1Dkw47mKJEZQNunItRTkWwgtaUSo1RVFRIG9ZXiFYg==}
+
+  prr@1.0.1:
+    resolution: {integrity: sha512-yPw4Sng1gWghHQWj0B3ZggWUm4qVbPwPFcRG8KyxiU7J2OHFSoEHKS+EZ3fv5l1t9CyCiop6l/ZYeWbrgoQejw==}
+
+  punycode@2.3.1:
+    resolution: {integrity: sha512-vYt7UD1U9Wg6138shLtLOvdAu+8DsC/ilFtEVHcH+wydcSpNE20AfSOduf6MkRFahL5FY7X1oU7nKVZFtfq8Fg==}
+    engines: {node: '>=6'}
+
+  queue-microtask@1.2.3:
+    resolution: {integrity: sha512-NuaNSa6flKT5JaSYQzJok04JzTL1CA6aGhv5rfLW3PgqA+M2ChpZQnAC8h8i4ZFkBS8X5RqkDBHA7r4hej3K9A==}
+
+  readdirp@3.6.0:
+    resolution: {integrity: sha512-hOS089on8RduqdbhvQ5Z37A0ESjsqz6qnRcffsMU3495FuTdqSm+7bhJ29JvIOsBDEEnan5DPu9t3To9VRlMzA==}
+    engines: {node: '>=8.10.0'}
+
+  resolve-from@4.0.0:
+    resolution: {integrity: sha512-pb/MYmXstAkysRFx8piNI1tGFNQIFA3vkE3Gq4EuA1dF6gHp/+vgZqsCGJapvy8N3Q+4o7FwvquPJcnZ7RYy4g==}
+    engines: {node: '>=4'}
+
+  reusify@1.0.4:
+    resolution: {integrity: sha512-U9nH88a3fc/ekCF1l0/UP1IosiuIjyTh7hBvXVMHYgVcfGvt897Xguj2UOLDeI5BG2m7/uwyaLVT6fbtCwTyzw==}
+    engines: {iojs: '>=1.0.0', node: '>=0.10.0'}
+
+  rfdc@1.4.1:
+    resolution: {integrity: sha512-q1b3N5QkRUWUl7iyylaaj3kOpIT0N2i9MqIEQXP73GVsN9cw3fdx8X63cEmWhJGi2PPCF23Ijp7ktmd39rawIA==}
+
+  rimraf@3.0.2:
+    resolution: {integrity: sha512-JZkJMZkAGFFPP2YqXZXPbMlMBgsxzE8ILs4lMIX/2o0L9UBw9O/Y3o6wFw/i9YLapcUJWwqbi3kdxIPdC62TIA==}
+    deprecated: Rimraf versions prior to v4 are no longer supported
+    hasBin: true
+
+  rollup@4.21.2:
+    resolution: {integrity: sha512-e3TapAgYf9xjdLvKQCkQTnbTKd4a6jwlpQSJJFokHGaX2IVjoEqkIIhiQfqsi0cdwlOD+tQGuOd5AJkc5RngBw==}
+    engines: {node: '>=18.0.0', npm: '>=8.0.0'}
+    hasBin: true
+
+  run-applescript@7.0.0:
+    resolution: {integrity: sha512-9by4Ij99JUr/MCFBUkDKLWK3G9HVXmabKz9U5MlIAIuvuzkiOicRYs8XJLxX+xahD+mLiiCYDqF9dKAgtzKP1A==}
+    engines: {node: '>=18'}
+
+  run-parallel@1.2.0:
+    resolution: {integrity: sha512-5l4VyZR86LZ/lDxZTR6jqL8AFE2S0IFLMP26AbjsLVADxHdhB/c0GUsH+y39UfCi3dzz8OlQuPmnaJOMoDHQBA==}
+
+  safer-buffer@2.1.2:
+    resolution: {integrity: sha512-YZo3K82SD7Riyi0E1EQPojLz7kpepnSQI9IyPbHHg1XXXevb5dJI7tpyN2ADxGcQbHG7vcyRHk0cbwqcQriUtg==}
+
+  sax@1.4.1:
+    resolution: {integrity: sha512-+aWOz7yVScEGoKNd4PA10LZ8sk0A/z5+nXQG5giUO5rprX9jgYsTdov9qCchZiPIZezbZH+jRut8nPodFAX4Jg==}
+
+  scule@1.3.0:
+    resolution: {integrity: sha512-6FtHJEvt+pVMIB9IBY+IcCJ6Z5f1iQnytgyfKMhDKgmzYG+TeH/wx1y3l27rshSbLiSanrR9ffZDrEsmjlQF2g==}
+
+  select@1.1.2:
+    resolution: {integrity: sha512-OwpTSOfy6xSs1+pwcNrv0RBMOzI39Lp3qQKUTPVVPRjCdNa5JH/oPRiqsesIskK8TVgmRiHwO4KXlV2Li9dANA==}
+
+  semver@5.7.2:
+    resolution: {integrity: sha512-cBznnQ9KjJqU67B52RMC65CMarK2600WFnbkcaiwWq3xy/5haFJlshgnpjovMVJ+Hff49d8GEn0b87C5pDQ10g==}
+    hasBin: true
+
+  semver@6.3.1:
+    resolution: {integrity: sha512-BR7VvDCVHO+q2xBEWskxS6DJE1qRnb7DxzUrogb71CWoSficBxYsiAGd+Kl0mmq/MprG9yArRkyrQxTO6XjMzA==}
+    hasBin: true
+
+  semver@7.6.3:
+    resolution: {integrity: sha512-oVekP1cKtI+CTDvHWYFUcMtsK/00wmAEfyqKfNdARm8u1wNVhSgaX7A8d4UuIlUI5e84iEwOhs7ZPYRmzU9U6A==}
+    engines: {node: '>=10'}
+    hasBin: true
+
+  shebang-command@2.0.0:
+    resolution: {integrity: sha512-kHxr2zZpYtdmrN1qDjrrX/Z1rR1kG8Dx+gkpK1G4eXmvXswmcE1hTWBWYUzlraYw1/yZp6YuDY77YtvbN0dmDA==}
+    engines: {node: '>=8'}
+
+  shebang-regex@3.0.0:
+    resolution: {integrity: sha512-7++dFhtcx3353uBaq8DDR4NuxBetBzC7ZQOhmTQInHEd6bSrXdiEyzCvG07Z44UYdLShWUyXt5M/yhz8ekcb1A==}
+    engines: {node: '>=8'}
+
+  signal-exit@4.1.0:
+    resolution: {integrity: sha512-bzyZ1e88w9O1iNJbKnOlvYTrWPDl46O1bG0D3XInv+9tkPrxrN8jUUTiFlDkkmKWgn1M6CfIA13SuGqOa9Korw==}
+    engines: {node: '>=14'}
+
+  sirv@2.0.4:
+    resolution: {integrity: sha512-94Bdh3cC2PKrbgSOUqTiGPWVZeSiXfKOVZNJniWoqrWrRkB1CJzBU3NEbiTsPcYy1lDsANA/THzS+9WBiy5nfQ==}
+    engines: {node: '>= 10'}
+
+  source-map-js@1.2.0:
+    resolution: {integrity: sha512-itJW8lvSA0TXEphiRoawsCksnlf8SyvmFzIhltqAHluXd88pkCd+cXJVHTDwdCr0IzwptSm035IHQktUu1QUMg==}
+    engines: {node: '>=0.10.0'}
+
+  source-map@0.6.1:
+    resolution: {integrity: sha512-UjgapumWlbMhkBgzT7Ykc5YXUT46F0iKu8SGXq0bcwP5dz/h0Plj6enJqjz1Zbq2l5WaqYnrVbwWOWMyF3F47g==}
+    engines: {node: '>=0.10.0'}
+
+  speakingurl@14.0.1:
+    resolution: {integrity: sha512-1POYv7uv2gXoyGFpBCmpDVSNV74IfsWlDW216UPjbWufNf+bSU6GdbDsxdcxtfwb4xlI3yxzOTKClUosxARYrQ==}
+    engines: {node: '>=0.10.0'}
+
+  strip-ansi@6.0.1:
+    resolution: {integrity: sha512-Y38VPSHcqkFrCpFnQ9vuSXmquuv5oXOKpGeT6aGrr3o3Gc9AlVa6JBfUSOCnbxGGZF+/0ooI7KrPuUSztUdU5A==}
+    engines: {node: '>=8'}
+
+  strip-final-newline@3.0.0:
+    resolution: {integrity: sha512-dOESqjYr96iWYylGObzd39EuNTa5VJxyvVAEm5Jnh7KGo75V43Hk1odPQkNDyXNmUR6k+gEiDVXnjB8HJ3crXw==}
+    engines: {node: '>=12'}
+
+  strip-json-comments@3.1.1:
+    resolution: {integrity: sha512-6fPc+R4ihwqP6N/aIv2f1gMH8lOVtWQHoqC4yK6oSDVVocumAsfCqjkXnqiYMhmMwS/mEHLp7Vehlt3ql6lEig==}
+    engines: {node: '>=8'}
+
+  strip-literal@2.1.0:
+    resolution: {integrity: sha512-Op+UycaUt/8FbN/Z2TWPBLge3jWrP3xj10f3fnYxf052bKuS3EKs1ZQcVGjnEMdsNVAM+plXRdmjrZ/KgG3Skw==}
+
+  superjson@2.2.1:
+    resolution: {integrity: sha512-8iGv75BYOa0xRJHK5vRLEjE2H/i4lulTjzpUXic3Eg8akftYjkmQDa8JARQ42rlczXyFR3IeRoeFCc7RxHsYZA==}
+    engines: {node: '>=16'}
+
+  supports-color@5.5.0:
+    resolution: {integrity: sha512-QjVjwdXIt408MIiAqCX4oUKsgU2EqAGzs2Ppkm4aQYbjm+ZEWEcW4SfFNTr4uMNZma0ey4f5lgLrkB0aX0QMow==}
+    engines: {node: '>=4'}
+
+  supports-color@7.2.0:
+    resolution: {integrity: sha512-qpCAvRl9stuOHveKsn7HncJRvv501qIacKzQlO/+Lwxc9+0q2wLyv4Dfvt80/DPn2pqOBsJdDiogXGR9+OvwRw==}
+    engines: {node: '>=8'}
+
+  svg-tags@1.0.0:
+    resolution: {integrity: sha512-ovssysQTa+luh7A5Weu3Rta6FJlFBBbInjOh722LIt6klpU2/HtdUbszju/G4devcvk8PGt7FCLv5wftu3THUA==}
+
+  synckit@0.9.1:
+    resolution: {integrity: sha512-7gr8p9TQP6RAHusBOSLs46F4564ZrjV8xFmw5zCmgmhGUcw2hxsShhJ6CEiHQMgPDwAQ1fWHPM0ypc4RMAig4A==}
+    engines: {node: ^14.18.0 || >=16.0.0}
+
+  text-table@0.2.0:
+    resolution: {integrity: sha512-N+8UisAXDGk8PFXP4HAzVR9nbfmVJ3zYLAWiTIoqC5v5isinhr+r5uaO8+7r3BMfuNIufIsA7RdpVgacC2cSpw==}
+
+  tiny-emitter@2.1.0:
+    resolution: {integrity: sha512-NB6Dk1A9xgQPMoGqC5CVXn123gWyte215ONT5Pp5a0yt4nlEoO1ZWeCwpncaekPHXO60i47ihFnZPiRPjRMq4Q==}
+
+  tinyexec@0.3.0:
+    resolution: {integrity: sha512-tVGE0mVJPGb0chKhqmsoosjsS+qUnJVGJpZgsHYQcGoPlG3B51R3PouqTgEGH2Dc9jjFyOqOpix6ZHNMXp1FZg==}
+
+  to-fast-properties@2.0.0:
+    resolution: {integrity: sha512-/OaKK0xYrs3DmxRYqL/yDc+FxFUVYhDlXMhRmv3z915w2HF1tnN1omB354j8VUGO/hbRzyD6Y3sA7v7GS/ceog==}
+    engines: {node: '>=4'}
+
+  to-regex-range@5.0.1:
+    resolution: {integrity: sha512-65P7iz6X5yEr1cwcgvQxbbIw7Uk3gOy5dIdtZ4rDveLqhrdJP+Li/Hx6tyK0NEb+2GCyneCMJiGqrADCSNk8sQ==}
+    engines: {node: '>=8.0'}
+
+  totalist@3.0.1:
+    resolution: {integrity: sha512-sf4i37nQ2LBx4m3wB74y+ubopq6W/dIzXg0FDGjsYnZHVa1Da8FH853wlL2gtUhg+xJXjfk3kUZS3BRoQeoQBQ==}
+    engines: {node: '>=6'}
+
+  tslib@2.7.0:
+    resolution: {integrity: sha512-gLXCKdN1/j47AiHiOkJN69hJmcbGTHI0ImLmbYLHykhgeN0jVGola9yVjFgzCUklsZQMW55o+dW7IXv3RCXDzA==}
+
+  type-check@0.4.0:
+    resolution: {integrity: sha512-XleUoc9uwGXqjWwXaUTZAmzMcFZ5858QA2vvx1Ur5xIcixXIP+8LnFDgRplU30us6teqdlskFfu+ae4K79Ooew==}
+    engines: {node: '>= 0.8.0'}
+
+  type-fest@0.20.2:
+    resolution: {integrity: sha512-Ne+eE4r0/iWnpAxD852z3A+N0Bt5RN//NjJwRd2VFHEmrywxf5vsZlh4R6lixl6B+wz/8d+maTSAkN1FIkI3LQ==}
+    engines: {node: '>=10'}
+
+  ufo@1.5.4:
+    resolution: {integrity: sha512-UsUk3byDzKd04EyoZ7U4DOlxQaD14JUKQl6/P7wiX4FNvUfm3XL246n9W5AmqwW5RSFJ27NAuM0iLscAOYUiGQ==}
+
+  undici-types@6.20.0:
+    resolution: {integrity: sha512-Ny6QZ2Nju20vw1SRHe3d9jVu6gJ+4e3+MMpqu7pqE5HT6WsTSlce++GQmK5UXS8mzV8DSYHrQH+Xrf2jVcuKNg==}
+
+  unimport@3.11.1:
+    resolution: {integrity: sha512-DuB1Uoq01LrrXTScxnwOoMSlTXxyKcULguFxbLrMDFcE/CO0ZWHpEiyhovN0mycPt7K6luAHe8laqvwvuoeUPg==}
+
+  universalify@2.0.1:
+    resolution: {integrity: sha512-gptHNQghINnc/vTGIk0SOFGFNXw7JVrlRUtConJRlvaw6DuX0wO5Jeko9sWrMBhh+PsYAZ7oXAiOnf/UKogyiw==}
+    engines: {node: '>= 10.0.0'}
+
+  unplugin-auto-import@0.18.2:
+    resolution: {integrity: sha512-Dwb3rAic75harVBrVjwiq6H24PT+nBq2dpxV5BH8NNI6sDFaTytvP+iyo4xy7prQbR3r5K6nMs4f5Wp9PE4g8A==}
+    engines: {node: '>=14'}
+    peerDependencies:
+      '@nuxt/kit': ^3.2.2
+      '@vueuse/core': '*'
+    peerDependenciesMeta:
+      '@nuxt/kit':
+        optional: true
+      '@vueuse/core':
+        optional: true
+
+  unplugin-icons@0.19.3:
+    resolution: {integrity: sha512-EUegRmsAI6+rrYr0vXjFlIP+lg4fSC4zb62zAZKx8FGXlWAGgEGBCa3JDe27aRAXhistObLPbBPhwa/0jYLFkQ==}
+    peerDependencies:
+      '@svgr/core': '>=7.0.0'
+      '@svgx/core': ^1.0.1
+      '@vue/compiler-sfc': ^3.0.2 || ^2.7.0
+      vue-template-compiler: ^2.6.12
+      vue-template-es2015-compiler: ^1.9.0
+    peerDependenciesMeta:
+      '@svgr/core':
+        optional: true
+      '@svgx/core':
+        optional: true
+      '@vue/compiler-sfc':
+        optional: true
+      vue-template-compiler:
+        optional: true
+      vue-template-es2015-compiler:
+        optional: true
+
+  unplugin-vue-components@0.27.4:
+    resolution: {integrity: sha512-1XVl5iXG7P1UrOMnaj2ogYa5YTq8aoh5jwDPQhemwO/OrXW+lPQKDXd1hMz15qxQPxgb/XXlbgo3HQ2rLEbmXQ==}
+    engines: {node: '>=14'}
+    peerDependencies:
+      '@babel/parser': ^7.15.8
+      '@nuxt/kit': ^3.2.2
+      vue: 2 || 3
+    peerDependenciesMeta:
+      '@babel/parser':
+        optional: true
+      '@nuxt/kit':
+        optional: true
+
+  unplugin@1.12.3:
+    resolution: {integrity: sha512-my8DH0/T/Kx33KO+6QXAqdeMYgyy0GktlOpdQjpagfHKw5DrD0ctPr7SHUyOT3g4ZVpzCQGt/qcpuoKJ/pniHA==}
+    engines: {node: '>=14.0.0'}
+
+  update-browserslist-db@1.1.0:
+    resolution: {integrity: sha512-EdRAaAyk2cUE1wOf2DkEhzxqOQvFOoRJFNS6NeyJ01Gp2beMRpBAINjM2iDXE3KCuKhwnvHIQCJm6ThL2Z+HzQ==}
+    hasBin: true
+    peerDependencies:
+      browserslist: '>= 4.21.0'
+
+  uri-js@4.4.1:
+    resolution: {integrity: sha512-7rKUyy33Q1yc98pQ1DAmLtwX109F7TIfWlW1Ydo8Wl1ii1SeHieeh0HHfPeL2fMXK6z0s8ecKs9frCuLJvndBg==}
+
+  util-deprecate@1.0.2:
+    resolution: {integrity: sha512-EPD5q1uXyFxJpCrLnCc1nHnq3gOa6DZBocAIiI2TaSCA7VCJ1UJDMagCzIkXNsUYfD1daK//LTEQ8xiIbrHtcw==}
+
+  vite-hot-client@0.2.3:
+    resolution: {integrity: sha512-rOGAV7rUlUHX89fP2p2v0A2WWvV3QMX2UYq0fRqsWSvFvev4atHWqjwGoKaZT1VTKyLGk533ecu3eyd0o59CAg==}
+    peerDependencies:
+      vite: ^2.6.0 || ^3.0.0 || ^4.0.0 || ^5.0.0-0
+
+  vite-plugin-inspect@0.8.7:
+    resolution: {integrity: sha512-/XXou3MVc13A5O9/2Nd6xczjrUwt7ZyI9h8pTnUMkr5SshLcb0PJUOVq2V+XVkdeU4njsqAtmK87THZuO2coGA==}
+    engines: {node: '>=14'}
+    peerDependencies:
+      '@nuxt/kit': '*'
+      vite: ^3.1.0 || ^4.0.0 || ^5.0.0-0
+    peerDependenciesMeta:
+      '@nuxt/kit':
+        optional: true
+
+  vite-plugin-vue-devtools@7.4.0:
+    resolution: {integrity: sha512-0cIX+mi8QaRsf8hQckRPOWNHj5XyJVHDKo5ddLKVOh3vzurfFCGFVTXPALUy9A4sMGfWKTt0VyIh8OYk7V7m+Q==}
+    engines: {node: '>=v14.21.3'}
+    peerDependencies:
+      vite: ^3.1.0 || ^4.0.0-0 || ^5.0.0-0
+
+  vite-plugin-vue-inspector@5.2.0:
+    resolution: {integrity: sha512-wWxyb9XAtaIvV/Lr7cqB1HIzmHZFVUJsTNm3yAxkS87dgh/Ky4qr2wDEWNxF23fdhVa3jQ8MZREpr4XyiuaRqA==}
+    peerDependencies:
+      vite: ^3.0.0-0 || ^4.0.0-0 || ^5.0.0-0
+
+  vite@5.4.3:
+    resolution: {integrity: sha512-IH+nl64eq9lJjFqU+/yrRnrHPVTlgy42/+IzbOdaFDVlyLgI/wDlf+FCobXLX1cT0X5+7LMyH1mIy2xJdLfo8Q==}
+    engines: {node: ^18.0.0 || >=20.0.0}
+    hasBin: true
+    peerDependencies:
+      '@types/node': ^18.0.0 || >=20.0.0
+      less: '*'
+      lightningcss: ^1.21.0
+      sass: '*'
+      sass-embedded: '*'
+      stylus: '*'
+      sugarss: '*'
+      terser: ^5.4.0
+    peerDependenciesMeta:
+      '@types/node':
+        optional: true
+      less:
+        optional: true
+      lightningcss:
+        optional: true
+      sass:
+        optional: true
+      sass-embedded:
+        optional: true
+      stylus:
+        optional: true
+      sugarss:
+        optional: true
+      terser:
+        optional: true
+
+  vue-demi@0.14.10:
+    resolution: {integrity: sha512-nMZBOwuzabUO0nLgIcc6rycZEebF6eeUfaiQx9+WSk8e29IbLvPU9feI6tqW4kTo3hvoYAJkMh8n8D0fuISphg==}
+    engines: {node: '>=12'}
+    hasBin: true
+    peerDependencies:
+      '@vue/composition-api': ^1.0.0-rc.1
+      vue: ^3.0.0-0 || ^2.6.0
+    peerDependenciesMeta:
+      '@vue/composition-api':
+        optional: true
+
+  vue-eslint-parser@9.4.3:
+    resolution: {integrity: sha512-2rYRLWlIpaiN8xbPiDyXZXRgLGOtWxERV7ND5fFAv5qo1D2N9Fu9MNajBNc6o13lZ+24DAWCkQCvj4klgmcITg==}
+    engines: {node: ^14.17.0 || >=16.0.0}
+    peerDependencies:
+      eslint: '>=6.0.0'
+
+  vue-i18n@11.0.1:
+    resolution: {integrity: sha512-pWAT8CusK8q9/EpN7V3oxwHwxWm6+Kp2PeTZmRGvdZTkUzMQDpbbmHp0TwQ8xw04XKm23cr6B4GL72y3W8Yekg==}
+    engines: {node: '>= 16'}
+    peerDependencies:
+      vue: ^3.0.0
+
+  vue-router@4.4.3:
+    resolution: {integrity: sha512-sv6wmNKx2j3aqJQDMxLFzs/u/mjA9Z5LCgy6BE0f7yFWMjrPLnS/sPNn8ARY/FXw6byV18EFutn5lTO6+UsV5A==}
+    peerDependencies:
+      vue: ^3.2.0
+
+  vue@3.5.0:
+    resolution: {integrity: sha512-1t70favYoFijwfWJ7g81aTd32obGaAnKYE9FNyMgnEzn3F4YncRi/kqAHHKloG0VXTD8vBYMhbgLKCA+Sk6QDw==}
+    peerDependencies:
+      typescript: '*'
+    peerDependenciesMeta:
+      typescript:
+        optional: true
+
+  webpack-sources@3.2.3:
+    resolution: {integrity: sha512-/DyMEOrDgLKKIG0fmvtz+4dUX/3Ghozwgm6iPp8KRhvn+eQf9+Q7GWxVNMk3+uCPWfdXYC4ExGBckIXdFEfH1w==}
+    engines: {node: '>=10.13.0'}
+
+  webpack-virtual-modules@0.6.2:
+    resolution: {integrity: sha512-66/V2i5hQanC51vBQKPH4aI8NMAcBW59FVBs+rC7eGHupMyfn34q7rZIE+ETlJ+XTevqfUhVVBgSUNSW2flEUQ==}
+
+  which@2.0.2:
+    resolution: {integrity: sha512-BLI3Tl1TW3Pvl70l3yq3Y64i+awpwXqsGBYWkkqMtnbXgrMD+yj7rhW0kuEDxzJaYXGjEW5ogapKNMEKNMjibA==}
+    engines: {node: '>= 8'}
+    hasBin: true
+
+  word-wrap@1.2.5:
+    resolution: {integrity: sha512-BN22B5eaMMI9UMtjrGd5g5eCYPpCPDUy0FJXbYsaT5zYxjFOckS53SQDE3pWkVoWpHXVb3BrYcEN4Twa55B5cA==}
+    engines: {node: '>=0.10.0'}
+
+  wrappy@1.0.2:
+    resolution: {integrity: sha512-l4Sp/DRseor9wL6EvV2+TuQn63dMkPjZ/sp9XkghTEbV9KlPS1xUsZ3u7/IQO4wxtcFB4bgpQPRcR3QCvezPcQ==}
+
+  xml-name-validator@4.0.0:
+    resolution: {integrity: sha512-ICP2e+jsHvAj2E2lIHxa5tjXRlKDJo4IdvPvCXbXQGdzSfmSpNVyIKMvoZHjDY9DP0zV17iI85o90vRFXNccRw==}
+    engines: {node: '>=12'}
+
+  yallist@3.1.1:
+    resolution: {integrity: sha512-a4UGQaWPH59mOXUYnAG2ewncQS4i4F43Tv3JoAM+s2VDAmS9NsK8GpDMLrCHPksFT7h3K6TOoUNn2pb7RoXx4g==}
+
+  yocto-queue@0.1.0:
+    resolution: {integrity: sha512-rVksvsnNCdJ/ohGc6xgPwyN8eheCxsiLM8mxuE/t/mOVqJewPuO1miLpTHQiRgTKCLexL4MeAFVagts7HmNZ2Q==}
+    engines: {node: '>=10'}
+
+snapshots:
+
+  '@ampproject/remapping@2.3.0':
+    dependencies:
+      '@jridgewell/gen-mapping': 0.3.5
+      '@jridgewell/trace-mapping': 0.3.25
+
+  '@antfu/install-pkg@0.4.1':
+    dependencies:
+      package-manager-detector: 0.2.0
+      tinyexec: 0.3.0
+
+  '@antfu/utils@0.7.10': {}
+
+  '@babel/code-frame@7.24.7':
+    dependencies:
+      '@babel/highlight': 7.24.7
+      picocolors: 1.1.0
+
+  '@babel/compat-data@7.25.4': {}
+
+  '@babel/core@7.25.2':
+    dependencies:
+      '@ampproject/remapping': 2.3.0
+      '@babel/code-frame': 7.24.7
+      '@babel/generator': 7.25.6
+      '@babel/helper-compilation-targets': 7.25.2
+      '@babel/helper-module-transforms': 7.25.2(@babel/core@7.25.2)
+      '@babel/helpers': 7.25.6
+      '@babel/parser': 7.25.6
+      '@babel/template': 7.25.0
+      '@babel/traverse': 7.25.6
+      '@babel/types': 7.25.6
+      convert-source-map: 2.0.0
+      debug: 4.3.6
+      gensync: 1.0.0-beta.2
+      json5: 2.2.3
+      semver: 6.3.1
+    transitivePeerDependencies:
+      - supports-color
+
+  '@babel/generator@7.25.6':
+    dependencies:
+      '@babel/types': 7.25.6
+      '@jridgewell/gen-mapping': 0.3.5
+      '@jridgewell/trace-mapping': 0.3.25
+      jsesc: 2.5.2
+
+  '@babel/helper-annotate-as-pure@7.24.7':
+    dependencies:
+      '@babel/types': 7.25.6
+
+  '@babel/helper-compilation-targets@7.25.2':
+    dependencies:
+      '@babel/compat-data': 7.25.4
+      '@babel/helper-validator-option': 7.24.8
+      browserslist: 4.23.3
+      lru-cache: 5.1.1
+      semver: 6.3.1
+
+  '@babel/helper-create-class-features-plugin@7.25.4(@babel/core@7.25.2)':
+    dependencies:
+      '@babel/core': 7.25.2
+      '@babel/helper-annotate-as-pure': 7.24.7
+      '@babel/helper-member-expression-to-functions': 7.24.8
+      '@babel/helper-optimise-call-expression': 7.24.7
+      '@babel/helper-replace-supers': 7.25.0(@babel/core@7.25.2)
+      '@babel/helper-skip-transparent-expression-wrappers': 7.24.7
+      '@babel/traverse': 7.25.6
+      semver: 6.3.1
+    transitivePeerDependencies:
+      - supports-color
+
+  '@babel/helper-member-expression-to-functions@7.24.8':
+    dependencies:
+      '@babel/traverse': 7.25.6
+      '@babel/types': 7.25.6
+    transitivePeerDependencies:
+      - supports-color
+
+  '@babel/helper-module-imports@7.22.15':
+    dependencies:
+      '@babel/types': 7.25.6
+
+  '@babel/helper-module-imports@7.24.7':
+    dependencies:
+      '@babel/traverse': 7.25.6
+      '@babel/types': 7.25.6
+    transitivePeerDependencies:
+      - supports-color
+
+  '@babel/helper-module-transforms@7.25.2(@babel/core@7.25.2)':
+    dependencies:
+      '@babel/core': 7.25.2
+      '@babel/helper-module-imports': 7.24.7
+      '@babel/helper-simple-access': 7.24.7
+      '@babel/helper-validator-identifier': 7.24.7
+      '@babel/traverse': 7.25.6
+    transitivePeerDependencies:
+      - supports-color
+
+  '@babel/helper-optimise-call-expression@7.24.7':
+    dependencies:
+      '@babel/types': 7.25.6
+
+  '@babel/helper-plugin-utils@7.24.8': {}
+
+  '@babel/helper-replace-supers@7.25.0(@babel/core@7.25.2)':
+    dependencies:
+      '@babel/core': 7.25.2
+      '@babel/helper-member-expression-to-functions': 7.24.8
+      '@babel/helper-optimise-call-expression': 7.24.7
+      '@babel/traverse': 7.25.6
+    transitivePeerDependencies:
+      - supports-color
+
+  '@babel/helper-simple-access@7.24.7':
+    dependencies:
+      '@babel/traverse': 7.25.6
+      '@babel/types': 7.25.6
+    transitivePeerDependencies:
+      - supports-color
+
+  '@babel/helper-skip-transparent-expression-wrappers@7.24.7':
+    dependencies:
+      '@babel/traverse': 7.25.6
+      '@babel/types': 7.25.6
+    transitivePeerDependencies:
+      - supports-color
+
+  '@babel/helper-string-parser@7.24.8': {}
+
+  '@babel/helper-validator-identifier@7.24.7': {}
+
+  '@babel/helper-validator-option@7.24.8': {}
+
+  '@babel/helpers@7.25.6':
+    dependencies:
+      '@babel/template': 7.25.0
+      '@babel/types': 7.25.6
+
+  '@babel/highlight@7.24.7':
+    dependencies:
+      '@babel/helper-validator-identifier': 7.24.7
+      chalk: 2.4.2
+      js-tokens: 4.0.0
+      picocolors: 1.1.0
+
+  '@babel/parser@7.25.6':
+    dependencies:
+      '@babel/types': 7.25.6
+
+  '@babel/plugin-proposal-decorators@7.24.7(@babel/core@7.25.2)':
+    dependencies:
+      '@babel/core': 7.25.2
+      '@babel/helper-create-class-features-plugin': 7.25.4(@babel/core@7.25.2)
+      '@babel/helper-plugin-utils': 7.24.8
+      '@babel/plugin-syntax-decorators': 7.24.7(@babel/core@7.25.2)
+    transitivePeerDependencies:
+      - supports-color
+
+  '@babel/plugin-syntax-decorators@7.24.7(@babel/core@7.25.2)':
+    dependencies:
+      '@babel/core': 7.25.2
+      '@babel/helper-plugin-utils': 7.24.8
+
+  '@babel/plugin-syntax-import-attributes@7.25.6(@babel/core@7.25.2)':
+    dependencies:
+      '@babel/core': 7.25.2
+      '@babel/helper-plugin-utils': 7.24.8
+
+  '@babel/plugin-syntax-import-meta@7.10.4(@babel/core@7.25.2)':
+    dependencies:
+      '@babel/core': 7.25.2
+      '@babel/helper-plugin-utils': 7.24.8
+
+  '@babel/plugin-syntax-jsx@7.24.7(@babel/core@7.25.2)':
+    dependencies:
+      '@babel/core': 7.25.2
+      '@babel/helper-plugin-utils': 7.24.8
+
+  '@babel/plugin-syntax-typescript@7.25.4(@babel/core@7.25.2)':
+    dependencies:
+      '@babel/core': 7.25.2
+      '@babel/helper-plugin-utils': 7.24.8
+
+  '@babel/plugin-transform-typescript@7.25.2(@babel/core@7.25.2)':
+    dependencies:
+      '@babel/core': 7.25.2
+      '@babel/helper-annotate-as-pure': 7.24.7
+      '@babel/helper-create-class-features-plugin': 7.25.4(@babel/core@7.25.2)
+      '@babel/helper-plugin-utils': 7.24.8
+      '@babel/helper-skip-transparent-expression-wrappers': 7.24.7
+      '@babel/plugin-syntax-typescript': 7.25.4(@babel/core@7.25.2)
+    transitivePeerDependencies:
+      - supports-color
+
+  '@babel/template@7.25.0':
+    dependencies:
+      '@babel/code-frame': 7.24.7
+      '@babel/parser': 7.25.6
+      '@babel/types': 7.25.6
+
+  '@babel/traverse@7.25.6':
+    dependencies:
+      '@babel/code-frame': 7.24.7
+      '@babel/generator': 7.25.6
+      '@babel/parser': 7.25.6
+      '@babel/template': 7.25.0
+      '@babel/types': 7.25.6
+      debug: 4.3.6
+      globals: 11.12.0
+    transitivePeerDependencies:
+      - supports-color
+
+  '@babel/types@7.25.6':
+    dependencies:
+      '@babel/helper-string-parser': 7.24.8
+      '@babel/helper-validator-identifier': 7.24.7
+      to-fast-properties: 2.0.0
+
+  '@ctrl/tinycolor@3.6.1': {}
+
+  '@element-plus/icons-vue@2.3.1(vue@3.5.0)':
+    dependencies:
+      vue: 3.5.0
+
+  '@esbuild/aix-ppc64@0.21.5':
+    optional: true
+
+  '@esbuild/android-arm64@0.21.5':
+    optional: true
+
+  '@esbuild/android-arm@0.21.5':
+    optional: true
+
+  '@esbuild/android-x64@0.21.5':
+    optional: true
+
+  '@esbuild/darwin-arm64@0.21.5':
+    optional: true
+
+  '@esbuild/darwin-x64@0.21.5':
+    optional: true
+
+  '@esbuild/freebsd-arm64@0.21.5':
+    optional: true
+
+  '@esbuild/freebsd-x64@0.21.5':
+    optional: true
+
+  '@esbuild/linux-arm64@0.21.5':
+    optional: true
+
+  '@esbuild/linux-arm@0.21.5':
+    optional: true
+
+  '@esbuild/linux-ia32@0.21.5':
+    optional: true
+
+  '@esbuild/linux-loong64@0.21.5':
+    optional: true
+
+  '@esbuild/linux-mips64el@0.21.5':
+    optional: true
+
+  '@esbuild/linux-ppc64@0.21.5':
+    optional: true
+
+  '@esbuild/linux-riscv64@0.21.5':
+    optional: true
+
+  '@esbuild/linux-s390x@0.21.5':
+    optional: true
+
+  '@esbuild/linux-x64@0.21.5':
+    optional: true
+
+  '@esbuild/netbsd-x64@0.21.5':
+    optional: true
+
+  '@esbuild/openbsd-x64@0.21.5':
+    optional: true
+
+  '@esbuild/sunos-x64@0.21.5':
+    optional: true
+
+  '@esbuild/win32-arm64@0.21.5':
+    optional: true
+
+  '@esbuild/win32-ia32@0.21.5':
+    optional: true
+
+  '@esbuild/win32-x64@0.21.5':
+    optional: true
+
+  '@eslint-community/eslint-utils@4.4.0(eslint@8.57.0)':
+    dependencies:
+      eslint: 8.57.0
+      eslint-visitor-keys: 3.4.3
+
+  '@eslint-community/regexpp@4.11.0': {}
+
+  '@eslint/eslintrc@2.1.4':
+    dependencies:
+      ajv: 6.12.6
+      debug: 4.3.6
+      espree: 9.6.1
+      globals: 13.24.0
+      ignore: 5.3.2
+      import-fresh: 3.3.0
+      js-yaml: 4.1.0
+      minimatch: 3.1.2
+      strip-json-comments: 3.1.1
+    transitivePeerDependencies:
+      - supports-color
+
+  '@eslint/js@8.57.0': {}
+
+  '@floating-ui/core@1.6.7':
+    dependencies:
+      '@floating-ui/utils': 0.2.7
+
+  '@floating-ui/dom@1.6.10':
+    dependencies:
+      '@floating-ui/core': 1.6.7
+      '@floating-ui/utils': 0.2.7
+
+  '@floating-ui/utils@0.2.7': {}
+
+  '@humanwhocodes/config-array@0.11.14':
+    dependencies:
+      '@humanwhocodes/object-schema': 2.0.3
+      debug: 4.3.6
+      minimatch: 3.1.2
+    transitivePeerDependencies:
+      - supports-color
+
+  '@humanwhocodes/module-importer@1.0.1': {}
+
+  '@humanwhocodes/object-schema@2.0.3': {}
+
+  '@iconify-json/fluent@1.2.1':
+    dependencies:
+      '@iconify/types': 2.0.0
+
+  '@iconify-json/material-symbols@1.2.1':
+    dependencies:
+      '@iconify/types': 2.0.0
+
+  '@iconify/types@2.0.0': {}
+
+  '@iconify/utils@2.1.32':
+    dependencies:
+      '@antfu/install-pkg': 0.4.1
+      '@antfu/utils': 0.7.10
+      '@iconify/types': 2.0.0
+      debug: 4.3.6
+      kolorist: 1.8.0
+      local-pkg: 0.5.0
+      mlly: 1.7.1
+    transitivePeerDependencies:
+      - supports-color
+
+  '@intlify/core-base@11.0.1':
+    dependencies:
+      '@intlify/message-compiler': 11.0.1
+      '@intlify/shared': 11.0.1
+
+  '@intlify/message-compiler@11.0.1':
+    dependencies:
+      '@intlify/shared': 11.0.1
+      source-map-js: 1.2.0
+
+  '@intlify/shared@11.0.1': {}
+
+  '@jridgewell/gen-mapping@0.3.5':
+    dependencies:
+      '@jridgewell/set-array': 1.2.1
+      '@jridgewell/sourcemap-codec': 1.5.0
+      '@jridgewell/trace-mapping': 0.3.25
+
+  '@jridgewell/resolve-uri@3.1.2': {}
+
+  '@jridgewell/set-array@1.2.1': {}
+
+  '@jridgewell/sourcemap-codec@1.5.0': {}
+
+  '@jridgewell/trace-mapping@0.3.25':
+    dependencies:
+      '@jridgewell/resolve-uri': 3.1.2
+      '@jridgewell/sourcemap-codec': 1.5.0
+
+  '@microsoft/fetch-event-source@2.0.1': {}
+
+  '@nodelib/fs.scandir@2.1.5':
+    dependencies:
+      '@nodelib/fs.stat': 2.0.5
+      run-parallel: 1.2.0
+
+  '@nodelib/fs.stat@2.0.5': {}
+
+  '@nodelib/fs.walk@1.2.8':
+    dependencies:
+      '@nodelib/fs.scandir': 2.1.5
+      fastq: 1.17.1
+
+  '@pkgr/core@0.1.1': {}
+
+  '@polka/url@1.0.0-next.25': {}
+
+  '@protobufjs/aspromise@1.1.2': {}
+
+  '@protobufjs/base64@1.1.2': {}
+
+  '@protobufjs/codegen@2.0.4': {}
+
+  '@protobufjs/eventemitter@1.1.0': {}
+
+  '@protobufjs/fetch@1.1.0':
+    dependencies:
+      '@protobufjs/aspromise': 1.1.2
+      '@protobufjs/inquire': 1.1.0
+
+  '@protobufjs/float@1.0.2': {}
+
+  '@protobufjs/inquire@1.1.0': {}
+
+  '@protobufjs/path@1.1.2': {}
+
+  '@protobufjs/pool@1.1.0': {}
+
+  '@protobufjs/utf8@1.1.0': {}
+
+  '@ricky0123/vad-web@0.0.22':
+    dependencies:
+      onnxruntime-web: 1.14.0
+
+  '@rollup/pluginutils@5.1.0(rollup@4.21.2)':
+    dependencies:
+      '@types/estree': 1.0.5
+      estree-walker: 2.0.2
+      picomatch: 2.3.1
+    optionalDependencies:
+      rollup: 4.21.2
+
+  '@rollup/rollup-android-arm-eabi@4.21.2':
+    optional: true
+
+  '@rollup/rollup-android-arm64@4.21.2':
+    optional: true
+
+  '@rollup/rollup-darwin-arm64@4.21.2':
+    optional: true
+
+  '@rollup/rollup-darwin-x64@4.21.2':
+    optional: true
+
+  '@rollup/rollup-linux-arm-gnueabihf@4.21.2':
+    optional: true
+
+  '@rollup/rollup-linux-arm-musleabihf@4.21.2':
+    optional: true
+
+  '@rollup/rollup-linux-arm64-gnu@4.21.2':
+    optional: true
+
+  '@rollup/rollup-linux-arm64-musl@4.21.2':
+    optional: true
+
+  '@rollup/rollup-linux-powerpc64le-gnu@4.21.2':
+    optional: true
+
+  '@rollup/rollup-linux-riscv64-gnu@4.21.2':
+    optional: true
+
+  '@rollup/rollup-linux-s390x-gnu@4.21.2':
+    optional: true
+
+  '@rollup/rollup-linux-x64-gnu@4.21.2':
+    optional: true
+
+  '@rollup/rollup-linux-x64-musl@4.21.2':
+    optional: true
+
+  '@rollup/rollup-win32-arm64-msvc@4.21.2':
+    optional: true
+
+  '@rollup/rollup-win32-ia32-msvc@4.21.2':
+    optional: true
+
+  '@rollup/rollup-win32-x64-msvc@4.21.2':
+    optional: true
+
+  '@rushstack/eslint-patch@1.10.4': {}
+
+  '@sxzz/popperjs-es@2.11.7': {}
+
+  '@types/estree@1.0.5': {}
+
+  '@types/lodash-es@4.17.12':
+    dependencies:
+      '@types/lodash': 4.17.7
+
+  '@types/lodash@4.17.7': {}
+
+  '@types/long@4.0.2': {}
+
+  '@types/node@22.10.3':
+    dependencies:
+      undici-types: 6.20.0
+
+  '@types/web-bluetooth@0.0.16': {}
+
+  '@types/web-bluetooth@0.0.20': {}
+
+  '@ungap/structured-clone@1.2.0': {}
+
+  '@vitejs/plugin-vue@5.1.3(vite@5.4.3(@types/node@22.10.3)(less@4.2.0))(vue@3.5.0)':
+    dependencies:
+      vite: 5.4.3(@types/node@22.10.3)(less@4.2.0)
+      vue: 3.5.0
+
+  '@vue/babel-helper-vue-transform-on@1.2.2': {}
+
+  '@vue/babel-plugin-jsx@1.2.2(@babel/core@7.25.2)':
+    dependencies:
+      '@babel/helper-module-imports': 7.22.15
+      '@babel/helper-plugin-utils': 7.24.8
+      '@babel/plugin-syntax-jsx': 7.24.7(@babel/core@7.25.2)
+      '@babel/template': 7.25.0
+      '@babel/traverse': 7.25.6
+      '@babel/types': 7.25.6
+      '@vue/babel-helper-vue-transform-on': 1.2.2
+      '@vue/babel-plugin-resolve-type': 1.2.2(@babel/core@7.25.2)
+      camelcase: 6.3.0
+      html-tags: 3.3.1
+      svg-tags: 1.0.0
+    optionalDependencies:
+      '@babel/core': 7.25.2
+    transitivePeerDependencies:
+      - supports-color
+
+  '@vue/babel-plugin-resolve-type@1.2.2(@babel/core@7.25.2)':
+    dependencies:
+      '@babel/code-frame': 7.24.7
+      '@babel/core': 7.25.2
+      '@babel/helper-module-imports': 7.22.15
+      '@babel/helper-plugin-utils': 7.24.8
+      '@babel/parser': 7.25.6
+      '@vue/compiler-sfc': 3.5.0
+
+  '@vue/compiler-core@3.5.0':
+    dependencies:
+      '@babel/parser': 7.25.6
+      '@vue/shared': 3.5.0
+      entities: 4.5.0
+      estree-walker: 2.0.2
+      source-map-js: 1.2.0
+
+  '@vue/compiler-dom@3.5.0':
+    dependencies:
+      '@vue/compiler-core': 3.5.0
+      '@vue/shared': 3.5.0
+
+  '@vue/compiler-sfc@3.5.0':
+    dependencies:
+      '@babel/parser': 7.25.6
+      '@vue/compiler-core': 3.5.0
+      '@vue/compiler-dom': 3.5.0
+      '@vue/compiler-ssr': 3.5.0
+      '@vue/shared': 3.5.0
+      estree-walker: 2.0.2
+      magic-string: 0.30.11
+      postcss: 8.4.44
+      source-map-js: 1.2.0
+
+  '@vue/compiler-ssr@3.5.0':
+    dependencies:
+      '@vue/compiler-dom': 3.5.0
+      '@vue/shared': 3.5.0
+
+  '@vue/devtools-api@6.6.3': {}
+
+  '@vue/devtools-core@7.4.0(vite@5.4.3(@types/node@22.10.3)(less@4.2.0))(vue@3.5.0)':
+    dependencies:
+      '@vue/devtools-kit': 7.4.0
+      '@vue/devtools-shared': 7.4.0
+      mitt: 3.0.1
+      nanoid: 3.3.7
+      pathe: 1.1.2
+      vite-hot-client: 0.2.3(vite@5.4.3(@types/node@22.10.3)(less@4.2.0))
+      vue: 3.5.0
+    transitivePeerDependencies:
+      - vite
+
+  '@vue/devtools-kit@7.4.0':
+    dependencies:
+      '@vue/devtools-shared': 7.4.0
+      birpc: 0.2.17
+      hookable: 5.5.3
+      mitt: 3.0.1
+      perfect-debounce: 1.0.0
+      speakingurl: 14.0.1
+      superjson: 2.2.1
+
+  '@vue/devtools-shared@7.4.0':
+    dependencies:
+      rfdc: 1.4.1
+
+  '@vue/eslint-config-prettier@9.0.0(eslint@8.57.0)(prettier@3.3.3)':
+    dependencies:
+      eslint: 8.57.0
+      eslint-config-prettier: 9.1.0(eslint@8.57.0)
+      eslint-plugin-prettier: 5.2.1(eslint-config-prettier@9.1.0(eslint@8.57.0))(eslint@8.57.0)(prettier@3.3.3)
+      prettier: 3.3.3
+    transitivePeerDependencies:
+      - '@types/eslint'
+
+  '@vue/reactivity@3.5.0':
+    dependencies:
+      '@vue/shared': 3.5.0
+
+  '@vue/runtime-core@3.5.0':
+    dependencies:
+      '@vue/reactivity': 3.5.0
+      '@vue/shared': 3.5.0
+
+  '@vue/runtime-dom@3.5.0':
+    dependencies:
+      '@vue/reactivity': 3.5.0
+      '@vue/runtime-core': 3.5.0
+      '@vue/shared': 3.5.0
+      csstype: 3.1.3
+
+  '@vue/server-renderer@3.5.0(vue@3.5.0)':
+    dependencies:
+      '@vue/compiler-ssr': 3.5.0
+      '@vue/shared': 3.5.0
+      vue: 3.5.0
+
+  '@vue/shared@3.5.0': {}
+
+  '@vueuse/core@11.0.3(vue@3.5.0)':
+    dependencies:
+      '@types/web-bluetooth': 0.0.20
+      '@vueuse/metadata': 11.0.3
+      '@vueuse/shared': 11.0.3(vue@3.5.0)
+      vue-demi: 0.14.10(vue@3.5.0)
+    transitivePeerDependencies:
+      - '@vue/composition-api'
+      - vue
+
+  '@vueuse/core@9.13.0(vue@3.5.0)':
+    dependencies:
+      '@types/web-bluetooth': 0.0.16
+      '@vueuse/metadata': 9.13.0
+      '@vueuse/shared': 9.13.0(vue@3.5.0)
+      vue-demi: 0.14.10(vue@3.5.0)
+    transitivePeerDependencies:
+      - '@vue/composition-api'
+      - vue
+
+  '@vueuse/metadata@11.0.3': {}
+
+  '@vueuse/metadata@9.13.0': {}
+
+  '@vueuse/shared@11.0.3(vue@3.5.0)':
+    dependencies:
+      vue-demi: 0.14.10(vue@3.5.0)
+    transitivePeerDependencies:
+      - '@vue/composition-api'
+      - vue
+
+  '@vueuse/shared@9.13.0(vue@3.5.0)':
+    dependencies:
+      vue-demi: 0.14.10(vue@3.5.0)
+    transitivePeerDependencies:
+      - '@vue/composition-api'
+      - vue
+
+  acorn-jsx@5.3.2(acorn@8.12.1):
+    dependencies:
+      acorn: 8.12.1
+
+  acorn@8.12.1: {}
+
+  ajv@6.12.6:
+    dependencies:
+      fast-deep-equal: 3.1.3
+      fast-json-stable-stringify: 2.1.0
+      json-schema-traverse: 0.4.1
+      uri-js: 4.4.1
+
+  ansi-regex@5.0.1: {}
+
+  ansi-styles@3.2.1:
+    dependencies:
+      color-convert: 1.9.3
+
+  ansi-styles@4.3.0:
+    dependencies:
+      color-convert: 2.0.1
+
+  anymatch@3.1.3:
+    dependencies:
+      normalize-path: 3.0.0
+      picomatch: 2.3.1
+
+  argparse@2.0.1: {}
+
+  async-validator@4.2.5: {}
+
+  asynckit@0.4.0: {}
+
+  axios@1.7.7:
+    dependencies:
+      follow-redirects: 1.15.8
+      form-data: 4.0.0
+      proxy-from-env: 1.1.0
+    transitivePeerDependencies:
+      - debug
+
+  balanced-match@1.0.2: {}
+
+  binary-extensions@2.3.0: {}
+
+  birpc@0.2.17: {}
+
+  boolbase@1.0.0: {}
+
+  brace-expansion@1.1.11:
+    dependencies:
+      balanced-match: 1.0.2
+      concat-map: 0.0.1
+
+  brace-expansion@2.0.1:
+    dependencies:
+      balanced-match: 1.0.2
+
+  braces@3.0.3:
+    dependencies:
+      fill-range: 7.1.1
+
+  browserslist@4.23.3:
+    dependencies:
+      caniuse-lite: 1.0.30001655
+      electron-to-chromium: 1.5.13
+      node-releases: 2.0.18
+      update-browserslist-db: 1.1.0(browserslist@4.23.3)
+
+  bundle-name@4.1.0:
+    dependencies:
+      run-applescript: 7.0.0
+
+  callsites@3.1.0: {}
+
+  camelcase@6.3.0: {}
+
+  caniuse-lite@1.0.30001655: {}
+
+  chalk@2.4.2:
+    dependencies:
+      ansi-styles: 3.2.1
+      escape-string-regexp: 1.0.5
+      supports-color: 5.5.0
+
+  chalk@4.1.2:
+    dependencies:
+      ansi-styles: 4.3.0
+      supports-color: 7.2.0
+
+  chokidar@3.6.0:
+    dependencies:
+      anymatch: 3.1.3
+      braces: 3.0.3
+      glob-parent: 5.1.2
+      is-binary-path: 2.1.0
+      is-glob: 4.0.3
+      normalize-path: 3.0.0
+      readdirp: 3.6.0
+    optionalDependencies:
+      fsevents: 2.3.3
+
+  clipboard@2.0.11:
+    dependencies:
+      good-listener: 1.2.2
+      select: 1.1.2
+      tiny-emitter: 2.1.0
+
+  color-convert@1.9.3:
+    dependencies:
+      color-name: 1.1.3
+
+  color-convert@2.0.1:
+    dependencies:
+      color-name: 1.1.4
+
+  color-name@1.1.3: {}
+
+  color-name@1.1.4: {}
+
+  combined-stream@1.0.8:
+    dependencies:
+      delayed-stream: 1.0.0
+
+  concat-map@0.0.1: {}
+
+  confbox@0.1.7: {}
+
+  convert-source-map@2.0.0: {}
+
+  copy-anything@2.0.6:
+    dependencies:
+      is-what: 3.14.1
+
+  copy-anything@3.0.5:
+    dependencies:
+      is-what: 4.1.16
+
+  core-js@3.38.1: {}
+
+  cross-spawn@7.0.3:
+    dependencies:
+      path-key: 3.1.1
+      shebang-command: 2.0.0
+      which: 2.0.2
+
+  cssesc@3.0.0: {}
+
+  csstype@3.1.3: {}
+
+  dayjs@1.11.13: {}
+
+  debug@4.3.6:
+    dependencies:
+      ms: 2.1.2
+
+  deep-is@0.1.4: {}
+
+  default-browser-id@5.0.0: {}
+
+  default-browser@5.2.1:
+    dependencies:
+      bundle-name: 4.1.0
+      default-browser-id: 5.0.0
+
+  define-lazy-prop@3.0.0: {}
+
+  delayed-stream@1.0.0: {}
+
+  delegate@3.2.0: {}
+
+  doctrine@3.0.0:
+    dependencies:
+      esutils: 2.0.3
+
+  el-table-infinite-scroll@3.0.6:
+    dependencies:
+      core-js: 3.38.1
+      element-plus: 2.8.1(vue@3.5.0)
+      vue: 3.5.0
+    transitivePeerDependencies:
+      - '@vue/composition-api'
+      - typescript
+
+  electron-to-chromium@1.5.13: {}
+
+  element-plus@2.8.1(vue@3.5.0):
+    dependencies:
+      '@ctrl/tinycolor': 3.6.1
+      '@element-plus/icons-vue': 2.3.1(vue@3.5.0)
+      '@floating-ui/dom': 1.6.10
+      '@popperjs/core': '@sxzz/popperjs-es@2.11.7'
+      '@types/lodash': 4.17.7
+      '@types/lodash-es': 4.17.12
+      '@vueuse/core': 9.13.0(vue@3.5.0)
+      async-validator: 4.2.5
+      dayjs: 1.11.13
+      escape-html: 1.0.3
+      lodash: 4.17.21
+      lodash-es: 4.17.21
+      lodash-unified: 1.0.3(@types/lodash-es@4.17.12)(lodash-es@4.17.21)(lodash@4.17.21)
+      memoize-one: 6.0.0
+      normalize-wheel-es: 1.2.0
+      vue: 3.5.0
+    transitivePeerDependencies:
+      - '@vue/composition-api'
+
+  entities@4.5.0: {}
+
+  errno@0.1.8:
+    dependencies:
+      prr: 1.0.1
+    optional: true
+
+  error-stack-parser-es@0.1.5: {}
+
+  esbuild@0.21.5:
+    optionalDependencies:
+      '@esbuild/aix-ppc64': 0.21.5
+      '@esbuild/android-arm': 0.21.5
+      '@esbuild/android-arm64': 0.21.5
+      '@esbuild/android-x64': 0.21.5
+      '@esbuild/darwin-arm64': 0.21.5
+      '@esbuild/darwin-x64': 0.21.5
+      '@esbuild/freebsd-arm64': 0.21.5
+      '@esbuild/freebsd-x64': 0.21.5
+      '@esbuild/linux-arm': 0.21.5
+      '@esbuild/linux-arm64': 0.21.5
+      '@esbuild/linux-ia32': 0.21.5
+      '@esbuild/linux-loong64': 0.21.5
+      '@esbuild/linux-mips64el': 0.21.5
+      '@esbuild/linux-ppc64': 0.21.5
+      '@esbuild/linux-riscv64': 0.21.5
+      '@esbuild/linux-s390x': 0.21.5
+      '@esbuild/linux-x64': 0.21.5
+      '@esbuild/netbsd-x64': 0.21.5
+      '@esbuild/openbsd-x64': 0.21.5
+      '@esbuild/sunos-x64': 0.21.5
+      '@esbuild/win32-arm64': 0.21.5
+      '@esbuild/win32-ia32': 0.21.5
+      '@esbuild/win32-x64': 0.21.5
+
+  escalade@3.2.0: {}
+
+  escape-html@1.0.3: {}
+
+  escape-string-regexp@1.0.5: {}
+
+  escape-string-regexp@4.0.0: {}
+
+  escape-string-regexp@5.0.0: {}
+
+  eslint-config-prettier@9.1.0(eslint@8.57.0):
+    dependencies:
+      eslint: 8.57.0
+
+  eslint-plugin-prettier@5.2.1(eslint-config-prettier@9.1.0(eslint@8.57.0))(eslint@8.57.0)(prettier@3.3.3):
+    dependencies:
+      eslint: 8.57.0
+      prettier: 3.3.3
+      prettier-linter-helpers: 1.0.0
+      synckit: 0.9.1
+    optionalDependencies:
+      eslint-config-prettier: 9.1.0(eslint@8.57.0)
+
+  eslint-plugin-vue@9.28.0(eslint@8.57.0):
+    dependencies:
+      '@eslint-community/eslint-utils': 4.4.0(eslint@8.57.0)
+      eslint: 8.57.0
+      globals: 13.24.0
+      natural-compare: 1.4.0
+      nth-check: 2.1.1
+      postcss-selector-parser: 6.1.2
+      semver: 7.6.3
+      vue-eslint-parser: 9.4.3(eslint@8.57.0)
+      xml-name-validator: 4.0.0
+    transitivePeerDependencies:
+      - supports-color
+
+  eslint-scope@7.2.2:
+    dependencies:
+      esrecurse: 4.3.0
+      estraverse: 5.3.0
+
+  eslint-visitor-keys@3.4.3: {}
+
+  eslint@8.57.0:
+    dependencies:
+      '@eslint-community/eslint-utils': 4.4.0(eslint@8.57.0)
+      '@eslint-community/regexpp': 4.11.0
+      '@eslint/eslintrc': 2.1.4
+      '@eslint/js': 8.57.0
+      '@humanwhocodes/config-array': 0.11.14
+      '@humanwhocodes/module-importer': 1.0.1
+      '@nodelib/fs.walk': 1.2.8
+      '@ungap/structured-clone': 1.2.0
+      ajv: 6.12.6
+      chalk: 4.1.2
+      cross-spawn: 7.0.3
+      debug: 4.3.6
+      doctrine: 3.0.0
+      escape-string-regexp: 4.0.0
+      eslint-scope: 7.2.2
+      eslint-visitor-keys: 3.4.3
+      espree: 9.6.1
+      esquery: 1.6.0
+      esutils: 2.0.3
+      fast-deep-equal: 3.1.3
+      file-entry-cache: 6.0.1
+      find-up: 5.0.0
+      glob-parent: 6.0.2
+      globals: 13.24.0
+      graphemer: 1.4.0
+      ignore: 5.3.2
+      imurmurhash: 0.1.4
+      is-glob: 4.0.3
+      is-path-inside: 3.0.3
+      js-yaml: 4.1.0
+      json-stable-stringify-without-jsonify: 1.0.1
+      levn: 0.4.1
+      lodash.merge: 4.6.2
+      minimatch: 3.1.2
+      natural-compare: 1.4.0
+      optionator: 0.9.4
+      strip-ansi: 6.0.1
+      text-table: 0.2.0
+    transitivePeerDependencies:
+      - supports-color
+
+  espree@9.6.1:
+    dependencies:
+      acorn: 8.12.1
+      acorn-jsx: 5.3.2(acorn@8.12.1)
+      eslint-visitor-keys: 3.4.3
+
+  esquery@1.6.0:
+    dependencies:
+      estraverse: 5.3.0
+
+  esrecurse@4.3.0:
+    dependencies:
+      estraverse: 5.3.0
+
+  estraverse@5.3.0: {}
+
+  estree-walker@2.0.2: {}
+
+  estree-walker@3.0.3:
+    dependencies:
+      '@types/estree': 1.0.5
+
+  esutils@2.0.3: {}
+
+  execa@8.0.1:
+    dependencies:
+      cross-spawn: 7.0.3
+      get-stream: 8.0.1
+      human-signals: 5.0.0
+      is-stream: 3.0.0
+      merge-stream: 2.0.0
+      npm-run-path: 5.3.0
+      onetime: 6.0.0
+      signal-exit: 4.1.0
+      strip-final-newline: 3.0.0
+
+  fast-deep-equal@3.1.3: {}
+
+  fast-diff@1.3.0: {}
+
+  fast-glob@3.3.2:
+    dependencies:
+      '@nodelib/fs.stat': 2.0.5
+      '@nodelib/fs.walk': 1.2.8
+      glob-parent: 5.1.2
+      merge2: 1.4.1
+      micromatch: 4.0.8
+
+  fast-json-stable-stringify@2.1.0: {}
+
+  fast-levenshtein@2.0.6: {}
+
+  fastq@1.17.1:
+    dependencies:
+      reusify: 1.0.4
+
+  file-entry-cache@6.0.1:
+    dependencies:
+      flat-cache: 3.2.0
+
+  fill-range@7.1.1:
+    dependencies:
+      to-regex-range: 5.0.1
+
+  find-up@5.0.0:
+    dependencies:
+      locate-path: 6.0.0
+      path-exists: 4.0.0
+
+  flat-cache@3.2.0:
+    dependencies:
+      flatted: 3.3.1
+      keyv: 4.5.4
+      rimraf: 3.0.2
+
+  flatbuffers@1.12.0: {}
+
+  flatted@3.3.1: {}
+
+  follow-redirects@1.15.8: {}
+
+  form-data@4.0.0:
+    dependencies:
+      asynckit: 0.4.0
+      combined-stream: 1.0.8
+      mime-types: 2.1.35
+
+  fs-extra@11.2.0:
+    dependencies:
+      graceful-fs: 4.2.11
+      jsonfile: 6.1.0
+      universalify: 2.0.1
+
+  fs.realpath@1.0.0: {}
+
+  fsevents@2.3.3:
+    optional: true
+
+  gensync@1.0.0-beta.2: {}
+
+  get-stream@8.0.1: {}
+
+  glob-parent@5.1.2:
+    dependencies:
+      is-glob: 4.0.3
+
+  glob-parent@6.0.2:
+    dependencies:
+      is-glob: 4.0.3
+
+  glob@7.2.3:
+    dependencies:
+      fs.realpath: 1.0.0
+      inflight: 1.0.6
+      inherits: 2.0.4
+      minimatch: 3.1.2
+      once: 1.4.0
+      path-is-absolute: 1.0.1
+
+  globals@11.12.0: {}
+
+  globals@13.24.0:
+    dependencies:
+      type-fest: 0.20.2
+
+  good-listener@1.2.2:
+    dependencies:
+      delegate: 3.2.0
+
+  graceful-fs@4.2.11: {}
+
+  graphemer@1.4.0: {}
+
+  guid-typescript@1.0.9: {}
+
+  has-flag@3.0.0: {}
+
+  has-flag@4.0.0: {}
+
+  hookable@5.5.3: {}
+
+  html-tags@3.3.1: {}
+
+  human-signals@5.0.0: {}
+
+  husky@9.1.5: {}
+
+  iconv-lite@0.6.3:
+    dependencies:
+      safer-buffer: 2.1.2
+    optional: true
+
+  ignore@5.3.2: {}
+
+  image-size@0.5.5:
+    optional: true
+
+  import-fresh@3.3.0:
+    dependencies:
+      parent-module: 1.0.1
+      resolve-from: 4.0.0
+
+  imurmurhash@0.1.4: {}
+
+  inflight@1.0.6:
+    dependencies:
+      once: 1.4.0
+      wrappy: 1.0.2
+
+  inherits@2.0.4: {}
+
+  is-binary-path@2.1.0:
+    dependencies:
+      binary-extensions: 2.3.0
+
+  is-docker@3.0.0: {}
+
+  is-extglob@2.1.1: {}
+
+  is-glob@4.0.3:
+    dependencies:
+      is-extglob: 2.1.1
+
+  is-inside-container@1.0.0:
+    dependencies:
+      is-docker: 3.0.0
+
+  is-number@7.0.0: {}
+
+  is-path-inside@3.0.3: {}
+
+  is-stream@3.0.0: {}
+
+  is-what@3.14.1: {}
+
+  is-what@4.1.16: {}
+
+  is-wsl@3.1.0:
+    dependencies:
+      is-inside-container: 1.0.0
+
+  isexe@2.0.0: {}
+
+  js-tokens@4.0.0: {}
+
+  js-tokens@9.0.0: {}
+
+  js-yaml@4.1.0:
+    dependencies:
+      argparse: 2.0.1
+
+  jsesc@2.5.2: {}
+
+  json-buffer@3.0.1: {}
+
+  json-schema-traverse@0.4.1: {}
+
+  json-stable-stringify-without-jsonify@1.0.1: {}
+
+  json5@2.2.3: {}
+
+  jsonfile@6.1.0:
+    dependencies:
+      universalify: 2.0.1
+    optionalDependencies:
+      graceful-fs: 4.2.11
+
+  keyv@4.5.4:
+    dependencies:
+      json-buffer: 3.0.1
+
+  kolorist@1.8.0: {}
+
+  less@4.2.0:
+    dependencies:
+      copy-anything: 2.0.6
+      parse-node-version: 1.0.1
+      tslib: 2.7.0
+    optionalDependencies:
+      errno: 0.1.8
+      graceful-fs: 4.2.11
+      image-size: 0.5.5
+      make-dir: 2.1.0
+      mime: 1.6.0
+      needle: 3.3.1
+      source-map: 0.6.1
+
+  levn@0.4.1:
+    dependencies:
+      prelude-ls: 1.2.1
+      type-check: 0.4.0
+
+  local-pkg@0.5.0:
+    dependencies:
+      mlly: 1.7.1
+      pkg-types: 1.2.0
+
+  locate-path@6.0.0:
+    dependencies:
+      p-locate: 5.0.0
+
+  lodash-es@4.17.21: {}
+
+  lodash-unified@1.0.3(@types/lodash-es@4.17.12)(lodash-es@4.17.21)(lodash@4.17.21):
+    dependencies:
+      '@types/lodash-es': 4.17.12
+      lodash: 4.17.21
+      lodash-es: 4.17.21
+
+  lodash.merge@4.6.2: {}
+
+  lodash@4.17.21: {}
+
+  long@4.0.0: {}
+
+  lru-cache@5.1.1:
+    dependencies:
+      yallist: 3.1.1
+
+  magic-string@0.30.11:
+    dependencies:
+      '@jridgewell/sourcemap-codec': 1.5.0
+
+  make-dir@2.1.0:
+    dependencies:
+      pify: 4.0.1
+      semver: 5.7.2
+    optional: true
+
+  memoize-one@6.0.0: {}
+
+  merge-stream@2.0.0: {}
+
+  merge2@1.4.1: {}
+
+  micromatch@4.0.8:
+    dependencies:
+      braces: 3.0.3
+      picomatch: 2.3.1
+
+  mime-db@1.52.0: {}
+
+  mime-types@2.1.35:
+    dependencies:
+      mime-db: 1.52.0
+
+  mime@1.6.0:
+    optional: true
+
+  mimic-fn@4.0.0: {}
+
+  minimatch@3.1.2:
+    dependencies:
+      brace-expansion: 1.1.11
+
+  minimatch@9.0.5:
+    dependencies:
+      brace-expansion: 2.0.1
+
+  mitt@3.0.1: {}
+
+  mlly@1.7.1:
+    dependencies:
+      acorn: 8.12.1
+      pathe: 1.1.2
+      pkg-types: 1.2.0
+      ufo: 1.5.4
+
+  mrmime@2.0.0: {}
+
+  ms@2.1.2: {}
+
+  nanoid@3.3.7: {}
+
+  natural-compare@1.4.0: {}
+
+  needle@3.3.1:
+    dependencies:
+      iconv-lite: 0.6.3
+      sax: 1.4.1
+    optional: true
+
+  node-releases@2.0.18: {}
+
+  normalize-path@3.0.0: {}
+
+  normalize-wheel-es@1.2.0: {}
+
+  npm-run-path@5.3.0:
+    dependencies:
+      path-key: 4.0.0
+
+  nth-check@2.1.1:
+    dependencies:
+      boolbase: 1.0.0
+
+  once@1.4.0:
+    dependencies:
+      wrappy: 1.0.2
+
+  onetime@6.0.0:
+    dependencies:
+      mimic-fn: 4.0.0
+
+  onnx-proto@4.0.4:
+    dependencies:
+      protobufjs: 6.11.4
+
+  onnxruntime-common@1.14.0: {}
+
+  onnxruntime-web@1.14.0:
+    dependencies:
+      flatbuffers: 1.12.0
+      guid-typescript: 1.0.9
+      long: 4.0.0
+      onnx-proto: 4.0.4
+      onnxruntime-common: 1.14.0
+      platform: 1.3.6
+
+  open@10.1.0:
+    dependencies:
+      default-browser: 5.2.1
+      define-lazy-prop: 3.0.0
+      is-inside-container: 1.0.0
+      is-wsl: 3.1.0
+
+  optionator@0.9.4:
+    dependencies:
+      deep-is: 0.1.4
+      fast-levenshtein: 2.0.6
+      levn: 0.4.1
+      prelude-ls: 1.2.1
+      type-check: 0.4.0
+      word-wrap: 1.2.5
+
+  p-limit@3.1.0:
+    dependencies:
+      yocto-queue: 0.1.0
+
+  p-locate@5.0.0:
+    dependencies:
+      p-limit: 3.1.0
+
+  package-manager-detector@0.2.0: {}
+
+  parent-module@1.0.1:
+    dependencies:
+      callsites: 3.1.0
+
+  parse-node-version@1.0.1: {}
+
+  path-exists@4.0.0: {}
+
+  path-is-absolute@1.0.1: {}
+
+  path-key@3.1.1: {}
+
+  path-key@4.0.0: {}
+
+  pathe@1.1.2: {}
+
+  perfect-debounce@1.0.0: {}
+
+  picocolors@1.1.0: {}
+
+  picomatch@2.3.1: {}
+
+  pify@4.0.1:
+    optional: true
+
+  pinia@2.2.2(vue@3.5.0):
+    dependencies:
+      '@vue/devtools-api': 6.6.3
+      vue: 3.5.0
+      vue-demi: 0.14.10(vue@3.5.0)
+
+  pkg-types@1.2.0:
+    dependencies:
+      confbox: 0.1.7
+      mlly: 1.7.1
+      pathe: 1.1.2
+
+  platform@1.3.6: {}
+
+  postcss-selector-parser@6.1.2:
+    dependencies:
+      cssesc: 3.0.0
+      util-deprecate: 1.0.2
+
+  postcss@8.4.44:
+    dependencies:
+      nanoid: 3.3.7
+      picocolors: 1.1.0
+      source-map-js: 1.2.0
+
+  prelude-ls@1.2.1: {}
+
+  prettier-linter-helpers@1.0.0:
+    dependencies:
+      fast-diff: 1.3.0
+
+  prettier@3.3.3: {}
+
+  protobufjs@6.11.4:
+    dependencies:
+      '@protobufjs/aspromise': 1.1.2
+      '@protobufjs/base64': 1.1.2
+      '@protobufjs/codegen': 2.0.4
+      '@protobufjs/eventemitter': 1.1.0
+      '@protobufjs/fetch': 1.1.0
+      '@protobufjs/float': 1.0.2
+      '@protobufjs/inquire': 1.1.0
+      '@protobufjs/path': 1.1.2
+      '@protobufjs/pool': 1.1.0
+      '@protobufjs/utf8': 1.1.0
+      '@types/long': 4.0.2
+      '@types/node': 22.10.3
+      long: 4.0.0
+
+  proxy-from-env@1.1.0: {}
+
+  prr@1.0.1:
+    optional: true
+
+  punycode@2.3.1: {}
+
+  queue-microtask@1.2.3: {}
+
+  readdirp@3.6.0:
+    dependencies:
+      picomatch: 2.3.1
+
+  resolve-from@4.0.0: {}
+
+  reusify@1.0.4: {}
+
+  rfdc@1.4.1: {}
+
+  rimraf@3.0.2:
+    dependencies:
+      glob: 7.2.3
+
+  rollup@4.21.2:
+    dependencies:
+      '@types/estree': 1.0.5
+    optionalDependencies:
+      '@rollup/rollup-android-arm-eabi': 4.21.2
+      '@rollup/rollup-android-arm64': 4.21.2
+      '@rollup/rollup-darwin-arm64': 4.21.2
+      '@rollup/rollup-darwin-x64': 4.21.2
+      '@rollup/rollup-linux-arm-gnueabihf': 4.21.2
+      '@rollup/rollup-linux-arm-musleabihf': 4.21.2
+      '@rollup/rollup-linux-arm64-gnu': 4.21.2
+      '@rollup/rollup-linux-arm64-musl': 4.21.2
+      '@rollup/rollup-linux-powerpc64le-gnu': 4.21.2
+      '@rollup/rollup-linux-riscv64-gnu': 4.21.2
+      '@rollup/rollup-linux-s390x-gnu': 4.21.2
+      '@rollup/rollup-linux-x64-gnu': 4.21.2
+      '@rollup/rollup-linux-x64-musl': 4.21.2
+      '@rollup/rollup-win32-arm64-msvc': 4.21.2
+      '@rollup/rollup-win32-ia32-msvc': 4.21.2
+      '@rollup/rollup-win32-x64-msvc': 4.21.2
+      fsevents: 2.3.3
+
+  run-applescript@7.0.0: {}
+
+  run-parallel@1.2.0:
+    dependencies:
+      queue-microtask: 1.2.3
+
+  safer-buffer@2.1.2:
+    optional: true
+
+  sax@1.4.1:
+    optional: true
+
+  scule@1.3.0: {}
+
+  select@1.1.2: {}
+
+  semver@5.7.2:
+    optional: true
+
+  semver@6.3.1: {}
+
+  semver@7.6.3: {}
+
+  shebang-command@2.0.0:
+    dependencies:
+      shebang-regex: 3.0.0
+
+  shebang-regex@3.0.0: {}
+
+  signal-exit@4.1.0: {}
+
+  sirv@2.0.4:
+    dependencies:
+      '@polka/url': 1.0.0-next.25
+      mrmime: 2.0.0
+      totalist: 3.0.1
+
+  source-map-js@1.2.0: {}
+
+  source-map@0.6.1:
+    optional: true
+
+  speakingurl@14.0.1: {}
+
+  strip-ansi@6.0.1:
+    dependencies:
+      ansi-regex: 5.0.1
+
+  strip-final-newline@3.0.0: {}
+
+  strip-json-comments@3.1.1: {}
+
+  strip-literal@2.1.0:
+    dependencies:
+      js-tokens: 9.0.0
+
+  superjson@2.2.1:
+    dependencies:
+      copy-anything: 3.0.5
+
+  supports-color@5.5.0:
+    dependencies:
+      has-flag: 3.0.0
+
+  supports-color@7.2.0:
+    dependencies:
+      has-flag: 4.0.0
+
+  svg-tags@1.0.0: {}
+
+  synckit@0.9.1:
+    dependencies:
+      '@pkgr/core': 0.1.1
+      tslib: 2.7.0
+
+  text-table@0.2.0: {}
+
+  tiny-emitter@2.1.0: {}
+
+  tinyexec@0.3.0: {}
+
+  to-fast-properties@2.0.0: {}
+
+  to-regex-range@5.0.1:
+    dependencies:
+      is-number: 7.0.0
+
+  totalist@3.0.1: {}
+
+  tslib@2.7.0: {}
+
+  type-check@0.4.0:
+    dependencies:
+      prelude-ls: 1.2.1
+
+  type-fest@0.20.2: {}
+
+  ufo@1.5.4: {}
+
+  undici-types@6.20.0: {}
+
+  unimport@3.11.1(rollup@4.21.2):
+    dependencies:
+      '@rollup/pluginutils': 5.1.0(rollup@4.21.2)
+      acorn: 8.12.1
+      escape-string-regexp: 5.0.0
+      estree-walker: 3.0.3
+      fast-glob: 3.3.2
+      local-pkg: 0.5.0
+      magic-string: 0.30.11
+      mlly: 1.7.1
+      pathe: 1.1.2
+      pkg-types: 1.2.0
+      scule: 1.3.0
+      strip-literal: 2.1.0
+      unplugin: 1.12.3
+    transitivePeerDependencies:
+      - rollup
+
+  universalify@2.0.1: {}
+
+  unplugin-auto-import@0.18.2(@vueuse/core@11.0.3(vue@3.5.0))(rollup@4.21.2):
+    dependencies:
+      '@antfu/utils': 0.7.10
+      '@rollup/pluginutils': 5.1.0(rollup@4.21.2)
+      fast-glob: 3.3.2
+      local-pkg: 0.5.0
+      magic-string: 0.30.11
+      minimatch: 9.0.5
+      unimport: 3.11.1(rollup@4.21.2)
+      unplugin: 1.12.3
+    optionalDependencies:
+      '@vueuse/core': 11.0.3(vue@3.5.0)
+    transitivePeerDependencies:
+      - rollup
+
+  unplugin-icons@0.19.3(@vue/compiler-sfc@3.5.0):
+    dependencies:
+      '@antfu/install-pkg': 0.4.1
+      '@antfu/utils': 0.7.10
+      '@iconify/utils': 2.1.32
+      debug: 4.3.6
+      kolorist: 1.8.0
+      local-pkg: 0.5.0
+      unplugin: 1.12.3
+    optionalDependencies:
+      '@vue/compiler-sfc': 3.5.0
+    transitivePeerDependencies:
+      - supports-color
+
+  unplugin-vue-components@0.27.4(@babel/parser@7.25.6)(rollup@4.21.2)(vue@3.5.0):
+    dependencies:
+      '@antfu/utils': 0.7.10
+      '@rollup/pluginutils': 5.1.0(rollup@4.21.2)
+      chokidar: 3.6.0
+      debug: 4.3.6
+      fast-glob: 3.3.2
+      local-pkg: 0.5.0
+      magic-string: 0.30.11
+      minimatch: 9.0.5
+      mlly: 1.7.1
+      unplugin: 1.12.3
+      vue: 3.5.0
+    optionalDependencies:
+      '@babel/parser': 7.25.6
+    transitivePeerDependencies:
+      - rollup
+      - supports-color
+
+  unplugin@1.12.3:
+    dependencies:
+      acorn: 8.12.1
+      webpack-sources: 3.2.3
+      webpack-virtual-modules: 0.6.2
+
+  update-browserslist-db@1.1.0(browserslist@4.23.3):
+    dependencies:
+      browserslist: 4.23.3
+      escalade: 3.2.0
+      picocolors: 1.1.0
+
+  uri-js@4.4.1:
+    dependencies:
+      punycode: 2.3.1
+
+  util-deprecate@1.0.2: {}
+
+  vite-hot-client@0.2.3(vite@5.4.3(@types/node@22.10.3)(less@4.2.0)):
+    dependencies:
+      vite: 5.4.3(@types/node@22.10.3)(less@4.2.0)
+
+  vite-plugin-inspect@0.8.7(rollup@4.21.2)(vite@5.4.3(@types/node@22.10.3)(less@4.2.0)):
+    dependencies:
+      '@antfu/utils': 0.7.10
+      '@rollup/pluginutils': 5.1.0(rollup@4.21.2)
+      debug: 4.3.6
+      error-stack-parser-es: 0.1.5
+      fs-extra: 11.2.0
+      open: 10.1.0
+      perfect-debounce: 1.0.0
+      picocolors: 1.1.0
+      sirv: 2.0.4
+      vite: 5.4.3(@types/node@22.10.3)(less@4.2.0)
+    transitivePeerDependencies:
+      - rollup
+      - supports-color
+
+  vite-plugin-vue-devtools@7.4.0(rollup@4.21.2)(vite@5.4.3(@types/node@22.10.3)(less@4.2.0))(vue@3.5.0):
+    dependencies:
+      '@vue/devtools-core': 7.4.0(vite@5.4.3(@types/node@22.10.3)(less@4.2.0))(vue@3.5.0)
+      '@vue/devtools-kit': 7.4.0
+      '@vue/devtools-shared': 7.4.0
+      execa: 8.0.1
+      sirv: 2.0.4
+      vite: 5.4.3(@types/node@22.10.3)(less@4.2.0)
+      vite-plugin-inspect: 0.8.7(rollup@4.21.2)(vite@5.4.3(@types/node@22.10.3)(less@4.2.0))
+      vite-plugin-vue-inspector: 5.2.0(vite@5.4.3(@types/node@22.10.3)(less@4.2.0))
+    transitivePeerDependencies:
+      - '@nuxt/kit'
+      - rollup
+      - supports-color
+      - vue
+
+  vite-plugin-vue-inspector@5.2.0(vite@5.4.3(@types/node@22.10.3)(less@4.2.0)):
+    dependencies:
+      '@babel/core': 7.25.2
+      '@babel/plugin-proposal-decorators': 7.24.7(@babel/core@7.25.2)
+      '@babel/plugin-syntax-import-attributes': 7.25.6(@babel/core@7.25.2)
+      '@babel/plugin-syntax-import-meta': 7.10.4(@babel/core@7.25.2)
+      '@babel/plugin-transform-typescript': 7.25.2(@babel/core@7.25.2)
+      '@vue/babel-plugin-jsx': 1.2.2(@babel/core@7.25.2)
+      '@vue/compiler-dom': 3.5.0
+      kolorist: 1.8.0
+      magic-string: 0.30.11
+      vite: 5.4.3(@types/node@22.10.3)(less@4.2.0)
+    transitivePeerDependencies:
+      - supports-color
+
+  vite@5.4.3(@types/node@22.10.3)(less@4.2.0):
+    dependencies:
+      esbuild: 0.21.5
+      postcss: 8.4.44
+      rollup: 4.21.2
+    optionalDependencies:
+      '@types/node': 22.10.3
+      fsevents: 2.3.3
+      less: 4.2.0
+
+  vue-demi@0.14.10(vue@3.5.0):
+    dependencies:
+      vue: 3.5.0
+
+  vue-eslint-parser@9.4.3(eslint@8.57.0):
+    dependencies:
+      debug: 4.3.6
+      eslint: 8.57.0
+      eslint-scope: 7.2.2
+      eslint-visitor-keys: 3.4.3
+      espree: 9.6.1
+      esquery: 1.6.0
+      lodash: 4.17.21
+      semver: 7.6.3
+    transitivePeerDependencies:
+      - supports-color
+
+  vue-i18n@11.0.1(vue@3.5.0):
+    dependencies:
+      '@intlify/core-base': 11.0.1
+      '@intlify/shared': 11.0.1
+      '@vue/devtools-api': 6.6.3
+      vue: 3.5.0
+
+  vue-router@4.4.3(vue@3.5.0):
+    dependencies:
+      '@vue/devtools-api': 6.6.3
+      vue: 3.5.0
+
+  vue@3.5.0:
+    dependencies:
+      '@vue/compiler-dom': 3.5.0
+      '@vue/compiler-sfc': 3.5.0
+      '@vue/runtime-dom': 3.5.0
+      '@vue/server-renderer': 3.5.0(vue@3.5.0)
+      '@vue/shared': 3.5.0
+
+  webpack-sources@3.2.3: {}
+
+  webpack-virtual-modules@0.6.2: {}
+
+  which@2.0.2:
+    dependencies:
+      isexe: 2.0.0
+
+  word-wrap@1.2.5: {}
+
+  wrappy@1.0.2: {}
+
+  xml-name-validator@4.0.0: {}
+
+  yallist@3.1.1: {}
+
+  yocto-queue@0.1.0: {}
diff --git a/web_demos/minicpm-o_2.6/web_server/public/favicon.ico b/web_demos/minicpm-o_2.6/web_server/public/favicon.ico
new file mode 100644
index 0000000..df36fcf
Binary files /dev/null and b/web_demos/minicpm-o_2.6/web_server/public/favicon.ico differ
diff --git a/web_demos/minicpm-o_2.6/web_server/public/favicon.svg b/web_demos/minicpm-o_2.6/web_server/public/favicon.svg
new file mode 100644
index 0000000..4a83c5c
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/public/favicon.svg
@@ -0,0 +1,9 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<svg width="39px" height="40px" viewBox="0 0 39 40" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+    <title>形状结合</title>
+    <g id="封面/目录" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
+        <g id="编组-9" transform="translate(-573, -4)" fill="#EF1C2F" fill-rule="nonzero">
+            <path d="M576.881892,21.235765 L580.450462,24.8041433 L580.38268,24.87313 C577.235834,28.1237009 577.267944,33.3099197 580.479012,36.5209876 C583.7129,39.7548751 588.950111,39.7644621 592.195876,36.5497487 L595.764033,40.1177144 L595.635716,40.2441837 C590.410115,45.3030383 582.072776,45.2514173 576.910679,40.0893208 L576.755816,39.9319282 C571.756877,34.7682174 571.748077,26.5660415 576.729414,21.3916323 L576.881892,21.235765 Z M592.417879,13.3160236 L604.512414,4 L599.920819,17.5607789 L607.492343,16.6473294 L602.663827,23.0830718 L611.570065,25.829445 L602.638402,29.682258 L606.245418,35.3702608 L600.389683,35.3702553 L597.78469,37.9753136 L594.216265,34.4068885 L597.546837,31.0764355 L595.209819,27.390919 L597.0017,26.6178775 L594.322938,25.7918752 L596.362671,23.0730359 L592.57191,23.5303217 L594.387004,18.1691054 L590.921987,20.8381842 L588.636635,16.8638388 L585.916869,19.3631275 L585.910577,19.36078 L582.540401,22.7310252 L578.972081,19.1627052 L581.472017,16.6628227 L581.47204,12.2468077 L584.806996,13.5296032 L589.867048,8.87978168 L592.417879,13.3160236 Z" id="形状结合"></path>
+        </g>
+    </g>
+</svg>
\ No newline at end of file
diff --git a/web_demos/minicpm-o_2.6/web_server/public/silero_vad_legacy.onnx b/web_demos/minicpm-o_2.6/web_server/public/silero_vad_legacy.onnx
new file mode 100644
index 0000000..e6db48d
Binary files /dev/null and b/web_demos/minicpm-o_2.6/web_server/public/silero_vad_legacy.onnx differ
diff --git a/web_demos/minicpm-o_2.6/web_server/src/App.vue b/web_demos/minicpm-o_2.6/web_server/src/App.vue
new file mode 100644
index 0000000..e537e75
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/App.vue
@@ -0,0 +1,7 @@
+<template>
+    <RouterView />
+</template>
+
+<script setup></script>
+
+<style lang="less" scoped></style>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/apis/index.js b/web_demos/minicpm-o_2.6/web_server/src/apis/index.js
new file mode 100644
index 0000000..159eb5b
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/apis/index.js
@@ -0,0 +1,21 @@
+// 定时发送消息
+export const sendMessage = data => {
+    return useHttp.post('/api/v1/stream', data);
+};
+// 跳过当前
+export const stopMessage = () => {
+    return useHttp.post('/api/v1/stop');
+};
+// 上传音色文件
+export const uploadFile = data => {
+    return useHttp.post('/api/v1/upload_audio', data);
+};
+// 反馈
+export const feedback = data => {
+    return useHttp.post('/api/v1/feedback', data);
+};
+// 上传配置
+export const uploadConfig = data => {
+    return useHttp.post('/api/v1/init_options', data);
+    // return useHttp.post('/api/v1/upload_audio', data);
+};
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/images/cai-active.png b/web_demos/minicpm-o_2.6/web_server/src/assets/images/cai-active.png
new file mode 100644
index 0000000..19971e5
Binary files /dev/null and b/web_demos/minicpm-o_2.6/web_server/src/assets/images/cai-active.png differ
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/images/cai.png b/web_demos/minicpm-o_2.6/web_server/src/assets/images/cai.png
new file mode 100644
index 0000000..6316224
Binary files /dev/null and b/web_demos/minicpm-o_2.6/web_server/src/assets/images/cai.png differ
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/images/ideas-icon.png b/web_demos/minicpm-o_2.6/web_server/src/assets/images/ideas-icon.png
new file mode 100644
index 0000000..cafaeb3
Binary files /dev/null and b/web_demos/minicpm-o_2.6/web_server/src/assets/images/ideas-icon.png differ
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/images/logo.png b/web_demos/minicpm-o_2.6/web_server/src/assets/images/logo.png
new file mode 100644
index 0000000..254c050
Binary files /dev/null and b/web_demos/minicpm-o_2.6/web_server/src/assets/images/logo.png differ
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/images/voice-icon.png b/web_demos/minicpm-o_2.6/web_server/src/assets/images/voice-icon.png
new file mode 100644
index 0000000..59237de
Binary files /dev/null and b/web_demos/minicpm-o_2.6/web_server/src/assets/images/voice-icon.png differ
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/images/voice.png b/web_demos/minicpm-o_2.6/web_server/src/assets/images/voice.png
new file mode 100644
index 0000000..06aaeab
Binary files /dev/null and b/web_demos/minicpm-o_2.6/web_server/src/assets/images/voice.png differ
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/images/zan-active.png b/web_demos/minicpm-o_2.6/web_server/src/assets/images/zan-active.png
new file mode 100644
index 0000000..ce865ee
Binary files /dev/null and b/web_demos/minicpm-o_2.6/web_server/src/assets/images/zan-active.png differ
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/images/zan.png b/web_demos/minicpm-o_2.6/web_server/src/assets/images/zan.png
new file mode 100644
index 0000000..a4b7db5
Binary files /dev/null and b/web_demos/minicpm-o_2.6/web_server/src/assets/images/zan.png differ
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/svg/config.svg b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/config.svg
new file mode 100644
index 0000000..15084b6
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/config.svg
@@ -0,0 +1 @@
+<svg data-v-d2e47025="" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1024 1024"><path fill="currentColor" d="M600.704 64a32 32 0 0 1 30.464 22.208l35.2 109.376c14.784 7.232 28.928 15.36 42.432 24.512l112.384-24.192a32 32 0 0 1 34.432 15.36L944.32 364.8a32 32 0 0 1-4.032 37.504l-77.12 85.12a357.12 357.12 0 0 1 0 49.024l77.12 85.248a32 32 0 0 1 4.032 37.504l-88.704 153.6a32 32 0 0 1-34.432 15.296L708.8 803.904c-13.44 9.088-27.648 17.28-42.368 24.512l-35.264 109.376A32 32 0 0 1 600.704 960H423.296a32 32 0 0 1-30.464-22.208L357.696 828.48a351.616 351.616 0 0 1-42.56-24.64l-112.32 24.256a32 32 0 0 1-34.432-15.36L79.68 659.2a32 32 0 0 1 4.032-37.504l77.12-85.248a357.12 357.12 0 0 1 0-48.896l-77.12-85.248A32 32 0 0 1 79.68 364.8l88.704-153.6a32 32 0 0 1 34.432-15.296l112.32 24.256c13.568-9.152 27.776-17.408 42.56-24.64l35.2-109.312A32 32 0 0 1 423.232 64H600.64zm-23.424 64H446.72l-36.352 113.088-24.512 11.968a294.113 294.113 0 0 0-34.816 20.096l-22.656 15.36-116.224-25.088-65.28 113.152 79.68 88.192-1.92 27.136a293.12 293.12 0 0 0 0 40.192l1.92 27.136-79.808 88.192 65.344 113.152 116.224-25.024 22.656 15.296a294.113 294.113 0 0 0 34.816 20.096l24.512 11.968L446.72 896h130.688l36.48-113.152 24.448-11.904a288.282 288.282 0 0 0 34.752-20.096l22.592-15.296 116.288 25.024 65.28-113.152-79.744-88.192 1.92-27.136a293.12 293.12 0 0 0 0-40.256l-1.92-27.136 79.808-88.128-65.344-113.152-116.288 24.96-22.592-15.232a287.616 287.616 0 0 0-34.752-20.096l-24.448-11.904L577.344 128zM512 320a192 192 0 1 1 0 384 192 192 0 0 1 0-384m0 64a128 128 0 1 0 0 256 128 128 0 0 0 0-256"></path></svg>
\ No newline at end of file
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/svg/document.svg b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/document.svg
new file mode 100644
index 0000000..ddf06cc
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/document.svg
@@ -0,0 +1 @@
+<svg data-v-d2e47025="" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1024 1024"><path fill="currentColor" d="M832 384H576V128H192v768h640zm-26.496-64L640 154.496V320zM160 64h480l256 256v608a32 32 0 0 1-32 32H160a32 32 0 0 1-32-32V96a32 32 0 0 1 32-32m160 448h384v64H320zm0-192h160v64H320zm0 384h384v64H320z"></path></svg>
\ No newline at end of file
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/svg/error.svg b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/error.svg
new file mode 100644
index 0000000..f9786c4
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/error.svg
@@ -0,0 +1,5 @@
+<svg width="20" height="20" viewBox="0 0 20 20" fill="none" xmlns="http://www.w3.org/2000/svg">
+<g id="Icon/Utility Icon/line/error">
+<path id="Union" fill-rule="evenodd" clip-rule="evenodd" d="M9.99997 20C4.48608 20 0 15.5139 0 10C0 4.48607 4.48606 0 9.99997 0C15.5139 0 19.9999 4.48609 19.9999 10C19.9999 15.5139 15.5139 20 9.99997 20ZM9.99997 1.875C5.52001 1.875 1.875 5.52002 1.875 10C1.875 14.48 5.52001 18.125 9.99997 18.125C14.4799 18.125 18.125 14.48 18.125 10C18.125 5.52002 14.4799 1.875 9.99997 1.875ZM13.7878 7.53784L11.3257 9.99999L13.7878 12.4621C14.154 12.8283 14.154 13.4216 13.7878 13.7878C13.6047 13.9709 13.3655 14.0625 13.125 14.0625C12.8845 14.0625 12.6452 13.9709 12.4621 13.7878L9.99998 11.3257L7.53784 13.7878C7.35473 13.9709 7.11548 14.0625 6.875 14.0625C6.63451 14.0625 6.39526 13.9709 6.21216 13.7878C5.84595 13.4216 5.84595 12.8283 6.21216 12.4621L8.6743 9.99999L6.21216 7.53784C5.84595 7.17163 5.84595 6.57837 6.21216 6.21216C6.57836 5.84595 7.17163 5.84595 7.53784 6.21216L10 8.67431L12.4621 6.21216C12.8283 5.84595 13.4216 5.84595 13.7878 6.21216C14.154 6.57837 14.154 7.17163 13.7878 7.53784Z" fill="#E72B00"/>
+</g>
+</svg>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/svg/miniCPM2.6.svg b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/miniCPM2.6.svg
new file mode 100644
index 0000000..872df0c
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/miniCPM2.6.svg
@@ -0,0 +1,29 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<svg viewBox="0 0 2199 258" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+    <title>编组 5</title>
+    <defs>
+        <linearGradient x1="45.9111958%" y1="57.6904311%" x2="4.78458419e-14%" y2="70.534914%" id="linearGradient-1">
+            <stop stop-color="#373ED8" offset="0%"></stop>
+            <stop stop-color="#497DFF" offset="100%"></stop>
+        </linearGradient>
+        <path d="M1812.80909,215.823442 L1812.80909,252.015442 L1952.00909,252.015442 L1952.00909,211.995442 L1870.22909,211.995442 L1930.08509,134.391442 C1937.27682,125.111446 1942.72882,116.063446 1946.44109,107.247442 C1950.15309,98.4314389 1952.00909,88.8034425 1952.00909,78.3634425 L1952.00909,72.4474425 C1952.00909,60.6154425 1949.16709,49.8274425 1943.48309,40.0834425 C1937.79935,30.3394425 1929.67935,22.5674425 1919.12309,16.7674425 C1908.56709,10.9674354 1896.32909,8.06744248 1882.40909,8.06744248 C1868.02509,8.06744248 1855.49709,10.8514425 1844.82509,16.4194425 C1834.15309,21.9874425 1825.97509,29.6434425 1820.29109,39.3874425 C1814.6071,49.1314425 1811.76509,60.2674425 1811.76509,72.7954425 L1811.76509,81.1474425 L1855.96109,81.1474425 L1855.96109,75.5794425 C1855.96109,66.5314425 1858.16509,59.4554496 1862.57309,54.3514425 C1866.98109,49.2474354 1873.01309,46.6954425 1880.66909,46.6954425 C1888.32509,46.6954425 1894.41509,49.1314425 1898.93909,54.0034425 C1903.46309,58.8754425 1905.72509,65.4874425 1905.72509,73.8394425 L1905.72509,78.7114425 C1905.72509,89.3834389 1901.31709,100.635442 1892.50109,112.467442 L1812.80909,215.823442 Z M1976.89309,202.599442 L1976.89309,252.015442 L2025.26509,252.015442 L2025.26509,202.599442 L1976.89309,202.599442 Z M2069.81109,237.051442 C2082.91909,249.579442 2101.07309,255.843442 2124.27309,255.843442 C2146.54509,255.843442 2164.46709,249.463444 2178.03909,236.703442 C2191.61109,223.943441 2198.39709,206.195446 2198.39709,183.459442 L2198.39709,172.323442 C2198.39709,151.675439 2192.77109,135.377446 2181.51909,123.429442 C2170.26709,111.481439 2155.24509,105.507442 2136.45309,105.507442 C2129.95709,105.507442 2124.15709,106.667446 2119.05309,108.987442 L2168.81709,11.8954425 L2120.79309,11.8954425 L2065.80909,118.731442 C2060.70509,128.939446 2056.81909,138.335446 2054.15109,146.919442 C2051.48309,155.503439 2050.14909,164.551444 2050.14909,174.063442 L2050.14909,184.851442 C2050.14909,207.123442 2056.70309,224.523442 2069.81109,237.051442 Z M2145.15309,208.863442 C2140.04909,214.431442 2133.08909,217.215442 2124.27309,217.215442 C2115.45709,217.215442 2108.55509,214.431442 2103.56709,208.863442 C2098.57909,203.295442 2096.08509,195.639442 2096.08509,185.895442 L2096.08509,174.411442 C2096.08509,164.667448 2098.57909,157.06945 2103.56709,151.617442 C2108.55507,146.165446 2115.45707,143.439442 2124.27309,143.439442 C2133.08909,143.439442 2140.04909,146.165446 2145.15309,151.617442 C2150.25709,157.069439 2152.80909,164.783441 2152.80909,174.759442 L2152.80909,185.547442 C2152.80909,195.523444 2150.25709,203.295442 2145.15309,208.863442 Z" id="path-2"></path>
+    </defs>
+    <g id="页面-1" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
+        <g id="画板备份-14" transform="translate(-1928, -1764)" fill-rule="nonzero">
+            <g id="编组-5" transform="translate(1928, 1764.9846)">
+                <path d="M760.177408,6.08426104 C780.959767,6.08426104 798.639412,11.1994393 813.310847,21.4099653 L814.826937,22.4868416 C827.871266,31.9421726 838.44267,44.5805385 846.551989,60.4441981 L846.854209,61.0497535 L805.164914,79.6080391 L804.700192,78.6693484 C800.249247,69.9367001 794.263119,63.1340103 786.753168,58.3199387 C778.149995,52.8050845 768.233984,50.0506369 757.083763,50.0506369 C744.833865,50.0506369 733.831738,53.5163071 724.177049,60.4281868 C714.581392,67.2978043 707.134706,76.9187061 701.846243,89.2224762 C696.604371,101.417853 693.991732,115.174193 693.991732,130.467076 C693.991732,146.161685 696.600155,160.26846 701.833896,172.765351 C707.114285,185.373628 714.549996,195.251726 724.136357,202.332561 C733.797042,209.468294 744.814803,213.049067 757.083763,213.049067 C767.641468,213.049067 777.212416,210.227573 785.712813,204.59744 L787.029919,203.697896 C793.997044,198.793482 800.045118,192.1801 805.176059,183.885785 L805.493229,183.359111 L847.470223,202.046016 L847.190902,202.601562 C838.534575,219.439902 826.824502,232.527091 812.037303,241.920218 C796.196991,251.982303 778.007922,257.015442 757.393128,257.015442 C735.537395,257.015442 716.157915,251.671101 699.179392,240.984619 C682.183395,230.287139 668.940168,215.394755 659.416171,196.246509 C649.861213,177.036014 645.075526,155.122604 645.075526,130.467076 C645.075526,106.236416 649.907085,84.6957154 659.55475,65.8023698 C669.170256,46.9720039 682.655972,32.3375052 700.053275,21.8391328 C717.453143,11.3392121 737.472005,6.08426104 760.177408,6.08426104 Z M472.804069,70.4320631 C490.215347,70.4320631 503.551514,75.9687387 513.08592,87.0440588 L513.922858,88.0433234 C522.993334,99.1752071 527.579887,114.927363 527.579887,135.416907 L527.579887,252.681061 L482.065262,252.681061 L482.066689,147.48212 C482.066689,137.230753 479.444996,128.966722 474.122797,122.834623 C468.710698,116.598944 461.272898,113.470346 452.076652,113.470346 C441.497963,113.470346 432.82714,116.858283 426.285393,123.629565 L425.517793,124.451905 C419.503248,131.122279 416.518055,139.98993 416.518055,150.885129 L416.517248,252.681061 L371.003248,252.681061 L371.003248,74.7611551 L416.517248,74.7611551 L416.518055,98.8935909 L423.358335,98.8935909 L424.345965,97.2602023 C429.414322,88.8779197 436.064565,82.3247602 444.331449,77.5591448 C452.566403,72.8119363 462.037327,70.4320631 472.804069,70.4320631 Z M605.481416,74.7611551 L605.481416,252.681061 L559.9708,252.681061 L559.9708,74.7611551 L605.481416,74.7611551 Z M335.78549,74.7611551 L335.78549,252.681061 L290.27149,252.681061 L290.27149,74.7611551 L335.78549,74.7611551 Z M0,10.4147082 L43.0533273,10.4147082 L122.24903,130.758127 L127.754059,130.758127 L206.947055,10.4147082 L250.000382,10.4147082 L250.000382,252.681061 L204.178374,252.681061 L204.180526,104.498777 L197.139835,104.498777 L143.308009,184.313596 L106.66868,184.313596 L52.5323692,105.736235 L45.5131981,105.736235 L45.5106162,252.681061 L0,252.681061 L0,10.4147082 Z M961.869908,10.4147082 C981.192886,10.4147082 997.923497,13.7715036 1012.08946,20.4554447 C1026.1438,27.0867173 1036.87643,36.5393059 1044.3624,48.8517555 C1051.8649,61.1913978 1055.62564,75.6900619 1055.62564,92.415251 C1055.62564,109.151297 1051.96203,123.607707 1044.65738,135.847946 C1037.36788,148.062782 1026.98507,157.46144 1013.43892,164.086199 C999.800233,170.756213 983.652345,174.105774 964.963553,174.105774 L922.598939,174.105774 L922.596926,252.681061 L875.228112,252.681061 L875.228112,10.4147082 L961.869908,10.4147082 Z M953.826433,52.8349168 L922.598939,52.8349168 L922.598939,131.686221 L953.826433,131.686221 C969.953126,131.686221 982.80491,128.309905 992.310812,121.456813 C1002.06659,114.423581 1007.0188,104.634314 1007.0188,92.415251 C1007.0188,80.2034681 1002.0737,70.3707714 992.331646,63.2341476 C982.822362,56.2680445 969.961757,52.8349168 953.826433,52.8349168 Z M335.78549,0 L335.78549,45.5106162 L290.27149,45.5106162 L290.27149,0 L335.78549,0 Z M605.119253,0 L605.119253,45.5106162 L559.605252,45.5106162 L559.605252,0 L605.119253,0 Z" id="形状" fill="#111111"></path>
+                <g id="M-V" transform="translate(1084.9431, 11.7574)" fill="#000111">
+                    <polygon id="路径" points="44.394 239.184 0 239.184 0 0 41.676 0 119.894 123.216 121.706 123.216 200.226 0 241.902 0 241.902 239.184 197.508 239.184 197.508 85.466 195.696 85.466 137.41 176.368 104.492 176.368 46.206 86.372 44.394 86.372"></polygon>
+                    <polygon id="路径" points="274.216 96.942 374.48 96.942 374.48 138.014 274.216 138.014"></polygon>
+                </g>
+                <g id="o" transform="translate(1501.3431, 42.9174)" fill="#000111">
+                    <path d="M95.4,213.12 C75.96,213.12 59.04,208.8 44.64,200.16 C30.24,191.52 19.2,179.16 11.52,163.08 C3.84,147 0,128.16 0,106.56 C0,84.96 3.84,66.12 11.52,50.04 C19.2,33.96 30.24,21.6 44.64,12.96 C59.04,4.32 75.96,0 95.4,0 C114.84,0 131.76,4.32 146.16,12.96 C160.56,21.6 171.66,33.96 179.46,50.04 C187.26,66.12 191.16,84.96 191.16,106.56 C191.16,128.16 187.26,147 179.46,163.08 C171.66,179.16 160.56,191.52 146.16,200.16 C131.76,208.8 114.84,213.12 95.4,213.12 Z M95.4,169.92 C110.52,169.92 122.46,164.22 131.22,152.82 C139.98,141.42 144.36,126 144.36,106.56 C144.36,86.88 139.98,71.4 131.22,60.12 C122.46,48.84 110.52,43.2 95.4,43.2 C80.52,43.2 68.7,48.84 59.94,60.12 C51.18,71.4 46.8,86.88 46.8,106.56 C46.8,126 51.18,141.42 59.94,152.82 C68.7,164.22 80.52,169.92 95.4,169.92 Z" id="形状"></path>
+                </g>
+                <g id="形状结合">
+                    <use fill="#000111" xlink:href="#path-2"></use>
+                    <use fill="url(#linearGradient-1)" xlink:href="#path-2"></use>
+                </g>
+            </g>
+        </g>
+    </g>
+</svg>
\ No newline at end of file
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/svg/pause.svg b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/pause.svg
new file mode 100644
index 0000000..e21f175
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/pause.svg
@@ -0,0 +1,5 @@
+<svg width="18" height="18" viewBox="0 0 18 18" fill="none" xmlns="http://www.w3.org/2000/svg">
+<g id="Pause">
+<path id="Vector" fill-rule="evenodd" clip-rule="evenodd" d="M4.875 2.2522H7.125C7.5375 2.2522 7.875 2.5897 7.875 3.0022V15.0022C7.875 15.4147 7.5375 15.7522 7.125 15.7522H4.875C4.4625 15.7522 4.125 15.4147 4.125 15.0022V3.0022C4.125 2.5897 4.4625 2.2522 4.875 2.2522ZM10.875 2.2522H13.125C13.5375 2.2522 13.875 2.5897 13.875 3.0022V15.0022C13.875 15.4147 13.5375 15.7522 13.125 15.7522H10.875C10.4625 15.7522 10.125 15.4147 10.125 15.0022V3.0022C10.125 2.5897 10.4625 2.2522 10.875 2.2522Z" fill="currentColor" />
+</g>
+</svg>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/svg/phone-icon.svg b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/phone-icon.svg
new file mode 100644
index 0000000..4f371cc
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/phone-icon.svg
@@ -0,0 +1,10 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none">
+  <g clip-path="url(#clip0_7781_19663)">
+    <path d="M21.7786 18.4946C22.7599 18.6754 23.7615 17.9845 23.7955 16.9048C23.827 15.9053 23.7672 14.6519 23.4533 13.4613C23.141 12.2768 22.5546 11.0725 21.4695 10.2892C20.3647 9.49176 18.7205 8.97497 17.1207 8.64947C15.4984 8.31938 13.8179 8.16607 12.5642 8.15054C10.9332 8.13034 8.67094 8.26243 6.60622 8.68941C5.57392 8.90289 4.56701 9.19489 3.70489 9.59193C2.85192 9.98474 2.07652 10.5096 1.58739 11.2247C0.257894 13.1683 0.172116 15.4886 0.325588 16.9453C0.436943 18.0022 1.45742 18.5535 2.36025 18.353C3.07081 18.1951 3.71743 18.0593 4.36845 17.9225C5.30139 17.7265 6.24339 17.5286 7.3955 17.2614C7.46587 17.2451 7.53161 17.2194 7.59169 17.1859C7.85982 17.0768 8.05173 16.8168 8.05917 16.509C8.09666 14.957 8.40578 14.0228 8.95698 13.4586C9.50108 12.9017 10.4369 12.5484 12.1227 12.5476C13.8976 12.5468 14.8691 12.862 15.4225 13.3997C15.9698 13.9314 16.2828 14.8523 16.2836 16.5335C16.2836 16.5634 16.2854 16.5928 16.2888 16.6217C16.279 16.6521 16.2711 16.6836 16.2651 16.7159C16.19 17.1233 16.4594 17.5144 16.8668 17.5894L21.7786 18.4946Z" fill="currentColor" />
+  </g>
+  <defs>
+    <clipPath id="clip0_7781_19663">
+      <rect width="24" height="24" fill="white"/>
+    </clipPath>
+  </defs>
+</svg>
\ No newline at end of file
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/svg/question.svg b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/question.svg
new file mode 100644
index 0000000..833e635
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/question.svg
@@ -0,0 +1 @@
+<svg t="1736675176012" class="icon" viewBox="0 0 1024 1024" version="1.1" xmlns="http://www.w3.org/2000/svg" p-id="4244" xmlns:xlink="http://www.w3.org/1999/xlink"><path d="M512 106.667A405.333 405.333 0 1 1 106.667 512 405.333 405.333 0 0 1 512 106.667m0-64A469.333 469.333 0 1 0 981.333 512 469.333 469.333 0 0 0 512 42.667z" p-id="4245"></path><path d="M501.333 664.533a32 32 0 1 0 32 32 32 32 0 0 0-32-32z m-0.426-27.093a32 32 0 0 1-32-32c0-80.213 50.56-111.787 91.306-136.96 32-19.84 51.84-33.28 59.094-60.16a85.333 85.333 0 0 0-12.587-69.547 91.52 91.52 0 0 0-76.8-29.226 123.52 123.52 0 0 0-92.16 29.866 82.56 82.56 0 0 0-21.333 52.907 32 32 0 1 1-64 2.56 144 144 0 0 1 39.466-99.84c31.574-32.853 78.08-49.493 138.24-49.493 70.827 0 108.587 29.44 128 54.186a149.333 149.333 0 0 1 23.894 125.014c-14.08 52.693-54.614 77.866-87.04 98.133-40.32 24.747-61.654 39.68-61.654 82.56a32 32 0 0 1-32.426 32z" p-id="4246"></path></svg>
\ No newline at end of file
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/svg/switch-camera.svg b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/switch-camera.svg
new file mode 100644
index 0000000..39797b8
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/switch-camera.svg
@@ -0,0 +1,3 @@
+<svg data-v-d2e47025="" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1024 1024">
+    <path fill="currentColor" d="M771.776 794.88A384 384 0 0 1 128 512h64a320 320 0 0 0 555.712 216.448H654.72a32 32 0 1 1 0-64h149.056a32 32 0 0 1 32 32v148.928a32 32 0 1 1-64 0v-50.56zM276.288 295.616h92.992a32 32 0 0 1 0 64H220.16a32 32 0 0 1-32-32V178.56a32 32 0 0 1 64 0v50.56A384 384 0 0 1 896.128 512h-64a320 320 0 0 0-555.776-216.384z"></path>
+</svg>
\ No newline at end of file
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/svg/upload.svg b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/upload.svg
new file mode 100644
index 0000000..e15275c
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/upload.svg
@@ -0,0 +1,7 @@
+<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
+<g id="&#228;&#184;&#139;&#232;&#189;&#189;">
+<rect width="24" height="24" rx="7" fill="#EAEFFF"/>
+<path id="Vector" d="M12.2816 16.1003C11.9134 16.1003 11.615 15.8019 11.615 15.4337V6.2513L8.28168 9.50983C8.11137 9.6765 7.86502 9.73978 7.63559 9.67571C7.40617 9.61165 7.22831 9.42989 7.16894 9.1989C7.10956 8.96817 7.17805 8.72313 7.34836 8.55646L11.8163 4.18987C12.0785 3.93363 12.4985 3.93727 12.7563 4.19794L17.088 8.56323C17.3399 8.82599 17.3344 9.24187 17.0763 9.49811C16.818 9.7541 16.4018 9.75592 16.1414 9.50202L12.9483 6.28489V15.4337C12.9483 15.8019 12.6498 16.1003 12.2816 16.1003Z" fill="#424EC5"/>
+<path id="Vector_2" d="M4.66666 13.6001C5.03488 13.6001 5.33331 13.8985 5.33331 14.2668V17.4667C5.33331 17.8349 5.63174 18.1334 5.99997 18.1334H17.9998C18.368 18.1334 18.6664 17.8349 18.6664 17.4667V14.2668C18.6664 13.8985 18.9648 13.6001 19.3331 13.6001C19.7013 13.6001 19.9997 13.8985 19.9997 14.2668V17.4667C19.9997 18.5714 19.1044 19.4667 17.9998 19.4667H5.99997C4.8953 19.4667 4 18.5714 4 17.4667V14.2668C4 13.8985 4.29843 13.6001 4.66666 13.6001Z" fill="#424EC5"/>
+</g>
+</svg>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/svg/voice.svg b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/voice.svg
new file mode 100644
index 0000000..e773ad0
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/voice.svg
@@ -0,0 +1,41 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="195" height="45" viewBox="0 0 195 45" fill="none">
+  <rect x="16" y="18.2257" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="11" y="18.2257" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="6" y="18.2257" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="0.907227" y="18.2257" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="71" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="91" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="111" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="131" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="66" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="86" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="106" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="126" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="61" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="81" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="101" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="121" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="56" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="76" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="96" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="116" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="21" y="13.3407" width="3.14815" height="18.3186" rx="1.57407" fill="#F3F3F3"/>
+  <rect width="3.14815" height="18.3186" rx="1.57407" transform="matrix(-1 0 0 1 54 13.3849)" fill="#F3F3F3"/>
+  <rect x="26" y="8.45581" width="3.14815" height="28.0885" rx="1.57407" fill="#F3F3F3"/>
+  <rect width="3.14815" height="28.0885" rx="1.57407" transform="matrix(-1 0 0 1 49 8.5)" fill="#F3F3F3"/>
+  <rect x="31" y="9.9823" width="3.14815" height="25.0354" rx="1.57407" fill="#F3F3F3"/>
+  <rect width="3.14815" height="25.0354" rx="1.57407" transform="matrix(-1 0 0 1 44 10.0265)" fill="#F3F3F3"/>
+  <rect x="36" y="5.09729" width="3.14815" height="34.8053" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="151" y="15.4779" width="3.14815" height="14.0442" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="156" y="5.70801" width="3.14815" height="33.5841" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="161" y="7.53979" width="3.14815" height="29.9204" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="166" y="15.4779" width="3.14815" height="14.0442" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="171" y="10.8982" width="3.14815" height="23.2035" rx="1.57407" fill="#F3F3F3"/>
+  <rect width="3.14815" height="29.9204" rx="1.57407" transform="matrix(-1 0 0 1 149 7.5)" fill="#F3F3F3"/>
+  <rect width="3.14815" height="14.0442" rx="1.57407" transform="matrix(-1 0 0 1 144 15.4381)" fill="#F3F3F3"/>
+  <rect width="3.14815" height="23.2035" rx="1.57407" transform="matrix(-1 0 0 1 139 10.8584)" fill="#F3F3F3"/>
+  <rect x="176" y="18.2257" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="181" y="18.2257" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="186" y="18.2257" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+  <rect x="191" y="18.2257" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
+</svg>
\ No newline at end of file
diff --git a/web_demos/minicpm-o_2.6/web_server/src/assets/svg/warning.svg b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/warning.svg
new file mode 100644
index 0000000..f5fe1f8
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/assets/svg/warning.svg
@@ -0,0 +1,5 @@
+<svg width="20" height="20" viewBox="0 0 20 20" fill="none" xmlns="http://www.w3.org/2000/svg">
+<g id="Icon/Utility Icon/line/warning">
+<path id="Union" fill-rule="evenodd" clip-rule="evenodd" d="M11.8265 2.57765L19.6724 15.1369C20.0876 15.8014 20.1095 16.6395 19.7298 17.3248C19.3513 18.0101 18.6285 18.4352 17.8459 18.4352H2.15406C1.37142 18.4352 0.648615 18.0101 0.270115 17.3248C-0.109605 16.6395 -0.0876187 15.8014 0.327499 15.1369L8.17341 2.57765C8.569 1.94364 9.25275 1.56494 9.99998 1.56494C10.7472 1.56494 11.431 1.94364 11.8265 2.57765ZM17.8459 16.5589C17.9887 16.5589 18.0608 16.4685 18.0901 16.4148C18.1194 16.361 18.1585 16.2535 18.0828 16.1314L10.2369 3.57211C10.1661 3.4585 10.0574 3.44141 10 3.44141C9.94262 3.44141 9.83395 3.4585 9.76313 3.57211L1.91722 16.1314C1.84151 16.2535 1.88058 16.361 1.90988 16.4148C1.93918 16.4685 2.01122 16.5589 2.15407 16.5589H17.8459ZM9.99995 12.1893C10.5176 12.1893 10.9377 11.769 10.9377 11.2511V7.69991C10.9377 7.18195 10.5176 6.76172 9.99995 6.76172C9.48226 6.76172 9.06225 7.18195 9.06225 7.69991V11.2511C9.06225 11.7691 9.48226 12.1893 9.99995 12.1893ZM9.99996 15.6293C9.30946 15.6293 8.7497 15.0692 8.7497 14.3784C8.7497 13.6875 9.30946 13.1274 9.99996 13.1274C10.6905 13.1274 11.2502 13.6875 11.2502 14.3784C11.2502 15.0692 10.6905 15.6293 9.99996 15.6293Z" fill="#F9AC2A"/>
+</g>
+</svg>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/components/CallHeader/index.vue b/web_demos/minicpm-o_2.6/web_server/src/components/CallHeader/index.vue
new file mode 100644
index 0000000..f172355
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/components/CallHeader/index.vue
@@ -0,0 +1,3 @@
+<template>
+    <div class="call-header"></div>
+</template>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/components/CountDown/index.vue b/web_demos/minicpm-o_2.6/web_server/src/components/CountDown/index.vue
new file mode 100644
index 0000000..eb6d615
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/components/CountDown/index.vue
@@ -0,0 +1,82 @@
+<template>
+    <div class="time">
+        <div class="time-minute">{{ minute || '00' }}</div>
+        <div class="time-colon">:</div>
+        <div class="time-second">{{ second || '00' }}</div>
+    </div>
+</template>
+
+<script setup>
+    import { limitTime, tipsRemainingTime } from '@/enums';
+
+    const start = defineModel();
+
+    const emits = defineEmits(['timeUp']);
+
+    const remainingTime = ref();
+    const minute = ref();
+    const second = ref();
+    const timeInterval = ref(null);
+
+    const startCount = () => {
+        remainingTime.value = limitTime;
+        updateCountDown();
+        timeInterval.value = setInterval(() => {
+            updateCountDown();
+        }, 1000);
+    };
+    const updateCountDown = () => {
+        let minutes = Math.floor(remainingTime.value / 60);
+        let seconds = remainingTime.value % 60;
+
+        // 格式化分钟和秒，确保它们是两位数
+        minute.value = minutes < 10 ? '0' + minutes : minutes;
+        second.value = seconds < 10 ? '0' + seconds : seconds;
+
+        // 剩余1分钟提示用户
+        if (remainingTime.value === tipsRemainingTime) {
+            ElMessage({
+                type: 'warning',
+                message: `This call will disconnect in ${tipsRemainingTime} seconds.`,
+                duration: 3000,
+                customClass: 'time-warning'
+            });
+        }
+        // 防止倒计时变成负数
+        if (remainingTime.value > 0) {
+            remainingTime.value--;
+        } else {
+            clearInterval(timeInterval);
+            emits('timeUp');
+        }
+    };
+    watch(
+        () => start.value,
+        newVal => {
+            timeInterval.value && clearInterval(timeInterval.value);
+            if (newVal) {
+                startCount();
+            }
+        },
+        { immediate: true }
+    );
+</script>
+<style lang="less" scoped>
+    .time {
+        display: flex;
+        align-items: center;
+        .time-minute,
+        .time-second {
+            width: 26px;
+            height: 26px;
+            display: flex;
+            justify-content: center;
+            align-items: center;
+            border-radius: 3.848px;
+            background: rgba(47, 47, 47, 0.5);
+        }
+        .time-colon {
+            margin: 0 3px;
+        }
+    }
+</style>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/components/DelayTips/index.vue b/web_demos/minicpm-o_2.6/web_server/src/components/DelayTips/index.vue
new file mode 100644
index 0000000..3dc3130
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/components/DelayTips/index.vue
@@ -0,0 +1,23 @@
+<template>
+    <div class="delay-tips">
+        <span>当前发生延迟，目前延迟{{ delayTimestamp }}ms，积压{{ delayCount * 200 }}ms未发</span>
+    </div>
+</template>
+<script setup>
+    defineProps({
+        delayTimestamp: {
+            type: Number,
+            defalult: 0
+        },
+        delayCount: {
+            type: Number,
+            defalult: 0
+        }
+    });
+</script>
+<style lang="less" scoped>
+    .delay-tips {
+        font-size: 12px;
+        color: #dc3545;
+    }
+</style>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/components/ExtraInfo/index.vue b/web_demos/minicpm-o_2.6/web_server/src/components/ExtraInfo/index.vue
new file mode 100644
index 0000000..ca2f1c5
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/components/ExtraInfo/index.vue
@@ -0,0 +1,36 @@
+<template>
+    <div class="extra-info">
+        <div class="model-version" v-if="modelVersion">模型版本: {{ modelVersion }}</div>
+        <div class="web-version">前端版本: {{ webVersion }}</div>
+    </div>
+</template>
+
+<script setup>
+    defineProps({
+        modelVersion: {
+            type: String,
+            default: ''
+        },
+        webVersion: {
+            type: String,
+            default: ''
+        }
+    });
+</script>
+
+<style lang="less" scoped>
+    .extra-info {
+        position: fixed;
+        top: 62px;
+        left: 4vw;
+        display: flex;
+        .model-version,
+        .web-version {
+            font-size: 12px;
+            color: red;
+        }
+        .model-version {
+            margin-right: 16px;
+        }
+    }
+</style>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/components/IdeasList/index.vue b/web_demos/minicpm-o_2.6/web_server/src/components/IdeasList/index.vue
new file mode 100644
index 0000000..43bef07
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/components/IdeasList/index.vue
@@ -0,0 +1,67 @@
+<template>
+    <div class="ideas">
+        <div class="ideas-title">
+            <img src="@/assets/images/ideas-icon.png " />
+            <span>Convsersation ideas</span>
+        </div>
+        <div class="ideas-content">
+            <div class="ideas-content-item" v-for="(item, index) in ideasList" :key="index">{{ item }}</div>
+        </div>
+    </div>
+</template>
+
+<script setup>
+    defineProps({
+        ideasList: {
+            type: Array,
+            default: () => []
+        }
+    });
+</script>
+
+<style lang="less" scoped>
+    .ideas {
+        margin-top: 16px;
+        box-shadow: 0 0 0 0.5px #e0e0e0;
+        border-radius: 12px;
+        padding: 18px 28px;
+        &-title {
+            font-size: 20px;
+            font-weight: 500;
+            margin-bottom: 20px;
+            display: flex;
+            align-items: center;
+            img {
+                width: 24px;
+                height: 24px;
+                margin-right: 10px;
+            }
+            span {
+                color: #171717;
+                font-family: PingFang SC;
+                font-size: 16px;
+                font-style: normal;
+                font-weight: 500;
+                line-height: normal;
+            }
+        }
+        &-content {
+            display: grid;
+            grid-template-columns: repeat(3, 1fr);
+            gap: 8px;
+            &-item {
+                display: flex;
+                align-items: center;
+                border-radius: 10px;
+                background: #eaefff;
+                padding: 10px 24px;
+                color: #7579eb;
+                font-family: PingFang SC;
+                font-size: 14px;
+                font-style: normal;
+                font-weight: 400;
+                line-height: normal;
+            }
+        }
+    }
+</style>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/components/LikeAndDislike/index.vue b/web_demos/minicpm-o_2.6/web_server/src/components/LikeAndDislike/index.vue
new file mode 100644
index 0000000..ffed7b4
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/components/LikeAndDislike/index.vue
@@ -0,0 +1,110 @@
+<template>
+    <div class="like-box">
+        <div class="like-btn" @click="selectFeedbackStatus('like')">
+            <img v-if="feedbackStatus === '' || feedbackStatus === 'dislike'" src="@/assets/images/zan.png" />
+            <img v-else src="@/assets/images/zan-active.png" />
+        </div>
+        <div class="dislike-btn" @click="selectFeedbackStatus('dislike')">
+            <img v-if="feedbackStatus === '' || feedbackStatus === 'like'" src="@/assets/images/cai.png" />
+            <img v-else src="@/assets/images/cai-active.png" />
+        </div>
+    </div>
+    <el-dialog
+        v-model="dialogVisible"
+        :title="t('feedbackDialogTitle')"
+        width="400"
+        :align-center="true"
+        @close="cancelFeedback"
+    >
+        <el-input type="textarea" :rows="4" v-model="comment" />
+        <div class="operate-btn">
+            <el-button type="primary" :loading="submitLoading" @click="submitFeedback">确定</el-button>
+            <el-button @click="cancelFeedback">取消</el-button>
+        </div>
+    </el-dialog>
+</template>
+<script setup>
+    import { feedback } from '@/apis';
+    import { useI18n } from 'vue-i18n';
+
+    const { t } = useI18n();
+    const feedbackStatus = defineModel('feedbackStatus');
+    const curResponseId = defineModel('curResponseId');
+    const dialogVisible = ref(false);
+    const comment = ref('');
+    const submitLoading = ref(false);
+    const selectFeedbackStatus = val => {
+        if (!curResponseId.value) {
+            return;
+        }
+        feedbackStatus.value = val;
+        dialogVisible.value = true;
+    };
+    // 提交反馈
+    const submitFeedback = async () => {
+        submitLoading.value = true;
+        const { code, message } = await feedback({
+            response_id: curResponseId.value,
+            rating: feedbackStatus.value,
+            comment: comment.value
+        });
+        submitLoading.value = false;
+        if (code !== 0) {
+            ElMessage({
+                type: 'error',
+                message: message,
+                duration: 3000,
+                customClass: 'system-error'
+            });
+            return;
+        }
+        ElMessage.success('反馈成功');
+        dialogVisible.value = false;
+        setTimeout(() => {
+            feedbackStatus.value = '';
+        }, 2000);
+    };
+    const cancelFeedback = () => {
+        dialogVisible.value = false;
+        feedbackStatus.value = '';
+    };
+</script>
+<style lang="less" scoped>
+    .like-box {
+        display: flex;
+        margin: 0 16px;
+        .like-btn,
+        .dislike-btn {
+            width: 26px;
+            height: 26px;
+            background: #f3f3f3;
+            display: flex;
+            align-items: center;
+            justify-content: center;
+            border-radius: 8px;
+            cursor: pointer;
+            &:hover {
+                background: #d1d1d1;
+            }
+            img {
+                width: 16px;
+                height: 16px;
+            }
+        }
+        .dislike-btn {
+            margin-left: 16px;
+        }
+    }
+    .operate-btn {
+        margin-top: 20px;
+        display: flex;
+        justify-content: flex-end;
+        .el-button--primary {
+            background: #647fff;
+            border-color: #647fff;
+            &:hover {
+                border-color: #647fff;
+            }
+        }
+    }
+</style>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/components/ModelConfig/index copy.vue b/web_demos/minicpm-o_2.6/web_server/src/components/ModelConfig/index copy.vue
new file mode 100644
index 0000000..4e96640
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/components/ModelConfig/index copy.vue	
@@ -0,0 +1,404 @@
+<template>
+    <div class="user-config">
+        <div class="user-config-title">模型配置</div>
+        <div class="config-item">
+            <div class="config-item-label">语音打断：</div>
+            <div class="config-item-content">
+                <el-switch
+                    v-model="configData.canStopByVoice"
+                    inline-prompt
+                    active-text="是"
+                    inactive-text="否"
+                    size="small"
+                    :disabled="isCalling"
+                />
+            </div>
+        </div>
+        <div class="config-item">
+            <div class="config-item-label">视频画质：</div>
+            <div class="config-item-content">
+                <el-radio-group v-model="configData.videoQuality" :disabled="isCalling">
+                    <el-radio :value="true">高清</el-radio>
+                    <el-radio :value="false">低清</el-radio>
+                </el-radio-group>
+            </div>
+        </div>
+        <div class="config-item">
+            <div class="config-item-label">VAD阈值：</div>
+            <div class="config-item-content vad-slider">
+                <el-slider
+                    v-model="configData.vadThreshold"
+                    :min="0.5"
+                    :max="1"
+                    :step="0.1"
+                    size="small"
+                    :disabled="isCalling"
+                />
+            </div>
+        </div>
+        <!-- <div class="timbre-model">
+            <div class="timbre-model-label">音色人物：</div>
+            <div class="timbre-model-content">
+                <el-select
+                    v-model="configData.timbreId"
+                    style="width: 100%"
+                    @change="handleChangePeople"
+                    clearable
+                    placeholder="请选择"
+                >
+                    <el-option v-for="item in peopleList" :key="item.id" :value="item.id" :label="item.name">
+                        {{ item.name }}
+                    </el-option>
+                </el-select>
+            </div>
+        </div> -->
+        <div class="prompt-item">
+            <div class="prompt-item-label">Assistant_prompt：</div>
+            <div class="prompt-item-content">
+                <el-input
+                    type="textarea"
+                    :rows="3"
+                    v-model="configData.assistantPrompt"
+                    resize="none"
+                    :disabled="isCalling"
+                />
+            </div>
+        </div>
+        <div class="config-item">
+            <div class="config-item-label">使用语音prompt：</div>
+            <div class="config-item-content">
+                <el-switch
+                    v-model="configData.useAudioPrompt"
+                    inline-prompt
+                    active-text="是"
+                    inactive-text="否"
+                    size="small"
+                    :disabled="isCalling"
+                    @change="handleSelectUseAudioPrompt"
+                />
+            </div>
+        </div>
+        <div class="voice-prompt-box">
+            <div class="prompt-item" v-if="configData.useAudioPrompt">
+                <div class="prompt-item-label">Voice_clone_prompt：</div>
+                <div class="prompt-item-content">
+                    <el-input
+                        type="textarea"
+                        :rows="8"
+                        v-model="configData.voiceClonePrompt"
+                        resize="none"
+                        :disabled="isCalling"
+                    />
+                </div>
+            </div>
+
+            <div class="timbre-config" v-if="configData.useAudioPrompt">
+                <div class="timbre-config-label">音色选择：</div>
+                <div class="timbre-config-content">
+                    <el-checkbox-group v-model="configData.timbre" @change="handleSelectTimbre" :disabled="isCalling">
+                        <el-checkbox :value="1" label="Default Audio"></el-checkbox>
+                        <el-upload
+                            v-model:file-list="fileList"
+                            action=""
+                            :multiple="false"
+                            :on-change="handleChangeFile"
+                            :auto-upload="false"
+                            :show-file-list="false"
+                            :disabled="isCalling"
+                            accept="audio/*"
+                        >
+                            <el-checkbox :value="2">
+                                <!-- <span>Customization: Upload Audio</span> -->
+                                <span>Customization</span>
+                                <SvgIcon name="upload" className="checkbox-icon" />
+                            </el-checkbox>
+                        </el-upload>
+                    </el-checkbox-group>
+                </div>
+            </div>
+            <div class="file-content" v-if="fileName">
+                <SvgIcon name="document" class="document-icon" />
+                <span class="file-name">{{ fileName }}</span>
+            </div>
+        </div>
+    </div>
+</template>
+
+<script setup>
+    const isCalling = defineModel('isCalling');
+    const type = defineModel('type');
+
+    let defaultVoiceClonePrompt =
+        '你是一个AI助手。你能接受视频，音频和文本输入并输出语音和文本。模仿输入音频中的声音特征。';
+    let defaultAssistantPrompt = '作为助手，你将使用这种声音风格说话。';
+
+    const fileList = ref([]);
+    const fileName = ref('');
+
+    const configData = ref({
+        canStopByVoice: false,
+        videoQuality: false,
+        useAudioPrompt: true,
+        vadThreshold: 0.8,
+        voiceClonePrompt: defaultVoiceClonePrompt,
+        assistantPrompt: defaultAssistantPrompt,
+        timbre: [1],
+        audioFormat: 'mp3',
+        base64Str: '',
+        timbreId: ''
+    });
+
+    const peopleList = [
+        {
+            id: 1,
+            name: 'Trump',
+            voiceClonePrompt: '',
+            assistantPrompt: ''
+        },
+        {
+            id: 2,
+            name: '说相声',
+            voiceClonePrompt: '克隆音频提示中的音色以生成语音',
+            assistantPrompt: '请角色扮演这段音频，请以相声演员的口吻说话'
+        },
+        {
+            id: 3,
+            name: '默认',
+            voiceClonePrompt: defaultVoiceClonePrompt,
+            assistantPrompt: defaultAssistantPrompt
+        }
+    ];
+    watch(
+        () => type.value,
+        val => {
+            if (val === 'video') {
+                console.log('val: ', val);
+                defaultVoiceClonePrompt =
+                    '你是一个AI助手。你能接受视频，音频和文本输入并输出语音和文本。模仿输入音频中的声音特征。';
+                defaultAssistantPrompt = '作为助手，你将使用这种声音风格说话。';
+            } else {
+                defaultVoiceClonePrompt = '克隆音频提示中的音色以生成语音。';
+                defaultAssistantPrompt = 'Your task is to be a helpful assistant using this voice pattern.';
+            }
+            configData.value.voiceClonePrompt = defaultVoiceClonePrompt;
+            configData.value.assistantPrompt = defaultAssistantPrompt;
+        },
+        { immediate: true }
+    );
+    onMounted(() => {
+        handleSetStorage();
+    });
+    const handleSelectTimbre = e => {
+        if (e.length > 1) {
+            const val = e[e.length - 1];
+            configData.value.timbre = [val];
+            // 默认音色
+            if (val === 1) {
+                configData.value.audioFormat = 'mp3';
+                configData.value.base64Str = '';
+                fileList.value = [];
+                fileName.value = '';
+            }
+        }
+    };
+    const handleChangeFile = file => {
+        if (isAudio(file) && sizeNotExceed(file)) {
+            fileList.value = [file];
+            fileName.value = file.name;
+            configData.value.timbre = [2];
+            handleUpload();
+        } else {
+            ElMessage.error('Please upload audio file and size not exceed 10MB');
+        }
+    };
+    const isAudio = file => {
+        return file.raw.type.includes('audio');
+    };
+    const sizeNotExceed = file => {
+        return file.size / 1024 / 1024 <= 10;
+    };
+    const handleUpload = async () => {
+        const file = fileList.value[0].raw;
+        if (file) {
+            const reader = new FileReader();
+            reader.onload = e => {
+                const base64String = e.target.result.split(',')[1];
+                configData.value.audioFormat = file.name.split('.')[1];
+                configData.value.base64Str = base64String;
+            };
+            reader.readAsDataURL(file);
+        }
+    };
+    const handleSelectUseAudioPrompt = val => {
+        if (val) {
+            configData.value.voiceClonePrompt = defaultVoiceClonePrompt;
+            configData.value.assistantPrompt = defaultAssistantPrompt;
+        }
+    };
+    // 配置发生变化，更新到localstorage中
+    watch(configData.value, () => {
+        handleSetStorage();
+    });
+    const handleSetStorage = () => {
+        const { timbre, canStopByVoice, ...others } = configData.value;
+        const defaultConfigData = {
+            canStopByVoice,
+            ...others
+        };
+        localStorage.setItem('configData', JSON.stringify(defaultConfigData));
+        localStorage.setItem('canStopByVoice', canStopByVoice);
+    };
+    const handleChangePeople = val => {
+        console.log('val: ', val);
+        if (!val) {
+            return;
+        }
+        const index = peopleList.findIndex(item => item.id === val);
+        configData.value.voiceClonePrompt = peopleList[index].voiceClonePrompt;
+        configData.value.assistantPrompt = peopleList[index].assistantPrompt;
+        configData.value.timbre = [1];
+    };
+</script>
+<style lang="less">
+    .user-config {
+        &-title {
+            height: 61px;
+            padding: 18px 18px 0;
+            color: rgba(23, 23, 23, 0.9);
+            font-family: PingFang SC;
+            font-size: 16px;
+            font-style: normal;
+            font-weight: 500;
+            line-height: normal;
+        }
+        .config-item {
+            display: flex;
+            align-items: center;
+            width: 100%;
+            padding: 0 0 0 18px;
+            margin-bottom: 12px;
+            &-label {
+                width: 120px;
+                flex-shrink: 0;
+            }
+            &-content {
+                flex: 1;
+                margin-left: 16px;
+                .el-radio-group {
+                    .el-radio {
+                        width: 50px;
+                    }
+                }
+            }
+            &-content.vad-slider {
+                width: 80%;
+                padding-left: 7px;
+                margin-right: 20px;
+                .el-slider__button {
+                    width: 14px;
+                    height: 14px;
+                }
+            }
+        }
+        .timbre-config {
+            padding: 0 0 0 18px;
+            &-label {
+                margin-bottom: 12px;
+            }
+            &-content {
+                display: flex;
+                align-items: center;
+                .el-checkbox-group {
+                    display: flex;
+                    flex-wrap: wrap;
+                    flex: 1;
+                    > .el-checkbox {
+                        margin-right: 12px;
+                    }
+                }
+                .el-checkbox {
+                    padding: 8px 16px;
+                    border-radius: 10px;
+                    background: #eaefff;
+                    margin-bottom: 12px;
+                    height: 40px;
+                    .el-checkbox__input {
+                        .el-checkbox__inner {
+                            border: 1px solid #4dc100;
+                        }
+                    }
+                    .el-checkbox__input.is-checked {
+                        .el-checkbox__inner {
+                            background: #4dc100;
+                        }
+                    }
+                    .el-checkbox__input.is-checked.is-disabled {
+                        .el-checkbox__inner::after {
+                            border-color: #ffffff;
+                        }
+                    }
+                }
+                .el-checkbox__label {
+                    color: #7579eb !important;
+                    font-family: PingFang SC;
+                    font-size: 16px;
+                    font-style: normal;
+                    font-weight: 400;
+                    line-height: normal;
+                    display: flex;
+                    align-items: center;
+                    .checkbox-icon {
+                        margin-left: 4px;
+                    }
+                }
+                .el-checkbox + .el-checkbox {
+                    margin-left: 12px;
+                }
+            }
+        }
+        .prompt-item {
+            // padding: 0 0 0 18px;
+            margin-bottom: 12px;
+            &-label {
+                // margin-bottom: 16px;
+            }
+        }
+        .file-content {
+            padding: 0 0 0 18px;
+            font-size: 14px;
+            display: flex;
+            align-items: center;
+            .document-icon {
+                width: 16px;
+                height: 16px;
+                margin-right: 4px;
+            }
+            .file-name {
+                flex: 1;
+                overflow: hidden;
+                white-space: nowrap;
+                text-overflow: ellipsis;
+            }
+        }
+        .timbre-model {
+            padding: 0 0 0 18px;
+            margin-bottom: 12px;
+            display: flex;
+            align-items: center;
+            &-label {
+                width: 120px;
+                flex-shrink: 0;
+            }
+            &-content {
+                flex: 1;
+                margin-left: 16px;
+            }
+        }
+        .voice-prompt-box {
+            border: 1px solid #eaefff;
+            margin-left: 18px;
+            padding: 12px;
+            width: 50%;
+        }
+    }
+</style>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/components/ModelConfig/index.vue b/web_demos/minicpm-o_2.6/web_server/src/components/ModelConfig/index.vue
new file mode 100644
index 0000000..e04023e
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/components/ModelConfig/index.vue
@@ -0,0 +1,456 @@
+<template>
+    <div :class="`user-config ${t('modelConfigTitle') === '模型配置' ? '' : 'en-user-config'}`">
+        <div class="user-config-title">{{ t('modelConfigTitle') }}</div>
+        <div class="config-item">
+            <div class="config-item-label">
+                <span>{{ t('audioInterruptionBtn') }}</span>
+                <el-tooltip class="box-item" effect="dark" :content="t('audioInterruptionTips')" placement="top">
+                    <SvgIcon name="question" class="question-icon" /> </el-tooltip
+                >:
+            </div>
+            <div class="config-item-content">
+                <el-switch
+                    v-model="configData.canStopByVoice"
+                    inline-prompt
+                    :active-text="t('yes')"
+                    :inactive-text="t('no')"
+                    size="small"
+                    :disabled="isCalling"
+                />
+            </div>
+        </div>
+        <div class="config-item" v-if="type === 'video'">
+            <div class="config-item-label">
+                <span>{{ t('videoQualityBtn') }}</span>
+                <el-tooltip class="box-item" effect="dark" :content="t('videoQualityTips')" placement="top">
+                    <SvgIcon name="question" class="question-icon" /> </el-tooltip
+                >:
+            </div>
+            <div class="config-item-content">
+                <el-switch
+                    v-model="configData.videoQuality"
+                    inline-prompt
+                    :active-text="t('yes')"
+                    :inactive-text="t('no')"
+                    size="small"
+                    :disabled="isCalling"
+                />
+            </div>
+        </div>
+        <div class="config-item">
+            <div class="config-item-label">
+                <span>{{ t('vadThresholdBtn') }}</span>
+                <el-tooltip class="box-item" effect="dark" :content="t('vadThresholdTips')" placement="top">
+                    <SvgIcon name="question" class="question-icon" /> </el-tooltip
+                >:
+            </div>
+            <div class="config-item-content vad-slider">
+                <el-slider
+                    v-model="configData.vadThreshold"
+                    :min="0.5"
+                    :max="1"
+                    :step="0.1"
+                    size="small"
+                    :disabled="isCalling"
+                />
+            </div>
+        </div>
+        <div class="prompt-item" v-if="type === 'voice'">
+            <div class="prompt-item-label">
+                <span>{{ t('assistantPromptBtn') }}</span>
+                <el-tooltip class="box-item" effect="dark" :content="t('assistantPromptTips')" placement="top">
+                    <SvgIcon name="question" class="question-icon" /> </el-tooltip
+                >:
+            </div>
+            <div class="prompt-item-content">
+                <el-input
+                    type="textarea"
+                    :rows="3"
+                    v-model="configData.assistantPrompt"
+                    resize="none"
+                    :disabled="isCalling"
+                />
+            </div>
+        </div>
+        <!-- <div class="config-item">
+            <div class="config-item-label">{{ t('useVoicePromptBtn') }}:</div>
+            <div class="config-item-content">
+                <el-switch
+                    v-model="configData.useAudioPrompt"
+                    inline-prompt
+                    :active-text="t('yes')"
+                    :inactive-text="t('no')"
+                    size="small"
+                    :disabled="isCalling"
+                    @change="handleSelectUseAudioPrompt"
+                />
+            </div>
+        </div> -->
+        <div class="timbre-model">
+            <div class="timbre-model-label">
+                <span>{{ t('toneColorOptions') }}</span>
+                <el-tooltip class="box-item" effect="dark" :content="t('toneColorOptionsTips')" placement="top">
+                    <SvgIcon name="question" class="question-icon" /> </el-tooltip
+                >:
+            </div>
+            <div class="timbre-model-content">
+                <el-select
+                    v-model="configData.useAudioPrompt"
+                    style="width: 100%"
+                    @change="handleChangePeople"
+                    placeholder="请选择"
+                    :disabled="isCalling"
+                >
+                    <el-option :value="0" :label="t('nullOption')">{{ t('nullOption') }}</el-option>
+                    <el-option :value="1" :label="t('defaultOption')">{{ t('defaultOption') }}</el-option>
+                    <el-option :value="2" :label="t('femaleOption')">{{ t('femaleOption') }}</el-option>
+                    <el-option :value="3" :label="t('maleOption')">{{ t('maleOption') }}</el-option>
+                </el-select>
+            </div>
+        </div>
+        <!-- <div class="prompt-item">
+            <div class="prompt-item-label">
+                <span>{{ t('voiceClonePromptInput') }}</span>
+                <el-tooltip class="box-item" effect="dark" :content="t('voiceClonePromptTips')" placement="top">
+                    <SvgIcon name="question" class="question-icon" /> </el-tooltip
+                >:
+            </div>
+            <div class="prompt-item-content">
+                <el-input
+                    type="textarea"
+                    :rows="3"
+                    v-model="configData.voiceClonePrompt"
+                    resize="none"
+                    :disabled="true"
+                />
+            </div>
+        </div> -->
+        <!-- <div class="timbre-config" v-if="configData.useAudioPrompt">
+            <div class="timbre-config-label">{{ t('audioChoiceBtn') }}:</div>
+            <div class="timbre-config-content">
+                <el-checkbox-group v-model="configData.timbre" @change="handleSelectTimbre" :disabled="isCalling">
+                    <el-checkbox :value="1" :label="t('defaultAudioBtn')"></el-checkbox>
+                    <el-upload
+                        v-model:file-list="fileList"
+                        action=""
+                        :multiple="false"
+                        :on-change="handleChangeFile"
+                        :auto-upload="false"
+                        :show-file-list="false"
+                        :disabled="isCalling"
+                        accept="audio/*"
+                    >
+                        <el-checkbox :value="2">
+                            <span>{{ t('customizationBtn') }}</span>
+                            <SvgIcon name="upload" className="checkbox-icon" />
+                        </el-checkbox>
+                    </el-upload>
+                </el-checkbox-group>
+            </div>
+        </div>
+        <div class="file-content" v-if="fileName">
+            <SvgIcon name="document" class="document-icon" />
+            <span class="file-name">{{ fileName }}</span>
+        </div> -->
+    </div>
+</template>
+
+<script setup>
+    const isCalling = defineModel('isCalling');
+    const type = defineModel('type');
+    import { useI18n } from 'vue-i18n';
+
+    const { t, locale } = useI18n();
+
+    let defaultVoiceClonePrompt =
+        '你是一个AI助手。你能接受视频，音频和文本输入并输出语音和文本。模仿输入音频中的声音特征。';
+    let defaultAssistantPrompt = '';
+
+    const fileList = ref([]);
+    const fileName = ref('');
+
+    const configData = ref({
+        canStopByVoice: false,
+        videoQuality: false,
+        useAudioPrompt: 1,
+        vadThreshold: 0.8,
+        voiceClonePrompt: defaultVoiceClonePrompt,
+        assistantPrompt: defaultAssistantPrompt,
+        timbre: [1],
+        audioFormat: 'mp3',
+        base64Str: ''
+    });
+
+    // let peopleList = [];
+    // watch(
+    //     () => type.value,
+    //     val => {
+    //         console.log('val: ', val);
+    //         if (val === 'video') {
+    //             defaultVoiceClonePrompt =
+    //                 '你是一个AI助手。你能接受视频，音频和文本输入并输出语音和文本。模仿输入音频中的声音特征。';
+    //             defaultAssistantPrompt = '作为助手，你将使用这种声音风格说话。';
+    //         } else {
+    //             defaultVoiceClonePrompt = '克隆音频提示中的音色以生成语音。';
+    //             defaultAssistantPrompt = 'Your task is to be a helpful assistant using this voice pattern.';
+    //         }
+    //         configData.value.voiceClonePrompt = defaultVoiceClonePrompt;
+    //         configData.value.assistantPrompt = defaultAssistantPrompt;
+    //     },
+    //     { immediate: true }
+    // );
+    watch(
+        locale,
+        (newLocale, oldLocale) => {
+            console.log(`Language switched from ${oldLocale} to ${newLocale}`);
+            if (newLocale === 'zh' && type.value === 'video') {
+                defaultAssistantPrompt = '作为助手，你将使用这种声音风格说话。';
+            } else if (newLocale === 'zh' && type.value === 'voice') {
+                defaultAssistantPrompt = '作为助手，你将使用这种声音风格说话。';
+            } else if (newLocale === 'en' && type.value === 'video') {
+                defaultAssistantPrompt = 'As an assistant, you will speak using this voice style.';
+            } else {
+                defaultAssistantPrompt = 'As an assistant, you will speak using this voice style.';
+            }
+            configData.value.assistantPrompt = defaultAssistantPrompt;
+        },
+        { immediate: true }
+    );
+    onMounted(() => {
+        handleSetStorage();
+    });
+    const handleSelectTimbre = e => {
+        if (e.length > 1) {
+            const val = e[e.length - 1];
+            configData.value.timbre = [val];
+            // 默认音色
+            if (val === 1) {
+                configData.value.audioFormat = 'mp3';
+                configData.value.base64Str = '';
+                fileList.value = [];
+                fileName.value = '';
+            }
+        }
+    };
+    const handleChangeFile = file => {
+        if (isAudio(file) && sizeNotExceed(file)) {
+            fileList.value = [file];
+            fileName.value = file.name;
+            configData.value.timbre = [2];
+            handleUpload();
+        } else {
+            ElMessage.error('Please upload audio file and size not exceed 10MB');
+        }
+    };
+    const isAudio = file => {
+        return file.raw.type.includes('audio');
+    };
+    const sizeNotExceed = file => {
+        return file.size / 1024 / 1024 <= 10;
+    };
+    const handleUpload = async () => {
+        const file = fileList.value[0].raw;
+        if (file) {
+            const reader = new FileReader();
+            reader.onload = e => {
+                const base64String = e.target.result.split(',')[1];
+                configData.value.audioFormat = file.name.split('.')[1];
+                configData.value.base64Str = base64String;
+            };
+            reader.readAsDataURL(file);
+        }
+    };
+    const handleSelectUseAudioPrompt = val => {
+        if (val) {
+            configData.value.voiceClonePrompt = defaultVoiceClonePrompt;
+            configData.value.assistantPrompt = defaultAssistantPrompt;
+        }
+    };
+    // 配置发生变化，更新到localstorage中
+    watch(configData.value, () => {
+        handleSetStorage();
+    });
+    const handleSetStorage = () => {
+        const { timbre, canStopByVoice, ...others } = configData.value;
+        const defaultConfigData = {
+            canStopByVoice,
+            ...others
+        };
+        localStorage.setItem('configData', JSON.stringify(defaultConfigData));
+        localStorage.setItem('canStopByVoice', canStopByVoice);
+    };
+    const handleChangePeople = val => {
+        console.log('val: ', val);
+        // const index = peopleList.findIndex(item => item.id === val);
+        configData.value.voiceClonePrompt = defaultVoiceClonePrompt;
+        configData.value.assistantPrompt = defaultAssistantPrompt;
+        configData.value.timbre = [1];
+    };
+</script>
+<style lang="less" scoped>
+    .user-config {
+        &-title {
+            height: 61px;
+            padding: 18px 18px 0;
+            color: rgba(23, 23, 23, 0.9);
+            font-family: PingFang SC;
+            font-size: 16px;
+            font-style: normal;
+            font-weight: 500;
+            line-height: normal;
+        }
+        .config-item {
+            display: flex;
+            align-items: center;
+            width: 100%;
+            padding: 0 0 0 18px;
+            margin-bottom: 20px;
+            &-label {
+                width: 120px;
+                flex-shrink: 0;
+                display: flex;
+                align-items: center;
+            }
+            &-content {
+                flex: 1;
+                margin-left: 16px;
+                .el-radio-group {
+                    .el-radio {
+                        width: 50px;
+                    }
+                }
+            }
+            &-content.vad-slider {
+                width: 80%;
+                padding-left: 7px;
+                margin-right: 20px;
+                .el-slider__button {
+                    width: 14px;
+                    height: 14px;
+                }
+            }
+        }
+        .timbre-config {
+            padding: 0 0 0 18px;
+            &-label {
+                margin-bottom: 20px;
+                display: flex;
+                align-items: center;
+            }
+            &-content {
+                display: flex;
+                align-items: center;
+                .el-checkbox-group {
+                    display: flex;
+                    flex-wrap: wrap;
+                    flex: 1;
+                    > .el-checkbox {
+                        margin-right: 12px;
+                    }
+                }
+                .el-checkbox {
+                    padding: 8px 16px;
+                    border-radius: 10px;
+                    background: #eaefff;
+                    margin-bottom: 12px;
+                    height: 40px;
+                    .el-checkbox__input {
+                        .el-checkbox__inner {
+                            border: 1px solid #4dc100;
+                        }
+                    }
+                    .el-checkbox__input.is-checked {
+                        .el-checkbox__inner {
+                            background: #4dc100;
+                        }
+                    }
+                    .el-checkbox__input.is-checked.is-disabled {
+                        .el-checkbox__inner::after {
+                            border-color: #ffffff;
+                        }
+                    }
+                }
+                .el-checkbox__label {
+                    color: #7579eb !important;
+                    font-family: PingFang SC;
+                    font-size: 16px;
+                    font-style: normal;
+                    font-weight: 400;
+                    line-height: normal;
+                    display: flex;
+                    align-items: center;
+                    .checkbox-icon {
+                        margin-left: 4px;
+                    }
+                }
+                .el-checkbox + .el-checkbox {
+                    margin-left: 12px;
+                }
+            }
+        }
+        .prompt-item {
+            padding: 0 0 0 18px;
+            margin-bottom: 20px;
+            &-label {
+                // margin-bottom: 16px;
+                display: flex;
+                align-items: center;
+            }
+        }
+        .file-content {
+            padding: 0 0 0 18px;
+            font-size: 14px;
+            display: flex;
+            align-items: center;
+            .document-icon {
+                width: 16px;
+                height: 16px;
+                margin-right: 4px;
+            }
+            .file-name {
+                flex: 1;
+                overflow: hidden;
+                white-space: nowrap;
+                text-overflow: ellipsis;
+            }
+        }
+        .timbre-model {
+            padding: 0 0 0 18px;
+            margin-bottom: 20px;
+            display: flex;
+            align-items: center;
+            &-label {
+                width: 120px;
+                flex-shrink: 0;
+                display: flex;
+                align-items: center;
+            }
+            &-content {
+                flex: 1;
+                margin-left: 16px;
+            }
+        }
+    }
+    .en-user-config {
+        .config-item-label {
+            width: 160px;
+        }
+        .timbre-model-label {
+            width: 160px;
+        }
+    }
+    .question-icon {
+        width: 14px;
+        height: 14px;
+        cursor: pointer;
+        margin-left: 6px;
+    }
+</style>
+<style lang="less">
+    .el-switch--small .el-switch__core {
+        min-width: 50px;
+    }
+    .el-popper.is-dark {
+        max-width: 300px;
+    }
+</style>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/components/ModelOutput/index.vue b/web_demos/minicpm-o_2.6/web_server/src/components/ModelOutput/index.vue
new file mode 100644
index 0000000..563aa2f
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/components/ModelOutput/index.vue
@@ -0,0 +1,91 @@
+<template>
+    <div class="output-area">
+        <div
+            :class="`output-area-item ${item.type === 'USER' ? 'user-item' : 'bot-item'}`"
+            :key="index"
+            v-for="(item, index) in outputData"
+        >
+            <div v-if="item.type === 'USER'" class="user-input">
+                <audio v-if="item.audio" :src="item.audio" controls></audio>
+            </div>
+            <div v-else class="bot-output">
+                <div class="output-item">{{ item.text }}</div>
+                <audio v-if="item.audio" :src="item.audio" controls></audio>
+            </div>
+        </div>
+    </div>
+</template>
+
+<script setup>
+    const props = defineProps({
+        outputData: {
+            type: Array,
+            default: () => []
+        },
+        containerClass: {
+            type: String,
+            default: ''
+        }
+    });
+    watch(
+        () => props.outputData,
+        newVal => {
+            nextTick(() => {
+                if (newVal && props.containerClass) {
+                    let dom = document.querySelector(`.${props.containerClass}`);
+                    if (dom) {
+                        dom.scrollTop = dom.scrollHeight;
+                    }
+                }
+            });
+        },
+        { deep: true }
+    );
+</script>
+
+<style lang="less" scoped>
+    .output-area {
+        display: flex;
+        flex-direction: column;
+        &-item {
+            width: fit-content;
+        }
+        &-item + &-item {
+            margin-top: 16px;
+        }
+        &-item.user-item {
+            align-self: flex-end;
+            .user-input {
+            }
+        }
+        &-item.bot-item {
+            align-self: flex-start;
+            width: 100%;
+            .bot-output {
+                width: 100%;
+                display: flex;
+                flex-direction: column;
+                .output-item {
+                    padding: 8px 24px;
+                    border-radius: 10px;
+                    color: #202224;
+                    background: #f3f3f3;
+                    max-width: 90%;
+                    width: fit-content;
+                    font-family: PingFang SC;
+                    font-size: 16px;
+                    font-style: normal;
+                    font-weight: 400;
+                    line-height: normal;
+                    word-break: break-all;
+                    word-wrap: break-word;
+                    white-space: pre-wrap;
+                    display: inline-block;
+                }
+                .output-item + audio {
+                    margin-top: 16px;
+                }
+            }
+        }
+    }
+</style>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/components/SelectTimbre/index.vue b/web_demos/minicpm-o_2.6/web_server/src/components/SelectTimbre/index.vue
new file mode 100644
index 0000000..eee1769
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/components/SelectTimbre/index.vue
@@ -0,0 +1,122 @@
+<template>
+    <div class="select-timbre">
+        <el-checkbox-group v-model="timbre" @change="handleSelectTimbre" :disabled="disabled">
+            <el-checkbox :value="1" label="Default Audio"></el-checkbox>
+            <!-- <el-upload
+                v-model:file-list="fileList"
+                action=""
+                :multiple="false"
+                :on-change="handleChangeFile"
+                :auto-upload="false"
+                :show-file-list="false"
+                :disabled="disabled"
+                accept="audio/*"
+            >
+                <el-checkbox :value="2">
+                    <span>Customization: Upload Audio</span>
+                    <SvgIcon name="upload" className="checkbox-icon" />
+                </el-checkbox>
+            </el-upload> -->
+        </el-checkbox-group>
+    </div>
+</template>
+
+<script setup>
+    const timbre = defineModel('timbre');
+    const audioData = defineModel('audioData');
+    const disabled = defineModel('disabled');
+    const fileList = ref([]);
+
+    const handleSelectTimbre = e => {
+        if (e.length > 1) {
+            const val = e[e.length - 1];
+            timbre.value = [val];
+            // 默认音色
+            if (val === 1) {
+                audioData.value = {
+                    base64Str: '',
+                    type: 'mp3'
+                };
+            }
+        }
+    };
+    const handleChangeFile = file => {
+        if (isAudio(file) && sizeNotExceed(file)) {
+            fileList.value = [file];
+            timbre.value = [2];
+            handleUpload();
+        } else {
+            ElMessage.error('Please upload audio file and size not exceed 1MB');
+        }
+    };
+    const isAudio = file => {
+        return file.name.endsWith('.mp3') || file.name.endsWith('.wav');
+    };
+    const sizeNotExceed = file => {
+        return file.size / 1024 / 1024 <= 1;
+    };
+    const handleUpload = async () => {
+        const file = fileList.value[0].raw;
+        if (file) {
+            const reader = new FileReader();
+            reader.onload = e => {
+                const base64String = e.target.result.split(',')[1];
+                audioData.value = {
+                    base64Str: base64String,
+                    type: file.name.split('.')[1]
+                };
+            };
+            reader.readAsDataURL(file);
+        }
+    };
+</script>
+<style lang="less">
+    .select-timbre {
+        display: flex;
+        align-items: center;
+        .el-checkbox-group {
+            display: flex;
+            > .el-checkbox {
+                margin-right: 12px;
+            }
+        }
+        .el-checkbox {
+            padding: 8px 16px;
+            border-radius: 10px;
+            background: #eaefff;
+            margin-right: 0;
+            height: 40px;
+            .el-checkbox__input {
+                .el-checkbox__inner {
+                    border: 1px solid #4dc100;
+                }
+            }
+            .el-checkbox__input.is-checked {
+                .el-checkbox__inner {
+                    background: #4dc100;
+                }
+            }
+            .el-checkbox__input.is-checked.is-disabled {
+                .el-checkbox__inner::after {
+                    border-color: #ffffff;
+                }
+            }
+        }
+        .el-checkbox__label {
+            color: #7579eb !important;
+            font-family: PingFang SC;
+            font-size: 16px;
+            font-style: normal;
+            font-weight: 400;
+            line-height: normal;
+            display: flex;
+            align-items: center;
+            .checkbox-icon {
+                margin-left: 4px;
+            }
+        }
+        .el-checkbox + .el-checkbox {
+            margin-left: 12px;
+        }
+    }
+</style>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/components/SkipBtn/index.vue b/web_demos/minicpm-o_2.6/web_server/src/components/SkipBtn/index.vue
new file mode 100644
index 0000000..5b64b80
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/components/SkipBtn/index.vue
@@ -0,0 +1,67 @@
+<template>
+    <div :class="`skip-btn ${disabled ? 'disabled-btn' : ''}`">
+        <div class="pause-icon">
+            <SvgIcon name="pause" className="pause-svg" />
+        </div>
+        <span class="btn-text">{{ t('skipMessageBtn') }}</span>
+    </div>
+</template>
+<script setup>
+    import { useI18n } from 'vue-i18n';
+
+    const { t } = useI18n();
+    defineProps({
+        disabled: {
+            type: Boolean,
+            default: false
+        }
+    });
+</script>
+<style lang="less">
+    .skip-btn {
+        flex-shrink: 0;
+        display: flex;
+        align-items: center;
+        padding: 8px 14px 8px 10px;
+        border-radius: 90px;
+        background: #5865f2;
+        cursor: pointer;
+        user-select: none;
+        .pause-icon {
+            display: flex;
+            justify-content: center;
+            align-items: center;
+            width: 32px;
+            height: 32px;
+            background: #ffffff;
+            border-radius: 50%;
+            margin-right: 8px;
+            .pause-svg {
+                width: 18px;
+                height: 18px;
+                color: #5865f2;
+            }
+        }
+        .btn-text {
+            color: #fff;
+            font-family: PingFang SC;
+            font-size: 16px;
+            font-style: normal;
+            font-weight: 400;
+            line-height: normal;
+        }
+    }
+    .disabled-btn {
+        cursor: not-allowed;
+        background: #f3f3f3;
+        .pause-icon {
+            background: #d1d1d1;
+            .pause-svg {
+                color: #ffffff;
+            }
+        }
+        .btn-text {
+            color: #d1d1d1;
+        }
+    }
+</style>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/components/SvgIcon/index.vue b/web_demos/minicpm-o_2.6/web_server/src/components/SvgIcon/index.vue
new file mode 100644
index 0000000..9f53456
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/components/SvgIcon/index.vue
@@ -0,0 +1,39 @@
+<template>
+    <svg :class="iconClass" v-html="content"></svg>
+</template>
+
+<script setup>
+    const props = defineProps({
+        name: {
+            type: String,
+            required: true
+        },
+        className: {
+            type: String,
+            default: ''
+        }
+    });
+
+    const content = ref('');
+
+    const iconClass = computed(() => ['svg-icon', props.className]);
+    onMounted(() => {
+        import(`@/assets/svg/${props.name}.svg`)
+            .then(module => {
+                fetch(module.default)
+                    .then(response => response.text())
+                    .then(svg => {
+                        content.value = svg;
+                    });
+            })
+            .catch(error => {
+                console.error(`Error loading SVG icon: ${props.name}`, error);
+            });
+    });
+</script>
+<style lang="less" scoped>
+    .svg-icon {
+        width: 24px;
+        height: 24px;
+    }
+</style>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/components/Voice/index.vue b/web_demos/minicpm-o_2.6/web_server/src/components/Voice/index.vue
new file mode 100644
index 0000000..342c896
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/components/Voice/index.vue
@@ -0,0 +1,138 @@
+<template>
+    <div class="bars" id="bars" :style="boxStyle">
+        <!-- 柱形条 -->
+        <div class="bar" v-for="(item, index) in defaultList" :key="index" :style="itemAttr(item)"></div>
+    </div>
+</template>
+
+<script setup>
+    const props = defineProps({
+        analyser: {
+            type: Object
+        },
+        dataArray: {
+            type: [Array, Uint8Array]
+        },
+        isCalling: {
+            type: Boolean,
+            default: false
+        },
+        isPlaying: {
+            type: Boolean,
+            default: false
+        },
+        // 容器高度
+        boxStyle: {
+            type: Object,
+            default: () => {
+                return {
+                    height: '80px'
+                };
+            }
+        },
+        // 柱形条宽度
+        itemStyle: {
+            type: Object,
+            default: () => {
+                return {
+                    width: '6px',
+                    margin: '0 2px',
+                    borderRadius: '5px'
+                };
+            }
+        },
+        configList: {
+            type: Array,
+            default: () => []
+        }
+    });
+    const animationFrameId = ref();
+    const defaultList = ref([]);
+    const bgColor = ref('#4c5cf8');
+    const itemAttr = computed(() => item => {
+        return {
+            height: item + 'px',
+            ...props.itemStyle
+        };
+    });
+    watch(
+        () => props.dataArray,
+        newVal => {
+            if (newVal && props.isCalling) {
+                console.log('draw');
+                drawBars();
+            } else {
+                console.log('stop');
+                stopDraw();
+            }
+        }
+    );
+    watch(
+        () => props.configList,
+        newVal => {
+            if (newVal.length > 0) {
+                defaultList.value = newVal;
+            }
+        },
+        { immediate: true }
+    );
+    watch(
+        () => props.isPlaying,
+        newVal => {
+            if (newVal) {
+                // 绿色
+                bgColor.value = '#4dc100';
+            } else {
+                // 蓝色
+                bgColor.value = '#4c5cf8';
+            }
+        }
+    );
+    function drawBars() {
+        const bars = document.querySelectorAll('.bar');
+        if (bars.length === 0) {
+            cancelAnimationFrame(animationFrameId.value);
+            return;
+        }
+
+        const maxHeight = document.querySelector('.bars').clientHeight; // 最大高度为容器的高度
+
+        const averageVolume = props.dataArray.reduce((sum, value) => sum + value, 0) / props.dataArray.length;
+        const normalizedVolume = props.isPlaying ? Math.random() : averageVolume / 128; // 将音量数据归一化为0到1之间
+
+        bars.forEach((bar, index) => {
+            const minHeight = defaultList.value[index];
+            const randomFactor = Math.random() * 1.5 + 0.5; // 随机因子
+            const newHeight = Math.min(
+                maxHeight,
+                minHeight + (maxHeight - minHeight) * normalizedVolume * randomFactor
+            ); // 根据音量设置高度
+            bar.style.height = `${newHeight}px`; // 设置新的高度
+            bar.style.backgroundColor = bgColor.value;
+        });
+
+        animationFrameId.value = requestAnimationFrame(drawBars);
+    }
+    const stopDraw = () => {
+        if (animationFrameId.value) {
+            cancelAnimationFrame(animationFrameId.value);
+        }
+    };
+</script>
+
+<style lang="less" scoped>
+    .bars {
+        display: flex;
+        justify-content: center;
+        align-items: center;
+    }
+    .bar {
+        // width: 6px;
+        // margin: 0 2px;
+        background-color: #4c5cf8;
+        transition:
+            height 0.1s,
+            background-color 0.1s;
+        border-radius: 5px; /* 圆角 */
+    }
+</style>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/directives/index.js b/web_demos/minicpm-o_2.6/web_server/src/directives/index.js
new file mode 100644
index 0000000..7741ef8
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/directives/index.js
@@ -0,0 +1,8 @@
+/**
+ * Configure and register global directives
+ */
+import ElTableInfiniteScroll from 'el-table-infinite-scroll';
+
+export function setupGlobDirectives(app) {
+    app.use(ElTableInfiniteScroll);
+}
diff --git a/web_demos/minicpm-o_2.6/web_server/src/enums/index.js b/web_demos/minicpm-o_2.6/web_server/src/enums/index.js
new file mode 100644
index 0000000..2bff009
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/enums/index.js
@@ -0,0 +1,18 @@
+export const voiceIdeasList = ['TBD', 'TBD', 'TBD'];
+export const videoIdeasList = ['TBD', 'TBD', 'TBD'];
+export const limitTime = 10 * 60; // 限制单次使用时常不超过10分钟
+export const tipsRemainingTime = 30; // 剩余30s时提醒用户
+// 初始音频波形
+export const voiceConfigList = [
+    16, 16, 16, 16, 36, 58, 50, 70, 50, 58, 36, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 46, 28,
+    60, 28, 68, 60, 28, 46, 16, 16, 16, 16, 16, 16, 16, 16, 36, 58, 50, 70, 50, 58, 36, 16, 16, 16, 16, 16, 16, 16, 16,
+    16, 16, 16, 16, 16, 16, 16, 16, 46, 28, 60, 28, 68, 60, 28, 46, 16, 16, 16, 16
+];
+// 初始视频中的音频波形
+export const videoConfigList = [
+    8, 8, 8, 8, 18, 28, 26, 36, 26, 28, 18, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 24, 14, 30, 14, 34, 30, 14,
+    24, 8, 8, 8, 8, 8, 8, 8, 8, 18, 28, 26, 36, 26, 28, 18, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 24, 14, 30,
+    14, 34, 30, 14, 24, 8, 8, 8, 8, 8, 8, 8, 8, 18, 28, 26, 36, 26, 28, 18, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
+    8, 24, 14, 30, 14, 34, 30, 14, 24, 8, 8, 8, 8
+];
+export const showIdeasList = false;
diff --git a/web_demos/minicpm-o_2.6/web_server/src/hooks/useHttp.js b/web_demos/minicpm-o_2.6/web_server/src/hooks/useHttp.js
new file mode 100644
index 0000000..b2eedd7
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/hooks/useHttp.js
@@ -0,0 +1,61 @@
+import axios from 'axios';
+import { setNewUserId, getNewUserId } from './useRandomId';
+
+// 创建实例时配置默认值
+const service = axios.create({
+    baseURL: '/',
+    timeout: 30000,
+    responseType: 'json'
+});
+
+// 请求拦截器
+service.interceptors.request.use(config => {
+    if (config.url.includes('stream')) {
+        config.timeout = 3000;
+    }
+    if (window.location.search) {
+        config.url += window.location.search;
+    }
+    Object.assign(config.headers, ajaxHeader());
+    return config;
+});
+
+// 响应拦截器
+service.interceptors.response.use(
+    response => {
+        let res = response.data;
+        if (response?.status === 200) {
+            return Promise.resolve({
+                code: 0,
+                message: '',
+                data: res
+            });
+        }
+        return Promise.resolve({ code: -1, message: '网络异常，请稍后再试', data: null });
+    },
+    error => {
+        const res = { code: -1, message: error?.response?.data?.detail || '网络异常，请稍后再试', data: null };
+        return Promise.resolve(res);
+    }
+);
+
+export const ajaxHeader = () => {
+    if (!localStorage.getItem('uid')) {
+        setNewUserId();
+    }
+    return {
+        'Content-Type': 'application/json;charset=UTF-8',
+        Accept: 'application/json',
+        service: 'minicpmo-server',
+        uid: getNewUserId()
+    };
+};
+
+export default {
+    get(url, params, config = {}) {
+        return service.get(url, { params, ...config });
+    },
+    post(url, data, config = {}) {
+        return service.post(url, data, { ...config });
+    }
+};
diff --git a/web_demos/minicpm-o_2.6/web_server/src/hooks/useQueue.js b/web_demos/minicpm-o_2.6/web_server/src/hooks/useQueue.js
new file mode 100644
index 0000000..5cfbb00
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/hooks/useQueue.js
@@ -0,0 +1,95 @@
+export class TaskQueue {
+    constructor() {
+        this.tasks = [];
+        this.isRunning = false;
+        this.isPaused = false;
+        this.currentTask = null;
+    }
+
+    // 添加任务到队列
+    addTask(task) {
+        this.tasks.push(task);
+        if (!this.isRunning) {
+            this.start();
+        }
+    }
+
+    // 删除任务
+    removeTask(taskToRemove) {
+        this.tasks = this.tasks.filter(task => task !== taskToRemove);
+    }
+
+    // 清空任务队列
+    clearQueue() {
+        this.tasks = [];
+    }
+
+    // 暂停任务执行
+    pause() {
+        this.isPaused = true;
+    }
+
+    // 恢复任务执行
+    resume() {
+        if (this.isPaused) {
+            this.isPaused = false;
+            if (!this.isRunning) {
+                this.start();
+            }
+        }
+    }
+
+    // 内部启动方法
+    async start() {
+        this.isRunning = true;
+        while (this.tasks.length > 0 && !this.isPaused) {
+            this.currentTask = this.tasks.shift();
+            await this.currentTask();
+
+            // 检查是否暂停或任务队列已清空
+            if (this.isPaused || this.tasks.length === 0) {
+                this.isRunning = false;
+                break;
+            }
+        }
+        this.isRunning = false;
+    }
+}
+
+// 示例任务函数
+function exampleTask(id) {
+    return () =>
+        new Promise(resolve => {
+            console.log(`Executing task ${id}`);
+            setTimeout(() => {
+                console.log(`Task ${id} completed`);
+                resolve();
+            }, 1000); // 每个任务耗时1秒
+        });
+}
+
+// 测试示例
+const queue = new TaskQueue();
+
+// 添加任务到队列
+for (let i = 1; i <= 5; i++) {
+    queue.addTask(exampleTask(i));
+}
+
+// 暂停队列，在2.5秒后执行
+setTimeout(() => {
+    console.log('Pausing queue...');
+    queue.pause();
+}, 2500);
+
+// 恢复队列，在4.5秒后执行
+setTimeout(() => {
+    console.log('Resuming queue...');
+    queue.resume();
+}, 4500);
+
+// 清空队列，在3秒后执行
+setTimeout(() => {
+    console.log('Clearing queue...');
+    queue.clearQueue();
+}, 3000);
diff --git a/web_demos/minicpm-o_2.6/web_server/src/hooks/useRandomId.js b/web_demos/minicpm-o_2.6/web_server/src/hooks/useRandomId.js
new file mode 100644
index 0000000..0399892
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/hooks/useRandomId.js
@@ -0,0 +1,9 @@
+const uid = 'uid';
+export const setNewUserId = () => {
+    const randomId = Math.random().toString(36).slice(2).toUpperCase();
+    localStorage.setItem(uid, randomId);
+    return randomId;
+};
+export const getNewUserId = () => {
+    return localStorage.getItem('uid');
+};
diff --git a/web_demos/minicpm-o_2.6/web_server/src/hooks/useVoice.js b/web_demos/minicpm-o_2.6/web_server/src/hooks/useVoice.js
new file mode 100644
index 0000000..88fbc71
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/hooks/useVoice.js
@@ -0,0 +1,38 @@
+const writeString = (view, offset, string) => {
+    for (let i = 0; i < string.length; i++) {
+        view.setUint8(offset + i, string.charCodeAt(i));
+    }
+};
+const floatTo16BitPCM = (output, offset, input) => {
+    for (let i = 0; i < input.length; i++, offset += 2) {
+        const s = Math.max(-1, Math.min(1, input[i]));
+        output.setInt16(offset, s < 0 ? s * 0x8000 : s * 0x7fff, true);
+    }
+};
+// audio buffer to wav file, need add 44 length header
+export const encodeWAV = (samples, sampleRate) => {
+    const buffer = new ArrayBuffer(44 + samples.length * 2);
+    const view = new DataView(buffer);
+    const numChannels = 1;
+    const bitsPerSample = 16;
+
+    /* WAV 标头 */
+    writeString(view, 0, 'RIFF');
+    view.setUint32(4, 36 + samples.length * 2, true);
+    writeString(view, 8, 'WAVE');
+    writeString(view, 12, 'fmt ');
+    view.setUint32(16, 16, true);
+    view.setUint16(20, 1, true);
+    view.setUint16(22, numChannels, true);
+    view.setUint32(24, sampleRate, true);
+    view.setUint32(28, (sampleRate * numChannels * bitsPerSample) / 8, true);
+    view.setUint16(32, (numChannels * bitsPerSample) / 8, true);
+    view.setUint16(34, bitsPerSample, true);
+    writeString(view, 36, 'data');
+    view.setUint32(40, samples.length * 2, true);
+
+    /* PCM 数据 */
+    floatTo16BitPCM(view, 44, samples);
+
+    return new Blob([view], { type: 'audio/wav' });
+};
diff --git a/web_demos/minicpm-o_2.6/web_server/src/i18n/en.json b/web_demos/minicpm-o_2.6/web_server/src/i18n/en.json
new file mode 100644
index 0000000..9f3622a
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/i18n/en.json
@@ -0,0 +1,36 @@
+{
+    "menuTabVideo": "Realtime Video Call",
+    "menuTabAudio": "Realtime Voice Call",
+    "menuTabChatbot": "Chatbot",
+    "videoCallBtn": "Call MiniCPM-omni",
+    "audioCallBtn": "Call MiniCPM-omni",
+    "hangUpBtn": "Hang Up",
+    "notReadyBtn": "Not ready yet, please wait",
+    "skipMessageBtn": "Skip this message",
+    "feedbackDialogTitle": "Feedback issue",
+    "modelConfigTitle": "Model Config",
+    "audioInterruptionBtn": "Speech Interruption",
+    "audioInterruptionTips": "When the \"voice interruption\" mode is enabled, it allows users to interrupt the model while it is speaking. The model will immediately terminate the previous round of generation and respond to the user's latest question.",
+    "yes": "Yes",
+    "no": "No",
+    "videoQualityBtn": "HD Mode",
+    "videoQualityTips": "When the \"high resulation\" mode is enabled, the model will perform high resolution encoding on the last frame, allowing the model to see more detailed parts.",
+    "high": "High",
+    "low": "Low",
+    "vadThresholdBtn": "VAD Threshold",
+    "vadThresholdTips": "The VAD threshold indicates how long the sound needs to be silent before triggering inference. If the VAD threshold is too low, it may trigger accidentally during speech pauses, while if it's too high, it will result in slower initial response.",
+    "assistantPromptBtn": "Task Prompt",
+    "assistantPromptTips": "Model task instructions are used to support different task objectives.",
+    "useVoicePromptBtn": "Tone Color Prompt",
+    "voiceClonePromptInput": "Tone Color Prompt",
+    "voiceClonePromptTips": "Tone Color Prompt tips",
+    "audioChoiceBtn": "Audio Choice",
+    "defaultAudioBtn": "Default Audio",
+    "customizationBtn": "Customization: Upload Audio",
+    "toneColorOptions": "Voice Options",
+    "toneColorOptionsTips": "We have provided a selection of sample tone colors, and you also have the option to choose \"none\" and instruct the model to create a new tone color.",
+    "nullOption": "Null",
+    "defaultOption": "Female 1(Default)",
+    "femaleOption": "Female 2",
+    "maleOption": "Male 1"
+}
diff --git a/web_demos/minicpm-o_2.6/web_server/src/i18n/zh.json b/web_demos/minicpm-o_2.6/web_server/src/i18n/zh.json
new file mode 100644
index 0000000..f813738
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/i18n/zh.json
@@ -0,0 +1,36 @@
+{
+    "menuTabVideo": "实时视频通话",
+    "menuTabAudio": "实时语音通话",
+    "menuTabChatbot": "聊天机器人",
+    "videoCallBtn": "视频通话",
+    "audioCallBtn": "语音通话",
+    "hangUpBtn": "挂断",
+    "notReadyBtn": "服务繁忙，请稍后",
+    "skipMessageBtn": "跳过当前对话",
+    "feedbackDialogTitle": "请输入反馈意见",
+    "modelConfigTitle": "模型配置",
+    "audioInterruptionBtn": "语音打断",
+    "audioInterruptionTips": "开启\"语音打断\"功能，支持在模型说话时打断模型，模型会立刻结束上一轮的生成，并支持用户最新的问题。",
+    "yes": "是",
+    "no": "否",
+    "videoQualityBtn": "高清模式",
+    "videoQualityTips": "开启高清模式，模型会在最后一帧对图片进行高清编码，可以使得模型看得清更细节的部分。",
+    "high": "高清",
+    "low": "低清",
+    "vadThresholdBtn": "VAD阈值",
+    "vadThresholdTips": "vad阈值表示声音静音多久才开始触发推理，vad阈值过低会导致说话气口误触，过高会导致首响更慢。",
+    "assistantPromptBtn": "任务指令",
+    "assistantPromptTips": "模型的任务指令，用于支持不同的任务目标",
+    "useVoicePromptBtn": "音色指令",
+    "voiceClonePromptInput": "音色指令",
+    "voiceClonePromptTips": "我们的模型具有端到端的音色克隆能力，提供一段 5-7 秒的音频，模型在一定程度上可以用这种音色来说话。但基于法律考虑，我们的demo并不开启这个能力的试用。社区可以参照我们的开源代码自行适配。",
+    "audioChoiceBtn": "音色选择",
+    "defaultAudioBtn": "默认音色",
+    "customizationBtn": "自定义：上传音频",
+    "toneColorOptions": "语音选项",
+    "toneColorOptionsTips": "我们提供了一些示例音色，也可以选择“无”并通过指令让模型创建音色。",
+    "nullOption": "无",
+    "defaultOption": "女一号(默认)",
+    "femaleOption": "女二号",
+    "maleOption": "男一号"
+}
diff --git a/web_demos/minicpm-o_2.6/web_server/src/main.js b/web_demos/minicpm-o_2.6/web_server/src/main.js
new file mode 100644
index 0000000..e49ae0f
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/main.js
@@ -0,0 +1,40 @@
+import './styles/main.css';
+
+import { router, setupRouter } from '@/router';
+import { setupRouterGuard } from '@/router/guard';
+import SvgIcon from '@/components/SvgIcon/index.vue';
+import { createI18n } from 'vue-i18n';
+
+import App from './App.vue';
+import en from './i18n/en.json';
+import zh from './i18n/zh.json';
+
+const savedLanguage = localStorage.getItem('language') || 'zh';
+
+const i18n = createI18n({
+    locale: savedLanguage, // 默认语言
+    messages: {
+        en,
+        zh
+    }
+});
+
+const app = createApp(App);
+
+// Configure routing
+// 配置路由
+setupRouter(app);
+
+// router-guard
+// 路由守卫
+setupRouterGuard(router);
+
+// Register global directive
+// 注册全局指令
+// setupGlobDirectives(app);
+
+app.component('SvgIcon', SvgIcon);
+
+app.use(i18n);
+
+app.mount('#app');
diff --git a/web_demos/minicpm-o_2.6/web_server/src/router/guard/index.js b/web_demos/minicpm-o_2.6/web_server/src/router/guard/index.js
new file mode 100644
index 0000000..148233b
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/router/guard/index.js
@@ -0,0 +1,5 @@
+import { createStateGuard } from './stateGuard';
+
+export function setupRouterGuard(router) {
+    createStateGuard(router);
+}
diff --git a/web_demos/minicpm-o_2.6/web_server/src/router/guard/stateGuard.js b/web_demos/minicpm-o_2.6/web_server/src/router/guard/stateGuard.js
new file mode 100644
index 0000000..b021527
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/router/guard/stateGuard.js
@@ -0,0 +1 @@
+export function createStateGuard() {}
diff --git a/web_demos/minicpm-o_2.6/web_server/src/router/index.js b/web_demos/minicpm-o_2.6/web_server/src/router/index.js
new file mode 100644
index 0000000..943b2e2
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/router/index.js
@@ -0,0 +1,16 @@
+import { createRouter, createWebHistory } from 'vue-router';
+import { basicRoutes } from './menu';
+
+// 创建一个可以被 Vue 应用程序使用的路由实例
+export const router = createRouter({
+    // 创建一个 hash 历史记录。
+    history: createWebHistory(import.meta.env.BASE_URL),
+    // 路由列表。
+    routes: basicRoutes
+});
+
+// config router
+// 配置路由器
+export function setupRouter(app) {
+    app.use(router);
+}
diff --git a/web_demos/minicpm-o_2.6/web_server/src/router/menu/index.js b/web_demos/minicpm-o_2.6/web_server/src/router/menu/index.js
new file mode 100644
index 0000000..c693531
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/router/menu/index.js
@@ -0,0 +1,10 @@
+export const basicRoutes = [
+    {
+        path: '/',
+        component: () => import('@/views/home/index.vue')
+    },
+    {
+        path: '/:port',
+        component: () => import('@/views/home/index.vue')
+    }
+];
diff --git a/web_demos/minicpm-o_2.6/web_server/src/styles/base.css b/web_demos/minicpm-o_2.6/web_server/src/styles/base.css
new file mode 100644
index 0000000..36f9ed7
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/styles/base.css
@@ -0,0 +1,56 @@
+*,
+*::before,
+*::after {
+    box-sizing: border-box;
+    margin: 0;
+}
+
+::-webkit-scrollbar {
+    width: 6px;
+    height: 6px;
+}
+::-webkit-scrollbar-thumb {
+    background: #e0e4ee;
+    border-radius: 4px;
+}
+
+html,
+body {
+    width: 100%;
+    height: 100%;
+    font-family:
+        Inter,
+        -apple-system,
+        BlinkMacSystemFont,
+        Segoe UI,
+        SF Pro SC,
+        SF Pro Display,
+        SF Pro Icons,
+        PingFang SC,
+        Hiragino Sans GB,
+        Microsoft YaHei,
+        Helvetica Neue,
+        Helvetica,
+        Arial,
+        sans-serif !important;
+    background: #f3f3f3;
+    transition:
+        color 0.5s,
+        background-color 0.5s;
+    line-height: 1.3;
+    font-size: 14px;
+    font-weight: 400;
+    color: var(--el-text-color-regular);
+    text-rendering: optimizeLegibility;
+    -webkit-font-smoothing: antialiased;
+    -moz-osx-font-smoothing: grayscale;
+    margin: 0;
+    padding: 0;
+    overflow: hidden;
+}
+
+#app {
+    width: 100%;
+    height: 100%;
+    padding: 16px 4vw;
+}
diff --git a/web_demos/minicpm-o_2.6/web_server/src/styles/element/index.less b/web_demos/minicpm-o_2.6/web_server/src/styles/element/index.less
new file mode 100644
index 0000000..803ec59
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/styles/element/index.less
@@ -0,0 +1,79 @@
+@import url('./variable.less');
+
+.el-message {
+    box-shadow: 0px 4px 13px 2px rgba(75, 79, 88, 0.11);
+    border: none;
+    border-radius: 8px;
+    top: 60px !important;
+    &--success,
+    &--error {
+        .el-message__content {
+            color: rgb(10, 10, 10);
+            font-size: 14px;
+        }
+    }
+    .el-message-icon--error {
+        color: var(--el-color-danger);
+        font-size: 16px;
+    }
+    .el-message-icon--success {
+        color: var(--el-color-success);
+        font-size: 16px;
+    }
+}
+.el-message.time-warning,
+.el-message.system-error {
+    width: calc(100vw - 200px);
+    padding: 16px 12px;
+    border-radius: 12px;
+}
+.el-message.el-message--warning.time-warning {
+    border: 1px solid #f9ac2a;
+    background: #fef7ea;
+    .el-icon {
+        display: none;
+    }
+    .el-message__content {
+        color: #2f333e;
+        font-family: PingFang SC;
+        font-size: 14px;
+        font-style: normal;
+        font-weight: 400;
+        line-height: normal;
+        padding-left: 28px;
+        position: relative;
+    }
+    .el-message__content::before {
+        position: absolute;
+        content: '';
+        width: 20px;
+        height: 20px;
+        background: url('@/assets/svg/warning.svg') no-repeat;
+        left: 0;
+    }
+}
+.el-message.el-message--error.system-error {
+    border: 1px solid #e72b00;
+    background: #ffebe7;
+    .el-icon {
+        display: none;
+    }
+    .el-message__content {
+        color: #2f333e;
+        font-family: PingFang SC;
+        font-size: 14px;
+        font-style: normal;
+        font-weight: 400;
+        line-height: normal;
+        padding-left: 28px;
+        position: relative;
+    }
+    .el-message__content::before {
+        position: absolute;
+        content: '';
+        width: 20px;
+        height: 20px;
+        background: url('@/assets/svg/error.svg') no-repeat;
+        left: 0;
+    }
+}
diff --git a/web_demos/minicpm-o_2.6/web_server/src/styles/element/variable.less b/web_demos/minicpm-o_2.6/web_server/src/styles/element/variable.less
new file mode 100644
index 0000000..19fe192
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/styles/element/variable.less
@@ -0,0 +1,51 @@
+:root {
+    --el-component-size-large: 48px;
+    --el-component-size: 40px;
+    --el-color-primary: #7661ff;
+    --el-color-danger: #de0000;
+    --el-color-warning: #ff7d00;
+    --el-color-success: #00b42a;
+    --el-text-color-regular: #0a0a0a;
+}
+.el-button {
+    --el-button-height: var(--el-component-size);
+    height: var(--el-button-height);
+    &--large {
+        --el-button-height: var(--el-component-size-large);
+        height: var(--el-button-height);
+    }
+    &--primary {
+        --el-button-bg-color: #7661ff;
+        --el-button-text-color: var(--el-color-white);
+        --el-button-border-color: #7661ff;
+        --el-button-hover-bg-color: rgb(159, 144, 255);
+        --el-button-hover-text-color: var(--el-color-white);
+        --el-button-hover-border-color: rgb(159, 144, 255);
+        --el-button-active-bg-color: rgb(98, 82, 208);
+        --el-button-active-border-color: rgb(98, 82, 208);
+        --el-button-disabled-bg-color: #d4cdff;
+        --el-button-disabled-border-color: #d4cdff;
+    }
+}
+.el-checkbox {
+    --el-checkbox-border-radius: 4px;
+    --el-checkbox-input-border: 1px solid rgb(188, 188, 188);
+    --el-checkbox-input-border-color-hover: rgb(61, 92, 255);
+    --el-checkbox-checked-bg-color: rgb(61, 92, 255);
+    --el-checkbox-checked-input-border-color: rgb(61, 92, 255);
+}
+.el-dialog {
+    &__header {
+        padding-bottom: 20px;
+    }
+    &__title {
+        --el-text-color-primary: rgb(10, 10, 10);
+        --el-dialog-title-font-size: 18px;
+        --el-dialog-font-line-height: 20px;
+    }
+}
+.el-message {
+    --el-message-padding: 11px 20px;
+    --el-message-bg-color: rgb(255, 255, 255);
+    --el-message-text-color: #de0000;
+}
diff --git a/web_demos/minicpm-o_2.6/web_server/src/styles/main.css b/web_demos/minicpm-o_2.6/web_server/src/styles/main.css
new file mode 100644
index 0000000..c56c367
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/styles/main.css
@@ -0,0 +1,30 @@
+@import './base.css';
+@import './variable.css';
+
+.layout-root {
+    width: 100%;
+    height: 100%;
+    display: flex;
+    flex-direction: column;
+}
+
+.layout-main {
+    flex: 1 1 0;
+    display: flex;
+    flex-direction: column;
+    padding: 0 var(--layout-main-padding);
+}
+
+.layout-footer {
+    width: 100%;
+    max-width: var(--layout-content-width);
+    height: fit-content;
+    display: flex;
+    flex-direction: column;
+    padding: 0 var(--layout-main-padding);
+    margin: auto;
+}
+
+:focus-visible {
+    outline: none;
+}
diff --git a/web_demos/minicpm-o_2.6/web_server/src/styles/variable.css b/web_demos/minicpm-o_2.6/web_server/src/styles/variable.css
new file mode 100644
index 0000000..f473d64
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/styles/variable.css
@@ -0,0 +1,7 @@
+:root {
+    --layout-sidebar-width: 56px;
+    --layout-sidebar-left-space: 8px;
+    --layout-main-padding: 8px;
+    --layout-main-minWidth: calc(var(--layout-content-width) + var(--layout-main-padding) * 2);
+    --layout-content-width: 780px;
+}
diff --git a/web_demos/minicpm-o_2.6/web_server/src/utils/index.js b/web_demos/minicpm-o_2.6/web_server/src/utils/index.js
new file mode 100644
index 0000000..af49e5d
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/utils/index.js
@@ -0,0 +1,44 @@
+// 判断终端是pc还是移动端
+export const isMobile = () => {
+    let flag = /Android|webOS|iPhone|iPad|iPod|BlackBerry|IEMobile|Opera Mini|Linux/i.test(navigator.userAgent);
+    const platform = navigator.platform;
+    // iPad上的Safari
+    if (platform === 'MacIntel' && navigator.maxTouchPoints > 1) {
+        flag = true;
+    }
+    return flag;
+};
+// 单片语音长度(单位：ms)
+const voicePerLength = 200;
+
+// 图片计数，算出在哪一次发送语音时，同时发送图片。例如一片语音100ms，一秒钟发送一次语音，即发送的第10片语音时需要带一张图片
+export const maxCount = 1000 / voicePerLength;
+
+export const getChunkLength = sampleRate => {
+    return sampleRate * (voicePerLength / 1000);
+};
+
+export const isAvailablePort = port => {
+    return [
+        8000, 8001, 8002, 8003, 8004, 8010, 8011, 8012, 8013, 8014, 8020, 8021, 8022, 8023, 8024, 8025, 8026, 8027,
+        8028, 32449
+    ].includes(port);
+};
+
+// 文件转base64格式
+export const fileToBase64 = file => {
+    return new Promise((resolve, reject) => {
+        if (!file) {
+            reject('文件不能为空');
+        }
+        const reader = new FileReader();
+        reader.onload = e => {
+            const base64String = e.target.result;
+            resolve(base64String);
+        };
+        reader.onerror = () => {
+            reject('文件转码失败');
+        };
+        reader.readAsDataURL(file);
+    });
+};
diff --git a/web_demos/minicpm-o_2.6/web_server/src/utils/websocket.js b/web_demos/minicpm-o_2.6/web_server/src/utils/websocket.js
new file mode 100644
index 0000000..41f88d4
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/utils/websocket.js
@@ -0,0 +1,91 @@
+class WebSocketClient {
+    constructor(url, maxReconnectAttempts = 5, reconnectInterval = 5000) {
+        this.url = url;
+        this.socket = null;
+        this.eventHandlers = {};
+        this.maxReconnectAttempts = maxReconnectAttempts;
+        this.reconnectInterval = reconnectInterval;
+        this.reconnectAttempts = 0;
+        this.reconnectTimer = null;
+    }
+
+    connect() {
+        this.reconnectAttempts = 0;
+        this.establishConnection();
+    }
+
+    establishConnection() {
+        this.socket = new WebSocket(this.url);
+
+        this.socket.onopen = () => {
+            console.log('WebSocket connection opened');
+            this.reconnectAttempts = 0; // Reset reconnect attempts on successful connection
+            this.emit('open');
+        };
+
+        this.socket.onclose = event => {
+            console.log('WebSocket connection closed', event);
+            this.emit('close', event);
+            // 1005为主动关闭websocket
+            if (event.code !== 1005) {
+                this.reconnect();
+            }
+        };
+
+        this.socket.onerror = error => {
+            console.error('WebSocket error', error);
+            this.emit('error', error);
+            // Optionally, you may want to trigger a reconnect on error as well
+            // this.reconnect();
+        };
+
+        this.socket.onmessage = message => {
+            // console.log('WebSocket message received', message.data);
+            this.emit('message', message.data);
+        };
+    }
+
+    send(data) {
+        if (this.socket && this.socket.readyState === WebSocket.OPEN) {
+            this.socket.send(data);
+        } else {
+            console.error('WebSocket is not open');
+        }
+    }
+
+    on(event, handler) {
+        // if (!this.eventHandlers[event]) {
+        this.eventHandlers[event] = [];
+        // }
+        this.eventHandlers[event].push(handler);
+        // console.log('Event handler added:', this.eventHandlers, event);
+    }
+
+    emit(event, ...args) {
+        if (this.eventHandlers[event]) {
+            this.eventHandlers[event].forEach(handler => handler(...args));
+        }
+    }
+
+    close() {
+        if (this.socket) {
+            this.socket.close();
+        }
+        clearTimeout(this.reconnectTimer);
+    }
+
+    reconnect() {
+        if (this.reconnectAttempts < this.maxReconnectAttempts) {
+            console.log(`Reconnecting attempt ${this.reconnectAttempts + 1}/${this.maxReconnectAttempts}...`);
+            this.reconnectTimer = setTimeout(() => {
+                this.reconnectAttempts++;
+                this.establishConnection();
+            }, this.reconnectInterval);
+        } else {
+            console.error('Max reconnect attempts reached. WebSocket will not attempt to reconnect.');
+            this.emit('max-reconnect-attempts');
+        }
+    }
+}
+
+export default WebSocketClient;
diff --git a/web_demos/minicpm-o_2.6/web_server/src/views/home/components/Chatbot.vue b/web_demos/minicpm-o_2.6/web_server/src/views/home/components/Chatbot.vue
new file mode 100644
index 0000000..364903b
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/views/home/components/Chatbot.vue
@@ -0,0 +1,3 @@
+<template>
+    <div>Chatbot</div>
+</template>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/views/home/components/VideoCall.vue b/web_demos/minicpm-o_2.6/web_server/src/views/home/components/VideoCall.vue
new file mode 100644
index 0000000..6463a8f
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/views/home/components/VideoCall.vue
@@ -0,0 +1,971 @@
+<template>
+    <!-- <ExtraInfo webVersion="非websocket_0111" :modelVersion="modelVersion" /> -->
+    <div class="video-page">
+        <div class="video-page-header">
+            <div class="voice-container" v-if="!isCalling">
+                <SvgIcon name="voice" class="voice-icon" />
+                <SvgIcon name="voice" class="voice-icon" />
+                <SvgIcon name="voice" class="voice-icon" />
+            </div>
+            <div class="voice-container" v-else>
+                <Voice
+                    :dataArray="dataArray"
+                    :isCalling="isCalling"
+                    :isPlaying="playing"
+                    :configList="videoConfigList"
+                    :boxStyle="{ height: '45px' }"
+                    :itemStyle="{ width: '3px', margin: '0 1px' }"
+                />
+            </div>
+            <!-- <SelectTimbre v-model:timbre="timbre" v-model:audioData="audioData" v-model:disabled="isCalling" /> -->
+        </div>
+        <div class="video-page-content">
+            <div class="video-page-content-video" v-loading="loading" element-loading-background="#f3f3f3">
+                <video ref="videoRef" autoplay playsinline muted />
+                <canvas ref="canvasRef" canvas-id="canvasId" style="display: none" />
+                <div class="switch-camera" v-if="isMobile()" @click="switchCamera">
+                    <SvgIcon name="switch-camera" class="icon" />
+                </div>
+            </div>
+            <div class="video-page-content-right">
+                <div class="output-content">
+                    <ModelOutput
+                        v-if="outputData.length > 0"
+                        :outputData="outputData"
+                        containerClass="output-content"
+                    />
+                </div>
+                <div class="skip-box">
+                    <!-- <DelayTips
+                        v-if="delayTimestamp > 200 || delayCount > 2"
+                        :delayTimestamp="delayTimestamp"
+                        :delayCount="delayCount"
+                    /> -->
+                    <LikeAndDislike v-model:feedbackStatus="feedbackStatus" v-model:curResponseId="curResponseId" />
+                    <SkipBtn :disabled="skipDisabled" @click="skipVoice" />
+                </div>
+            </div>
+        </div>
+        <div class="video-page-btn">
+            <el-button v-show="!isCalling" type="success" :disabled="callDisabled" @click="initRecording">
+                {{ callDisabled ? t('notReadyBtn') : t('videoCallBtn') }}
+            </el-button>
+            <el-button v-show="isCalling" @click="stopRecording" type="danger">
+                <SvgIcon name="phone-icon" className="phone-icon" />
+                <span class="btn-text">{{ t('hangUpBtn') }}</span>
+                <CountDown v-model="isCalling" @timeUp="stopRecording" />
+            </el-button>
+        </div>
+        <IdeasList v-if="showIdeasList" :ideasList="videoIdeasList" />
+    </div>
+</template>
+<script setup>
+    import { sendMessage, stopMessage, uploadConfig } from '@/apis';
+    import { encodeWAV } from '@/hooks/useVoice';
+    import { getNewUserId, setNewUserId } from '@/hooks/useRandomId';
+    import { fetchEventSource } from '@microsoft/fetch-event-source';
+    import { MicVAD } from '@ricky0123/vad-web';
+    import { videoIdeasList, videoConfigList, showIdeasList } from '@/enums';
+    import { isMobile, maxCount, getChunkLength } from '@/utils';
+    import { mergeBase64ToBlob } from './merge';
+    import { useI18n } from 'vue-i18n';
+
+    const { t } = useI18n();
+    import WebSocketService from '@/utils/websocket';
+
+    let ctrl = new AbortController();
+    let socket = null;
+    const audioData = ref({
+        base64Str: '',
+        type: 'mp3'
+    }); // 自定义音色base64
+    const isCalling = defineModel();
+    const videoRef = ref();
+    const videoStream = ref(null);
+    const interval = ref();
+    const canvasRef = ref();
+    const videoImage = ref([]);
+    const videoLoaded = ref(false);
+    const taskQueue = ref([]);
+    const running = ref(false);
+    const outputData = ref([]);
+    const isFirstReturn = ref(true);
+    const audioPlayQueue = ref([]);
+    const base64List = ref([]);
+    const playing = ref(false);
+    const timbre = ref([1]);
+    const isReturnError = ref(false);
+
+    const textQueue = ref('');
+    const textAnimationInterval = ref();
+
+    const analyser = ref();
+    const dataArray = ref();
+    const animationFrameId = ref();
+    const skipDisabled = ref(true);
+    const stop = ref(false);
+    const isFrontCamera = ref(true);
+    const loading = ref(false);
+
+    const isEnd = ref(false); // sse接口关闭，认为模型已完成本次返回
+
+    const isFirstPiece = ref(true);
+    const allVoice = ref([]);
+    const callDisabled = ref(true);
+
+    const feedbackStatus = ref('');
+    const curResponseId = ref('');
+    const delayTimestamp = ref(0); // 当前发送片延时
+    const delayCount = ref(0); // 当前剩余多少ms未发送到接口
+
+    const modelVersion = ref('');
+
+    let mediaStream;
+    let audioRecorder;
+    let audioStream;
+    let intervalId;
+    let audioContext;
+    let audioChunks = [];
+    let count = 0;
+    let audioDOM;
+
+    onBeforeUnmount(() => {
+        stopRecording();
+    });
+    const vadStartTime = ref();
+    let myvad = null;
+    let vadTimer = null; // vad定时器，用于检测1s内人声是否停止，1s内停止，可认为是vad误触，直接忽略，1s内未停止，则认为是人声，已自动跳过当前对话
+    const vadStart = async () => {
+        myvad = await MicVAD.new({
+            onSpeechStart: () => {
+                console.log('Speech start', +new Date());
+                // if (!skipDisabled.value) {
+                vadTimer && clearTimeout(vadTimer);
+                vadTimer = setTimeout(() => {
+                    // vadStartTime.value = +new Date();
+                    console.log('打断时间: ', +new Date());
+                    skipVoice();
+                }, 500);
+                // }
+            },
+            onSpeechEnd: audio => {
+                vadTimer && clearTimeout(vadTimer);
+                console.log('Speech end', +new Date());
+                // debugger;
+                // do something with `audio` (Float32Array of audio samples at sample rate 16000)...
+            },
+            baseAssetPath: '/'
+        });
+        myvad.start();
+    };
+    onMounted(async () => {
+        const { code, message } = await stopMessage();
+        if (code !== 0) {
+            ElMessage({
+                type: 'error',
+                message: message,
+                duration: 3000,
+                customClass: 'system-error'
+            });
+            return;
+        }
+        callDisabled.value = false;
+    });
+    const delay = ms => {
+        return new Promise(resolve => setTimeout(resolve, ms));
+    };
+    const initRecording = async () => {
+        uploadUserConfig()
+            .then(async () => {
+                if (!audioDOM) {
+                    audioDOM = new Audio();
+                    audioDOM.playsinline = true;
+                    audioDOM.preload = 'auto';
+                }
+                // 每次call都需要生成新uid
+                setNewUserId();
+                buildConnect();
+                await delay(100);
+                // if (socket) {
+                //     socket.close();
+                // }
+                // socket = new WebSocketService(
+                //     `/ws/stream${window.location.search}&uid=${getNewUserId()}&service=minicpmo-server`
+                // );
+                // socket.connect();
+
+                initVideoStream('environment');
+                if (localStorage.getItem('canStopByVoice') === 'true') {
+                    console.log('vad start');
+                    vadStart();
+                }
+            })
+            .catch(() => {});
+    };
+    // 切换摄像头
+    const switchCamera = () => {
+        if (!isCalling.value) {
+            return;
+        }
+        isFrontCamera.value = !isFrontCamera.value;
+        const facingMode = isFrontCamera.value ? 'environment' : 'user'; // 'user' 前置, 'environment' 后置
+        initVideoStream(facingMode);
+    };
+    const initVideoStream = async facingMode => {
+        if (mediaStream) {
+            mediaStream.getTracks().forEach(track => track.stop());
+            videoStream.value = null;
+        }
+        outputData.value = [];
+        isCalling.value = true;
+        loading.value = true;
+        if (!videoStream.value) {
+            try {
+                mediaStream = await window.navigator.mediaDevices.getUserMedia({
+                    video: { facingMode },
+                    audio: true
+                });
+                videoStream.value = mediaStream;
+                videoRef.value.srcObject = mediaStream;
+                loading.value = false;
+                console.log('打开后： ', +new Date());
+                // takePhotos();
+                audioContext = new (window.AudioContext || window.webkitAudioContext)({ sampleRate: 16000 });
+                console.log('samplate: ', audioContext);
+                const audioSource = audioContext.createMediaStreamSource(mediaStream);
+                interval.value = setInterval(() => dealImage(), 50);
+                // 创建 ScriptProcessorNode 用于捕获音频数据
+                const processor = audioContext.createScriptProcessor(256, 1, 1);
+
+                processor.onaudioprocess = event => {
+                    if (!isCalling.value) return;
+                    if (isReturnError.value) {
+                        stopRecording();
+                        return;
+                    }
+                    const data = event.inputBuffer.getChannelData(0);
+                    audioChunks.push(new Float32Array(data));
+                    // 检查是否已经收集到1秒钟的数据
+                    const totalBufferLength = audioChunks.reduce((total, curr) => total + curr.length, 0);
+                    const chunkLength = getChunkLength(audioContext.sampleRate);
+                    if (totalBufferLength >= chunkLength) {
+                        // 合并到一个完整的数据数组，并裁剪成1秒钟
+                        const mergedBuffer = mergeBuffers(audioChunks, totalBufferLength);
+                        const oneSecondBuffer = mergedBuffer.slice(0, audioContext.sampleRate);
+
+                        // 保存并处理成WAV格式
+                        addQueue(+new Date(), () => saveAudioChunk(oneSecondBuffer, +new Date()));
+
+                        // 保留多余的数据备用
+                        audioChunks = [mergedBuffer.slice(audioContext.sampleRate)];
+                    }
+                };
+                analyser.value = audioContext.createAnalyser();
+                // 将音频节点连接到分析器
+                audioSource.connect(analyser.value);
+                // 分析器设置
+                analyser.value.fftSize = 256;
+                const bufferLength = analyser.value.frequencyBinCount;
+                dataArray.value = new Uint8Array(bufferLength);
+                // 开始绘制音波
+                drawBars();
+
+                audioSource.connect(processor);
+                processor.connect(audioContext.destination);
+            } catch {}
+        }
+    };
+    const drawText = async () => {
+        if (textQueue.value.length > 0) {
+            outputData.value[outputData.value.length - 1].text += textQueue.value[0];
+            textQueue.value = textQueue.value.slice(1);
+        } else {
+            cancelAnimationFrame(textAnimationInterval.value);
+        }
+        textAnimationInterval.value = requestAnimationFrame(drawText);
+    };
+    const getStopValue = () => {
+        return stop.value;
+    };
+    const getPlayingValue = () => {
+        return playing.value;
+    };
+    const getStopStatus = () => {
+        return localStorage.getItem('canStopByVoice') === 'true';
+    };
+    const saveAudioChunk = (buffer, timestamp) => {
+        return new Promise(resolve => {
+            if (!getStopStatus() && getPlayingValue()) {
+                resolve();
+                return;
+            }
+            const wavBlob = encodeWAV(buffer, audioContext.sampleRate);
+            let reader = new FileReader();
+            reader.readAsDataURL(wavBlob);
+
+            reader.onloadend = async function () {
+                let base64data = reader.result.split(',')[1];
+                const imgBase64 = videoImage.value[videoImage.value.length - 1]?.src;
+                if (!(base64data && imgBase64)) {
+                    resolve();
+                    return;
+                }
+                const strBase64 = imgBase64.split(',')[1];
+                count++;
+                let obj = {
+                    messages: [
+                        {
+                            role: 'user',
+                            content: [
+                                {
+                                    type: 'input_audio',
+                                    input_audio: {
+                                        data: base64data,
+                                        format: 'wav',
+                                        timestamp: String(timestamp)
+                                    }
+                                }
+                            ]
+                        }
+                    ]
+                };
+                obj.messages[0].content.unshift({
+                    type: 'image_data',
+                    image_data: {
+                        data: count === maxCount ? strBase64 : '',
+                        type: 2
+                    }
+                });
+                if (count === maxCount) {
+                    count = 0;
+                }
+                // socket.send(JSON.stringify(obj));
+                // socket.on('message', data => {
+                //     console.log('message: ', data);
+                //     delayTimestamp.value = +new Date() - timestamp;
+                //     delayCount.value = taskQueue.value.length;
+                //     resolve();
+                // });
+                // 将Base64音频数据发送到后端
+                try {
+                    await sendMessage(obj);
+                    delayTimestamp.value = +new Date() - timestamp;
+                    delayCount.value = taskQueue.value.length;
+                } catch (err) {}
+                resolve();
+            };
+        });
+    };
+    const mergeBuffers = (buffers, length) => {
+        const result = new Float32Array(length);
+        let offset = 0;
+        for (let buffer of buffers) {
+            result.set(buffer, offset);
+            offset += buffer.length;
+        }
+        return result;
+    };
+    const stopRecording = () => {
+        isCalling.value = false;
+        clearInterval(interval.value);
+        interval.value = null;
+        if (audioRecorder && audioRecorder.state !== 'inactive') {
+            audioRecorder.stop();
+        }
+        if (animationFrameId.value) {
+            cancelAnimationFrame(animationFrameId.value);
+        }
+        if (audioContext && audioContext.state !== 'closed') {
+            audioContext.close();
+        }
+        destroyVideoStream();
+        taskQueue.value = [];
+        audioPlayQueue.value = [];
+        base64List.value = [];
+        ctrl.abort();
+        ctrl = new AbortController();
+        isReturnError.value = false;
+        skipDisabled.value = true;
+        playing.value = false;
+        audioDOM?.pause();
+        stopMessage();
+        if (socket) {
+            socket.close();
+        }
+        if (
+            outputData.value[outputData.value.length - 1]?.type === 'BOT' &&
+            outputData.value[outputData.value.length - 1].audio === '' &&
+            allVoice.value.length > 0
+        ) {
+            outputData.value[outputData.value.length - 1].audio = mergeBase64ToBlob(allVoice.value);
+        }
+        myvad && myvad.destroy();
+    };
+    // 建立连接
+    const buildConnect = () => {
+        const obj = {
+            messages: [
+                {
+                    role: 'user',
+                    content: [{ type: 'none' }]
+                }
+            ],
+            stream: true
+        };
+        isEnd.value = false;
+        ctrl.abort();
+        ctrl = new AbortController();
+        const url = `/api/v1/completions${window.location.search}`;
+
+        fetchEventSource(url, {
+            method: 'POST',
+            headers: {
+                'Content-Type': 'application/json',
+                service: 'minicpmo-server',
+                uid: getNewUserId()
+            },
+            body: JSON.stringify(obj),
+            signal: ctrl.signal,
+            openWhenHidden: true,
+            async onopen(response) {
+                isFirstPiece.value = true;
+                isFirstReturn.value = true;
+                allVoice.value = [];
+                base64List.value = [];
+                console.log('onopen', response);
+                if (response.status !== 200) {
+                    ElMessage({
+                        type: 'error',
+                        message: 'At limit. Please try again soon.',
+                        duration: 3000,
+                        customClass: 'system-error'
+                    });
+                    isReturnError.value = true;
+                } else {
+                    isReturnError.value = false;
+                    drawText();
+                }
+            },
+            onmessage(msg) {
+                const data = JSON.parse(msg.data);
+                if (data.response_id) {
+                    curResponseId.value = data.response_id;
+                }
+                if (data.choices[0]?.text) {
+                    textQueue.value += data.choices[0].text.replace('<end>', '');
+                    console.warn('text return time -------------------------------', +new Date());
+                }
+                // 首次返回的是前端发给后端的音频片段，需要单独处理
+                if (isFirstReturn.value) {
+                    console.log('第一次');
+                    isFirstReturn.value = false;
+                    // 如果后端返回的音频为空，需要重连
+                    if (!data.choices[0].audio) {
+                        buildConnect();
+                        return;
+                    }
+                    outputData.value.push({
+                        type: 'USER',
+                        audio: `data:audio/wav;base64,${data.choices[0].audio}`
+                    });
+                    outputData.value.push({
+                        type: 'BOT',
+                        text: '',
+                        audio: ''
+                    });
+                    return;
+                }
+                if (data.choices[0]?.audio) {
+                    console.log('audio return time -------------------------------', +new Date());
+                    if (!getStopValue() && isCalling.value) {
+                        skipDisabled.value = false;
+                        base64List.value.push(`data:audio/wav;base64,${data.choices[0].audio}`);
+                        addAudioQueue(() => truePlay(data.choices[0].audio));
+                    }
+                    allVoice.value.push(`data:audio/wav;base64,${data.choices[0].audio}`);
+                } else {
+                    // 发生异常了，直接重连
+                    buildConnect();
+                }
+                if (data.choices[0].text.includes('<end>')) {
+                    // isEnd.value = true;
+                    console.log('收到结束标记了:', +new Date());
+                    if (
+                        outputData.value[outputData.value.length - 1]?.type === 'BOT' &&
+                        outputData.value[outputData.value.length - 1].audio === '' &&
+                        allVoice.value.length > 0
+                    ) {
+                        outputData.value[outputData.value.length - 1].audio = mergeBase64ToBlob(allVoice.value);
+                    }
+                }
+            },
+            onclose() {
+                console.log('onclose', +new Date());
+                isEnd.value = true;
+                if (
+                    outputData.value[outputData.value.length - 1]?.type === 'BOT' &&
+                    outputData.value[outputData.value.length - 1].audio === '' &&
+                    allVoice.value.length > 0
+                ) {
+                    outputData.value[outputData.value.length - 1].audio = mergeBase64ToBlob(allVoice.value);
+                }
+                // sse关闭后，如果待播放的音频列表为空，说明模型出错了，此次连接没有返回音频，则直接重连
+                vadStartTime.value = +new Date();
+                if (audioPlayQueue.value.length === 0) {
+                    let startIndex = taskQueue.value.findIndex(item => item.time >= vadStartTime.value - 1000);
+                    if (startIndex !== -1) {
+                        taskQueue.value = taskQueue.value.slice(startIndex);
+                    }
+                    buildConnect();
+                }
+            },
+            onerror(err) {
+                console.log('onerror', err);
+                ctrl.abort();
+                ctrl = new AbortController();
+                throw err;
+            }
+        });
+    };
+    // 返回的语音放到队列里，挨个播放
+    const addAudioQueue = async item => {
+        audioPlayQueue.value.push(item);
+        if (isFirstPiece.value) {
+            await delay(1500);
+            isFirstPiece.value = false;
+        }
+        if (audioPlayQueue.value.length > 0 && !playing.value) {
+            playing.value = true;
+            playAudio();
+        }
+    };
+    // 控制播放队列执行
+    const playAudio = () => {
+        console.log('剩余播放列表:', audioPlayQueue.value, +new Date());
+
+        if (!isEnd.value && base64List.value.length >= 2) {
+            const remainLen = base64List.value.length;
+            const blob = mergeBase64ToBlob(base64List.value);
+            audioDOM.src = blob;
+            audioDOM.play();
+            console.error('前期合并后播放开始时间: ', +new Date());
+            audioDOM.onended = () => {
+                console.error('前期合并后播放结束时间: ', +new Date());
+                base64List.value = base64List.value.slice(remainLen);
+                audioPlayQueue.value = audioPlayQueue.value.slice(remainLen);
+                playAudio();
+            };
+            return;
+        }
+        if (isEnd.value && base64List.value.length >= 2) {
+            const blob = mergeBase64ToBlob(base64List.value);
+            audioDOM.src = blob;
+            audioDOM.play();
+            console.error('合并后播放开始时间: ', +new Date());
+            audioDOM.onended = () => {
+                console.error('合并后播放结束时间: ', +new Date());
+                // URL.revokeObjectURL(url);
+                base64List.value = [];
+                audioPlayQueue.value = [];
+                playing.value = false;
+                skipDisabled.value = true;
+                if (isCalling.value && !isReturnError.value) {
+                    // skipDisabled.value = true;
+                    taskQueue.value = [];
+                    // 打断前记录一下打断时间或vad触发事件
+                    // vadStartTime.value = +new Date();
+                    // // 每次完成后只保留当前时刻往前推1s的语音
+                    // console.log(
+                    //     '截取前长度:',
+                    //     taskQueue.value.map(item => item.time)
+                    // );
+                    // let startIndex = taskQueue.value.findIndex(item => item.time >= vadStartTime.value - 1000);
+                    // if (startIndex !== -1) {
+                    //     taskQueue.value = taskQueue.value.slice(startIndex);
+                    //     console.log(
+                    //         '截取后长度:',
+                    //         taskQueue.value.map(item => item.time),
+                    //         vadStartTime.value
+                    //     );
+                    // }
+                    buildConnect();
+                }
+            };
+            return;
+        }
+        base64List.value.shift();
+        const _truePlay = audioPlayQueue.value.shift();
+        if (_truePlay) {
+            _truePlay().finally(() => {
+                playAudio();
+            });
+        } else {
+            playing.value = false;
+            if (isEnd.value) {
+                console.warn('play done................');
+                skipDisabled.value = true;
+            }
+            // 播放完成后且正在通话且接口未返回错误时开始下一次连接
+            if (isEnd.value && isCalling.value && !isReturnError.value) {
+                // skipDisabled.value = true;
+                taskQueue.value = [];
+                // 跳过之后，只保留当前时间点两秒内到之后的音频片段
+                // vadStartTime.value = +new Date();
+                // console.log(
+                //     '截取前长度:',
+                //     taskQueue.value.map(item => item.time)
+                // );
+                // let startIndex = taskQueue.value.findIndex(item => item.time >= vadStartTime.value - 1000);
+                // if (startIndex !== -1) {
+                //     taskQueue.value = taskQueue.value.slice(startIndex);
+                //     console.log(
+                //         '截取后长度:',
+                //         taskQueue.value.map(item => item.time),
+                //         vadStartTime.value
+                //     );
+                // }
+                buildConnect();
+            }
+        }
+    };
+    // 播放音频
+    const truePlay = voice => {
+        console.log('promise: ', +new Date());
+        return new Promise(resolve => {
+            audioDOM.src = 'data:audio/wav;base64,' + voice;
+            console.error('播放开始时间:', +new Date());
+            audioDOM
+                .play()
+                .then(() => {
+                    console.log('Audio played successfully');
+                })
+                .catch(error => {
+                    if (error.name === 'NotAllowedError' || error.name === 'SecurityError') {
+                        console.error('User interaction required or permission issue:', error);
+                        // ElMessage.warning('音频播放失败');
+                        console.error('播放失败时间');
+                        // alert('Please interact with the page (like clicking a button) to enable audio playback.');
+                    } else {
+                        console.error('Error playing audio:', error);
+                    }
+                });
+            // .finally(() => {
+            //     resolve();
+            // });
+            audioDOM.onerror = () => {
+                console.error('播放失败时间', +new Date());
+                resolve();
+            };
+            audioDOM.onended = () => {
+                console.error('播放结束时间: ', +new Date());
+                // URL.revokeObjectURL(url);
+                resolve();
+            };
+        });
+    };
+    // 当队列中任务数大于0时，开始处理队列中的任务
+    const addQueue = (time, item) => {
+        taskQueue.value.push({ func: item, time });
+        if (taskQueue.value.length > 0 && !running.value) {
+            running.value = true;
+            processQueue();
+        }
+    };
+    const processQueue = () => {
+        const item = taskQueue.value.shift();
+        if (item?.func) {
+            item.func()
+                .then(res => {
+                    console.log('已处理事件: ', res);
+                })
+                .finally(() => processQueue());
+        } else {
+            running.value = false;
+        }
+    };
+    const destroyVideoStream = () => {
+        videoStream.value?.getTracks().forEach(track => track.stop());
+        videoStream.value = null;
+        // 将srcObject设置为null以切断与MediaStream 对象的链接，以便将其释放
+        videoRef.value.srcObject = null;
+
+        videoImage.value = [];
+        videoLoaded.value = false;
+
+        clearInterval(intervalId);
+        clearInterval(interval.value);
+        interval.value = null;
+    };
+    const dealImage = () => {
+        if (!videoRef.value) {
+            return;
+        }
+        const canvas = canvasRef.value;
+        canvasRef.value.width = videoRef.value.videoWidth;
+        canvasRef.value.height = videoRef.value.videoHeight;
+        const context = canvas.getContext('2d');
+        context.drawImage(videoRef.value, 0, 0, canvasRef.value.width, canvasRef.value.height);
+        const imageDataUrl = canvas.toDataURL('image/webp', 0.8);
+
+        videoImage.value.push({ src: imageDataUrl });
+    };
+    const drawBars = () => {
+        // AnalyserNode接口的 getByteFrequencyData() 方法将当前频率数据复制到传入的 Uint8Array（无符号字节数组）中。
+        analyser.value.getByteFrequencyData(dataArray.value);
+        animationFrameId.value = requestAnimationFrame(drawBars);
+    };
+    // 跳过当前片段
+    const skipVoice = async () => {
+        // 打断前记录一下打断时间或vad触发事件
+        vadStartTime.value = +new Date();
+        if (!skipDisabled.value) {
+            if (
+                outputData.value[outputData.value.length - 1]?.type === 'BOT' &&
+                outputData.value[outputData.value.length - 1].audio === ''
+            ) {
+                outputData.value[outputData.value.length - 1].audio = mergeBase64ToBlob(allVoice.value);
+            }
+            base64List.value = [];
+            audioPlayQueue.value = [];
+            // 跳过之后，只保留当前时间点两秒内到之后的音频片段
+            console.log(
+                '截取前长度:',
+                taskQueue.value.map(item => item.time)
+            );
+            let startIndex = taskQueue.value.findIndex(item => item.time >= vadStartTime.value - 1000);
+            if (startIndex !== -1) {
+                taskQueue.value = taskQueue.value.slice(startIndex);
+                console.log(
+                    '截取后长度:',
+                    taskQueue.value.map(item => item.time),
+                    vadStartTime.value
+                );
+            }
+            stop.value = true;
+            audioDOM?.pause();
+            setTimeout(() => {
+                skipDisabled.value = true;
+            }, 300);
+            try {
+                playing.value = false;
+                await stopMessage();
+                stop.value = false;
+                // playing.value = false;
+                buildConnect();
+                // cancelAnimationFrame(animationFrameId.value);
+            } catch (err) {}
+        }
+    };
+    // 每次call先上传当前用户配置
+    const uploadUserConfig = async () => {
+        if (!localStorage.getItem('configData')) {
+            return new Promise(resolve => resolve());
+        }
+        const {
+            videoQuality,
+            useAudioPrompt,
+            voiceClonePrompt,
+            assistantPrompt,
+            vadThreshold,
+            audioFormat,
+            base64Str
+        } = JSON.parse(localStorage.getItem('configData'));
+        const obj = {
+            messages: [
+                {
+                    role: 'user',
+                    content: [
+                        {
+                            type: 'input_audio',
+                            input_audio: {
+                                data: base64Str,
+                                format: audioFormat
+                            }
+                        },
+                        {
+                            type: 'options',
+                            options: {
+                                hd_video: videoQuality,
+                                use_audio_prompt: useAudioPrompt,
+                                vad_threshold: vadThreshold,
+                                voice_clone_prompt: voiceClonePrompt,
+                                assistant_prompt: assistantPrompt
+                            }
+                        }
+                    ]
+                }
+            ]
+        };
+        const { code, message, data } = await uploadConfig(obj);
+        modelVersion.value = data?.choices?.content || '';
+        return new Promise((resolve, reject) => {
+            if (code !== 0) {
+                ElMessage({
+                    type: 'error',
+                    message: message,
+                    duration: 3000,
+                    customClass: 'system-error'
+                });
+                reject();
+            } else {
+                resolve();
+            }
+        });
+    };
+</script>
+<style lang="less" scoped>
+    .video-page {
+        flex: 1;
+        height: 100%;
+        display: flex;
+        flex-direction: column;
+        &-header {
+            width: 100%;
+            display: flex;
+            align-items: center;
+            justify-content: center;
+            padding: 0 16px 16px;
+            box-shadow: 0 0.5px 0 0 #e0e0e0;
+            margin-bottom: 16px;
+            .header-icon {
+                display: flex;
+                align-items: center;
+                img {
+                    width: 24px;
+                    height: 24px;
+                    margin-right: 8px;
+                }
+                span {
+                    color: rgba(23, 23, 23, 0.9);
+                    font-family: PingFang SC;
+                    font-size: 16px;
+                    font-style: normal;
+                    font-weight: 500;
+                    line-height: normal;
+                    margin-right: 40px;
+                    flex-shrink: 0;
+                }
+            }
+            .voice-container {
+                display: flex;
+                .voice-icon {
+                    width: 191px;
+                    height: 45px;
+                }
+            }
+        }
+        &-content {
+            flex: 1;
+            margin-bottom: 16px;
+            display: flex;
+            height: 0;
+            &-video {
+                width: 50%;
+                height: 100%;
+                background: #f3f3f3;
+                flex-shrink: 0;
+                position: relative;
+                video {
+                    width: 100%;
+                    height: 100%;
+                    object-fit: contain;
+                }
+                .switch-camera {
+                    position: absolute;
+                    top: 10px;
+                    right: 10px;
+                    width: 36px;
+                    height: 36px;
+                    background: #ffffff;
+                    border-radius: 6px;
+                    display: flex;
+                    justify-content: center;
+                    align-items: center;
+                    font-size: 24px;
+                    z-index: 999;
+                    .icon {
+                        width: 20px;
+                        height: 20px;
+                    }
+                }
+            }
+            &-right {
+                margin-left: 16px;
+                flex: 1;
+                padding: 0 16px;
+                display: flex;
+                flex-direction: column;
+                .output-content {
+                    flex: 1;
+                    overflow: auto;
+                }
+                .skip-box {
+                    display: flex;
+                    align-items: center;
+                    justify-content: flex-end;
+                    margin-top: 16px;
+                }
+            }
+        }
+        &-btn {
+            text-align: center;
+            padding: 8px 0;
+            .el-button {
+                width: 284px;
+                height: 46px;
+                border-radius: 8px;
+            }
+            .el-button.el-button--success {
+                background: #647fff;
+                border-color: #647fff;
+                &:hover {
+                    opacity: 0.8;
+                }
+                span {
+                    color: #fff;
+                    font-family: PingFang SC;
+                    font-size: 16px;
+                    font-style: normal;
+                    font-weight: 500;
+                    line-height: normal;
+                }
+            }
+            .el-button.el-button--success.is-disabled {
+                background: #f3f3f3;
+                border-color: #f3f3f3;
+                span {
+                    color: #d1d1d1;
+                }
+            }
+            .el-button.el-button--danger {
+                border-color: #dc3545;
+                background-color: #dc3545;
+                color: #ffffff;
+                font-family: PingFang SC;
+                font-size: 16px;
+                font-style: normal;
+                font-weight: 500;
+                line-height: normal;
+                .phone-icon {
+                    margin-right: 10px;
+                }
+                .btn-text {
+                    margin-right: 10px;
+                }
+                .btn-desc {
+                    margin-right: 16px;
+                }
+            }
+        }
+    }
+    .video-size {
+        position: absolute;
+        bottom: 10px;
+        right: 10px;
+        background: rgba(0, 0, 0, 0.5);
+        color: #fff;
+        padding: 4px 8px;
+        border-radius: 4px;
+        font-size: 12px;
+    }
+</style>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/views/home/components/VideoCall_0105.vue b/web_demos/minicpm-o_2.6/web_server/src/views/home/components/VideoCall_0105.vue
new file mode 100644
index 0000000..136c552
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/views/home/components/VideoCall_0105.vue
@@ -0,0 +1,955 @@
+<template>
+    <ExtraInfo webVersion="websocket_0107" :modelVersion="modelVersion" />
+    <div class="video-page">
+        <div class="video-page-header">
+            <div style="display: flex; align-items: center" class="header-icon">
+                <img src="@/assets/images/voice-icon.png" />
+                <span>Audio Choice</span>
+            </div>
+            <div class="voice-container" v-if="!isCalling">
+                <SvgIcon name="voice" class="voice-icon" />
+                <SvgIcon name="voice" class="voice-icon" />
+                <SvgIcon name="voice" class="voice-icon" />
+            </div>
+            <div class="voice-container" v-else>
+                <Voice
+                    :dataArray="dataArray"
+                    :isCalling="isCalling"
+                    :isPlaying="playing"
+                    :configList="videoConfigList"
+                    :boxStyle="{ height: '45px' }"
+                    :itemStyle="{ width: '3px', margin: '0 1px' }"
+                />
+            </div>
+            <!-- <SelectTimbre v-model:timbre="timbre" v-model:audioData="audioData" v-model:disabled="isCalling" /> -->
+        </div>
+        <div class="video-page-content">
+            <div class="video-page-content-video" v-loading="loading" element-loading-background="#f3f3f3">
+                <video ref="videoRef" autoplay playsinline muted />
+                <canvas ref="canvasRef" canvas-id="canvasId" style="display: none" />
+                <div class="switch-camera" v-if="isMobile()" @click="switchCamera">
+                    <SvgIcon name="switch-camera" class="icon" />
+                </div>
+                <!-- <div class="video-size" v-if="width || height">{{ width }} x {{ height }}</div> -->
+            </div>
+            <div class="video-page-content-right">
+                <div class="output-content">
+                    <ModelOutput
+                        v-if="outputData.length > 0"
+                        :outputData="outputData"
+                        containerClass="output-content"
+                    />
+                </div>
+                <div class="skip-box">
+                    <DelayTips
+                        v-if="delayTimestamp > 200 || delayCount > 2"
+                        :delayTimestamp="delayTimestamp"
+                        :delayCount="delayCount"
+                    />
+                    <LikeAndDislike v-model:feedbackStatus="feedbackStatus" v-model:curResponseId="curResponseId" />
+                    <SkipBtn :disabled="skipDisabled" @click="skipVoice" />
+                </div>
+            </div>
+        </div>
+        <div class="video-page-btn">
+            <el-button v-show="!isCalling" type="success" :disabled="callDisabled" @click="initRecording">
+                {{ callDisabled ? 'Not ready yet, please wait' : 'Call MiniCPM' }}
+            </el-button>
+            <el-button v-show="isCalling" @click="stopRecording" type="danger">
+                <SvgIcon name="phone-icon" className="phone-icon" />
+                <span class="btn-text">Hang Up</span>
+                <CountDown v-model="isCalling" @timeUp="stopRecording" />
+            </el-button>
+        </div>
+        <IdeasList v-if="showIdeasList" :ideasList="videoIdeasList" />
+    </div>
+</template>
+<script setup>
+    import { sendMessage, stopMessage, uploadConfig } from '@/apis';
+    import { encodeWAV } from '@/hooks/useVoice';
+    import { getNewUserId, setNewUserId } from '@/hooks/useRandomId';
+    import { fetchEventSource } from '@microsoft/fetch-event-source';
+    import { MicVAD } from '@ricky0123/vad-web';
+    import { videoIdeasList, videoConfigList, showIdeasList } from '@/enums';
+    import { isMobile, maxCount, getChunkLength } from '@/utils';
+    import { mergeBase64ToBlob } from './merge';
+    import WebSocketService from '@/utils/websocket';
+    let ctrl = new AbortController();
+    let socket = null;
+    const audioData = ref({
+        base64Str: '',
+        type: 'mp3'
+    }); // 自定义音色base64
+    const isCalling = defineModel();
+    const videoRef = ref();
+    const videoStream = ref(null);
+    const interval = ref();
+    const canvasRef = ref();
+    const videoImage = ref([]);
+    const videoLoaded = ref(false);
+    const taskQueue = ref([]);
+    const running = ref(false);
+    const outputData = ref([]);
+    const isFirstReturn = ref(true);
+    const audioPlayQueue = ref([]);
+    const base64List = ref([]);
+    const playing = ref(false);
+    const timbre = ref([1]);
+    const isReturnError = ref(false);
+
+    const textQueue = ref('');
+    const textAnimationInterval = ref();
+    const analyser = ref();
+    const dataArray = ref();
+    const animationFrameId = ref();
+    const skipDisabled = ref(true);
+    const stop = ref(false);
+    const isFrontCamera = ref(true);
+    const loading = ref(false);
+    const isEnd = ref(false); // sse接口关闭，认为模型已完成本次返回
+    const isFirstPiece = ref(true);
+    const allVoice = ref([]);
+    const callDisabled = ref(true);
+    const feedbackStatus = ref('');
+    const curResponseId = ref('');
+    const delayTimestamp = ref(0); // 当前发送片延时
+    const delayCount = ref(0); // 当前剩余多少ms未发送到接口
+
+    const modelVersion = ref('');
+
+    let mediaStream;
+    let audioRecorder;
+    let audioStream;
+    let audioContext;
+    let audioChunks = [];
+    let count = 0;
+    let audioDOM;
+    onBeforeUnmount(() => {
+        stopRecording();
+    });
+    const vadStartTime = ref();
+    let myvad = null;
+    let vadTimer = null; // vad定时器，用于检测1s内人声是否停止，1s内停止，可认为是vad误触，直接忽略，1s内未停止，则认为是人声，已自动跳过当前对话
+    const vadStart = async () => {
+        myvad = await MicVAD.new({
+            onSpeechStart: () => {
+                console.log('Speech start', +new Date());
+                if (!skipDisabled.value) {
+                    vadTimer && clearTimeout(vadTimer);
+                    vadTimer = setTimeout(() => {
+                        // vadStartTime.value = +new Date();
+                        console.log('打断时间: ', +new Date());
+                        skipVoice();
+                    }, 1000);
+                }
+            },
+            onSpeechEnd: audio => {
+                vadTimer && clearTimeout(vadTimer);
+                console.log('Speech end', +new Date());
+                // debugger;
+                // do something with `audio` (Float32Array of audio samples at sample rate 16000)...
+            }
+        });
+        myvad.start();
+    };
+    onMounted(async () => {
+        const { code, message } = await stopMessage();
+        if (code !== 0) {
+            ElMessage({
+                type: 'error',
+                message: message,
+                duration: 3000,
+                customClass: 'system-error'
+            });
+            return;
+        }
+        callDisabled.value = false;
+    });
+    const delay = ms => {
+        return new Promise(resolve => setTimeout(resolve, ms));
+    };
+    const initRecording = async () => {
+        uploadUserConfig()
+            .then(async () => {
+                if (!audioDOM) {
+                    audioDOM = new Audio();
+                    audioDOM.playsinline = true;
+                    audioDOM.preload = 'auto';
+                }
+                // 每次call都需要生成新uid
+                setNewUserId();
+                buildConnect();
+                await delay(100);
+                initVideoStream('environment');
+                if (socket) {
+                    socket.close();
+                }
+                socket = new WebSocketService(
+                    `/ws/stream${window.location.search}&uid=${getNewUserId()}&service=minicpmo-server`
+                );
+                socket.connect();
+                initVideoStream('environment');
+                if (localStorage.getItem('canStopByVoice') === 'true') {
+                    vadStart();
+                }
+            })
+            .catch(() => {});
+    };
+    // 切换摄像头
+    const switchCamera = () => {
+        if (!isCalling.value) {
+            return;
+        }
+        isFrontCamera.value = !isFrontCamera.value;
+        const facingMode = isFrontCamera.value ? 'environment' : 'user'; // 'user' 前置, 'environment' 后置
+        initVideoStream(facingMode);
+    };
+    const initVideoStream = async facingMode => {
+        if (mediaStream) {
+            mediaStream.getTracks().forEach(track => track.stop());
+            videoStream.value = null;
+        }
+        outputData.value = [];
+        isCalling.value = true;
+        loading.value = true;
+        if (!videoStream.value) {
+            try {
+                mediaStream = await window.navigator.mediaDevices.getUserMedia({
+                    video: { facingMode },
+                    audio: true
+                });
+                console.log('mediaStream', mediaStream);
+                videoStream.value = mediaStream;
+                videoRef.value.srcObject = mediaStream;
+                loading.value = false;
+                console.log('打开后： ', +new Date());
+                // takePhotos();
+                audioContext = new (window.AudioContext || window.webkitAudioContext)({ sampleRate: 16000 });
+                console.log('samplate: ', audioContext);
+                const audioSource = audioContext.createMediaStreamSource(mediaStream);
+                interval.value = setInterval(() => dealImage(), 50);
+                // 创建 ScriptProcessorNode 用于捕获音频数据
+                const processor = audioContext.createScriptProcessor(256, 1, 1);
+                processor.onaudioprocess = event => {
+                    if (!isCalling.value) return;
+                    if (isReturnError.value) {
+                        stopRecording();
+                        return;
+                    }
+                    const data = event.inputBuffer.getChannelData(0);
+                    audioChunks.push(new Float32Array(data));
+                    // 检查是否已经收集到1秒钟的数据
+                    const totalBufferLength = audioChunks.reduce((total, curr) => total + curr.length, 0);
+                    // const chunkLength = audioContext.sampleRate;
+                    const chunkLength = getChunkLength(audioContext.sampleRate);
+                    if (totalBufferLength >= chunkLength) {
+                        // 合并到一个完整的数据数组，并裁剪成1秒钟
+                        const mergedBuffer = mergeBuffers(audioChunks, totalBufferLength);
+                        const oneSecondBuffer = mergedBuffer.slice(0, audioContext.sampleRate);
+
+                        // 保存并处理成WAV格式
+                        addQueue(+new Date(), () => saveAudioChunk(oneSecondBuffer, +new Date()));
+
+                        // 保留多余的数据备用
+                        audioChunks = [mergedBuffer.slice(audioContext.sampleRate)];
+                    }
+                };
+                analyser.value = audioContext.createAnalyser();
+                // 将音频节点连接到分析器
+                audioSource.connect(analyser.value);
+                // 分析器设置
+                analyser.value.fftSize = 256;
+                const bufferLength = analyser.value.frequencyBinCount;
+                dataArray.value = new Uint8Array(bufferLength);
+                // 开始绘制音波
+                drawBars();
+
+                audioSource.connect(processor);
+                processor.connect(audioContext.destination);
+            } catch {}
+        }
+    };
+    const drawText = async () => {
+        if (textQueue.value.length > 0) {
+            outputData.value[outputData.value.length - 1].text += textQueue.value[0];
+            textQueue.value = textQueue.value.slice(1);
+        } else {
+            cancelAnimationFrame(textAnimationInterval.value);
+        }
+        textAnimationInterval.value = requestAnimationFrame(drawText);
+    };
+    const getStopValue = () => {
+        return stop.value;
+    };
+    const getPlayingValue = () => {
+        return playing.value;
+    };
+    const getStopStatus = () => {
+        return localStorage.getItem('canStopByVoice') === 'true';
+    };
+    const saveAudioChunk = (buffer, timestamp) => {
+        return new Promise(resolve => {
+            if (!getStopStatus() && getPlayingValue()) {
+                resolve();
+                return;
+            }
+            const wavBlob = encodeWAV(buffer, audioContext.sampleRate);
+            let reader = new FileReader();
+            reader.readAsDataURL(wavBlob);
+            reader.onloadend = async function () {
+                let base64data = reader.result.split(',')[1];
+                const imgBase64 = videoImage.value[videoImage.value.length - 1]?.src;
+                if (!(base64data && imgBase64)) {
+                    resolve();
+                    return;
+                }
+                const strBase64 = imgBase64.split(',')[1];
+                count++;
+                let obj = {
+                    messages: [
+                        {
+                            role: 'user',
+                            content: [
+                                {
+                                    type: 'input_audio',
+                                    input_audio: {
+                                        data: base64data,
+                                        format: 'wav',
+                                        timestamp: String(timestamp)
+                                    }
+                                }
+                            ]
+                        }
+                    ]
+                };
+                obj.messages[0].content.unshift({
+                    type: 'image_data',
+                    image_data: {
+                        data: count === maxCount ? strBase64 : '',
+                        type: 2
+                    }
+                });
+                if (count === maxCount) {
+                    count = 0;
+                }
+                socket.send(JSON.stringify(obj));
+                socket.on('message', data => {
+                    console.log('message: ', data);
+                    delayTimestamp.value = +new Date() - timestamp;
+                    delayCount.value = taskQueue.value.length;
+                    resolve();
+                });
+                // 将Base64音频数据发送到后端
+                // try {
+                //     await sendMessage(obj);
+                //     delayTimestamp.value = +new Date() - timestamp;
+                //     delayCount.value = taskQueue.value.length;
+                // } catch (err) {}
+                // resolve();
+            };
+        });
+    };
+    const mergeBuffers = (buffers, length) => {
+        const result = new Float32Array(length);
+        let offset = 0;
+        for (let buffer of buffers) {
+            result.set(buffer, offset);
+            offset += buffer.length;
+        }
+        return result;
+    };
+    const stopRecording = () => {
+        isCalling.value = false;
+        clearInterval(interval.value);
+        interval.value = null;
+        if (audioRecorder && audioRecorder.state !== 'inactive') {
+            audioRecorder.stop();
+        }
+        if (animationFrameId.value) {
+            cancelAnimationFrame(animationFrameId.value);
+        }
+        if (audioContext && audioContext.state !== 'closed') {
+            audioContext.close();
+        }
+        destroyVideoStream();
+        taskQueue.value = [];
+        audioPlayQueue.value = [];
+        base64List.value = [];
+        ctrl.abort();
+        ctrl = new AbortController();
+        isReturnError.value = false;
+        skipDisabled.value = true;
+        playing.value = false;
+        audioDOM?.pause();
+        stopMessage();
+        if (socket) {
+            socket.close();
+        }
+        if (
+            outputData.value[outputData.value.length - 1]?.type === 'BOT' &&
+            outputData.value[outputData.value.length - 1].audio === '' &&
+            allVoice.value.length > 0
+        ) {
+            outputData.value[outputData.value.length - 1].audio = mergeBase64ToBlob(allVoice.value);
+        }
+        myvad && myvad.destroy();
+    };
+    // 建立连接
+    const buildConnect = () => {
+        const obj = {
+            messages: [
+                {
+                    role: 'user',
+                    content: [{ type: 'none' }]
+                }
+            ],
+            stream: true
+        };
+        isEnd.value = false;
+        ctrl.abort();
+        ctrl = new AbortController();
+        const url = `/api/v1/completions${window.location.search}`;
+
+        fetchEventSource(url, {
+            method: 'POST',
+            headers: {
+                'Content-Type': 'application/json',
+                service: 'minicpmo-server',
+                uid: getNewUserId()
+            },
+            body: JSON.stringify(obj),
+            signal: ctrl.signal,
+            openWhenHidden: true,
+            async onopen(response) {
+                isFirstPiece.value = true;
+                isFirstReturn.value = true;
+                allVoice.value = [];
+                base64List.value = [];
+                console.log('onopen', response);
+                if (response.status !== 200) {
+                    ElMessage({
+                        type: 'error',
+                        message: 'At limit. Please try again soon.',
+                        duration: 3000,
+                        customClass: 'system-error'
+                    });
+                    isReturnError.value = true;
+                } else {
+                    isReturnError.value = false;
+                    drawText();
+                }
+            },
+            onmessage(msg) {
+                const data = JSON.parse(msg.data);
+                if (data.response_id) {
+                    curResponseId.value = data.response_id;
+                }
+                if (data.choices[0]?.text) {
+                    textQueue.value += data.choices[0].text.replace('<end>', '');
+                    console.warn('text return time -------------------------------', +new Date());
+                }
+                // 首次返回的是前端发给后端的音频片段，需要单独处理
+                if (isFirstReturn.value) {
+                    console.log('第一次');
+                    isFirstReturn.value = false;
+                    // 如果后端返回的音频为空，需要重连
+                    if (!data.choices[0].audio) {
+                        buildConnect();
+                        return;
+                    }
+                    outputData.value.push({
+                        type: 'USER',
+                        audio: `data:audio/wav;base64,${data.choices[0].audio}`
+                    });
+                    outputData.value.push({
+                        type: 'BOT',
+                        text: '',
+                        audio: ''
+                    });
+                    return;
+                }
+                if (data.choices[0]?.audio) {
+                    console.log('audio return time -------------------------------', +new Date());
+                    if (!getStopValue() && isCalling.value) {
+                        skipDisabled.value = false;
+                        base64List.value.push(`data:audio/wav;base64,${data.choices[0].audio}`);
+                        addAudioQueue(() => truePlay(data.choices[0].audio));
+                    }
+                    allVoice.value.push(`data:audio/wav;base64,${data.choices[0].audio}`);
+                } else {
+                    // 发生异常了，直接重连
+                    buildConnect();
+                }
+                if (data.choices[0].text.includes('<end>')) {
+                    console.log('收到结束标记了:', +new Date());
+                    if (
+                        outputData.value[outputData.value.length - 1]?.type === 'BOT' &&
+                        outputData.value[outputData.value.length - 1].audio === '' &&
+                        allVoice.value.length > 0
+                    ) {
+                        outputData.value[outputData.value.length - 1].audio = mergeBase64ToBlob(allVoice.value);
+                    }
+                }
+            },
+            onclose() {
+                console.log('onclose', +new Date());
+                isEnd.value = true;
+                outputData.value[outputData.value.length - 1].audio = mergeBase64ToBlob(allVoice.value);
+                // sse关闭后，如果待播放的音频列表为空，说明模型出错了，此次连接没有返回音频，则直接重连
+                vadStartTime.value = +new Date();
+                if (audioPlayQueue.value.length === 0) {
+                    let startIndex = taskQueue.value.findIndex(item => item.time >= vadStartTime.value - 1000);
+                    console.log('taskQueue111111111: ', taskQueue.value, startIndex);
+                    if (startIndex !== -1) {
+                        taskQueue.value = taskQueue.value.slice(startIndex);
+                        console.log('截取后长度:', taskQueue.value, vadStartTime.value);
+                    }
+                    buildConnect();
+                }
+            },
+            onerror(err) {
+                console.log('onerror', err);
+                ctrl.abort();
+                ctrl = new AbortController();
+                throw err;
+            }
+        });
+    };
+    // 返回的语音放到队列里，挨个播放
+    const addAudioQueue = async item => {
+        audioPlayQueue.value.push(item);
+        if (isFirstPiece.value) {
+            await delay(1500);
+            isFirstPiece.value = false;
+        }
+        if (audioPlayQueue.value.length > 0 && !playing.value) {
+            playing.value = true;
+            playAudio();
+        }
+    };
+    // 控制播放队列执行
+    const playAudio = () => {
+        console.log('剩余播放列表:', audioPlayQueue.value, +new Date());
+
+        if (!isEnd.value && base64List.value.length >= 2) {
+            const remainLen = base64List.value.length;
+            const blob = mergeBase64ToBlob(base64List.value);
+            audioDOM.src = blob;
+            audioDOM.play();
+            console.error('前期合并后播放开始时间: ', +new Date());
+            audioDOM.onended = () => {
+                console.error('前期合并后播放结束时间: ', +new Date());
+                base64List.value = base64List.value.slice(remainLen);
+                audioPlayQueue.value = audioPlayQueue.value.slice(remainLen);
+                playAudio();
+            };
+            return;
+        }
+        if (isEnd.value && base64List.value.length >= 2) {
+            const blob = mergeBase64ToBlob(base64List.value);
+            audioDOM.src = blob;
+            audioDOM.play();
+            console.error('合并后播放开始时间: ', +new Date());
+            audioDOM.onended = () => {
+                console.error('合并后播放结束时间: ', +new Date());
+                // URL.revokeObjectURL(url);
+                base64List.value = [];
+                audioPlayQueue.value = [];
+                playing.value = false;
+                skipDisabled.value = true;
+                if (isCalling.value && !isReturnError.value) {
+                    // skipDisabled.value = true;
+                    taskQueue.value = [];
+                    // 打断前记录一下打断时间或vad触发事件
+                    // vadStartTime.value = +new Date();
+                    // // 每次完成后只保留当前时刻往前推1s的语音
+                    // console.log(
+                    //     '截取前长度:',
+                    //     taskQueue.value.map(item => item.time)
+                    // );
+                    // let startIndex = taskQueue.value.findIndex(item => item.time >= vadStartTime.value - 1000);
+                    // if (startIndex !== -1) {
+                    //     taskQueue.value = taskQueue.value.slice(startIndex);
+                    //     console.log(
+                    //         '截取后长度:',
+                    //         taskQueue.value.map(item => item.time),
+                    //         vadStartTime.value
+                    //     );
+                    // }
+                    buildConnect();
+                }
+            };
+            return;
+        }
+        base64List.value.shift();
+        const _truePlay = audioPlayQueue.value.shift();
+        if (_truePlay) {
+            _truePlay().finally(() => {
+                playAudio();
+            });
+        } else {
+            playing.value = false;
+            if (isEnd.value) {
+                console.warn('play done................');
+                skipDisabled.value = true;
+            }
+            // 播放完成后且正在通话且接口未返回错误时开始下一次连接
+            if (isEnd.value && isCalling.value && !isReturnError.value) {
+                // skipDisabled.value = true;
+                taskQueue.value = [];
+                // // 跳过之后，只保留当前时间点两秒内到之后的音频片段
+                // vadStartTime.value = +new Date();
+                // console.log(
+                //     '截取前长度:',
+                //     taskQueue.value.map(item => item.time)
+                // );
+                // let startIndex = taskQueue.value.findIndex(item => item.time >= vadStartTime.value - 1000);
+                // if (startIndex !== -1) {
+                //     taskQueue.value = taskQueue.value.slice(startIndex);
+                //     console.log(
+                //         '截取后长度:',
+                //         taskQueue.value.map(item => item.time),
+                //         vadStartTime.value
+                //     );
+                // }
+                buildConnect();
+            }
+        }
+    };
+    // 播放音频
+    const truePlay = voice => {
+        console.log('promise: ', +new Date());
+        return new Promise(resolve => {
+            audioDOM.src = 'data:audio/wav;base64,' + voice;
+            console.error('播放开始时间:', +new Date());
+            audioDOM
+                .play()
+                .then(() => {
+                    console.log('Audio played successfully');
+                })
+                .catch(error => {
+                    if (error.name === 'NotAllowedError' || error.name === 'SecurityError') {
+                        console.error('User interaction required or permission issue:', error);
+                        // ElMessage.warning('音频播放失败');
+                        console.error('播放失败时间');
+                        // alert('Please interact with the page (like clicking a button) to enable audio playback.');
+                    } else {
+                        console.error('Error playing audio:', error);
+                    }
+                });
+            // .finally(() => {
+            //     resolve();
+            // });
+            audioDOM.onerror = () => {
+                console.error('播放失败时间', +new Date());
+                resolve();
+            };
+            audioDOM.onended = () => {
+                console.error('播放结束时间: ', +new Date());
+                // URL.revokeObjectURL(url);
+                resolve();
+            };
+        });
+    };
+    // 当队列中任务数大于0时，开始处理队列中的任务
+    const addQueue = (time, item) => {
+        taskQueue.value.push({ func: item, time });
+        if (taskQueue.value.length > 0 && !running.value) {
+            running.value = true;
+            processQueue();
+        }
+    };
+    const processQueue = () => {
+        const item = taskQueue.value.shift();
+        if (item?.func) {
+            item.func()
+                .then(res => {
+                    console.log('已处理事件: ', res);
+                })
+                .finally(() => processQueue());
+        } else {
+            running.value = false;
+        }
+    };
+    const destroyVideoStream = () => {
+        videoStream.value?.getTracks().forEach(track => track.stop());
+        videoStream.value = null;
+        // 将srcObject设置为null以切断与MediaStream 对象的链接，以便将其释放
+        videoRef.value.srcObject = null;
+
+        videoImage.value = [];
+        videoLoaded.value = false;
+
+        clearInterval(interval.value);
+        interval.value = null;
+    };
+    const dealImage = () => {
+        if (!videoRef.value) {
+            return;
+        }
+        const canvas = canvasRef.value;
+        canvasRef.value.width = videoRef.value.videoWidth;
+        canvasRef.value.height = videoRef.value.videoHeight;
+        const context = canvas.getContext('2d');
+        context.drawImage(videoRef.value, 0, 0, canvasRef.value.width, canvasRef.value.height);
+        const imageDataUrl = canvas.toDataURL('image/webp', 0.8);
+        videoImage.value.push({ src: imageDataUrl });
+    };
+    const drawBars = () => {
+        // AnalyserNode接口的 getByteFrequencyData() 方法将当前频率数据复制到传入的 Uint8Array（无符号字节数组）中。
+        analyser.value.getByteFrequencyData(dataArray.value);
+        animationFrameId.value = requestAnimationFrame(drawBars);
+    };
+    // 跳过当前片段
+    const skipVoice = async () => {
+        // 打断前记录一下打断时间或vad触发事件
+        vadStartTime.value = +new Date();
+        if (!skipDisabled.value) {
+            if (
+                outputData.value[outputData.value.length - 1]?.type === 'BOT' &&
+                outputData.value[outputData.value.length - 1].audio === ''
+            ) {
+                outputData.value[outputData.value.length - 1].audio = mergeBase64ToBlob(allVoice.value);
+            }
+            base64List.value = [];
+            audioPlayQueue.value = [];
+            // 跳过之后，只保留当前时间点两秒内到之后的音频片段
+            console.log(
+                '截取前长度:',
+                taskQueue.value.map(item => item.time)
+            );
+            let startIndex = taskQueue.value.findIndex(item => item.time >= vadStartTime.value - 1000);
+            if (startIndex !== -1) {
+                taskQueue.value = taskQueue.value.slice(startIndex);
+                console.log(
+                    '截取后长度:',
+                    taskQueue.value.map(item => item.time),
+                    vadStartTime.value
+                );
+            }
+            stop.value = true;
+            audioDOM?.pause();
+            setTimeout(() => {
+                skipDisabled.value = true;
+            }, 300);
+            try {
+                playing.value = false;
+                await stopMessage();
+                stop.value = false;
+                // playing.value = false;
+                buildConnect();
+                // cancelAnimationFrame(animationFrameId.value);
+            } catch (err) {}
+        }
+    };
+    // 每次call先上传当前用户配置
+    const uploadUserConfig = async () => {
+        if (!localStorage.getItem('configData')) {
+            return new Promise(resolve => resolve());
+        }
+        const {
+            videoQuality,
+            useAudioPrompt,
+            voiceClonePrompt,
+            assistantPrompt,
+            vadThreshold,
+            audioFormat,
+            base64Str
+        } = JSON.parse(localStorage.getItem('configData'));
+        const obj = {
+            messages: [
+                {
+                    role: 'user',
+                    content: [
+                        {
+                            type: 'input_audio',
+                            input_audio: {
+                                data: base64Str,
+                                format: audioFormat
+                            }
+                        },
+                        {
+                            type: 'options',
+                            options: {
+                                hd_video: videoQuality,
+                                use_audio_prompt: useAudioPrompt,
+                                vad_threshold: vadThreshold,
+                                voice_clone_prompt: voiceClonePrompt,
+                                assistant_prompt: assistantPrompt
+                            }
+                        }
+                    ]
+                }
+            ]
+        };
+        const { code, message, data } = await uploadConfig(obj);
+        modelVersion.value = data?.choices?.content || '';
+        return new Promise((resolve, reject) => {
+            if (code !== 0) {
+                ElMessage({
+                    type: 'error',
+                    message: message,
+                    duration: 3000,
+                    customClass: 'system-error'
+                });
+                reject();
+            } else {
+                resolve();
+            }
+        });
+    };
+</script>
+<style lang="less">
+    .video-page {
+        height: 100%;
+        display: flex;
+        flex-direction: column;
+        &-header {
+            display: flex;
+            align-items: center;
+            padding: 0 16px 16px;
+            box-shadow: 0 0.5px 0 0 #e0e0e0;
+            margin-bottom: 16px;
+            justify-content: space-between;
+            .header-icon {
+                display: flex;
+                align-items: center;
+                img {
+                    width: 24px;
+                    height: 24px;
+                    margin-right: 8px;
+                }
+                span {
+                    color: rgba(23, 23, 23, 0.9);
+                    font-family: PingFang SC;
+                    font-size: 16px;
+                    font-style: normal;
+                    font-weight: 500;
+                    line-height: normal;
+                    margin-right: 40px;
+                    flex-shrink: 0;
+                }
+            }
+            .voice-container {
+                display: flex;
+                .voice-icon {
+                    width: 191px;
+                    height: 45px;
+                }
+            }
+        }
+        &-content {
+            flex: 1;
+            margin-bottom: 16px;
+            display: flex;
+            height: 0;
+            &-video {
+                width: 50%;
+                height: 100%;
+                background: #f3f3f3;
+                flex-shrink: 0;
+                position: relative;
+                video {
+                    width: 100%;
+                    height: 100%;
+                    object-fit: contain;
+                }
+                .switch-camera {
+                    position: absolute;
+                    top: 10px;
+                    right: 10px;
+                    width: 36px;
+                    height: 36px;
+                    background: #ffffff;
+                    border-radius: 6px;
+                    display: flex;
+                    justify-content: center;
+                    align-items: center;
+                    font-size: 24px;
+                    z-index: 999;
+                    .icon {
+                        width: 20px;
+                        height: 20px;
+                    }
+                }
+            }
+            &-right {
+                margin-left: 16px;
+                flex: 1;
+                padding: 0 16px;
+                display: flex;
+                flex-direction: column;
+                .output-content {
+                    flex: 1;
+                    overflow: auto;
+                }
+                .skip-box {
+                    display: flex;
+                    align-items: center;
+                    justify-content: flex-end;
+                    margin-top: 16px;
+                }
+            }
+        }
+        &-btn {
+            text-align: center;
+            padding: 8px 0;
+            .el-button {
+                width: 284px;
+                height: 46px;
+                border-radius: 8px;
+            }
+            .el-button.el-button--success {
+                background: #647fff;
+                border-color: #647fff;
+                &:hover {
+                    opacity: 0.8;
+                }
+                span {
+                    color: #fff;
+                    font-family: PingFang SC;
+                    font-size: 16px;
+                    font-style: normal;
+                    font-weight: 500;
+                    line-height: normal;
+                }
+            }
+            .el-button.el-button--success.is-disabled {
+                background: #f3f3f3;
+                border-color: #f3f3f3;
+                span {
+                    color: #d1d1d1;
+                }
+            }
+            .el-button.el-button--danger {
+                border-color: #dc3545;
+                background-color: #dc3545;
+                color: #ffffff;
+                font-family: PingFang SC;
+                font-size: 16px;
+                font-style: normal;
+                font-weight: 500;
+                line-height: normal;
+                .phone-icon {
+                    margin-right: 10px;
+                }
+                .btn-text {
+                    margin-right: 10px;
+                }
+                .btn-desc {
+                    margin-right: 16px;
+                }
+            }
+        }
+    }
+    .video-size {
+        position: absolute;
+        bottom: 10px;
+        right: 10px;
+        background: rgba(0, 0, 0, 0.5);
+        color: #fff;
+        padding: 4px 8px;
+        border-radius: 4px;
+        font-size: 12px;
+    }
+</style>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/views/home/components/VoiceCall.vue b/web_demos/minicpm-o_2.6/web_server/src/views/home/components/VoiceCall.vue
new file mode 100644
index 0000000..84162a9
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/views/home/components/VoiceCall.vue
@@ -0,0 +1,833 @@
+<template>
+    <!-- <ExtraInfo webVersion="非websocket_0112" :modelVersion="modelVersion" /> -->
+    <div class="voice-page">
+        <div class="voice-page-header">
+            <div class="voice-container" v-if="!isCalling">
+                <SvgIcon name="voice" class="voice-icon" />
+                <SvgIcon name="voice" class="voice-icon" />
+                <SvgIcon name="voice" class="voice-icon" />
+            </div>
+            <div class="voice-container" v-else>
+                <Voice
+                    :dataArray="dataArray"
+                    :isCalling="isCalling"
+                    :isPlaying="playing"
+                    :configList="videoConfigList"
+                    :boxStyle="{ height: '45px' }"
+                    :itemStyle="{ width: '3px', margin: '0 1px' }"
+                />
+            </div>
+            <!-- <SelectTimbre v-model:timbre="timbre" v-model:audioData="audioData" v-model:disabled="isCalling" /> -->
+        </div>
+        <div class="voice-page-output">
+            <div class="output-content">
+                <ModelOutput v-if="outputData.length > 0" :outputData="outputData" containerClass="output-content" />
+            </div>
+            <div class="skip-box">
+                <!-- <DelayTips
+                    v-if="delayTimestamp > 200 || delayCount > 2"
+                    :delayTimestamp="delayTimestamp"
+                    :delayCount="delayCount"
+                /> -->
+                <LikeAndDislike v-model:feedbackStatus="feedbackStatus" v-model:curResponseId="curResponseId" />
+                <SkipBtn :disabled="skipDisabled" @click="skipVoice" />
+            </div>
+        </div>
+        <div class="voice-page-btn">
+            <el-button v-show="!isCalling" type="success" :disabled="callDisabled" @click="initRecording">
+                {{ callDisabled ? t('notReadyBtn') : t('audioCallBtn') }}
+            </el-button>
+            <el-button v-show="isCalling" @click="stopRecording" type="danger">
+                <SvgIcon name="phone-icon" className="phone-icon" />
+                <span class="btn-text">{{ t('hangUpBtn') }}</span>
+                <CountDown v-model="isCalling" @timeUp="stopRecording" />
+            </el-button>
+        </div>
+        <IdeasList v-if="showIdeasList" :ideasList="voiceIdeasList" />
+    </div>
+</template>
+<script setup>
+    import { sendMessage, stopMessage, uploadConfig } from '@/apis';
+    import { encodeWAV } from '@/hooks/useVoice';
+    import { getNewUserId, setNewUserId } from '@/hooks/useRandomId';
+    import { fetchEventSource } from '@microsoft/fetch-event-source';
+    import { MicVAD } from '@ricky0123/vad-web';
+    import { videoConfigList, voiceConfigList, voiceIdeasList, showIdeasList } from '@/enums';
+    import { getChunkLength } from '@/utils';
+    import { mergeBase64ToBlob } from './merge';
+    import WebSocketService from '@/utils/websocket';
+    import { useI18n } from 'vue-i18n';
+
+    const { t } = useI18n();
+
+    let ctrl = new AbortController();
+    let socket = null;
+    const audioData = ref({
+        base64Str: '',
+        type: 'mp3'
+    }); // 自定义音色base64
+    const isCalling = defineModel();
+    const taskQueue = ref([]);
+    const running = ref(false);
+    const outputData = ref([]);
+    const textQueue = ref('');
+    const textAnimationInterval = ref();
+
+    const isFirstReturn = ref(true); // 首次返回的音频是前端发给后端的音频片段，需要单独处理
+
+    const audioPlayQueue = ref([]);
+    const base64List = ref([]);
+    const playing = ref(false);
+    const skipDisabled = ref(true);
+    const stop = ref(false);
+    const timbre = ref([1]);
+    const isReturnError = ref(false);
+    const allVoice = ref([]);
+    const callDisabled = ref(true);
+
+    const feedbackStatus = ref('');
+    const curResponseId = ref('');
+    const delayTimestamp = ref(0); // 当前发送片延时
+    const delayCount = ref(0); // 当前剩余多少ms未发送到接口
+
+    const modelVersion = ref('');
+
+    let audioDOM = new Audio();
+
+    const isEnd = ref(false); // sse接口关闭，认为模型已完成本次返回
+    // 页面卸载时关闭录音
+    onBeforeUnmount(() => {
+        stopRecording();
+    });
+    const vadStartTime = ref();
+    let myvad = null;
+    let vadTimer = null; // vad定时器，用于检测1s内人声是否停止，1s内停止，可认为是vad误触，直接忽略，1s内未停止，则认为是人声，已自动跳过当前对话
+    const vadStart = async () => {
+        myvad = await MicVAD.new({
+            onSpeechStart: () => {
+                console.log('Speech start detected');
+                // if (!skipDisabled.value) {
+                vadTimer && clearTimeout(vadTimer);
+                vadTimer = setTimeout(() => {
+                    console.log('打断时间: ', +new Date());
+                    skipVoice();
+                }, 500);
+                // }
+            },
+            onSpeechEnd: audio => {
+                vadTimer && clearTimeout(vadTimer);
+                // debugger;
+                // do something with `audio` (Float32Array of audio samples at sample rate 16000)...
+            },
+            baseAssetPath: '/'
+        });
+        console.log('vad: ', myvad);
+        myvad.start();
+    };
+    onMounted(async () => {
+        const { code, message } = await stopMessage();
+        if (code !== 0) {
+            ElMessage({
+                type: 'error',
+                message: message,
+                duration: 3000,
+                customClass: 'system-error'
+            });
+            return;
+        }
+        callDisabled.value = false;
+    });
+    const delay = ms => {
+        return new Promise(resolve => setTimeout(resolve, ms));
+    };
+    const initRecording = async () => {
+        uploadUserConfig()
+            .then(async () => {
+                // 每次call都需要生成新uid
+                setNewUserId();
+
+                outputData.value = [];
+                buildConnect();
+                isCalling.value = true;
+                await delay(100);
+                // if (socket) {
+                //     socket.close();
+                // }
+                // socket = new WebSocketService(
+                //     `/ws/stream${window.location.search}&uid=${getNewUserId()}&service=minicpmo-server`
+                // );
+                // socket.connect();
+                // 建立连接后稍等一会儿再传送数据
+                startRecording();
+                if (localStorage.getItem('canStopByVoice') === 'true') {
+                    vadStart();
+                }
+            })
+            .catch(() => {});
+    };
+    let audioContext;
+    const analyser = ref();
+    const dataArray = ref();
+    let mediaRecorder;
+    let audioChunks = [];
+    const animationFrameId = ref();
+
+    const isFirstPiece = ref(true);
+
+    const startRecording = async () => {
+        // 获取用户音频流
+        const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
+
+        // 创建 AudioContext 和 MediaStreamAudioSourceNode
+        audioContext = new (window.AudioContext || window.webkitAudioContext)({ sampleRate: 16000 });
+        const source = audioContext.createMediaStreamSource(stream);
+
+        analyser.value = audioContext.createAnalyser();
+        // 将音频节点连接到分析器
+        source.connect(analyser.value);
+        // 分析器设置
+        analyser.value.fftSize = 256;
+        const bufferLength = analyser.value.frequencyBinCount;
+        dataArray.value = new Uint8Array(bufferLength);
+        // 开始绘制音波
+        drawBars();
+
+        // 创建 ScriptProcessorNode 用于捕获音频数据
+        const processor = audioContext.createScriptProcessor(256, 1, 1);
+
+        processor.onaudioprocess = event => {
+            if (!isCalling.value) return;
+            if (isReturnError.value) {
+                stopRecording();
+                return;
+            }
+            const data = event.inputBuffer.getChannelData(0);
+            audioChunks.push(new Float32Array(data));
+            // 检查是否已经收集到1秒钟的数据
+            const totalBufferLength = audioChunks.reduce((total, curr) => total + curr.length, 0);
+            const chunkLength = getChunkLength(audioContext.sampleRate);
+            if (totalBufferLength >= chunkLength) {
+                // 合并到一个完整的数据数组，并裁剪成1秒钟
+                const mergedBuffer = mergeBuffers(audioChunks, totalBufferLength);
+                const oneSecondBuffer = mergedBuffer.slice(0, chunkLength);
+                // 保存并处理成WAV格式
+                addQueue(+new Date(), () => saveAudioChunk(oneSecondBuffer, +new Date()));
+                // 保留多余的数据备用
+                audioChunks = [mergedBuffer.slice(chunkLength)];
+            }
+        };
+
+        source.connect(processor);
+        processor.connect(audioContext.destination);
+    };
+    const stopRecording = () => {
+        isCalling.value = false;
+        if (animationFrameId.value) {
+            cancelAnimationFrame(animationFrameId.value);
+        }
+        if (audioContext && audioContext.state !== 'closed') {
+            audioContext.close();
+        }
+        ctrl.abort();
+        ctrl = new AbortController();
+        taskQueue.value = [];
+        audioPlayQueue.value = [];
+        base64List.value = [];
+        isReturnError.value = false;
+        skipDisabled.value = true;
+        playing.value = false;
+        audioDOM.pause();
+        stopMessage();
+        if (socket) {
+            socket.close();
+        }
+        if (
+            outputData.value[outputData.value.length - 1]?.type === 'BOT' &&
+            outputData.value[outputData.value.length - 1].audio === '' &&
+            allVoice.value.length > 0
+        ) {
+            outputData.value[outputData.value.length - 1].audio = mergeBase64ToBlob(allVoice.value);
+        }
+        myvad && myvad.destroy();
+    };
+    const getStopValue = () => {
+        return stop.value;
+    };
+    const getPlayingValue = () => {
+        return playing.value;
+    };
+    const getStopStatus = () => {
+        return localStorage.getItem('canStopByVoice') === 'true';
+    };
+    const saveAudioChunk = (buffer, timestamp) => {
+        return new Promise(resolve => {
+            if (!getStopStatus() && getPlayingValue()) {
+                resolve();
+                return;
+            }
+            const wavBlob = encodeWAV(buffer, audioContext.sampleRate);
+            let reader = new FileReader();
+            reader.readAsDataURL(wavBlob);
+
+            reader.onloadend = async function () {
+                let base64data = reader.result.split(',')[1];
+                if (!base64data) {
+                    resolve();
+                    return;
+                }
+                const obj = {
+                    uid: getNewUserId(),
+                    messages: [
+                        {
+                            role: 'user',
+                            content: [
+                                {
+                                    type: 'input_audio',
+                                    input_audio: {
+                                        data: base64data,
+                                        format: 'wav',
+                                        timestamp: String(timestamp)
+                                    }
+                                }
+                            ]
+                        }
+                    ]
+                };
+                // socket.send(JSON.stringify(obj));
+                // socket.on('message', data => {
+                //     console.log('message: ', data);
+                //     delayTimestamp.value = +new Date() - timestamp;
+                //     delayCount.value = taskQueue.value.length;
+                //     resolve();
+                // });
+                // 将Base64音频数据发送到后端
+                try {
+                    await sendMessage(obj);
+                    delayTimestamp.value = +new Date() - timestamp;
+                    delayCount.value = taskQueue.value.length;
+                } catch (err) {}
+                resolve();
+            };
+        });
+    };
+    const mergeBuffers = (buffers, length) => {
+        const result = new Float32Array(length);
+        let offset = 0;
+        for (let buffer of buffers) {
+            result.set(buffer, offset);
+            offset += buffer.length;
+        }
+        return result;
+    };
+    // 建立连接
+    const buildConnect = async () => {
+        const obj = {
+            messages: [
+                {
+                    role: 'user',
+                    content: [{ type: 'none' }]
+                }
+            ],
+            stream: true
+        };
+        isEnd.value = false;
+        ctrl.abort();
+        ctrl = new AbortController();
+        const url = `/api/v1/completions${window.location.search}`;
+        fetchEventSource(url, {
+            method: 'POST',
+            headers: {
+                'Content-Type': 'application/json',
+                service: 'minicpmo-server',
+                uid: getNewUserId()
+            },
+            body: JSON.stringify(obj),
+            signal: ctrl.signal,
+            openWhenHidden: true,
+            async onopen(response) {
+                console.log('onopen', response);
+                isFirstPiece.value = true;
+                isFirstReturn.value = true;
+                allVoice.value = [];
+                base64List.value = [];
+                if (response.status !== 200) {
+                    ElMessage({
+                        type: 'error',
+                        message: 'At limit. Please try again soon.',
+                        duration: 3000,
+                        customClass: 'system-error'
+                    });
+                    isReturnError.value = true;
+                } else {
+                    isReturnError.value = false;
+                    // skipDisabled.value = false;
+                    drawText();
+                }
+            },
+            onmessage(msg) {
+                const data = JSON.parse(msg.data);
+                if (data.response_id) {
+                    curResponseId.value = data.response_id;
+                }
+                if (data.choices[0]?.text) {
+                    textQueue.value += data.choices[0].text.replace('<end>', '');
+                    console.warn('text return time -------------------------------', +new Date());
+                }
+                // 首次返回的是前端发给后端的音频片段，需要单独处理
+                if (isFirstReturn.value) {
+                    console.log('第一次');
+                    isFirstReturn.value = false;
+                    // 如果后端返回的音频为空，需要重连
+                    if (!data.choices[0].audio) {
+                        buildConnect();
+                        return;
+                    }
+                    outputData.value.push({
+                        type: 'USER',
+                        audio: `data:audio/wav;base64,${data.choices[0].audio}`
+                    });
+                    outputData.value.push({
+                        type: 'BOT',
+                        text: '',
+                        audio: ''
+                    });
+                    return;
+                }
+                if (data.choices[0]?.audio) {
+                    console.warn('audio return time -------------------------------', +new Date());
+                    if (!getStopValue() && isCalling.value) {
+                        skipDisabled.value = false;
+                        base64List.value.push(`data:audio/wav;base64,${data.choices[0].audio}`);
+                        addAudioQueue(() => truePlay(data.choices[0].audio));
+                    }
+                    allVoice.value.push(`data:audio/wav;base64,${data.choices[0].audio}`);
+                } else {
+                    // 发生异常了，直接重连
+                    buildConnect();
+                }
+                if (data.choices[0].text.includes('<end>')) {
+                    // isEnd.value = true;
+                    console.log('收到结束标记了:', +new Date());
+                    if (
+                        outputData.value[outputData.value.length - 1]?.type === 'BOT' &&
+                        outputData.value[outputData.value.length - 1].audio === '' &&
+                        allVoice.value.length > 0
+                    ) {
+                        outputData.value[outputData.value.length - 1].audio = mergeBase64ToBlob(allVoice.value);
+                    }
+                }
+            },
+            onclose() {
+                console.log('onclose', +new Date());
+                isEnd.value = true;
+                if (
+                    outputData.value[outputData.value.length - 1]?.type === 'BOT' &&
+                    outputData.value[outputData.value.length - 1].audio === '' &&
+                    allVoice.value.length > 0
+                ) {
+                    outputData.value[outputData.value.length - 1].audio = mergeBase64ToBlob(allVoice.value);
+                }
+                vadStartTime.value = +new Date();
+                if (audioPlayQueue.value.length === 0) {
+                    let startIndex = taskQueue.value.findIndex(item => item.time >= vadStartTime.value - 2000);
+                    console.log('taskQueue111111111: ', taskQueue.value, startIndex);
+                    if (startIndex !== -1) {
+                        taskQueue.value = taskQueue.value.slice(startIndex);
+                        console.log('截取后长度:', taskQueue.value, vadStartTime.value);
+                    }
+                    buildConnect();
+                }
+            },
+            onerror(err) {
+                console.log('onerror', err);
+                ctrl.abort();
+                ctrl = new AbortController();
+                throw err;
+            }
+        });
+    };
+    const drawText = async () => {
+        if (textQueue.value.length > 0) {
+            outputData.value[outputData.value.length - 1].text += textQueue.value[0];
+            textQueue.value = textQueue.value.slice(1);
+        } else {
+            cancelAnimationFrame(textAnimationInterval.value);
+        }
+        textAnimationInterval.value = requestAnimationFrame(drawText);
+    };
+    // 返回的语音放到队列里，挨个播放
+    const addAudioQueue = async item => {
+        audioPlayQueue.value.push(item);
+        if (isFirstPiece.value) {
+            await delay(500);
+            isFirstPiece.value = false;
+        }
+        if (audioPlayQueue.value.length > 0 && !playing.value) {
+            playing.value = true;
+            playAudio();
+        }
+    };
+    // 控制播放队列执行
+    const playAudio = () => {
+        console.log('剩余播放列表:', audioPlayQueue.value, +new Date());
+
+        if (!isEnd.value && base64List.value.length >= 2) {
+            const remainLen = base64List.value.length;
+            const blob = mergeBase64ToBlob(base64List.value);
+            audioDOM.src = blob;
+            audioDOM.play();
+            console.error('前期合并后播放开始时间: ', +new Date());
+            audioDOM.onended = () => {
+                console.error('前期合并后播放结束时间: ', +new Date());
+                base64List.value = base64List.value.slice(remainLen);
+                audioPlayQueue.value = audioPlayQueue.value.slice(remainLen);
+                playAudio();
+            };
+            return;
+        }
+        if (isEnd.value && base64List.value.length >= 2) {
+            const blob = mergeBase64ToBlob(base64List.value);
+            // let audio = new Audio();
+            audioDOM.src = blob;
+            audioDOM.play();
+            console.error('最后合并后播放开始时间: ', +new Date());
+            audioDOM.onended = () => {
+                console.error('合并后播放结束时间: ', +new Date());
+                // URL.revokeObjectURL(url);
+                base64List.value = [];
+                audioPlayQueue.value = [];
+                playing.value = false;
+                skipDisabled.value = true;
+                if (isCalling.value && !isReturnError.value) {
+                    // skipDisabled.value = true;
+                    taskQueue.value = [];
+                    // 打断前记录一下打断时间或vad触发事件
+                    // vadStartTime.value = +new Date();
+                    // // 每次完成后只保留当前时刻往前推1s的语音
+                    // console.log('截取前长度:', JSON.parse(JSON.stringify(taskQueue.value.map(item => item.time))));
+                    // let startIndex = taskQueue.value.findIndex(item => item.time >= vadStartTime.value - 2000);
+                    // if (startIndex !== -1) {
+                    //     taskQueue.value = taskQueue.value.slice(startIndex);
+                    //     console.log(
+                    //         '截取后长度:',
+                    //         taskQueue.value.map(item => item.time),
+                    //         vadStartTime.value
+                    //     );
+                    // }
+                    buildConnect();
+                }
+            };
+            return;
+        }
+        base64List.value.shift();
+        const item = audioPlayQueue.value.shift();
+        if (item) {
+            item().finally(() => playAudio());
+        } else {
+            playing.value = false;
+            if (isEnd.value) {
+                console.warn('play done................');
+                skipDisabled.value = true;
+            }
+            // 播放完成后且正在通话且接口未返回错误时开始下一次连接
+            if (isEnd.value && isCalling.value && !isReturnError.value) {
+                // skipDisabled.value = true;
+                taskQueue.value = [];
+                // 打断前记录一下打断时间或vad触发事件
+                // vadStartTime.value = +new Date();
+                // // 每次完成后只保留当前时刻往前推1s的语音
+                // console.log(
+                //     '截取前长度:',
+                //     taskQueue.value.map(item => item.time)
+                // );
+                // let startIndex = taskQueue.value.findIndex(item => item.time >= vadStartTime.value - 2000);
+                // if (startIndex !== -1) {
+                //     taskQueue.value = taskQueue.value.slice(startIndex);
+                //     console.log(
+                //         '截取后长度:',
+                //         taskQueue.value.map(item => item.time),
+                //         vadStartTime.value
+                //     );
+                // }
+                buildConnect();
+            }
+        }
+    };
+
+    // 播放音频
+    const truePlay = async voice => {
+        return new Promise(resolve => {
+            audioDOM.src = 'data:audio/wav;base64,' + voice;
+            console.error('播放开始时间:', +new Date());
+            audioDOM
+                .play()
+                .then(() => {
+                    // console.error('播放结束时间: ', +new Date());
+                })
+                .catch(error => {
+                    resolve();
+                    if (error.name === 'NotAllowedError' || error.name === 'SecurityError') {
+                        console.error('User interaction required or permission issue:', error);
+                        ElMessage.warning('音频播放失败');
+                    } else {
+                        console.error('Error playing audio:', error);
+                    }
+                });
+
+            audioDOM.onended = () => {
+                console.error('播放结束时间: ', +new Date());
+                // URL.revokeObjectURL(url);
+                resolve();
+            };
+        });
+    };
+    // 当队列中任务数大于0时，开始处理队列中的任务
+    const addQueue = (time, item) => {
+        taskQueue.value.push({ func: item, time });
+        if (taskQueue.value.length > 0 && !running.value) {
+            running.value = true;
+            processQueue();
+        }
+    };
+    const processQueue = () => {
+        const item = taskQueue.value.shift();
+        if (item?.func) {
+            item.func().then(() => {
+                console.warn('shift!!!!!!!!!');
+                processQueue();
+            });
+        } else {
+            running.value = false;
+        }
+    };
+    const drawBars = () => {
+        // AnalyserNode接口的 getByteFrequencyData() 方法将当前频率数据复制到传入的 Uint8Array（无符号字节数组）中。
+        analyser.value.getByteFrequencyData(dataArray.value);
+        animationFrameId.value = requestAnimationFrame(drawBars);
+    };
+    // 跳过当前片段
+    const skipVoice = async () => {
+        // 打断前记录一下打断时间或vad触发事件
+        vadStartTime.value = +new Date();
+        if (!skipDisabled.value) {
+            if (
+                outputData.value[outputData.value.length - 1]?.type === 'BOT' &&
+                outputData.value[outputData.value.length - 1].audio === ''
+            ) {
+                outputData.value[outputData.value.length - 1].audio = mergeBase64ToBlob(allVoice.value);
+            }
+            base64List.value = [];
+            audioPlayQueue.value = [];
+            // 跳过之后，只保留当前时间点两秒内到之后的音频片段
+            console.log(
+                '截取前长度:',
+                taskQueue.value.map(item => item.time)
+            );
+            let startIndex = taskQueue.value.findIndex(item => item.time >= vadStartTime.value - 1000);
+            if (startIndex !== -1) {
+                taskQueue.value = taskQueue.value.slice(startIndex);
+                console.log(
+                    '截取后长度:',
+                    taskQueue.value.map(item => item.time),
+                    vadStartTime.value
+                );
+            }
+            stop.value = true;
+            audioDOM.pause();
+            setTimeout(() => {
+                skipDisabled.value = true;
+            }, 300);
+            try {
+                playing.value = false;
+                await stopMessage();
+                stop.value = false;
+                // playing.value = false;
+                buildConnect();
+                // cancelAnimationFrame(animationFrameId.value);
+            } catch (err) {}
+        }
+    };
+    // 每次call先上传当前用户配置
+    const uploadUserConfig = async () => {
+        if (!localStorage.getItem('configData')) {
+            return new Promise(resolve => resolve());
+        }
+        const {
+            videoQuality,
+            useAudioPrompt,
+            voiceClonePrompt,
+            assistantPrompt,
+            vadThreshold,
+            audioFormat,
+            base64Str
+        } = JSON.parse(localStorage.getItem('configData'));
+        const obj = {
+            messages: [
+                {
+                    role: 'user',
+                    content: [
+                        {
+                            type: 'input_audio',
+                            input_audio: {
+                                data: base64Str,
+                                format: audioFormat
+                            }
+                        },
+                        {
+                            type: 'options',
+                            options: {
+                                hd_video: videoQuality,
+                                use_audio_prompt: useAudioPrompt,
+                                vad_threshold: vadThreshold,
+                                voice_clone_prompt: voiceClonePrompt,
+                                assistant_prompt: assistantPrompt
+                            }
+                        }
+                    ]
+                }
+            ]
+        };
+        const { code, message, data } = await uploadConfig(obj);
+        modelVersion.value = data?.choices?.content || '';
+        return new Promise((resolve, reject) => {
+            if (code !== 0) {
+                ElMessage({
+                    type: 'error',
+                    message: message,
+                    duration: 3000,
+                    customClass: 'system-error'
+                });
+                reject();
+            } else {
+                resolve();
+            }
+        });
+    };
+</script>
+<style lang="less" scoped>
+    .voice-page {
+        flex: 1;
+        height: 100%;
+        display: flex;
+        flex-direction: column;
+        &-header {
+            display: flex;
+            align-items: center;
+            justify-content: center;
+            padding: 0 16px 16px;
+            box-shadow: 0 0.5px 0 0 #e0e0e0;
+            margin-bottom: 16px;
+            .header-icon {
+                display: flex;
+                align-items: center;
+                img {
+                    width: 24px;
+                    height: 24px;
+                    margin-right: 8px;
+                }
+                span {
+                    color: rgba(23, 23, 23, 0.9);
+                    font-family: PingFang SC;
+                    font-size: 16px;
+                    font-style: normal;
+                    font-weight: 500;
+                    line-height: normal;
+                    margin-right: 40px;
+                    flex-shrink: 0;
+                }
+            }
+            .voice-container {
+                display: flex;
+                .voice-icon {
+                    width: 191px;
+                    height: 45px;
+                }
+            }
+        }
+        &-output {
+            flex: 1;
+            height: 0;
+            padding: 0 16px;
+            margin-bottom: 16px;
+            display: flex;
+            flex-direction: column;
+            .output-content {
+                flex: 1;
+                overflow: auto;
+            }
+            .skip-box {
+                display: flex;
+                align-items: center;
+                justify-content: flex-end;
+                margin-top: 16px;
+            }
+        }
+        &-btn {
+            text-align: center;
+            padding: 8px 0;
+            .el-button {
+                width: 284px;
+                height: 46px;
+                border-radius: 8px;
+            }
+            .el-button.el-button--success {
+                background: #647fff;
+                border-color: #647fff;
+                &:hover {
+                    opacity: 0.8;
+                }
+                span {
+                    color: #fff;
+                    font-family: PingFang SC;
+                    font-size: 16px;
+                    font-style: normal;
+                    font-weight: 500;
+                    line-height: normal;
+                }
+            }
+            .el-button.el-button--success.is-disabled {
+                background: #f3f3f3;
+                border-color: #f3f3f3;
+                span {
+                    color: #d1d1d1;
+                }
+            }
+            .el-button.el-button--danger {
+                border-color: #dc3545;
+                background-color: #dc3545;
+                color: #ffffff;
+                font-family: PingFang SC;
+                font-size: 16px;
+                font-style: normal;
+                font-weight: 500;
+                line-height: normal;
+                .phone-icon {
+                    margin-right: 10px;
+                }
+                .btn-text {
+                    margin-right: 10px;
+                }
+                .btn-desc {
+                    margin-right: 16px;
+                }
+                .time {
+                    display: flex;
+                    align-items: center;
+                    .time-minute,
+                    .time-second {
+                        width: 26px;
+                        height: 26px;
+                        display: flex;
+                        justify-content: center;
+                        align-items: center;
+                        border-radius: 3.848px;
+                        background: rgba(47, 47, 47, 0.5);
+                    }
+                    .time-colon {
+                        margin: 0 3px;
+                    }
+                }
+            }
+        }
+    }
+</style>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/views/home/components/VoiceCall_0105.vue b/web_demos/minicpm-o_2.6/web_server/src/views/home/components/VoiceCall_0105.vue
new file mode 100644
index 0000000..66009d6
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/views/home/components/VoiceCall_0105.vue
@@ -0,0 +1,829 @@
+<template>
+    <ExtraInfo webVersion="websocket_0107" :modelVersion="modelVersion" />
+    <div class="voice-page">
+        <div class="voice-page-header">
+            <div class="header-icon">
+                <img src="@/assets/images/voice-icon.png" />
+                <span>Audio Choice</span>
+            </div>
+            <div class="voice-container" v-if="!isCalling">
+                <SvgIcon name="voice" class="voice-icon" />
+                <SvgIcon name="voice" class="voice-icon" />
+                <SvgIcon name="voice" class="voice-icon" />
+            </div>
+            <div class="voice-container" v-else>
+                <Voice
+                    :dataArray="dataArray"
+                    :isCalling="isCalling"
+                    :isPlaying="playing"
+                    :configList="videoConfigList"
+                    :boxStyle="{ height: '45px' }"
+                    :itemStyle="{ width: '3px', margin: '0 1px' }"
+                />
+            </div>
+            <!-- <SelectTimbre v-model:timbre="timbre" v-model:audioData="audioData" v-model:disabled="isCalling" /> -->
+        </div>
+        <div class="voice-page-output">
+            <div class="output-content">
+                <ModelOutput v-if="outputData.length > 0" :outputData="outputData" containerClass="output-content" />
+            </div>
+            <div class="skip-box">
+                <DelayTips
+                    v-if="delayTimestamp > 200 || delayCount > 2"
+                    :delayTimestamp="delayTimestamp"
+                    :delayCount="delayCount"
+                />
+                <LikeAndDislike v-model:feedbackStatus="feedbackStatus" v-model:curResponseId="curResponseId" />
+                <SkipBtn :disabled="skipDisabled" @click="skipVoice" />
+            </div>
+        </div>
+        <div class="voice-page-btn">
+            <el-button v-show="!isCalling" type="success" :disabled="callDisabled" @click="initRecording">
+                {{ callDisabled ? 'Not ready yet, please wait' : 'Call MiniCPM' }}
+            </el-button>
+            <el-button v-show="isCalling" @click="stopRecording" type="danger">
+                <SvgIcon name="phone-icon" className="phone-icon" />
+                <span class="btn-text">Hang Up</span>
+                <CountDown v-model="isCalling" @timeUp="stopRecording" />
+            </el-button>
+        </div>
+        <IdeasList v-if="showIdeasList" :ideasList="voiceIdeasList" />
+    </div>
+</template>
+<script setup>
+    import { sendMessage, stopMessage, uploadConfig } from '@/apis';
+    import { encodeWAV } from '@/hooks/useVoice';
+    import { getNewUserId, setNewUserId } from '@/hooks/useRandomId';
+    import { fetchEventSource } from '@microsoft/fetch-event-source';
+    import { MicVAD } from '@ricky0123/vad-web';
+    import { videoConfigList, voiceConfigList, voiceIdeasList, showIdeasList } from '@/enums';
+    import { getChunkLength } from '@/utils';
+    import { mergeBase64ToBlob } from './merge';
+    import WebSocketService from '@/utils/websocket';
+
+    let ctrl = new AbortController();
+    let socket = null;
+    const audioData = ref({
+        base64Str: '',
+        type: 'mp3'
+    }); // 自定义音色base64
+    const isCalling = defineModel();
+    const taskQueue = ref([]);
+    const running = ref(false);
+    const outputData = ref([]);
+    const textQueue = ref('');
+    const textAnimationInterval = ref();
+
+    const isFirstReturn = ref(true); // 首次返回的音频是前端发给后端的音频片段，需要单独处理
+
+    const audioPlayQueue = ref([]);
+    const base64List = ref([]);
+    const playing = ref(false);
+    const skipDisabled = ref(true);
+    const stop = ref(false);
+    const timbre = ref([1]);
+    const isReturnError = ref(false);
+    const allVoice = ref([]);
+    const callDisabled = ref(true);
+
+    const feedbackStatus = ref('');
+    const curResponseId = ref('');
+    const delayTimestamp = ref(0); // 当前发送片延时
+    const delayCount = ref(0); // 当前剩余多少ms未发送到接口
+
+    const modelVersion = ref('');
+
+    let audioDOM = new Audio();
+
+    const isEnd = ref(false); // sse接口关闭，认为模型已完成本次返回
+    // 页面卸载时关闭录音
+    onBeforeUnmount(() => {
+        stopRecording();
+    });
+    const vadStartTime = ref();
+    let myvad = null;
+    let vadTimer = null; // vad定时器，用于检测1s内人声是否停止，1s内停止，可认为是vad误触，直接忽略，1s内未停止，则认为是人声，已自动跳过当前对话
+    const vadStart = async () => {
+        myvad = await MicVAD.new({
+            onSpeechStart: () => {
+                console.log('Speech start detected');
+                if (!skipDisabled.value) {
+                    vadTimer && clearTimeout(vadTimer);
+                    vadTimer = setTimeout(() => {
+                        console.log('打断时间: ', +new Date());
+                        skipVoice();
+                    }, 500);
+                }
+            },
+            onSpeechEnd: audio => {
+                vadTimer && clearTimeout(vadTimer);
+                // debugger;
+                // do something with `audio` (Float32Array of audio samples at sample rate 16000)...
+            }
+        });
+        console.log('vad: ', myvad);
+        myvad.start();
+    };
+    onMounted(async () => {
+        const { code, message } = await stopMessage();
+        if (code !== 0) {
+            ElMessage({
+                type: 'error',
+                message: message,
+                duration: 3000,
+                customClass: 'system-error'
+            });
+            return;
+        }
+        callDisabled.value = false;
+    });
+    const delay = ms => {
+        return new Promise(resolve => setTimeout(resolve, ms));
+    };
+    const initRecording = async () => {
+        uploadUserConfig()
+            .then(async () => {
+                // 每次call都需要生成新uid
+                setNewUserId();
+
+                outputData.value = [];
+                buildConnect();
+                isCalling.value = true;
+                await delay(100);
+                if (socket) {
+                    socket.close();
+                }
+                socket = new WebSocketService(
+                    `/ws/stream${window.location.search}&uid=${getNewUserId()}&service=minicpmo-server`
+                );
+                socket.connect();
+                // 建立连接后稍等一会儿再传送数据
+                startRecording();
+                if (localStorage.getItem('canStopByVoice') === 'true') {
+                    vadStart();
+                }
+            })
+            .catch(() => {});
+    };
+    let audioContext;
+    const analyser = ref();
+    const dataArray = ref();
+    let mediaRecorder;
+    let audioChunks = [];
+    const animationFrameId = ref();
+
+    const isFirstPiece = ref(true);
+
+    const startRecording = async () => {
+        // 获取用户音频流
+        const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
+
+        // 创建 AudioContext 和 MediaStreamAudioSourceNode
+        audioContext = new (window.AudioContext || window.webkitAudioContext)({ sampleRate: 16000 });
+        const source = audioContext.createMediaStreamSource(stream);
+
+        analyser.value = audioContext.createAnalyser();
+        // 将音频节点连接到分析器
+        source.connect(analyser.value);
+        // 分析器设置
+        analyser.value.fftSize = 256;
+        const bufferLength = analyser.value.frequencyBinCount;
+        dataArray.value = new Uint8Array(bufferLength);
+        // 开始绘制音波
+        drawBars();
+
+        // 创建 ScriptProcessorNode 用于捕获音频数据
+        const processor = audioContext.createScriptProcessor(256, 1, 1);
+
+        processor.onaudioprocess = event => {
+            if (!isCalling.value) return;
+            if (isReturnError.value) {
+                stopRecording();
+                return;
+            }
+            const data = event.inputBuffer.getChannelData(0);
+            audioChunks.push(new Float32Array(data));
+            // 检查是否已经收集到1秒钟的数据
+            const totalBufferLength = audioChunks.reduce((total, curr) => total + curr.length, 0);
+            const chunkLength = getChunkLength(audioContext.sampleRate);
+            if (totalBufferLength >= chunkLength) {
+                // 合并到一个完整的数据数组，并裁剪成1秒钟
+                const mergedBuffer = mergeBuffers(audioChunks, totalBufferLength);
+                const oneSecondBuffer = mergedBuffer.slice(0, chunkLength);
+                // 保存并处理成WAV格式
+                addQueue(+new Date(), () => saveAudioChunk(oneSecondBuffer, +new Date()));
+                // 保留多余的数据备用
+                audioChunks = [mergedBuffer.slice(chunkLength)];
+            }
+        };
+
+        source.connect(processor);
+        processor.connect(audioContext.destination);
+    };
+    const stopRecording = () => {
+        isCalling.value = false;
+        if (animationFrameId.value) {
+            cancelAnimationFrame(animationFrameId.value);
+        }
+        if (audioContext && audioContext.state !== 'closed') {
+            audioContext.close();
+        }
+        ctrl.abort();
+        ctrl = new AbortController();
+        taskQueue.value = [];
+        audioPlayQueue.value = [];
+        base64List.value = [];
+        isReturnError.value = false;
+        skipDisabled.value = true;
+        playing.value = false;
+        audioDOM.pause();
+        stopMessage();
+        if (socket) {
+            socket.close();
+        }
+        if (
+            outputData.value[outputData.value.length - 1]?.type === 'BOT' &&
+            outputData.value[outputData.value.length - 1].audio === '' &&
+            allVoice.value.length > 0
+        ) {
+            outputData.value[outputData.value.length - 1].audio = mergeBase64ToBlob(allVoice.value);
+        }
+        myvad && myvad.destroy();
+    };
+    const getStopValue = () => {
+        return stop.value;
+    };
+    const getPlayingValue = () => {
+        return playing.value;
+    };
+    const getStopStatus = () => {
+        return localStorage.getItem('canStopByVoice') === 'true';
+    };
+    const saveAudioChunk = (buffer, timestamp) => {
+        return new Promise(resolve => {
+            if (!getStopStatus() && getPlayingValue()) {
+                resolve();
+                return;
+            }
+            const wavBlob = encodeWAV(buffer, audioContext.sampleRate);
+            let reader = new FileReader();
+            reader.readAsDataURL(wavBlob);
+
+            reader.onloadend = async function () {
+                let base64data = reader.result.split(',')[1];
+                if (!base64data) {
+                    resolve();
+                    return;
+                }
+                const obj = {
+                    uid: getNewUserId(),
+                    messages: [
+                        {
+                            role: 'user',
+                            content: [
+                                {
+                                    type: 'input_audio',
+                                    input_audio: {
+                                        data: base64data,
+                                        format: 'wav',
+                                        timestamp: String(timestamp)
+                                    }
+                                }
+                            ]
+                        }
+                    ]
+                };
+                socket.send(JSON.stringify(obj));
+                socket.on('message', data => {
+                    console.log('message: ', data);
+                    delayTimestamp.value = +new Date() - timestamp;
+                    delayCount.value = taskQueue.value.length;
+                    resolve();
+                });
+                // 将Base64音频数据发送到后端
+                // try {
+                //     await sendMessage(obj);
+                //     delayTimestamp.value = +new Date() - timestamp;
+                //     delayCount.value = taskQueue.value.length;
+                // } catch (err) {}
+                // resolve();
+            };
+        });
+    };
+    const mergeBuffers = (buffers, length) => {
+        const result = new Float32Array(length);
+        let offset = 0;
+        for (let buffer of buffers) {
+            result.set(buffer, offset);
+            offset += buffer.length;
+        }
+        return result;
+    };
+    // 建立连接
+    const buildConnect = async () => {
+        const obj = {
+            messages: [
+                {
+                    role: 'user',
+                    content: [{ type: 'none' }]
+                }
+            ],
+            stream: true
+        };
+        isEnd.value = false;
+        ctrl.abort();
+        ctrl = new AbortController();
+        const url = `/api/v1/completions${window.location.search}`;
+        fetchEventSource(url, {
+            method: 'POST',
+            headers: {
+                'Content-Type': 'application/json',
+                service: 'minicpmo-server',
+                uid: getNewUserId()
+            },
+            body: JSON.stringify(obj),
+            signal: ctrl.signal,
+            openWhenHidden: true,
+            async onopen(response) {
+                console.log('onopen', response);
+                isFirstPiece.value = true;
+                isFirstReturn.value = true;
+                allVoice.value = [];
+                base64List.value = [];
+                if (response.status !== 200) {
+                    ElMessage({
+                        type: 'error',
+                        message: 'At limit. Please try again soon.',
+                        duration: 3000,
+                        customClass: 'system-error'
+                    });
+                    isReturnError.value = true;
+                } else {
+                    isReturnError.value = false;
+                    // skipDisabled.value = false;
+                    drawText();
+                }
+            },
+            onmessage(msg) {
+                const data = JSON.parse(msg.data);
+                if (data.response_id) {
+                    curResponseId.value = data.response_id;
+                }
+                if (data.choices[0]?.text) {
+                    textQueue.value += data.choices[0].text.replace('<end>', '');
+                    console.warn('text return time -------------------------------', +new Date());
+                }
+                // 首次返回的是前端发给后端的音频片段，需要单独处理
+                if (isFirstReturn.value) {
+                    console.log('第一次');
+                    isFirstReturn.value = false;
+                    // 如果后端返回的音频为空，需要重连
+                    if (!data.choices[0].audio) {
+                        buildConnect();
+                        return;
+                    }
+                    outputData.value.push({
+                        type: 'USER',
+                        audio: `data:audio/wav;base64,${data.choices[0].audio}`
+                    });
+                    outputData.value.push({
+                        type: 'BOT',
+                        text: '',
+                        audio: ''
+                    });
+                    return;
+                }
+                if (data.choices[0]?.audio) {
+                    console.warn('audio return time -------------------------------', +new Date());
+                    if (!getStopValue() && isCalling.value) {
+                        skipDisabled.value = false;
+                        base64List.value.push(`data:audio/wav;base64,${data.choices[0].audio}`);
+                        addAudioQueue(() => truePlay(data.choices[0].audio));
+                    }
+                    allVoice.value.push(`data:audio/wav;base64,${data.choices[0].audio}`);
+                } else {
+                    // 发生异常了，直接重连
+                    buildConnect();
+                }
+                if (data.choices[0].text.includes('<end>')) {
+                    console.log('收到结束标记了:', +new Date());
+                    if (
+                        outputData.value[outputData.value.length - 1]?.type === 'BOT' &&
+                        outputData.value[outputData.value.length - 1].audio === '' &&
+                        allVoice.value.length > 0
+                    ) {
+                        outputData.value[outputData.value.length - 1].audio = mergeBase64ToBlob(allVoice.value);
+                    }
+                }
+            },
+            onclose() {
+                console.log('onclose', +new Date());
+                isEnd.value = true;
+                outputData.value[outputData.value.length - 1].audio = mergeBase64ToBlob(allVoice.value);
+                vadStartTime.value = +new Date();
+                if (audioPlayQueue.value.length === 0) {
+                    let startIndex = taskQueue.value.findIndex(item => item.time >= vadStartTime.value - 1000);
+                    console.log('taskQueue111111111: ', taskQueue.value, startIndex);
+                    if (startIndex !== -1) {
+                        taskQueue.value = taskQueue.value.slice(startIndex);
+                        console.log('截取后长度:', taskQueue.value, vadStartTime.value);
+                    }
+                    buildConnect();
+                }
+            },
+            onerror(err) {
+                console.log('onerror', err);
+                ctrl.abort();
+                ctrl = new AbortController();
+                throw err;
+            }
+        });
+    };
+    const drawText = async () => {
+        if (textQueue.value.length > 0) {
+            outputData.value[outputData.value.length - 1].text += textQueue.value[0];
+            textQueue.value = textQueue.value.slice(1);
+        } else {
+            cancelAnimationFrame(textAnimationInterval.value);
+        }
+        textAnimationInterval.value = requestAnimationFrame(drawText);
+    };
+    // 返回的语音放到队列里，挨个播放
+    const addAudioQueue = async item => {
+        audioPlayQueue.value.push(item);
+        if (isFirstPiece.value) {
+            await delay(500);
+            isFirstPiece.value = false;
+        }
+        if (audioPlayQueue.value.length > 0 && !playing.value) {
+            playing.value = true;
+            playAudio();
+        }
+    };
+    // 控制播放队列执行
+    const playAudio = () => {
+        console.log('剩余播放列表:', audioPlayQueue.value, +new Date());
+
+        if (!isEnd.value && base64List.value.length >= 2) {
+            const remainLen = base64List.value.length;
+            const blob = mergeBase64ToBlob(base64List.value);
+            audioDOM.src = blob;
+            audioDOM.play();
+            console.error('前期合并后播放开始时间: ', +new Date());
+            audioDOM.onended = () => {
+                console.error('前期合并后播放结束时间: ', +new Date());
+                base64List.value = base64List.value.slice(remainLen);
+                audioPlayQueue.value = audioPlayQueue.value.slice(remainLen);
+                playAudio();
+            };
+            return;
+        }
+        if (isEnd.value && base64List.value.length >= 2) {
+            const blob = mergeBase64ToBlob(base64List.value);
+            // let audio = new Audio();
+            audioDOM.src = blob;
+            audioDOM.play();
+            console.error('最后合并后播放开始时间: ', +new Date());
+            audioDOM.onended = () => {
+                console.error('合并后播放结束时间: ', +new Date());
+                // URL.revokeObjectURL(url);
+                base64List.value = [];
+                audioPlayQueue.value = [];
+                playing.value = false;
+                skipDisabled.value = true;
+                if (isCalling.value && !isReturnError.value) {
+                    // skipDisabled.value = true;
+                    taskQueue.value = [];
+                    // 打断前记录一下打断时间或vad触发事件
+                    // vadStartTime.value = +new Date();
+                    // // 每次完成后只保留当前时刻往前推1s的语音
+                    // console.log(
+                    //     '截取前长度:',
+                    //     taskQueue.value.map(item => item.time)
+                    // );
+                    // let startIndex = taskQueue.value.findIndex(item => item.time >= vadStartTime.value - 1000);
+                    // if (startIndex !== -1) {
+                    //     taskQueue.value = taskQueue.value.slice(startIndex);
+                    //     console.log(
+                    //         '截取后长度:',
+                    //         taskQueue.value.map(item => item.time),
+                    //         vadStartTime.value
+                    //     );
+                    // }
+                    buildConnect();
+                }
+            };
+            return;
+        }
+        base64List.value.shift();
+        const item = audioPlayQueue.value.shift();
+        if (item) {
+            item().finally(() => playAudio());
+        } else {
+            playing.value = false;
+            if (isEnd.value) {
+                console.warn('play done................');
+                skipDisabled.value = true;
+            }
+            // 播放完成后且正在通话且接口未返回错误时开始下一次连接
+            if (isEnd.value && isCalling.value && !isReturnError.value) {
+                // skipDisabled.value = true;
+                taskQueue.value = [];
+                // 打断前记录一下打断时间或vad触发事件
+                // vadStartTime.value = +new Date();
+                // // 每次完成后只保留当前时刻往前推1s的语音
+                // console.log(
+                //     '截取前长度:',
+                //     taskQueue.value.map(item => item.time)
+                // );
+                // let startIndex = taskQueue.value.findIndex(item => item.time >= vadStartTime.value - 1000);
+                // if (startIndex !== -1) {
+                //     taskQueue.value = taskQueue.value.slice(startIndex);
+                //     console.log(
+                //         '截取后长度:',
+                //         taskQueue.value.map(item => item.time),
+                //         vadStartTime.value
+                //     );
+                // }
+                buildConnect();
+            }
+        }
+    };
+
+    // 播放音频
+    const truePlay = async voice => {
+        return new Promise(resolve => {
+            audioDOM.src = 'data:audio/wav;base64,' + voice;
+            console.error('播放开始时间:', +new Date());
+            audioDOM
+                .play()
+                .then(() => {
+                    // console.error('播放结束时间: ', +new Date());
+                })
+                .catch(error => {
+                    resolve();
+                    if (error.name === 'NotAllowedError' || error.name === 'SecurityError') {
+                        console.error('User interaction required or permission issue:', error);
+                        ElMessage.warning('音频播放失败');
+                    } else {
+                        console.error('Error playing audio:', error);
+                    }
+                });
+
+            audioDOM.onended = () => {
+                console.error('播放结束时间: ', +new Date());
+                // URL.revokeObjectURL(url);
+                resolve();
+            };
+        });
+    };
+    // 当队列中任务数大于0时，开始处理队列中的任务
+    const addQueue = (time, item) => {
+        taskQueue.value.push({ func: item, time });
+        if (taskQueue.value.length > 0 && !running.value) {
+            running.value = true;
+            processQueue();
+        }
+    };
+    const processQueue = () => {
+        const item = taskQueue.value.shift();
+        if (item?.func) {
+            item.func().then(() => {
+                console.warn('shift!!!!!!!!!');
+                processQueue();
+            });
+        } else {
+            running.value = false;
+        }
+    };
+    const drawBars = () => {
+        // AnalyserNode接口的 getByteFrequencyData() 方法将当前频率数据复制到传入的 Uint8Array（无符号字节数组）中。
+        analyser.value.getByteFrequencyData(dataArray.value);
+        animationFrameId.value = requestAnimationFrame(drawBars);
+    };
+    // 跳过当前片段
+    const skipVoice = async () => {
+        // 打断前记录一下打断时间或vad触发事件
+        vadStartTime.value = +new Date();
+        if (!skipDisabled.value) {
+            if (
+                outputData.value[outputData.value.length - 1]?.type === 'BOT' &&
+                outputData.value[outputData.value.length - 1].audio === ''
+            ) {
+                outputData.value[outputData.value.length - 1].audio = mergeBase64ToBlob(allVoice.value);
+            }
+            base64List.value = [];
+            audioPlayQueue.value = [];
+            // 跳过之后，只保留当前时间点两秒内到之后的音频片段
+            console.log(
+                '截取前长度:',
+                taskQueue.value.map(item => item.time)
+            );
+            let startIndex = taskQueue.value.findIndex(item => item.time >= vadStartTime.value - 1000);
+            if (startIndex !== -1) {
+                taskQueue.value = taskQueue.value.slice(startIndex);
+                console.log(
+                    '截取后长度:',
+                    taskQueue.value.map(item => item.time),
+                    vadStartTime.value
+                );
+            }
+            stop.value = true;
+            audioDOM.pause();
+            setTimeout(() => {
+                skipDisabled.value = true;
+            }, 300);
+            try {
+                playing.value = false;
+                await stopMessage();
+                stop.value = false;
+                // playing.value = false;
+                buildConnect();
+                // cancelAnimationFrame(animationFrameId.value);
+            } catch (err) {}
+        }
+    };
+    // 每次call先上传当前用户配置
+    const uploadUserConfig = async () => {
+        if (!localStorage.getItem('configData')) {
+            return new Promise(resolve => resolve());
+        }
+        const {
+            videoQuality,
+            useAudioPrompt,
+            voiceClonePrompt,
+            assistantPrompt,
+            vadThreshold,
+            audioFormat,
+            base64Str
+        } = JSON.parse(localStorage.getItem('configData'));
+        const obj = {
+            messages: [
+                {
+                    role: 'user',
+                    content: [
+                        {
+                            type: 'input_audio',
+                            input_audio: {
+                                data: base64Str,
+                                format: audioFormat
+                            }
+                        },
+                        {
+                            type: 'options',
+                            options: {
+                                hd_video: videoQuality,
+                                use_audio_prompt: useAudioPrompt,
+                                vad_threshold: vadThreshold,
+                                voice_clone_prompt: voiceClonePrompt,
+                                assistant_prompt: assistantPrompt
+                            }
+                        }
+                    ]
+                }
+            ]
+        };
+        const { code, message, data } = await uploadConfig(obj);
+        modelVersion.value = data?.choices?.content || '';
+        return new Promise((resolve, reject) => {
+            if (code !== 0) {
+                ElMessage({
+                    type: 'error',
+                    message: message,
+                    duration: 3000,
+                    customClass: 'system-error'
+                });
+                reject();
+            } else {
+                resolve();
+            }
+        });
+    };
+</script>
+<style lang="less">
+    .voice-page {
+        flex: 1;
+        height: 100%;
+        display: flex;
+        flex-direction: column;
+        &-header {
+            display: flex;
+            align-items: center;
+            padding: 0 16px 16px;
+            box-shadow: 0 0.5px 0 0 #e0e0e0;
+            margin-bottom: 16px;
+            justify-content: space-between;
+            .header-icon {
+                display: flex;
+                align-items: center;
+                img {
+                    width: 24px;
+                    height: 24px;
+                    margin-right: 8px;
+                }
+                span {
+                    color: rgba(23, 23, 23, 0.9);
+                    font-family: PingFang SC;
+                    font-size: 16px;
+                    font-style: normal;
+                    font-weight: 500;
+                    line-height: normal;
+                    margin-right: 40px;
+                    flex-shrink: 0;
+                }
+            }
+            .voice-container {
+                display: flex;
+                .voice-icon {
+                    width: 191px;
+                    height: 45px;
+                }
+            }
+        }
+        &-output {
+            flex: 1;
+            height: 0;
+            padding: 0 16px;
+            margin-bottom: 16px;
+            display: flex;
+            flex-direction: column;
+            .output-content {
+                flex: 1;
+                overflow: auto;
+            }
+            .skip-box {
+                display: flex;
+                align-items: center;
+                justify-content: flex-end;
+                margin-top: 16px;
+            }
+        }
+        &-btn {
+            text-align: center;
+            padding: 8px 0;
+            .el-button {
+                width: 284px;
+                height: 46px;
+                border-radius: 8px;
+            }
+            .el-button.el-button--success {
+                background: #647fff;
+                border-color: #647fff;
+                &:hover {
+                    opacity: 0.8;
+                }
+                span {
+                    color: #fff;
+                    font-family: PingFang SC;
+                    font-size: 16px;
+                    font-style: normal;
+                    font-weight: 500;
+                    line-height: normal;
+                }
+            }
+            .el-button.el-button--success.is-disabled {
+                background: #f3f3f3;
+                border-color: #f3f3f3;
+                span {
+                    color: #d1d1d1;
+                }
+            }
+            .el-button.el-button--danger {
+                border-color: #dc3545;
+                background-color: #dc3545;
+                color: #ffffff;
+                font-family: PingFang SC;
+                font-size: 16px;
+                font-style: normal;
+                font-weight: 500;
+                line-height: normal;
+                .phone-icon {
+                    margin-right: 10px;
+                }
+                .btn-text {
+                    margin-right: 10px;
+                }
+                .btn-desc {
+                    margin-right: 16px;
+                }
+                .time {
+                    display: flex;
+                    align-items: center;
+                    .time-minute,
+                    .time-second {
+                        width: 26px;
+                        height: 26px;
+                        display: flex;
+                        justify-content: center;
+                        align-items: center;
+                        border-radius: 3.848px;
+                        background: rgba(47, 47, 47, 0.5);
+                    }
+                    .time-colon {
+                        margin: 0 3px;
+                    }
+                }
+            }
+        }
+    }
+</style>
diff --git a/web_demos/minicpm-o_2.6/web_server/src/views/home/components/audioBufferToMp3Base64.js b/web_demos/minicpm-o_2.6/web_server/src/views/home/components/audioBufferToMp3Base64.js
new file mode 100644
index 0000000..2188a72
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/views/home/components/audioBufferToMp3Base64.js
@@ -0,0 +1,36 @@
+import lame from '@breezystack/lamejs';
+
+export const audioBufferToMp3Base64 = audioBuffer => {
+    const mp3Encoder = new lame.Mp3Encoder(1, 16000, 128);
+    const sampleBlockSize = 1152;
+    const mp3Data = [];
+
+    for (let i = 0; i < audioBuffer.length; i += sampleBlockSize) {
+        const sampleChunk = audioBuffer.subarray(i, i + sampleBlockSize);
+        const mp3buf = mp3Encoder.encodeBuffer(sampleChunk);
+        if (mp3buf.length > 0) {
+            mp3Data.push(new Int8Array(mp3buf));
+        }
+    }
+
+    const mp3buf = mp3Encoder.flush();
+    if (mp3buf.length > 0) {
+        mp3Data.push(new Int8Array(mp3buf));
+    }
+
+    const mp3Blob = new Blob(mp3Data, { type: 'audio/mp3' });
+    const url = URL.createObjectURL(mp3Blob);
+    let dom = document.querySelector('#voice-box');
+    let audio = document.createElement('audio');
+    audio.controls = true;
+    audio.src = url;
+    dom.appendChild(audio);
+    return new Promise(resolve => {
+        const reader = new FileReader();
+        reader.onloadend = () => {
+            const base64String = reader.result.split(',')[1];
+            resolve(base64String);
+        };
+        reader.readAsDataURL(mp3Blob);
+    });
+};
diff --git a/web_demos/minicpm-o_2.6/web_server/src/views/home/components/merge.js b/web_demos/minicpm-o_2.6/web_server/src/views/home/components/merge.js
new file mode 100644
index 0000000..4231759
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/views/home/components/merge.js
@@ -0,0 +1,132 @@
+// Convert Base64 to ArrayBuffer
+const base64ToArrayBuffer = base64 => {
+    const binaryString = atob(base64.split(',')[1]); // Remove data URI scheme if present
+    const len = binaryString.length;
+    const bytes = new Uint8Array(len);
+    for (let i = 0; i < len; i++) {
+        bytes[i] = binaryString.charCodeAt(i);
+    }
+    return bytes.buffer;
+};
+
+// Parse WAV header and get audio data section
+const parseWav = buffer => {
+    const view = new DataView(buffer);
+    const format = view.getUint16(20, true);
+    const channels = view.getUint16(22, true);
+    const sampleRate = view.getUint32(24, true);
+    const bitsPerSample = view.getUint16(34, true);
+    const dataOffset = 44;
+    const dataSize = view.getUint32(40, true);
+    const audioData = new Uint8Array(buffer, dataOffset, dataSize);
+
+    return {
+        format,
+        channels,
+        sampleRate,
+        bitsPerSample,
+        audioData
+    };
+};
+
+// Create WAV header for combined audio data
+const createWavHeader = (audioDataSize, sampleRate, channels, bitsPerSample) => {
+    const arrayBuffer = new ArrayBuffer(44);
+    const view = new DataView(arrayBuffer);
+
+    const writeString = (view, offset, string) => {
+        for (let i = 0; i < string.length; i++) {
+            view.setUint8(offset + i, string.charCodeAt(i));
+        }
+    };
+
+    writeString(view, 0, 'RIFF'); // ChunkID
+    view.setUint32(4, 36 + audioDataSize, true); // ChunkSize
+    writeString(view, 8, 'WAVE'); // Format
+    writeString(view, 12, 'fmt '); // Subchunk1ID
+    view.setUint32(16, 16, true); // Subchunk1Size (PCM)
+    view.setUint16(20, 1, true); // AudioFormat (PCM)
+    view.setUint16(22, channels, true); // NumChannels
+    view.setUint32(24, sampleRate, true); // SampleRate
+    view.setUint32(28, (sampleRate * channels * bitsPerSample) / 8, true); // ByteRate
+    view.setUint16(32, (channels * bitsPerSample) / 8, true); // BlockAlign
+    view.setUint16(34, bitsPerSample, true); // BitsPerSample
+    writeString(view, 36, 'data'); // Subchunk2ID
+    view.setUint32(40, audioDataSize, true); // Subchunk2Size
+
+    return arrayBuffer;
+};
+
+// Merge multiple Base64 audio files and return a Blob
+const mergeAudioFiles = base64AudioArray => {
+    let sampleRate, channels, bitsPerSample;
+    let combinedAudioData = new Uint8Array();
+
+    for (let i = 0; i < base64AudioArray.length; i++) {
+        const arrayBuffer = base64ToArrayBuffer(base64AudioArray[i]);
+        const wav = parseWav(arrayBuffer);
+
+        // Initialize properties based on the first audio file
+        if (i === 0) {
+            sampleRate = wav.sampleRate;
+            channels = wav.channels;
+            bitsPerSample = wav.bitsPerSample;
+        }
+
+        // Ensure all files have the same format
+        if (wav.sampleRate !== sampleRate || wav.channels !== channels || wav.bitsPerSample !== bitsPerSample) {
+            throw new Error('All audio files must have the same format.');
+        }
+
+        // Combine audio data
+        const newCombinedData = new Uint8Array(combinedAudioData.byteLength + wav.audioData.byteLength);
+        newCombinedData.set(combinedAudioData, 0);
+        newCombinedData.set(wav.audioData, combinedAudioData.byteLength);
+        combinedAudioData = newCombinedData;
+    }
+
+    const combinedAudioDataSize = combinedAudioData.byteLength;
+    const wavHeader = createWavHeader(combinedAudioDataSize, sampleRate, channels, bitsPerSample);
+    const combinedWavBuffer = new Uint8Array(wavHeader.byteLength + combinedAudioData.byteLength);
+    combinedWavBuffer.set(new Uint8Array(wavHeader), 0);
+    combinedWavBuffer.set(combinedAudioData, wavHeader.byteLength);
+
+    // Create a Blob from the combined audio data
+    const combinedBlob = new Blob([combinedWavBuffer], { type: 'audio/wav' });
+    return combinedBlob;
+};
+export const mergeBase64ToBlob = base64List => {
+    const combinedBlob = mergeAudioFiles(base64List);
+    const audioUrl = URL.createObjectURL(combinedBlob);
+    return audioUrl;
+};
+
+// 假设 base64Strings 是一个包含多个 Base64 编码 WAV 文件的数组
+// 注意：这些 Base64 字符串不应该包含 URI 前缀 (例如 "audio/wav;base64,")
+/**
+ *
+ * @param {Array} base64Strings
+ * @returns
+ */
+// 解码 Base64 字符串并合并二进制数据
+export const mergeBase64WavFiles = base64Strings => {
+    const binaryDataArray = base64Strings.map(base64 => {
+        return Uint8Array.from(atob(base64), c => c.charCodeAt(0));
+    });
+
+    const totalLength = binaryDataArray.reduce((sum, arr) => sum + arr.length, 0);
+
+    const mergedArray = new Uint8Array(totalLength);
+    let offset = 0;
+
+    binaryDataArray.forEach(arr => {
+        mergedArray.set(arr, offset);
+        offset += arr.length;
+    });
+
+    // 重新编码为 Base64 字符串
+    const binaryString = String.fromCharCode(...mergedArray);
+    const mergedBase64 = btoa(binaryString);
+
+    return mergedBase64;
+};
diff --git a/web_demos/minicpm-o_2.6/web_server/src/views/home/components/mergeMp3Base64.js b/web_demos/minicpm-o_2.6/web_server/src/views/home/components/mergeMp3Base64.js
new file mode 100644
index 0000000..e877e59
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/views/home/components/mergeMp3Base64.js
@@ -0,0 +1,29 @@
+const base64ToArrayBuffer = base64 => {
+    let binaryString = atob(base64);
+    let len = binaryString.length;
+    let bytes = new Uint8Array(len);
+    for (let i = 0; i < len; i++) {
+        bytes[i] = binaryString.charCodeAt(i);
+    }
+    return bytes.buffer;
+};
+
+const concatenateArrayBuffers = buffers => {
+    let totalLength = buffers.reduce((acc, value) => acc + value.byteLength, 0);
+    let result = new Uint8Array(totalLength);
+    let offset = 0;
+    for (let buffer of buffers) {
+        result.set(new Uint8Array(buffer), offset);
+        offset += buffer.byteLength;
+    }
+    return result.buffer;
+};
+
+export const mergeMp3Base64ToBlob = base64Strings => {
+    let arrayBuffers = base64Strings.map(base64ToArrayBuffer);
+    let combinedArrayBuffer = concatenateArrayBuffers(arrayBuffers);
+    const blob = new Blob([combinedArrayBuffer], { type: 'audio/mp3' });
+    const url = URL.createObjectURL(blob);
+    console.log('url', url);
+    return url;
+};
diff --git a/web_demos/minicpm-o_2.6/web_server/src/views/home/index.vue b/web_demos/minicpm-o_2.6/web_server/src/views/home/index.vue
new file mode 100644
index 0000000..8695045
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/src/views/home/index.vue
@@ -0,0 +1,261 @@
+<template>
+    <div class="home-page">
+        <div class="home-page-header">
+            <div class="home-page-header-logo">
+                <!-- <img src="@/assets/images/logo.png" /> -->
+                <SvgIcon name="miniCPM2.6" class="logo-icon" />
+            </div>
+            <div class="home-page-header-menu">
+                <div
+                    class="home-page-header-menu-item"
+                    v-for="(item, index) in tabList"
+                    :key="item.type"
+                    :class="`home-page-header-menu-item ${activeTab === item.type ? 'active-tab' : ''} ${item.disabled ? 'disabled-tab' : ''}`"
+                    @click="handleClickTab(item.type, index)"
+                >
+                    {{ getMenuTab(item.type) }}
+                </div>
+            </div>
+
+            <div class="home-page-header-switch">
+                <div class="change-language">
+                    <div
+                        :class="`change-language-item ${language === 'en' ? 'active-language' : ''}`"
+                        @click="handleChangeLanguage('en')"
+                    >
+                        English
+                    </div>
+                    <div
+                        :class="`change-language-item ${language === 'zh' ? 'active-language' : ''}`"
+                        @click="handleChangeLanguage('zh')"
+                    >
+                        中文
+                    </div>
+                </div>
+            </div>
+        </div>
+        <div :class="`home-page-content ${activeTab === 'chatbot' && 'no-padding'}`">
+            <VoiceCallWs v-if="isWebSocket && activeTab === 'voice'" v-model="isCalling" />
+            <VoiceCall v-else-if="!isWebSocket && activeTab === 'voice'" v-model="isCalling" />
+            <VideoCallWs v-else-if="isWebSocket && activeTab === 'video'" v-model="isCalling" />
+            <VideoCall v-else-if="!isWebSocket && activeTab === 'video'" v-model="isCalling" />
+            <iframe
+                src="https://minicpm-omni-webdemo-iframe.modelbest.cn"
+                frameborder="0"
+                width="100%"
+                height="100%"
+                v-else
+            />
+            <div class="config-box" v-if="activeTab !== 'chatbot'">
+                <ModelConfig v-model:isCalling="isCalling" v-model:type="activeTab" />
+            </div>
+        </div>
+    </div>
+</template>
+
+<script setup>
+    import VoiceCall from './components/VoiceCall.vue';
+    import VoiceCallWs from './components/VoiceCall_0105.vue';
+    import VideoCall from './components/VideoCall.vue';
+    import VideoCallWs from './components/VideoCall_0105.vue';
+    import { useI18n } from 'vue-i18n';
+    import { useRoute, useRouter } from 'vue-router';
+
+    const route = useRoute();
+    const router = useRouter();
+    const typeObj = {
+        0: 'video',
+        1: 'voice',
+        2: 'chatbot'
+    };
+    const defaultType = typeObj[route.query.type] || 'voice';
+
+    const { t, locale } = useI18n();
+    const activeTab = ref(defaultType);
+    const language = ref(localStorage.getItem('language') || 'zh');
+    const isWebSocket = false;
+    const tabList = ref([
+        {
+            type: 'video',
+            text: 'Realtime Video Call'
+        },
+        {
+            type: 'voice',
+            text: 'Realtime Voice Call'
+        },
+        {
+            type: 'chatbot',
+            text: 'Chatbot'
+            // disabled: true
+        }
+    ]);
+    const isCalling = ref(false);
+    const handleChangeLanguage = val => {
+        console.log('val: ', val);
+        language.value = val;
+        locale.value = val;
+        localStorage.setItem('language', val);
+    };
+    const getMenuTab = val => {
+        let text = '';
+        switch (val) {
+            case 'video':
+                text = t('menuTabVideo');
+                break;
+            case 'voice':
+                text = t('menuTabAudio');
+                break;
+            case 'chatbot':
+                text = t('menuTabChatbot');
+                break;
+            default:
+                break;
+        }
+        return text;
+    };
+    const handleClickTab = (val, index) => {
+        activeTab.value = val;
+        const port = route.query.port;
+        const type = index;
+        router.push({
+            path: '/',
+            query: {
+                port,
+                type
+            }
+        });
+    };
+</script>
+
+<style lang="less" scoped>
+    .home-page {
+        width: 100%;
+        height: 100%;
+        display: flex;
+        flex-direction: column;
+        &-header {
+            display: flex;
+            align-items: center;
+            &-logo {
+                width: 174px;
+                height: 46px;
+                display: flex;
+                align-items: center;
+                justify-content: center;
+                border-radius: 12px;
+                background: #ffffff;
+                flex-shrink: 0;
+                padding: 0 24px;
+                .logo-icon {
+                    width: 100%;
+                    height: 100%;
+                }
+            }
+            &-menu {
+                display: flex;
+                align-items: center;
+                margin-left: 16px;
+                &-item {
+                    width: 260px;
+                    height: 46px;
+                    display: flex;
+                    align-items: center;
+                    justify-content: center;
+                    background: #ffffff;
+                    color: #252525;
+                    font-family: PingFang SC;
+                    font-size: 16px;
+                    font-style: normal;
+                    font-weight: 400;
+                    line-height: normal;
+                    border: 1px solid #dde1eb;
+                    cursor: pointer;
+                    user-select: none;
+                }
+                &-item + &-item {
+                    border-left: none;
+                }
+                &-item:first-of-type {
+                    border-radius: 12px 0 0 12px;
+                }
+                &-item:last-of-type {
+                    border-radius: 0 12px 12px 0;
+                }
+                .active-tab {
+                    color: #ffffff;
+                    background: linear-gradient(90deg, #789efe 0.02%, #647fff 75.28%);
+                    font-weight: 500;
+                }
+                .disabled-tab {
+                    cursor: not-allowed;
+                    border-color: #dde1eb;
+                    color: #d1d1d1;
+                }
+            }
+            &-switch {
+                flex: 1;
+                display: flex;
+                align-items: center;
+                justify-content: flex-end;
+                .change-language {
+                    display: flex;
+                    align-items: center;
+                    &-item {
+                        width: 80px;
+                        height: 32px;
+                        display: flex;
+                        justify-content: center;
+                        align-items: center;
+                        border: 1px solid #dde1eb;
+                        background: #ffffff;
+                        color: #252525;
+                        font-family: PingFang SC;
+                        font-size: 14px;
+                        font-weight: 400;
+                        line-height: normal;
+                        cursor: pointer;
+                        user-select: none;
+                    }
+                    &-item:first-of-type {
+                        border-right: none;
+                        border-radius: 12px 0 0 12px;
+                    }
+                    &-item:last-of-type {
+                        border-radius: 0 12px 12px 0;
+                    }
+                    &-item.active-language {
+                        color: #ffffff;
+                        background: linear-gradient(90deg, #789efe 0.02%, #647fff 75.28%);
+                    }
+                }
+            }
+        }
+        &-content {
+            flex: 1;
+            height: 0;
+            border-radius: 12px;
+            margin-top: 16px;
+            background: #ffffff;
+            padding: 18px;
+            display: flex;
+            .config-box {
+                width: 322px;
+                margin-left: 16px;
+                // border-left: 1px solid black;
+                box-shadow: -0.5px 0 0 0 #e0e0e0;
+                overflow: auto;
+            }
+        }
+        .no-padding {
+            padding: 0;
+            overflow: hidden;
+            background: #ffffff;
+        }
+    }
+</style>
+<style lang="less">
+    .el-popover.el-popper.config-popover {
+        padding: 18px;
+        border-radius: 12px;
+    }
+</style>
diff --git a/web_demos/minicpm-o_2.6/web_server/vite.config.js b/web_demos/minicpm-o_2.6/web_server/vite.config.js
new file mode 100644
index 0000000..b6c9509
--- /dev/null
+++ b/web_demos/minicpm-o_2.6/web_server/vite.config.js
@@ -0,0 +1,73 @@
+import { fileURLToPath, URL } from 'node:url';
+
+import { defineConfig } from 'vite';
+import vue from '@vitejs/plugin-vue';
+// import vueDevTools from 'vite-plugin-vue-devtools';
+
+import Icons from 'unplugin-icons/vite';
+import IconsResolver from 'unplugin-icons/resolver';
+import AutoImport from 'unplugin-auto-import/vite';
+import Components from 'unplugin-vue-components/vite';
+import { ElementPlusResolver } from 'unplugin-vue-components/resolvers';
+
+export default defineConfig({
+    plugins: [
+        vue(),
+        // vueDevTools(),
+        AutoImport({
+            resolvers: [
+                ElementPlusResolver(), // Auto import icon components
+                // 自动导入图标组件
+                IconsResolver({
+                    prefix: 'Icon'
+                })
+            ],
+            imports: ['vue', 'vue-router', '@vueuse/core'],
+            dirs: ['src/apis/**/*', 'src/hooks/*'],
+            vueTemplate: true,
+            eslintrc: {
+                enabled: true
+            }
+        }),
+        Components({
+            resolvers: [
+                ElementPlusResolver(), // 自动注册图标组件
+                IconsResolver({
+                    enabledCollections: ['ep']
+                })
+            ],
+            dirs: ['src/components']
+        }),
+        Icons({
+            autoInstall: true
+        })
+    ],
+    resolve: {
+        alias: {
+            '@': fileURLToPath(new URL('./src', import.meta.url))
+        }
+    },
+    css: {
+        preprocessorOptions: {
+            less: {
+                additionalData: `@import 'src/styles/element/index.less';`
+            }
+        }
+    },
+    server: {
+        host: '0.0.0.0',
+        port: 8088,
+        proxy: {
+            '/api/v1': {
+                target: 'http://127.0.0.1:32550',
+                ws: true,
+                changeOrigin: true
+            },
+            '/ws': {
+                target: 'http://127.0.0.1:32550',
+                ws: true,
+                changeOrigin: true
+            }
+        }
+    }
+});
diff --git a/web_demo.py b/web_demos/web_demo.py
similarity index 100%
rename from web_demo.py
rename to web_demos/web_demo.py
diff --git a/web_demo_2.5.py b/web_demos/web_demo_2.5.py
similarity index 100%
rename from web_demo_2.5.py
rename to web_demos/web_demo_2.5.py
diff --git a/web_demo_2.6.py b/web_demos/web_demo_2.6.py
similarity index 100%
rename from web_demo_2.6.py
rename to web_demos/web_demo_2.6.py
diff --git a/web_demo_streamlit-2_5.py b/web_demos/web_demo_streamlit-2_5.py
similarity index 100%
rename from web_demo_streamlit-2_5.py
rename to web_demos/web_demo_streamlit-2_5.py
diff --git a/web_demo_streamlit-minicpmv2_6.py b/web_demos/web_demo_streamlit-minicpmv2_6.py
similarity index 100%
rename from web_demo_streamlit-minicpmv2_6.py
rename to web_demos/web_demo_streamlit-minicpmv2_6.py
diff --git a/web_demo_streamlit.py b/web_demos/web_demo_streamlit.py
similarity index 100%
rename from web_demo_streamlit.py
rename to web_demos/web_demo_streamlit.py

Model	Size	Token Density⁺	OpenCompass	OCRBench	MathVista mini	ChartQA	MMVet	MMStar	MME	MMB1.1 test	AI2D	MMMU val	HallusionBench	TextVQA val	DocVQA test	MathVerse mini	MathVision	MMHal Score
Proprietary
GPT-4o-20240513	-	1088	69.9	736	61.3	85.7	69.1	63.9	2328.7	82.2	84.6	69.2	55.0	-	92.8	50.2	30.4	3.6
Claude3.5-Sonnet	-	750	67.9	788	61.6	90.8	66.0	62.2	1920.0	78.5	80.2	65.9	49.9	-	95.2	-	-	3.4
Gemini-1.5-Pro	-	-	64.4	754	57.7	81.3	64.0	59.1	2110.6	73.9	79.1	60.6	45.6	73.5	86.5	-	19.2	-
GPT-4o-mini-20240718	-	1088	64.1	785	52.4	-	66.9	54.8	2003.4	76.0	77.8	60.0	46.1	-	-	-	-	3.3
Open Source
Cambrian-34B	34B	1820	58.3	591	50.3	75.6	53.2	54.2	2049.9	77.8	79.5	50.4	41.6	76.7	75.5	-	-	-
GLM-4V-9B	13B	784	59.1	776	51.1	-	58.0	54.8	2018.8	67.9	71.2	46.9	45.0	-	-	-	-	-
Pixtral-12B	12B	256	61.0	685	56.9	81.8	58.5	54.5	-	72.7	79.0	51.1	47.0	75.7	90.7	-	-	-
DeepSeek-VL2-27B (4B)	27B	672	66.4	809	63.9	86.0	60.0	61.9	2253.0	81.2	83.8	54.0	45.3	84.2	93.3	-	-	3.0
Qwen2-VL-7B	8B	784	67.1	866	58.2	83.0	62.0	60.7	2326.0	81.8	83.0	54.1	50.6	84.3	94.5	31.9	16.3	3.2
LLaVA-OneVision-72B	72B	182	68.1	741	67.5	83.7	60.6	65.8	2261.0	85.0	85.6	56.8	49.0	80.5	91.3	39.1	-	3.5
InternVL-2.5-8B	8B	706	68.3	822	64.4	84.8	62.8	62.8	2344.0	83.6	84.5	56.0	50.1	79.1	93.0	39.5	19.7	3.4
MiniCPM-V 2.6	8B	2822	65.2	852*	60.6	79.4	60.0	57.5	2348.4*	78.0	82.1	49.8*	48.1*	80.1	90.8	25.7	18.3	3.6
MiniCPM-o 2.6	8B	2822	70.2	897*	71.9*	86.9*	67.5	64.0	2372.0*	80.5	85.8	50.4*	51.9	82.0	93.5	41.4*	23.1*	3.8
Model	Size	BLINK-val	Mantis-Eval	MIRB	Video-MME (wo / w subs)
Proprietary
GPT-4o-20240513	-	68.0	-	-	71.9/77.2
GPT4V	-	54.6	62.7	53.1	59.9/63.3
Open-source
LLaVA-NeXT-Interleave 14B	14B	52.6	66.4	30.2	-
LLaVA-One-Vision-72B	72B	55.4	77.6	-	66.2/69.5
MANTIS 8B	8B	49.1	59.5	34.8	-
Qwen2-VL-7B	8B	53.2	69.6*	67.6*	63.3/69.0
InternVL-2.5-8B	8B	54.8	67.7	52.5	64.2/66.9
MiniCPM-V 2.6	8B	53.0	69.1	53.8	60.9/63.6
MiniCPM-o 2.6	8B	56.7	71.9	58.6	63.9/67.9
Task	Size	ASR (zh)			ASR (en)			AST		Emotion
Metric		CER↓			WER↓			BLEU↑		ACC↑
Dataset		AISHELL-1	Fleurs zh	WenetSpeech test-net	LibriSpeech test-clean	GigaSpeech	TED-LIUM	CoVoST en2zh	CoVoST zh2en	MELD emotion
Proprietary
GPT-4o-Realtime	-	7.3*	5.4*	28.9*	2.6*	12.9*	4.8*	37.1*	15.7*	33.2*
Gemini-1.5-Pro	-	4.5*	5.9*	14.3*	2.9*	10.6*	3.0*	47.3*	22.6*	48.4*
Open-Source
Qwen2-Audio-Base	8B	-	7.5	-	1.6	-	-	45.2	24.4	55.3
Qwen2-Audio-Instruction	8B	2.6*	6.9*	10.3*	3.1*	9.7*	5.9*	39.5*	22.9*	17.4*
GLM-4-Voice-Base	9B	2.5	-	-	2.8	-	-	-	-
MiniCPM-o 2.6	8B	1.6	4.4	6.9	1.7	8.7	3.0	48.2	27.2	52.4
Task	Size	SpeechQA
Metric		ACC↑			G-Eval (10 point)↑	Semantic ELO score↑	Acoustic ELO score↑	Overall ELO score↑	UTMOS↑	ASR-WER↓
Dataset		Speech Llama Q.	Speech Web Q.	Speech Trivia QA	Speech AlpacaEval	AudioArena
Proprietary
GPT-4o-Realtime		71.7	51.6	69.7	7.4	1157	1203	1200	4.2	2.3
Open-Source
GLM-4-Voice	9B	50.0	32.0	36.4	5.1	999	1147	1035	4.1	11.7
Llama-Omni	8B	45.3	22.9	10.7	3.9	960	878	897	3.2	24.3
Moshi	7B	43.7	23.8	16.7	2.4	871	808	875	2.8	8.2
Mini-Omni	1B	22.0	12.8	6.9	2.5	926	803	865	3.4	10.0
MiniCPM-o 2.6	8B	61.0	40.0	40.2	5.1	1088	1163	1131	4.2	9.8
Task	Voice cloning
Metric	SIMO↑	SIMO↑
Dataset	Seed-TTS test-zh	Seed-TTS test-en
F5-TTS	76	67
CosyVoice	75	64
FireRedTTS	63	46
MiniCPM-o 2.6	57	47
Model	Size	Real-Time Video Understanding	Omni-Source Understanding	Contextual Understanding	Overall
Proprietary
Gemini 1.5 Pro	-	77.4	67.8	51.1	70.3
GPT-4o-202408	-	74.5	51.0	48.0	64.1
Claude-3.5-Sonnet	-	74.0	41.4	37.8	59.7
Open-source
VILA-1.5	8B	61.5	37.5	26.7	49.5
LongVA	7B	63.1	35.9	30.2	50.7
LLaVA-Next-Video-34B	34B	69.8	41.7	34.3	56.7
Qwen2-VL-7B	8B	71.2	40.7	33.1	57.0
InternVL2-8B	8B	70.1	42.7	34.1	57.0
VITA-1.5	8B	70.9	40.8	35.8	57.4
LLaVA-OneVision-7B	8B	74.3	40.8	31.0	58.4
InternLM-XC2.5-OL-7B	8B	75.4	46.2	33.6	60.8
MiniCPM-V 2.6	8B	72.4	40.2	33.4	57.7
MiniCPM-o 2.6	8B	79.9	53.4	38.5	66.0
Model	Size	OCRBench	TextVQA val	DocVQA test	Open-Compass	MME	MMB test (en)	MMB test (cn)	MMMU val	Math-Vista	LLaVA Bench	RealWorld QA	Object HalBench
Proprietary
Gemini Pro	-	680	74.6	88.1	62.9	2148.9	73.6	74.3	48.9	45.8	79.9	60.4	-
GPT-4V (2023.11.06)	-	645	78.0	88.4	63.5	1771.5	77.0	74.4	53.8	47.8	93.1	63.0	86.4
Open-source
Mini-Gemini	2.2B	-	56.2	34.2*	-	1653.0	-	-	31.7	-	-	-	-
Qwen-VL-Chat	9.6B	488	61.5	62.6	51.6	1860.0	61.8	56.3	37.0	33.8	67.7	49.3	56.2
DeepSeek-VL-7B	7.3B	435	64.7*	47.0*	54.6	1765.4	73.8	71.4	38.3	36.8	77.8	54.2	-
Yi-VL-34B	34B	290	43.4*	16.9*	52.2	2050.2	72.4	70.7	45.1	30.7	62.3	54.8	79.3
CogVLM-Chat	17.4B	590	70.4	33.3*	54.2	1736.6	65.8	55.9	37.3	34.7	73.9	60.3	73.6
TextMonkey	9.7B	558	64.3	66.7	-	-	-	-	-	-	-	-	-
Idefics2	8.0B	-	73.0	74.0	57.2	1847.6	75.7	68.6	45.2	52.2	49.1	60.7	-
Bunny-LLama-3-8B	8.4B	-	-	-	54.3	1920.3	77.0	73.9	41.3	31.5	61.2	58.8	-
LLaVA-NeXT Llama-3-8B	8.4B	-	-	78.2	-	1971.5	-	-	41.7	37.5	80.1	60.0	-
Phi-3-vision-128k-instruct	4.2B	639*	70.9	-	-	1537.5*	-	-	40.4	44.5	64.2*	58.8*	-
MiniCPM-V 1.0	2.8B	366	60.6	38.2	47.5	1650.2	64.1	62.6	38.3	28.9	51.3	51.2	78.4
MiniCPM-V 2.0	2.8B	605	74.1	71.9	54.5	1808.6	69.1	66.5	38.2	38.7	69.2	55.8	85.5
MiniCPM-Llama3-V 2.5	8.5B	725	76.6	84.8	65.1	2024.6	77.2	74.2	45.8	54.3	86.7	63.5	89.7
Model	Size	Token Density⁺	OpenCompass	MME	MMVet	OCRBench	MMMU val	MathVista mini	MMB1.1 test	AI2D	TextVQA val	DocVQA test	HallusionBench	Object HalBench
Proprietary
GPT-4o	-	1088	69.9	2328.7	69.1	736	69.2	61.3	82.2	84.6	-	92.8	55.0	17.6
Claude 3.5 Sonnet	-	750	67.9	1920.0	66.0	788	65.9	61.6	78.5	80.2	-	95.2	49.9	13.8
Gemini 1.5 Pro	-	-	64.4	2110.6	64.0	754	60.6	57.7	73.9	79.1	73.5	86.5	45.6	-
GPT-4o mini	-	1088	64.1	2003.4	66.9	785	60.0	52.4	76.0	77.8	-	-	46.1	12.4
GPT-4V	-	1088	63.5	2070.2	67.5	656	61.7	54.7	79.8	78.6	78.0	87.2	43.9	14.2
Step-1V	-	-	59.5	2206.4	63.3	625	49.9	44.8	78.0	79.2	71.6	-	48.4	-
Qwen-VL-Max	-	784	58.3	2281.7	61.8	684	52.0	43.4	74.6	75.7	79.5	93.1	41.2	13.4
Open-source
LLaVA-NeXT-Yi-34B	34B	157	55.0	2006.5	50.7	574	48.8	40.4	77.8	78.9	69.3	-	34.8	12.6
Mini-Gemini-HD-34B	34B	157	-	2141.0	59.3	518	48.0	43.3	-	80.5	74.1	78.9	-	-
Cambrian-34B	34B	1820	58.3	2049.9	53.2	591	50.4	50.3	77.8	79.5	76.7	75.5	41.6	14.7
GLM-4V-9B	13B	784	59.1	2018.8	58.0	776	46.9	51.1	67.9	71.2	-	-	45.0	-
InternVL2-8B	8B	706	64.1	2215.1	54.3	794	51.2	58.3	79.4	83.6	77.4	91.6	45.0	21.3
MiniCPM-Llama-V 2.5	8B	1882	58.8	2024.6	52.8	725	45.8	54.3	72.0	78.4	76.6	84.8	42.4	10.3
MiniCPM-V 2.6	8B	2822	65.2	2348.4*	60.0	852*	49.8*	60.6	78.0	82.1	80.1	90.8	48.1*	8.2
Model	Size	Video-MME		Video-ChatGPT
		w/o subs	w subs	Correctness	Detail	Context	Temporal	Consistency
Proprietary
Claude 3.5 Sonnet	-	60.0	62.9	-	-	-	-	-
GPT-4V	-	59.9	63.3	-	-	-	-	-
Open-source
LLaVA-NeXT-7B	7B	-	-	3.39	3.29	3.92	2.60	3.12
LLaVA-NeXT-34B	34B	-	-	3.29	3.23	3.83	2.51	3.47
CogVLM2-Video	12B	-	-	3.49	3.46	3.23	2.98	3.64
LongVA	7B	52.4	54.3	3.05	3.09	3.77	2.44	3.64
InternVL2-8B	8B	54.0	56.9	-	-	-	-	-
InternLM-XComposer-2.5	8B	55.8	-	-	-	-	-	-
LLaVA-NeXT-Video	32B	60.2	63.0	3.48	3.37	3.95	2.64	3.28
MiniCPM-V 2.6	8B	60.9	63.6	3.59	3.28	3.93	2.73	3.62
Model	Size	Shot	TextVQA val	VizWiz test-dev	VQAv2 test-dev	OK-VQA val
Flamingo	80B	0*	35.0	31.6	56.3	40.6
		4	36.5	39.6	63.1	57.4
		8	37.3	44.8	65.6	57.5
IDEFICS	80B	0*	30.9	36.0	60.0	45.2
		4	34.3	40.4	63.6	52.4
		8	35.7	46.1	64.8	55.1
OmniCorpus	7B	0*	43.0	49.8	63.2	45.5
		4	45.4	51.3	64.5	46.5
		8	45.6	52.2	64.7	46.6
Emu2	37B	0	26.4	40.4	33.5	26.7
		4	48.2	54.6	67.0	53.2
		8	49.3	54.7	67.8	54.1
MM1	30B	0	26.2	40.4	48.9	26.7
MM1	30B	8	49.3	54.7	70.9	54.1
MiniCPM-V 2.6⁺	8B	0	43.9	33.8	45.4	23.9
		4	63.6	60.5	65.5	50.1
		8	64.6	63.4	68.2	51.4