Update to MiniCPM-o 2.6
1695
README_en.md
1923
README_zh.md
BIN
assets/MiniCPM-o.png
Normal file
|
After Width: | Height: | Size: 373 KiB |
BIN
assets/discord.png
Normal file
|
After Width: | Height: | Size: 272 B |
3
assets/logo.html
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
<span style="color:#56A7DA; font-size: 10em; font-weight: bold;">
|
||||||
|
MiniCPM-<span>o</span>
|
||||||
|
</span>
|
||||||
BIN
assets/minicpm-o-26-framework.png
Normal file
|
After Width: | Height: | Size: 1023 KiB |
BIN
assets/minicpmo2_6/minicpmo2_6_diagram_train_NN.png
Normal file
|
After Width: | Height: | Size: 1.8 MiB |
BIN
assets/minicpmo2_6/minicpmo2_6_math_intersect.png
Normal file
|
After Width: | Height: | Size: 785 KiB |
BIN
assets/minicpmo2_6/minicpmo2_6_multi-image_bike.png
Normal file
|
After Width: | Height: | Size: 8.6 MiB |
BIN
assets/minicpmo2_6/show_demo.jpg
Normal file
|
After Width: | Height: | Size: 100 KiB |
BIN
assets/o-2dot6-demo-video-preview.png
Normal file
|
After Width: | Height: | Size: 2.6 MiB |
BIN
assets/radar.jpg
Normal file
|
After Width: | Height: | Size: 842 KiB |
BIN
assets/ref_audios/default.wav
Normal file
BIN
assets/ref_audios/female_example.wav
Normal file
BIN
assets/ref_audios/male_example.wav
Normal file
BIN
assets/ref_audios/video_default.wav
Normal file
BIN
assets/wechat.png
Normal file
|
After Width: | Height: | Size: 245 B |
333
docs/minicpm_llama3_v2dot5.md
Normal file
@@ -0,0 +1,333 @@
|
|||||||
|
## MiniCPM-Llama3-V 2.5
|
||||||
|
|
||||||
|
> Archieve at: 2025-01-13
|
||||||
|
|
||||||
|
|
||||||
|
**MiniCPM-Llama3-V 2.5** is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.0. Notable features of MiniCPM-Llama3-V 2.5 include:
|
||||||
|
|
||||||
|
- 🔥 **Leading Performance.**
|
||||||
|
MiniCPM-Llama3-V 2.5 has achieved an average score of 65.1 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. **With only 8B parameters, it surpasses widely used proprietary models like GPT-4V-1106, Gemini Pro, Claude 3 and Qwen-VL-Max** and greatly outperforms other Llama 3-based MLLMs.
|
||||||
|
|
||||||
|
- 💪 **Strong OCR Capabilities.**
|
||||||
|
MiniCPM-Llama3-V 2.5 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving a **700+ score on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V-0409, Qwen-VL-Max and Gemini Pro**. Based on recent user feedback, MiniCPM-Llama3-V 2.5 has now enhanced full-text OCR extraction, table-to-markdown conversion, and other high-utility capabilities, and has further strengthened its instruction-following and complex reasoning abilities, enhancing multimodal interaction experiences.
|
||||||
|
|
||||||
|
- 🏆 **Trustworthy Behavior.**
|
||||||
|
Leveraging the latest [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) method (the newest technique in the [RLHF-V](https://github.com/RLHF-V) [CVPR'24] series), MiniCPM-Llama3-V 2.5 exhibits more trustworthy behavior. It achieves a **10.3%** hallucination rate on Object HalBench, lower than GPT-4V-1106 (13.6%), achieving the best-level performance within the open-source community. [Data released](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset).
|
||||||
|
|
||||||
|
- 🌏 **Multilingual Support.**
|
||||||
|
Thanks to the strong multilingual capabilities of Llama 3 and the cross-lingual generalization technique from [VisCPM](https://github.com/OpenBMB/VisCPM), MiniCPM-Llama3-V 2.5 extends its bilingual (Chinese-English) multimodal capabilities to **over 30 languages including German, French, Spanish, Italian, Korean etc.** [All Supported Languages](./assets/minicpm-llama-v-2-5_languages.md).
|
||||||
|
|
||||||
|
- 🚀 **Efficient Deployment.**
|
||||||
|
MiniCPM-Llama3-V 2.5 systematically employs **model quantization, CPU optimizations, NPU optimizations and compilation optimizations**, achieving high-efficiency deployment on end-side devices. For mobile phones with Qualcomm chips, we have integrated the NPU acceleration framework QNN into llama.cpp for the first time. After systematic optimization, MiniCPM-Llama3-V 2.5 has realized a **150x acceleration in end-side MLLM image encoding** and a **3x speedup in language decoding**.
|
||||||
|
|
||||||
|
- 💫 **Easy Usage.**
|
||||||
|
MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) and [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) support for efficient CPU inference on local devices, (2) [GGUF](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) format quantized models in 16 sizes, (3) efficient [LoRA](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#lora-finetuning) fine-tuning with only 2 V100 GPUs, (4) [streaming output](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5#usage), (5) quick local WebUI demo setup with [Gradio](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_2.5.py) and [Streamlit](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_streamlit-2_5.py), and (6) interactive demos on [HuggingFace Spaces](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5).
|
||||||
|
|
||||||
|
### Evaluation <!-- omit in toc -->
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
<img src=../assets/MiniCPM-Llama3-V-2.5-peformance.png width=66% />
|
||||||
|
</div>
|
||||||
|
<details>
|
||||||
|
<summary>Click to view results on TextVQA, DocVQA, OCRBench, OpenCompass, MME, MMBench, MMMU, MathVista, LLaVA Bench, RealWorld QA, Object HalBench. </summary>
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
<table style="margin: 0px auto;">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th align="left">Model</th>
|
||||||
|
<th>Size</th>
|
||||||
|
<th>OCRBench</th>
|
||||||
|
<th>TextVQA val</th>
|
||||||
|
<th>DocVQA test</th>
|
||||||
|
<th>Open-Compass</th>
|
||||||
|
<th>MME</th>
|
||||||
|
<th>MMB test (en)</th>
|
||||||
|
<th>MMB test (cn)</th>
|
||||||
|
<th>MMMU val</th>
|
||||||
|
<th>Math-Vista</th>
|
||||||
|
<th>LLaVA Bench</th>
|
||||||
|
<th>RealWorld QA</th>
|
||||||
|
<th>Object HalBench</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody align="center">
|
||||||
|
<tr>
|
||||||
|
<td colspan="14" align="left"><strong>Proprietary</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">Gemini Pro</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>680</td>
|
||||||
|
<td>74.6</td>
|
||||||
|
<td>88.1</td>
|
||||||
|
<td>62.9</td>
|
||||||
|
<td>2148.9</td>
|
||||||
|
<td>73.6</td>
|
||||||
|
<td>74.3</td>
|
||||||
|
<td>48.9</td>
|
||||||
|
<td>45.8</td>
|
||||||
|
<td>79.9</td>
|
||||||
|
<td>60.4</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">GPT-4V (2023.11.06)</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>645</td>
|
||||||
|
<td>78.0</td>
|
||||||
|
<td>88.4</td>
|
||||||
|
<td>63.5</td>
|
||||||
|
<td>1771.5</td>
|
||||||
|
<td>77.0</td>
|
||||||
|
<td>74.4</td>
|
||||||
|
<td>53.8</td>
|
||||||
|
<td>47.8</td>
|
||||||
|
<td>93.1</td>
|
||||||
|
<td>63.0</td>
|
||||||
|
<td>86.4</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td colspan="14" align="left"><strong>Open-source</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">Mini-Gemini</td>
|
||||||
|
<td>2.2B</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>56.2</td>
|
||||||
|
<td>34.2*</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>1653.0</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>31.7</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">Qwen-VL-Chat</td>
|
||||||
|
<td>9.6B</td>
|
||||||
|
<td>488</td>
|
||||||
|
<td>61.5</td>
|
||||||
|
<td>62.6</td>
|
||||||
|
<td>51.6</td>
|
||||||
|
<td>1860.0</td>
|
||||||
|
<td>61.8</td>
|
||||||
|
<td>56.3</td>
|
||||||
|
<td>37.0</td>
|
||||||
|
<td>33.8</td>
|
||||||
|
<td>67.7</td>
|
||||||
|
<td>49.3</td>
|
||||||
|
<td>56.2</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">DeepSeek-VL-7B</td>
|
||||||
|
<td>7.3B</td>
|
||||||
|
<td>435</td>
|
||||||
|
<td>64.7*</td>
|
||||||
|
<td>47.0*</td>
|
||||||
|
<td>54.6</td>
|
||||||
|
<td>1765.4</td>
|
||||||
|
<td>73.8</td>
|
||||||
|
<td>71.4</td>
|
||||||
|
<td>38.3</td>
|
||||||
|
<td>36.8</td>
|
||||||
|
<td>77.8</td>
|
||||||
|
<td>54.2</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">Yi-VL-34B</td>
|
||||||
|
<td>34B</td>
|
||||||
|
<td>290</td>
|
||||||
|
<td>43.4*</td>
|
||||||
|
<td>16.9*</td>
|
||||||
|
<td>52.2</td>
|
||||||
|
<td><strong>2050.2</strong></td>
|
||||||
|
<td>72.4</td>
|
||||||
|
<td>70.7</td>
|
||||||
|
<td>45.1</td>
|
||||||
|
<td>30.7</td>
|
||||||
|
<td>62.3</td>
|
||||||
|
<td>54.8</td>
|
||||||
|
<td>79.3</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">CogVLM-Chat</td>
|
||||||
|
<td>17.4B</td>
|
||||||
|
<td>590</td>
|
||||||
|
<td>70.4</td>
|
||||||
|
<td>33.3*</td>
|
||||||
|
<td>54.2</td>
|
||||||
|
<td>1736.6</td>
|
||||||
|
<td>65.8</td>
|
||||||
|
<td>55.9</td>
|
||||||
|
<td>37.3</td>
|
||||||
|
<td>34.7</td>
|
||||||
|
<td>73.9</td>
|
||||||
|
<td>60.3</td>
|
||||||
|
<td>73.6</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">TextMonkey</td>
|
||||||
|
<td>9.7B</td>
|
||||||
|
<td>558</td>
|
||||||
|
<td>64.3</td>
|
||||||
|
<td>66.7</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">Idefics2</td>
|
||||||
|
<td>8.0B</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>73.0</td>
|
||||||
|
<td>74.0</td>
|
||||||
|
<td>57.2</td>
|
||||||
|
<td>1847.6</td>
|
||||||
|
<td>75.7</td>
|
||||||
|
<td>68.6</td>
|
||||||
|
<td>45.2</td>
|
||||||
|
<td>52.2</td>
|
||||||
|
<td>49.1</td>
|
||||||
|
<td>60.7</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">Bunny-LLama-3-8B</td>
|
||||||
|
<td>8.4B</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>54.3</td>
|
||||||
|
<td>1920.3</td>
|
||||||
|
<td>77.0</td>
|
||||||
|
<td>73.9</td>
|
||||||
|
<td>41.3</td>
|
||||||
|
<td>31.5</td>
|
||||||
|
<td>61.2</td>
|
||||||
|
<td>58.8</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">LLaVA-NeXT Llama-3-8B</td>
|
||||||
|
<td>8.4B</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>78.2</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>1971.5</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>41.7</td>
|
||||||
|
<td>37.5</td>
|
||||||
|
<td>80.1</td>
|
||||||
|
<td>60.0</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">Phi-3-vision-128k-instruct</td>
|
||||||
|
<td>4.2B</td>
|
||||||
|
<td>639*</td>
|
||||||
|
<td>70.9</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>1537.5*</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>40.4</td>
|
||||||
|
<td>44.5</td>
|
||||||
|
<td>64.2*</td>
|
||||||
|
<td>58.8*</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="background-color: #e6f2ff;">
|
||||||
|
<td nowrap="nowrap" align="left">MiniCPM-V 1.0</td>
|
||||||
|
<td>2.8B</td>
|
||||||
|
<td>366</td>
|
||||||
|
<td>60.6</td>
|
||||||
|
<td>38.2</td>
|
||||||
|
<td>47.5</td>
|
||||||
|
<td>1650.2</td>
|
||||||
|
<td>64.1</td>
|
||||||
|
<td>62.6</td>
|
||||||
|
<td>38.3</td>
|
||||||
|
<td>28.9</td>
|
||||||
|
<td>51.3</td>
|
||||||
|
<td>51.2</td>
|
||||||
|
<td>78.4</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="background-color: #e6f2ff;">
|
||||||
|
<td nowrap="nowrap" align="left">MiniCPM-V 2.0</td>
|
||||||
|
<td>2.8B</td>
|
||||||
|
<td>605</td>
|
||||||
|
<td>74.1</td>
|
||||||
|
<td>71.9</td>
|
||||||
|
<td>54.5</td>
|
||||||
|
<td>1808.6</td>
|
||||||
|
<td>69.1</td>
|
||||||
|
<td>66.5</td>
|
||||||
|
<td>38.2</td>
|
||||||
|
<td>38.7</td>
|
||||||
|
<td>69.2</td>
|
||||||
|
<td>55.8</td>
|
||||||
|
<td>85.5</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="background-color: #e6f2ff;">
|
||||||
|
<td nowrap="nowrap" align="left">MiniCPM-Llama3-V 2.5</td>
|
||||||
|
<td>8.5B</td>
|
||||||
|
<td><strong>725</strong></td>
|
||||||
|
<td><strong>76.6</strong></td>
|
||||||
|
<td><strong>84.8</strong></td>
|
||||||
|
<td><strong>65.1</strong></td>
|
||||||
|
<td>2024.6</td>
|
||||||
|
<td><strong>77.2</strong></td>
|
||||||
|
<td><strong>74.2</strong></td>
|
||||||
|
<td><strong>45.8</strong></td>
|
||||||
|
<td><strong>54.3</strong></td>
|
||||||
|
<td><strong>86.7</strong></td>
|
||||||
|
<td><strong>63.5</strong></td>
|
||||||
|
<td><strong>89.7</strong></td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
|
||||||
|
</div>
|
||||||
|
* We evaluate the officially released checkpoint by ourselves.
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
<img src="../assets/llavabench_compare_3.png" width="100%" />
|
||||||
|
<br>
|
||||||
|
Evaluation results of multilingual LLaVA Bench
|
||||||
|
</div>
|
||||||
|
|
||||||
|
### Examples <!-- omit in toc -->
|
||||||
|
|
||||||
|
<table align="center" >
|
||||||
|
<p align="center" >
|
||||||
|
<img src="../assets/minicpmv-llama3-v2.5/cases_all.png" />
|
||||||
|
</p>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
|
|
||||||
|
### Model Zoo
|
||||||
|
|
||||||
|
| Model | Device | Memory |          Description | Download |
|
||||||
|
|:-----------|:--:|:-----------:|:-------------------|:---------------:|
|
||||||
|
| MiniCPM-Llama3-V 2.5 | GPU | 19 GB | Strong end-side multimodal performance. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/) [<img src="../assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5) |
|
||||||
|
| MiniCPM-Llama3-V 2.5 gguf | CPU | 6 GB | The gguf version, lower memory usage and faster inference. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) [<img src="../assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-gguf) |
|
||||||
|
| MiniCPM-Llama3-V 2.5 int4 | GPU | 8 GB | The int4 quantized version, lower GPU memory usage. | [🤗](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-int4/) [<img src="../assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-int4) |
|
||||||
294
docs/minicpm_v2.md
Normal file
@@ -0,0 +1,294 @@
|
|||||||
|
## MiniCPM-V 2.0
|
||||||
|
|
||||||
|
|
||||||
|
> Archive at:2025-01-13
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
**MiniCPM-V 2.0** is an efficient version with promising performance for deployment. The model is built based on SigLip-400M and [MiniCPM-2.4B](https://github.com/OpenBMB/MiniCPM/), connected by a perceiver resampler. Our latest version, MiniCPM-V 2.0 has several notable features.
|
||||||
|
|
||||||
|
- 🔥 **State-of-the-art Performance.**
|
||||||
|
|
||||||
|
MiniCPM-V 2.0 achieves **state-of-the-art performance** on multiple benchmarks (including OCRBench, TextVQA, MME, MMB, MathVista, etc) among models under 7B parameters. It even **outperforms strong Qwen-VL-Chat 9.6B, CogVLM-Chat 17.4B, and Yi-VL 34B on OpenCompass, a comprehensive evaluation over 11 popular benchmarks**. Notably, MiniCPM-V 2.0 shows **strong OCR capability**, achieving **comparable performance to Gemini Pro in scene-text understanding**, and **state-of-the-art performance on OCRBench** among open-source models.
|
||||||
|
|
||||||
|
- 🏆 **Trustworthy Behavior.**
|
||||||
|
|
||||||
|
LMMs are known for suffering from hallucination, often generating text not factually grounded in images. MiniCPM-V 2.0 is **the first end-side LMM aligned via multimodal RLHF for trustworthy behavior** (using the recent [RLHF-V](https://rlhf-v.github.io/) [CVPR'24] series technique). This allows the model to **match GPT-4V in preventing hallucinations** on Object HalBench.
|
||||||
|
|
||||||
|
- 🌟 **High-Resolution Images at Any Aspect Raito.**
|
||||||
|
|
||||||
|
MiniCPM-V 2.0 can accept **1.8 million pixels (e.g., 1344x1344) images at any aspect ratio**. This enables better perception of fine-grained visual information such as small objects and optical characters, which is achieved via a recent technique from [LLaVA-UHD](https://arxiv.org/pdf/2403.11703.pdf).
|
||||||
|
|
||||||
|
- ⚡️ **High Efficiency.**
|
||||||
|
|
||||||
|
MiniCPM-V 2.0 can be **efficiently deployed on most GPU cards and personal computers**, and **even on end devices such as mobile phones**. For visual encoding, we compress the image representations into much fewer tokens via a perceiver resampler. This allows MiniCPM-V 2.0 to operate with **favorable memory cost and speed during inference even when dealing with high-resolution images**.
|
||||||
|
|
||||||
|
- 🙌 **Bilingual Support.**
|
||||||
|
|
||||||
|
MiniCPM-V 2.0 **supports strong bilingual multimodal capabilities in both English and Chinese**. This is enabled by generalizing multimodal capabilities across languages, a technique from [VisCPM](https://arxiv.org/abs/2308.12038) [ICLR'24].
|
||||||
|
|
||||||
|
|
||||||
|
### Evaluation <!-- omit in toc -->
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
<img src=../assets/minicpmv-2-peformance.png width=66% />
|
||||||
|
</div>
|
||||||
|
<details>
|
||||||
|
<summary>Click to view results on TextVQA, DocVQA, OCRBench, OpenCompass, MME, MMBench, MMMU, MathVista, LLaVA Bench, Object HalBench. </summary>
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
<table style="margin: 0px auto;">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th align="left">Model</th>
|
||||||
|
<th>Size</th>
|
||||||
|
<th>TextVQA val</th>
|
||||||
|
<th>DocVQA test</th>
|
||||||
|
<th>OCRBench</th>
|
||||||
|
<th>OpenCompass</th>
|
||||||
|
<th nowrap="nowrap" >MME</th>
|
||||||
|
<th>MMB dev(en)</th>
|
||||||
|
<th>MMB dev(zh)</th>
|
||||||
|
<th>MMMU val</th>
|
||||||
|
<th>MathVista</th>
|
||||||
|
<th>LLaVA Bench</th>
|
||||||
|
<th nowrap="nowrap">Object HalBench</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody align="center">
|
||||||
|
<tr>
|
||||||
|
<td colspan="12" align="left"><strong>Proprietary models</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">Gemini Pro Vision</td>
|
||||||
|
<td>- </td>
|
||||||
|
<td>74.6</td>
|
||||||
|
<td>88.1</td>
|
||||||
|
<td>680</td>
|
||||||
|
<td>63.8</td>
|
||||||
|
<td>2148.9</td>
|
||||||
|
<td>75.2</td>
|
||||||
|
<td>74.0</td>
|
||||||
|
<td>48.9</td>
|
||||||
|
<td>45.8</td>
|
||||||
|
<td>79.9</td>
|
||||||
|
<td>- </td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">GPT-4V</td>
|
||||||
|
<td>- </td>
|
||||||
|
<td>78.0</td>
|
||||||
|
<td>88.4</td>
|
||||||
|
<td>645</td>
|
||||||
|
<td>63.2</td>
|
||||||
|
<td>1771.5</td>
|
||||||
|
<td>75.1</td>
|
||||||
|
<td>75.0</td>
|
||||||
|
<td>53.8</td>
|
||||||
|
<td>47.8</td>
|
||||||
|
<td>93.1</td>
|
||||||
|
<td>86.4 / 92.7</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td colspan="12" align="left"><strong>Open-source models 6B~34B</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left" >Yi-VL-6B</td>
|
||||||
|
<td align="right" >6.7B</td>
|
||||||
|
<td>45.5*</td>
|
||||||
|
<td>17.1*</td>
|
||||||
|
<td>290</td>
|
||||||
|
<td>49.3</td>
|
||||||
|
<td>1915.1 </td>
|
||||||
|
<td>68.6 </td>
|
||||||
|
<td>68.3 </td>
|
||||||
|
<td>40.3 </td>
|
||||||
|
<td>28.8 </td>
|
||||||
|
<td>51.9 </td>
|
||||||
|
<td>- </td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left" >Qwen-VL-Chat</td>
|
||||||
|
<td align="right" >9.6B</td>
|
||||||
|
<td>61.5</td>
|
||||||
|
<td>62.6</td>
|
||||||
|
<td>488 </td>
|
||||||
|
<td>52.1 </td>
|
||||||
|
<td>1860.0 </td>
|
||||||
|
<td>60.6 </td>
|
||||||
|
<td>56.7 </td>
|
||||||
|
<td>37.0 </td>
|
||||||
|
<td>33.8 </td>
|
||||||
|
<td>67.7 </td>
|
||||||
|
<td>56.2 / 80.0</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left" >Yi-VL-34B</td>
|
||||||
|
<td align="right" >34B</td>
|
||||||
|
<td>43.4*</td>
|
||||||
|
<td>16.9*</td>
|
||||||
|
<td>290</td>
|
||||||
|
<td>52.6 </td>
|
||||||
|
<td>2050.2</td>
|
||||||
|
<td>71.1</td>
|
||||||
|
<td>71.4</td>
|
||||||
|
<td>45.1</td>
|
||||||
|
<td>30.7</td>
|
||||||
|
<td>62.3</td>
|
||||||
|
<td>- </td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left" >DeepSeek-VL-7B</td>
|
||||||
|
<td align="right" >7.3B</td>
|
||||||
|
<td>64.7*</td>
|
||||||
|
<td>47.0* </td>
|
||||||
|
<td>435</td>
|
||||||
|
<td>55.6 </td>
|
||||||
|
<td>1765.4 </td>
|
||||||
|
<td>74.1 </td>
|
||||||
|
<td>72.8 </td>
|
||||||
|
<td>38.3 </td>
|
||||||
|
<td>36.8</td>
|
||||||
|
<td>77.8 </td>
|
||||||
|
<td>- </td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left" >TextMonkey</td>
|
||||||
|
<td align="right" >9.7B</td>
|
||||||
|
<td>64.3</td>
|
||||||
|
<td>66.7 </td>
|
||||||
|
<td>558</td>
|
||||||
|
<td>- </td>
|
||||||
|
<td>- </td>
|
||||||
|
<td>- </td>
|
||||||
|
<td>- </td>
|
||||||
|
<td>- </td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>- </td>
|
||||||
|
<td>- </td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left" >CogVLM-Chat</td>
|
||||||
|
<td align="right" >17.4B</td>
|
||||||
|
<td>70.4</td>
|
||||||
|
<td>33.3*</td>
|
||||||
|
<td>590 </td>
|
||||||
|
<td>52.5 </td>
|
||||||
|
<td>1736.6 </td>
|
||||||
|
<td>63.7 </td>
|
||||||
|
<td>53.8 </td>
|
||||||
|
<td>37.3 </td>
|
||||||
|
<td>34.7 </td>
|
||||||
|
<td>73.9 </td>
|
||||||
|
<td>73.6 / 87.4 </td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td colspan="12" align="left"><strong>Open-source models 1B~3B </strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left" >DeepSeek-VL-1.3B</td>
|
||||||
|
<td align="right" >1.7B</td>
|
||||||
|
<td>58.4*</td>
|
||||||
|
<td>37.9*</td>
|
||||||
|
<td>413</td>
|
||||||
|
<td>46.0 </td>
|
||||||
|
<td>1531.6 </td>
|
||||||
|
<td>64.0 </td>
|
||||||
|
<td>61.2 </td>
|
||||||
|
<td>33.8 </td>
|
||||||
|
<td>29.4 </td>
|
||||||
|
<td>51.1 </td>
|
||||||
|
<td>- </td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left" >MobileVLM V2</td>
|
||||||
|
<td align="right" >3.1B</td>
|
||||||
|
<td>57.5</td>
|
||||||
|
<td>19.4*</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>1440.5(P) </td>
|
||||||
|
<td>63.2 </td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left" >Mini-Gemini</td>
|
||||||
|
<td align="right" >2.2B</td>
|
||||||
|
<td>56.2</td>
|
||||||
|
<td>34.2*</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>1653.0 </td>
|
||||||
|
<td>59.8 </td>
|
||||||
|
<td>- </td>
|
||||||
|
<td>31.7 </td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>- </td>
|
||||||
|
<td>- </td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left" >MiniCPM-V</td>
|
||||||
|
<td align="right" >2.8B </td>
|
||||||
|
<td>60.6</td>
|
||||||
|
<td>38.2 </td>
|
||||||
|
<td>366</td>
|
||||||
|
<td>47.6</td>
|
||||||
|
<td>1650.2 </td>
|
||||||
|
<td>67.9 </td>
|
||||||
|
<td>65.3 </td>
|
||||||
|
<td><strong>38.3</strong></td>
|
||||||
|
<td>28.9</td>
|
||||||
|
<td>51.3 </td>
|
||||||
|
<td>78.4 / 88.5 </td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left" ><strong>MiniCPM-V 2.0</strong></td>
|
||||||
|
<td align="right" >2.8B </td>
|
||||||
|
<td><strong>74.1</strong></td>
|
||||||
|
<td><strong>71.9</strong> </td>
|
||||||
|
<td><strong>605</strong></td>
|
||||||
|
<td><strong>55.0</strong></td>
|
||||||
|
<td><strong>1808.6</strong> </td>
|
||||||
|
<td><strong>69.6</strong> </td>
|
||||||
|
<td><strong>68.1</strong> </td>
|
||||||
|
<td>38.2 </td>
|
||||||
|
<td><strong>38.7</strong></td>
|
||||||
|
<td><strong>69.2</strong> </td>
|
||||||
|
<td><strong>85.5 / 92.2 </strong></td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
* We evaluate the officially released checkpoint by ourselves.
|
||||||
|
</details>
|
||||||
|
|
||||||
|
### Examples <!-- omit in toc -->
|
||||||
|
|
||||||
|
<table align="center">
|
||||||
|
<p align="center">
|
||||||
|
<img src="../assets/minicpmv2-cases_2.png" width=95%/>
|
||||||
|
</p>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
We deploy MiniCPM-V 2.0 on end devices. The demo video is the raw screen recording on a Xiaomi 14 Pro without edition.
|
||||||
|
|
||||||
|
<table align="center">
|
||||||
|
<p align="center">
|
||||||
|
<img src="../assets/gif_cases/station.gif" width=36%/>
|
||||||
|
<img src="../assets/gif_cases/london_car.gif" width=36%/>
|
||||||
|
</p>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
### Model Zoo
|
||||||
|
|
||||||
|
| Model | Device | Memory |          Description | Download |
|
||||||
|
|:-----------|:--:|:-----------:|:-------------------|:---------------:|
|
||||||
|
| MiniCPM-V 2.0 | GPU | 8 GB | Light version, balance the performance the computation cost. | [🤗](https://huggingface.co/openbmb/MiniCPM-V-2) [<img src="../assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2) |
|
||||||
|
| MiniCPM-V 1.0 | GPU | 7 GB | Lightest version, achieving the fastest inference. | [🤗](https://huggingface.co/openbmb/MiniCPM-V) [<img src="../assets/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-V) |
|
||||||
945
docs/minicpm_v2dot6.md
Normal file
@@ -0,0 +1,945 @@
|
|||||||
|
## MiniCPM-V 2.6
|
||||||
|
|
||||||
|
> Archieve at: 2025-01-13
|
||||||
|
|
||||||
|
**MiniCPM-V 2.6** is the latest and most capable model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-Llama3-V 2.5, and introduces new features for multi-image and video understanding. Notable features of MiniCPM-V 2.6 include:
|
||||||
|
|
||||||
|
- 🔥 **Leading Performance.**
|
||||||
|
MiniCPM-V 2.6 achieves an average score of 65.2 on the latest version of OpenCompass, a comprehensive evaluation over 8 popular benchmarks. **With only 8B parameters, it surpasses widely used proprietary models like GPT-4o mini, GPT-4V, Gemini 1.5 Pro, and Claude 3.5 Sonnet** for single image understanding.
|
||||||
|
|
||||||
|
- 🖼️ **Multi Image Understanding and In-context Learning.** MiniCPM-V 2.6 can also perform **conversation and reasoning over multiple images**. It achieves **state-of-the-art performance** on popular multi-image benchmarks such as Mantis-Eval, BLINK, Mathverse mv and Sciverse mv, and also shows promising in-context learning capability.
|
||||||
|
|
||||||
|
- 🎬 **Video Understanding.** MiniCPM-V 2.6 can also **accept video inputs**, performing conversation and providing dense captions for spatial-temporal information. It outperforms **GPT-4V, Claude 3.5 Sonnet and LLaVA-NeXT-Video-34B** on Video-MME with/without subtitles.
|
||||||
|
|
||||||
|
- 💪 **Strong OCR Capability and Others.**
|
||||||
|
MiniCPM-V 2.6 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344). It achieves **state-of-the-art performance on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V, and Gemini 1.5 Pro**.
|
||||||
|
Based on the the latest [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) and [VisCPM](https://github.com/OpenBMB/VisCPM) techniques, it features **trustworthy behaviors**, with significantly lower hallucination rates than GPT-4o and GPT-4V on Object HalBench, and supports **multilingual capabilities** on English, Chinese, German, French, Italian, Korean, etc.
|
||||||
|
|
||||||
|
|
||||||
|
- 🚀 **Superior Efficiency.**
|
||||||
|
In addition to its friendly size, MiniCPM-V 2.6 also shows **state-of-the-art token density** (i.e., number of pixels encoded into each visual token). **It produces only 640 tokens when processing a 1.8M pixel image, which is 75% fewer than most models**. This directly improves the inference speed, first-token latency, memory usage, and power consumption. As a result, MiniCPM-V 2.6 can efficiently support **real-time video understanding** on end-side devices such as iPad.
|
||||||
|
|
||||||
|
- 💫 **Easy Usage.**
|
||||||
|
MiniCPM-V 2.6 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpmv-main/examples/llava/README-minicpmv2.6.md) and [ollama](https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/examples/minicpm-v2.6/README.md) support for efficient CPU inference on local devices, (2) [int4](https://huggingface.co/openbmb/MiniCPM-V-2_6-int4) and [GGUF](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) format quantized models in 16 sizes, (3) [vLLM](#inference-with-vllm) support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks, (5) quick local WebUI demo setup with [Gradio](#chat-with-our-demo-on-gradio), and (6) online web [demo](http://120.92.209.146:8887/).
|
||||||
|
|
||||||
|
### Evaluation <!-- omit in toc -->
|
||||||
|
<div align="center">
|
||||||
|
<img src=../assets/radar_final.png width=66% />
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>Click to view single image results on OpenCompass, MME, MMVet, OCRBench, MMMU, MathVista, MMB, AI2D, TextVQA, DocVQA, HallusionBench, Object HalBench. </summary>
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
<table style="margin: 0px auto;">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th align="left">Model</th>
|
||||||
|
<th>Size</th>
|
||||||
|
<th>Token Density<sup>+</sup></th>
|
||||||
|
<th>OpenCompass</th>
|
||||||
|
<th>MME</th>
|
||||||
|
<th>MMVet</th>
|
||||||
|
<th>OCRBench</th>
|
||||||
|
<th>MMMU val</th>
|
||||||
|
<th>MathVista mini</th>
|
||||||
|
<th>MMB1.1 test</th>
|
||||||
|
<th>AI2D</th>
|
||||||
|
<th>TextVQA val</th>
|
||||||
|
<th>DocVQA test</th>
|
||||||
|
<th>HallusionBench</th>
|
||||||
|
<th>Object HalBench</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody align="center">
|
||||||
|
<tr>
|
||||||
|
<td colspan="15" align="left"><strong>Proprietary</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">GPT-4o</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>1088</td>
|
||||||
|
<td>69.9</td>
|
||||||
|
<td>2328.7</td>
|
||||||
|
<td>69.1</td>
|
||||||
|
<td>736</td>
|
||||||
|
<td>69.2</td>
|
||||||
|
<td>61.3</td>
|
||||||
|
<td>82.2</td>
|
||||||
|
<td>84.6</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>92.8</td>
|
||||||
|
<td>55.0</td>
|
||||||
|
<td>17.6</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">Claude 3.5 Sonnet</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>750</td>
|
||||||
|
<td>67.9</td>
|
||||||
|
<td>1920.0</td>
|
||||||
|
<td>66.0</td>
|
||||||
|
<td>788</td>
|
||||||
|
<td>65.9</td>
|
||||||
|
<td>61.6</td>
|
||||||
|
<td>78.5</td>
|
||||||
|
<td>80.2</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>95.2</td>
|
||||||
|
<td>49.9</td>
|
||||||
|
<td>13.8</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">Gemini 1.5 Pro</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>64.4</td>
|
||||||
|
<td>2110.6</td>
|
||||||
|
<td>64.0</td>
|
||||||
|
<td>754</td>
|
||||||
|
<td>60.6</td>
|
||||||
|
<td>57.7</td>
|
||||||
|
<td>73.9</td>
|
||||||
|
<td>79.1</td>
|
||||||
|
<td>73.5</td>
|
||||||
|
<td>86.5</td>
|
||||||
|
<td>45.6</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">GPT-4o mini</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>1088</td>
|
||||||
|
<td>64.1</td>
|
||||||
|
<td>2003.4</td>
|
||||||
|
<td>66.9</td>
|
||||||
|
<td>785</td>
|
||||||
|
<td>60.0</td>
|
||||||
|
<td>52.4</td>
|
||||||
|
<td>76.0</td>
|
||||||
|
<td>77.8</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>46.1</td>
|
||||||
|
<td>12.4</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">GPT-4V</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>1088</td>
|
||||||
|
<td>63.5</td>
|
||||||
|
<td>2070.2</td>
|
||||||
|
<td>67.5</td>
|
||||||
|
<td>656</td>
|
||||||
|
<td>61.7</td>
|
||||||
|
<td>54.7</td>
|
||||||
|
<td>79.8</td>
|
||||||
|
<td>78.6</td>
|
||||||
|
<td>78.0</td>
|
||||||
|
<td>87.2</td>
|
||||||
|
<td>43.9</td>
|
||||||
|
<td>14.2</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">Step-1V</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>59.5</td>
|
||||||
|
<td>2206.4</td>
|
||||||
|
<td>63.3</td>
|
||||||
|
<td>625</td>
|
||||||
|
<td>49.9</td>
|
||||||
|
<td>44.8</td>
|
||||||
|
<td>78.0</td>
|
||||||
|
<td>79.2</td>
|
||||||
|
<td>71.6</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>48.4</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">Qwen-VL-Max</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>784</td>
|
||||||
|
<td>58.3</td>
|
||||||
|
<td>2281.7</td>
|
||||||
|
<td>61.8</td>
|
||||||
|
<td>684</td>
|
||||||
|
<td>52.0</td>
|
||||||
|
<td>43.4</td>
|
||||||
|
<td>74.6</td>
|
||||||
|
<td>75.7</td>
|
||||||
|
<td>79.5</td>
|
||||||
|
<td>93.1</td>
|
||||||
|
<td>41.2</td>
|
||||||
|
<td>13.4</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td colspan="15" align="left"><strong>Open-source</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">LLaVA-NeXT-Yi-34B</td>
|
||||||
|
<td>34B</td>
|
||||||
|
<td>157</td>
|
||||||
|
<td>55.0</td>
|
||||||
|
<td>2006.5</td>
|
||||||
|
<td>50.7</td>
|
||||||
|
<td>574</td>
|
||||||
|
<td>48.8</td>
|
||||||
|
<td>40.4</td>
|
||||||
|
<td>77.8</td>
|
||||||
|
<td>78.9</td>
|
||||||
|
<td>69.3</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>34.8</td>
|
||||||
|
<td>12.6</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">Mini-Gemini-HD-34B</td>
|
||||||
|
<td>34B</td>
|
||||||
|
<td>157</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>2141.0</td>
|
||||||
|
<td>59.3</td>
|
||||||
|
<td>518</td>
|
||||||
|
<td>48.0</td>
|
||||||
|
<td>43.3</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>80.5</td>
|
||||||
|
<td>74.1</td>
|
||||||
|
<td>78.9</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">Cambrian-34B</td>
|
||||||
|
<td>34B</td>
|
||||||
|
<td>1820</td>
|
||||||
|
<td>58.3</td>
|
||||||
|
<td>2049.9</td>
|
||||||
|
<td>53.2</td>
|
||||||
|
<td>591</td>
|
||||||
|
<td>50.4</td>
|
||||||
|
<td>50.3</td>
|
||||||
|
<td>77.8</td>
|
||||||
|
<td>79.5</td>
|
||||||
|
<td>76.7</td>
|
||||||
|
<td>75.5</td>
|
||||||
|
<td>41.6</td>
|
||||||
|
<td>14.7</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">GLM-4V-9B</td>
|
||||||
|
<td>13B</td>
|
||||||
|
<td>784</td>
|
||||||
|
<td>59.1</td>
|
||||||
|
<td>2018.8</td>
|
||||||
|
<td>58.0</td>
|
||||||
|
<td>776</td>
|
||||||
|
<td>46.9</td>
|
||||||
|
<td>51.1</td>
|
||||||
|
<td>67.9</td>
|
||||||
|
<td>71.2</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>45.0</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">InternVL2-8B</td>
|
||||||
|
<td>8B</td>
|
||||||
|
<td>706</td>
|
||||||
|
<td>64.1</td>
|
||||||
|
<td>2215.1</td>
|
||||||
|
<td>54.3</td>
|
||||||
|
<td>794</td>
|
||||||
|
<td><strong>51.2</strong></td>
|
||||||
|
<td>58.3</td>
|
||||||
|
<td><strong>79.4</strong></td>
|
||||||
|
<td><strong>83.6</strong></td>
|
||||||
|
<td>77.4</td>
|
||||||
|
<td><strong>91.6</strong></td>
|
||||||
|
<td>45.0</td>
|
||||||
|
<td>21.3</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">MiniCPM-Llama-V 2.5</td>
|
||||||
|
<td>8B</td>
|
||||||
|
<td>1882</td>
|
||||||
|
<td>58.8</td>
|
||||||
|
<td>2024.6</td>
|
||||||
|
<td>52.8</td>
|
||||||
|
<td>725</td>
|
||||||
|
<td>45.8</td>
|
||||||
|
<td>54.3</td>
|
||||||
|
<td>72.0</td>
|
||||||
|
<td>78.4</td>
|
||||||
|
<td>76.6</td>
|
||||||
|
<td>84.8</td>
|
||||||
|
<td>42.4</td>
|
||||||
|
<td>10.3</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="background-color: #e6f2ff;">
|
||||||
|
<td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
|
||||||
|
<td>8B</td>
|
||||||
|
<td><strong>2822</strong></td>
|
||||||
|
<td><strong>65.2</strong></td>
|
||||||
|
<td><strong>2348.4</strong>*</td>
|
||||||
|
<td><strong>60.0</strong></td>
|
||||||
|
<td><strong>852</strong>*</td>
|
||||||
|
<td>49.8*</td>
|
||||||
|
<td><strong>60.6</strong></td>
|
||||||
|
<td>78.0</td>
|
||||||
|
<td>82.1</td>
|
||||||
|
<td><strong>80.1<strong></td>
|
||||||
|
<td>90.8</td>
|
||||||
|
<td><strong>48.1</strong>*</td>
|
||||||
|
<td><strong>8.2</strong></td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
* We evaluate this benchmark using chain-of-thought prompting. Specifically, for MME, we used this technique only for the Cognition set.
|
||||||
|
|
||||||
|
<sup>+</sup> Token Density: number of pixels encoded into each visual token at maximum resolution, i.e., # pixels at maximum resolution / # visual tokens.
|
||||||
|
|
||||||
|
Note: For proprietary models, we calculate token density based on the image encoding charging strategy defined in the official API documentation, which provides an upper-bound estimation.
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>Click to view multi-image results on Mantis Eval, BLINK, Mathverse mv, Sciverse mv, MIRB.</summary>
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
<table style="margin: 0px auto;">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th align="left">Model</th>
|
||||||
|
<th>Size</th>
|
||||||
|
<th>Mantis Eval</th>
|
||||||
|
<th>BLINK val</th>
|
||||||
|
<th>Mathverse mv</th>
|
||||||
|
<th>Sciverse mv</th>
|
||||||
|
<th>MIRB</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody align="center">
|
||||||
|
<tr>
|
||||||
|
<td colspan="7" align="left"><strong>Proprietary</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">GPT-4V</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>62.7</td>
|
||||||
|
<td>54.6</td>
|
||||||
|
<td>60.3</td>
|
||||||
|
<td>66.9</td>
|
||||||
|
<td>53.1</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">LLaVA-NeXT-Interleave-14B</td>
|
||||||
|
<td>14B</td>
|
||||||
|
<td>66.4</td>
|
||||||
|
<td>52.6</td>
|
||||||
|
<td>32.7</td>
|
||||||
|
<td>30.2</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td colspan="7" align="left"><strong>Open-source</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">Emu2-Chat</td>
|
||||||
|
<td>37B</td>
|
||||||
|
<td>37.8</td>
|
||||||
|
<td>36.2</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>27.2</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">CogVLM</td>
|
||||||
|
<td>17B</td>
|
||||||
|
<td>45.2</td>
|
||||||
|
<td>41.1</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">VPG-C</td>
|
||||||
|
<td>7B</td>
|
||||||
|
<td>52.4</td>
|
||||||
|
<td>43.1</td>
|
||||||
|
<td>24.3</td>
|
||||||
|
<td>23.1</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">VILA 8B</td>
|
||||||
|
<td>8B</td>
|
||||||
|
<td>51.2</td>
|
||||||
|
<td>39.3</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>36.5</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">InternLM-XComposer-2.5</td>
|
||||||
|
<td>8B</td>
|
||||||
|
<td>53.1*</td>
|
||||||
|
<td>48.9</td>
|
||||||
|
<td>32.1*</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>42.5</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">InternVL2-8B</td>
|
||||||
|
<td>8B</td>
|
||||||
|
<td>59.0*</td>
|
||||||
|
<td>50.9</td>
|
||||||
|
<td>30.5*</td>
|
||||||
|
<td>34.4*</td>
|
||||||
|
<td><strong>56.9*</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr style="background-color: #e6f2ff;">
|
||||||
|
<td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
|
||||||
|
<td>8B</td>
|
||||||
|
<td><strong>69.1</strong></td>
|
||||||
|
<td><strong>53.0</strong></td>
|
||||||
|
<td><strong>84.9</strong></td>
|
||||||
|
<td><strong>74.9</strong></td>
|
||||||
|
<td>53.8</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
* We evaluate the officially released checkpoint by ourselves.
|
||||||
|
</details>
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>Click to view video results on Video-MME and Video-ChatGPT.</summary>
|
||||||
|
<div align="center">
|
||||||
|
<table style="margin: 0px auto;">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th align="left">Model</th>
|
||||||
|
<th>Size</th>
|
||||||
|
<th colspan="2">Video-MME</th>
|
||||||
|
<th colspan="5">Video-ChatGPT</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th align="left"></th>
|
||||||
|
<th></th>
|
||||||
|
<th>w/o subs</th>
|
||||||
|
<th>w subs</th>
|
||||||
|
<th>Correctness</th>
|
||||||
|
<th>Detail</th>
|
||||||
|
<th>Context</th>
|
||||||
|
<th>Temporal</th>
|
||||||
|
<th>Consistency</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody align="center">
|
||||||
|
<tr>
|
||||||
|
<td colspan="9" align="left"><strong>Proprietary</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">Claude 3.5 Sonnet</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>60.0</td>
|
||||||
|
<td>62.9</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">GPT-4V</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>59.9</td>
|
||||||
|
<td>63.3</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td colspan="9" align="left"><strong>Open-source</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">LLaVA-NeXT-7B</td>
|
||||||
|
<td>7B</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>3.39</td>
|
||||||
|
<td>3.29</td>
|
||||||
|
<td>3.92</td>
|
||||||
|
<td>2.60</td>
|
||||||
|
<td>3.12</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">LLaVA-NeXT-34B</td>
|
||||||
|
<td>34B</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>3.29</td>
|
||||||
|
<td>3.23</td>
|
||||||
|
<td>3.83</td>
|
||||||
|
<td>2.51</td>
|
||||||
|
<td>3.47</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">CogVLM2-Video</td>
|
||||||
|
<td>12B</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>3.49</td>
|
||||||
|
<td><strong>3.46</strong></td>
|
||||||
|
<td>3.23</td>
|
||||||
|
<td><strong>2.98</strong></td>
|
||||||
|
<td><strong>3.64</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">LongVA</td>
|
||||||
|
<td>7B</td>
|
||||||
|
<td>52.4</td>
|
||||||
|
<td>54.3</td>
|
||||||
|
<td>3.05</td>
|
||||||
|
<td>3.09</td>
|
||||||
|
<td>3.77</td>
|
||||||
|
<td>2.44</td>
|
||||||
|
<td><strong>3.64</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">InternVL2-8B</td>
|
||||||
|
<td>8B</td>
|
||||||
|
<td>54.0</td>
|
||||||
|
<td>56.9</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">InternLM-XComposer-2.5</td>
|
||||||
|
<td>8B</td>
|
||||||
|
<td>55.8</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td nowrap="nowrap" align="left">LLaVA-NeXT-Video</td>
|
||||||
|
<td>32B</td>
|
||||||
|
<td>60.2</td>
|
||||||
|
<td>63.0</td>
|
||||||
|
<td>3.48</td>
|
||||||
|
<td>3.37</td>
|
||||||
|
<td><strong>3.95</strong></td>
|
||||||
|
<td>2.64</td>
|
||||||
|
<td>3.28</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="background-color: #e6f2ff;">
|
||||||
|
<td nowrap="nowrap" align="left">MiniCPM-V 2.6</td>
|
||||||
|
<td>8B</td>
|
||||||
|
<td><strong>60.9</strong></td>
|
||||||
|
<td><strong>63.6</strong></td>
|
||||||
|
<td><strong>3.59</strong></td>
|
||||||
|
<td>3.28</td>
|
||||||
|
<td>3.93</td>
|
||||||
|
<td>2.73</td>
|
||||||
|
<td>3.62</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
</details>
|
||||||
|
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>Click to view few-shot results on TextVQA, VizWiz, VQAv2, OK-VQA.</summary>
|
||||||
|
<div align="center">
|
||||||
|
<table style="margin: 0px auto;">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th align="left">Model</th>
|
||||||
|
<th>Size</th>
|
||||||
|
<th>Shot</th>
|
||||||
|
<th>TextVQA val</th>
|
||||||
|
<th>VizWiz test-dev</th>
|
||||||
|
<th>VQAv2 test-dev</th>
|
||||||
|
<th>OK-VQA val</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody align="center">
|
||||||
|
<tr>
|
||||||
|
<td align="left" nowrap="nowrap" rowspan="3">Flamingo</td>
|
||||||
|
<td rowspan="3">80B</td>
|
||||||
|
<td>0*</td>
|
||||||
|
<td>35.0</td>
|
||||||
|
<td>31.6</td>
|
||||||
|
<td>56.3</td>
|
||||||
|
<td>40.6</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>4</td>
|
||||||
|
<td>36.5</td>
|
||||||
|
<td>39.6</td>
|
||||||
|
<td>63.1</td>
|
||||||
|
<td><strong>57.4</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>8</td>
|
||||||
|
<td>37.3</td>
|
||||||
|
<td>44.8</td>
|
||||||
|
<td>65.6</td>
|
||||||
|
<td>57.5</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="left" nowrap="nowrap" rowspan="3">IDEFICS</td>
|
||||||
|
<td rowspan="3">80B</td>
|
||||||
|
<td>0*</td>
|
||||||
|
<td>30.9</td>
|
||||||
|
<td>36.0</td>
|
||||||
|
<td>60.0</td>
|
||||||
|
<td>45.2</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>4</td>
|
||||||
|
<td>34.3</td>
|
||||||
|
<td>40.4</td>
|
||||||
|
<td>63.6</td>
|
||||||
|
<td>52.4</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>8</td>
|
||||||
|
<td>35.7</td>
|
||||||
|
<td>46.1</td>
|
||||||
|
<td>64.8</td>
|
||||||
|
<td>55.1</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="left" nowrap="nowrap" rowspan="3">OmniCorpus</td>
|
||||||
|
<td rowspan="3">7B</td>
|
||||||
|
<td>0*</td>
|
||||||
|
<td>43.0</td>
|
||||||
|
<td>49.8</td>
|
||||||
|
<td>63.2</td>
|
||||||
|
<td>45.5</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>4</td>
|
||||||
|
<td>45.4</td>
|
||||||
|
<td>51.3</td>
|
||||||
|
<td>64.5</td>
|
||||||
|
<td>46.5</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>8</td>
|
||||||
|
<td>45.6</td>
|
||||||
|
<td>52.2</td>
|
||||||
|
<td>64.7</td>
|
||||||
|
<td>46.6</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="left" nowrap="nowrap" rowspan="3">Emu2</td>
|
||||||
|
<td rowspan="3">37B</td>
|
||||||
|
<td>0</td>
|
||||||
|
<td>26.4</td>
|
||||||
|
<td>40.4</td>
|
||||||
|
<td>33.5</td>
|
||||||
|
<td>26.7</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>4</td>
|
||||||
|
<td>48.2</td>
|
||||||
|
<td>54.6</td>
|
||||||
|
<td>67.0</td>
|
||||||
|
<td>53.2</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>8</td>
|
||||||
|
<td>49.3</td>
|
||||||
|
<td>54.7</td>
|
||||||
|
<td>67.8</td>
|
||||||
|
<td>54.1</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="left" nowrap="nowrap" rowspan="2">MM1</td>
|
||||||
|
<td rowspan="2">30B</td>
|
||||||
|
<td>0</td>
|
||||||
|
<td>26.2</td>
|
||||||
|
<td>40.4</td>
|
||||||
|
<td>48.9</td>
|
||||||
|
<td>26.7</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>8</td>
|
||||||
|
<td>49.3</td>
|
||||||
|
<td>54.7</td>
|
||||||
|
<td><strong>70.9</strong></td>
|
||||||
|
<td>54.1</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="background-color: #e6f2ff;">
|
||||||
|
<td align="left" nowrap="nowrap" rowspan="3">MiniCPM-V 2.6<sup>+</sup></td>
|
||||||
|
<td rowspan="3">8B</td>
|
||||||
|
<td>0</td>
|
||||||
|
<td>43.9</td>
|
||||||
|
<td>33.8</td>
|
||||||
|
<td>45.4</td>
|
||||||
|
<td>23.9</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="background-color: #e6f2ff;">
|
||||||
|
<td>4</td>
|
||||||
|
<td>63.6</td>
|
||||||
|
<td>60.5</td>
|
||||||
|
<td>65.5</td>
|
||||||
|
<td>50.1</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="background-color: #e6f2ff;">
|
||||||
|
<td>8</td>
|
||||||
|
<td><strong>64.6</strong></td>
|
||||||
|
<td><strong>63.4</strong></td>
|
||||||
|
<td>68.2</td>
|
||||||
|
<td>51.4</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
|
||||||
|
</div>
|
||||||
|
* denotes zero image shot and two additional text shots following Flamingo.
|
||||||
|
|
||||||
|
<sup>+</sup> We evaluate the pretraining ckpt without SFT.
|
||||||
|
</details>
|
||||||
|
|
||||||
|
### Examples <!-- omit in toc -->
|
||||||
|
|
||||||
|
<div style="display: flex; flex-direction: column; align-items: center;">
|
||||||
|
<img src="../assets/minicpmv2_6/multi_img-bike.png" alt="Bike" style="margin-bottom: 5px;">
|
||||||
|
<img src="../assets/minicpmv2_6/multi_img-menu.png" alt="Menu" style="margin-bottom: 5px;">
|
||||||
|
<img src="../assets/minicpmv2_6/multi_img-code.png" alt="Code" style="margin-bottom: 5px;">
|
||||||
|
<img src="../assets/minicpmv2_6/ICL-Mem.png" alt="Mem" style="margin-bottom: 5px;">
|
||||||
|
<img src="../assets/minicpmv2_6/multiling-medal.png" alt="medal" style="margin-bottom: 10px;">
|
||||||
|
</div>
|
||||||
|
<details>
|
||||||
|
<summary>Click to view more cases.</summary>
|
||||||
|
<div style="display: flex; flex-direction: column; align-items: center;">
|
||||||
|
<img src="../assets/minicpmv2_6/ICL-elec.png" alt="elec" style="margin-bottom: 5px;">
|
||||||
|
<img src="../assets/minicpmv2_6/multiling-olympic.png" alt="Menu" style="margin-bottom: 10px;">
|
||||||
|
</div>
|
||||||
|
</details>
|
||||||
|
|
||||||
|
We deploy MiniCPM-V 2.6 on end devices. The demo video is the raw screen recording on a iPad Pro without edition.
|
||||||
|
|
||||||
|
<table align="center">
|
||||||
|
<p align="center">
|
||||||
|
<img src="../assets/gif_cases/ai.gif" width=32%/>
|
||||||
|
|
||||||
|
<img src="../assets/gif_cases/beer.gif" width=32%/>
|
||||||
|
</p>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<table align="center">
|
||||||
|
<p align="center">
|
||||||
|
<img src="../assets/gif_cases/ticket.gif" width=32%/>
|
||||||
|
|
||||||
|
<img src="../assets/gif_cases/wfh.gif" width=32%/>
|
||||||
|
</p>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<table align="center">
|
||||||
|
<p align="center">
|
||||||
|
<video src="https://github.com/user-attachments/assets/21f4b818-ede1-4822-920e-91281725c830" width="360" /> </video>
|
||||||
|
<!-- <video src="https://github.com/user-attachments/assets/c835f757-206b-4d9c-8e36-70d67b453628" width="360" /> </video> -->
|
||||||
|
</p>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
### Multi-turn Conversation
|
||||||
|
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
<img src="../assets/airplane.jpeg" width="500px">
|
||||||
|
</div>
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from PIL import Image
|
||||||
|
from transformers import AutoModel, AutoTokenizer
|
||||||
|
|
||||||
|
torch.manual_seed(0)
|
||||||
|
|
||||||
|
model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
|
||||||
|
attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
|
||||||
|
model = model.eval().cuda()
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
|
||||||
|
|
||||||
|
image = Image.open('./assets/airplane.jpeg').convert('RGB')
|
||||||
|
|
||||||
|
# First round chat
|
||||||
|
question = "Tell me the model of this aircraft."
|
||||||
|
msgs = [{'role': 'user', 'content': [image, question]}]
|
||||||
|
|
||||||
|
answer = model.chat(
|
||||||
|
image=None,
|
||||||
|
msgs=msgs,
|
||||||
|
tokenizer=tokenizer
|
||||||
|
)
|
||||||
|
print(answer)
|
||||||
|
|
||||||
|
# Second round chat
|
||||||
|
# pass history context of multi-turn conversation
|
||||||
|
msgs.append({"role": "assistant", "content": [answer]})
|
||||||
|
msgs.append({"role": "user", "content": ["Introduce something about Airbus A380."]})
|
||||||
|
|
||||||
|
answer = model.chat(
|
||||||
|
image=None,
|
||||||
|
msgs=msgs,
|
||||||
|
tokenizer=tokenizer
|
||||||
|
)
|
||||||
|
print(answer)
|
||||||
|
```
|
||||||
|
|
||||||
|
You could get the following output:
|
||||||
|
|
||||||
|
```
|
||||||
|
"The aircraft in the image is an Airbus A380, which can be identified by its large size, double-deck structure, and the distinctive shape of its wings and engines. The A380 is a wide-body aircraft known for being the world's largest passenger airliner, designed for long-haul flights. It has four engines, which are characteristic of large commercial aircraft. The registration number on the aircraft can also provide specific information about the model if looked up in an aviation database."
|
||||||
|
|
||||||
|
"The Airbus A380 is a double-deck, wide-body, four-engine jet airliner made by Airbus. It is the world's largest passenger airliner and is known for its long-haul capabilities. The aircraft was developed to improve efficiency and comfort for passengers traveling over long distances. It has two full-length passenger decks, which can accommodate more passengers than a typical single-aisle airplane. The A380 has been operated by airlines such as Lufthansa, Singapore Airlines, and Emirates, among others. It is widely recognized for its unique design and significant impact on the aviation industry."
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Multi-image Understanding
|
||||||
|
<details>
|
||||||
|
<summary> Click to view Python example of MiniCPM-V 2.6 multi-image understanding </summary>
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from PIL import Image
|
||||||
|
from transformers import AutoModel, AutoTokenizer
|
||||||
|
|
||||||
|
model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
|
||||||
|
attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
|
||||||
|
model = model.eval().cuda()
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
|
||||||
|
|
||||||
|
image1 = Image.open('image1.jpg').convert('RGB')
|
||||||
|
image2 = Image.open('image2.jpg').convert('RGB')
|
||||||
|
question = 'Compare image 1 and image 2, tell me about the differences between image 1 and image 2.'
|
||||||
|
|
||||||
|
msgs = [{'role': 'user', 'content': [image1, image2, question]}]
|
||||||
|
|
||||||
|
answer = model.chat(
|
||||||
|
image=None,
|
||||||
|
msgs=msgs,
|
||||||
|
tokenizer=tokenizer
|
||||||
|
)
|
||||||
|
print(answer)
|
||||||
|
```
|
||||||
|
</details>
|
||||||
|
|
||||||
|
#### Few-shot In-Context-Learning
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary> Click to view Python example of MiniCPM-V 2.6 few-shot in-context-learning example </summary>
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from PIL import Image
|
||||||
|
from transformers import AutoModel, AutoTokenizer
|
||||||
|
|
||||||
|
model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
|
||||||
|
attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
|
||||||
|
model = model.eval().cuda()
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
|
||||||
|
|
||||||
|
question = "production date"
|
||||||
|
image1 = Image.open('example1.jpg').convert('RGB')
|
||||||
|
answer1 = "2023.08.04"
|
||||||
|
image2 = Image.open('example2.jpg').convert('RGB')
|
||||||
|
answer2 = "2007.04.24"
|
||||||
|
image_test = Image.open('test.jpg').convert('RGB')
|
||||||
|
|
||||||
|
msgs = [
|
||||||
|
{'role': 'user', 'content': [image1, question]}, {'role': 'assistant', 'content': [answer1]},
|
||||||
|
{'role': 'user', 'content': [image2, question]}, {'role': 'assistant', 'content': [answer2]},
|
||||||
|
{'role': 'user', 'content': [image_test, question]}
|
||||||
|
]
|
||||||
|
|
||||||
|
answer = model.chat(
|
||||||
|
image=None,
|
||||||
|
msgs=msgs,
|
||||||
|
tokenizer=tokenizer
|
||||||
|
)
|
||||||
|
print(answer)
|
||||||
|
```
|
||||||
|
</details>
|
||||||
|
|
||||||
|
#### Video understanding
|
||||||
|
<details>
|
||||||
|
<summary> Click to view Python example of MiniCPM-V 2.6 video understanding </summary>
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from PIL import Image
|
||||||
|
from transformers import AutoModel, AutoTokenizer
|
||||||
|
from decord import VideoReader, cpu # pip install decord
|
||||||
|
|
||||||
|
model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
|
||||||
|
attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
|
||||||
|
model = model.eval().cuda()
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
|
||||||
|
|
||||||
|
MAX_NUM_FRAMES=64 # if cuda OOM set a smaller number
|
||||||
|
|
||||||
|
def encode_video(video_path):
|
||||||
|
def uniform_sample(l, n):
|
||||||
|
gap = len(l) / n
|
||||||
|
idxs = [int(i * gap + gap / 2) for i in range(n)]
|
||||||
|
return [l[i] for i in idxs]
|
||||||
|
|
||||||
|
vr = VideoReader(video_path, ctx=cpu(0))
|
||||||
|
sample_fps = round(vr.get_avg_fps() / 1) # FPS
|
||||||
|
frame_idx = [i for i in range(0, len(vr), sample_fps)]
|
||||||
|
if len(frame_idx) > MAX_NUM_FRAMES:
|
||||||
|
frame_idx = uniform_sample(frame_idx, MAX_NUM_FRAMES)
|
||||||
|
frames = vr.get_batch(frame_idx).asnumpy()
|
||||||
|
frames = [Image.fromarray(v.astype('uint8')) for v in frames]
|
||||||
|
print('num frames:', len(frames))
|
||||||
|
return frames
|
||||||
|
|
||||||
|
video_path="video_test.mp4"
|
||||||
|
frames = encode_video(video_path)
|
||||||
|
question = "Describe the video"
|
||||||
|
msgs = [
|
||||||
|
{'role': 'user', 'content': frames + [question]},
|
||||||
|
]
|
||||||
|
|
||||||
|
# Set decode params for video
|
||||||
|
params = {}
|
||||||
|
params["use_image_id"] = False
|
||||||
|
params["max_slice_nums"] = 2 # 如果cuda OOM且视频分辨率大于448*448可设为1
|
||||||
|
|
||||||
|
answer = model.chat(
|
||||||
|
image=None,
|
||||||
|
msgs=msgs,
|
||||||
|
tokenizer=tokenizer,
|
||||||
|
**params
|
||||||
|
)
|
||||||
|
print(answer)
|
||||||
|
```
|
||||||
|
</details>
|
||||||
@@ -1,6 +1,6 @@
|
|||||||
## OmniLMM-12B
|
## OmniLMM-12B
|
||||||
|
|
||||||
> OmniLMM-12B is released at early time of this project. We recommond you to use our [recently released models](./README_en.md), for better performance and efficiency.
|
> OmniLMM-12B is released at early time of this project. We recommond you to use our [recently released models](./README.md), for better performance and efficiency.
|
||||||
|
|
||||||
> Archieve at: 2024-05-19
|
> Archieve at: 2024-05-19
|
||||||
|
|
||||||
@@ -7,7 +7,6 @@ import re
|
|||||||
import random
|
import random
|
||||||
from dataclasses import dataclass, field
|
from dataclasses import dataclass, field
|
||||||
from typing import Dict, List, Optional
|
from typing import Dict, List, Optional
|
||||||
from decord import VideoReader, cpu # pip install decord
|
|
||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
import torch
|
import torch
|
||||||
@@ -21,26 +20,6 @@ logger = logging.getLogger(__name__)
|
|||||||
|
|
||||||
llama3_chat_template = "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}"
|
llama3_chat_template = "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}"
|
||||||
|
|
||||||
MAX_NUM_FRAMES=64
|
|
||||||
def encode_video(video_path, max_num_frames=64):
|
|
||||||
max_num_frames = min(max_num_frames, MAX_NUM_FRAMES)
|
|
||||||
def uniform_sample(l, n):
|
|
||||||
gap = len(l) / n
|
|
||||||
idxs = [int(i * gap + gap / 2) for i in range(n)]
|
|
||||||
return [l[i] for i in idxs]
|
|
||||||
|
|
||||||
vr = VideoReader(video_path, ctx=cpu(0))
|
|
||||||
sample_fps = round(vr.get_avg_fps() / 1) # FPS
|
|
||||||
frame_idx = [i for i in range(0, len(vr), sample_fps)]
|
|
||||||
if len(frame_idx) > max_num_frames:
|
|
||||||
if max_num_frames==1:
|
|
||||||
frame_idx = [frame_idx[len(frame_idx)//2]]
|
|
||||||
else:
|
|
||||||
frame_idx = uniform_sample(frame_idx, max_num_frames)
|
|
||||||
frames = vr.get_batch(frame_idx).asnumpy()
|
|
||||||
frames = [Image.fromarray(v.astype('uint8')) for v in frames]
|
|
||||||
return frames
|
|
||||||
|
|
||||||
class SupervisedDataset(Dataset):
|
class SupervisedDataset(Dataset):
|
||||||
"""Dataset for supervised fine-tuning."""
|
"""Dataset for supervised fine-tuning."""
|
||||||
|
|
||||||
@@ -55,8 +34,6 @@ class SupervisedDataset(Dataset):
|
|||||||
query_nums=64,
|
query_nums=64,
|
||||||
batch_vision=False,
|
batch_vision=False,
|
||||||
max_length=2048,
|
max_length=2048,
|
||||||
video_max_slice_nums=2,
|
|
||||||
max_num_frames=1,
|
|
||||||
):
|
):
|
||||||
super(SupervisedDataset, self).__init__()
|
super(SupervisedDataset, self).__init__()
|
||||||
self.raw_data = raw_data
|
self.raw_data = raw_data
|
||||||
@@ -68,58 +45,17 @@ class SupervisedDataset(Dataset):
|
|||||||
self.query_nums=query_nums
|
self.query_nums=query_nums
|
||||||
self.batch_vision = batch_vision
|
self.batch_vision = batch_vision
|
||||||
self.max_length = max_length
|
self.max_length = max_length
|
||||||
# video config
|
|
||||||
self.video_slice_config = copy.deepcopy(slice_config)
|
|
||||||
self.video_slice_config['max_slice_nums'] = video_max_slice_nums
|
|
||||||
self.max_num_frames = max_num_frames
|
|
||||||
|
|
||||||
def __len__(self):
|
def __len__(self):
|
||||||
return len(self.raw_data)
|
return len(self.raw_data)
|
||||||
|
|
||||||
def __getitem__(self, i) -> Dict[str, torch.Tensor]:
|
def __getitem__(self, i) -> Dict[str, torch.Tensor]:
|
||||||
try:
|
try:
|
||||||
# default: sft image
|
if isinstance(self.raw_data[i]["image"], str):
|
||||||
use_image_id = True
|
images_dict = { "<image>" : Image.open(self.raw_data[i]["image"]).convert("RGB") }
|
||||||
slice_config = self.slice_config
|
elif isinstance(self.raw_data[i]["image"], Dict):
|
||||||
if "image" in self.raw_data[i]:
|
### for multi-images input, the template for every image is <image_xx>, such as <image_00>, <image_01>
|
||||||
if isinstance(self.raw_data[i]["image"], str):
|
images_dict = {img_name : Image.open(img_path).convert("RGB") for img_name, img_path in self.raw_data[i]["image"].items()}
|
||||||
images_dict = { "<image>" : Image.open(self.raw_data[i]["image"]).convert("RGB") }
|
|
||||||
elif isinstance(self.raw_data[i]["image"], Dict):
|
|
||||||
### for multi-images input, the template for every image is <image_xx>, such as <image_00>, <image_01>
|
|
||||||
images_dict = {img_name : Image.open(img_path).convert("RGB") for img_name, img_path in self.raw_data[i]["image"].items()}
|
|
||||||
elif "video" in self.raw_data[i]:
|
|
||||||
if isinstance(self.raw_data[i]["video"], str):
|
|
||||||
frames = encode_video(self.raw_data[i]["video"], max_num_frames=self.max_num_frames)
|
|
||||||
image_names = []
|
|
||||||
images_dict = {}
|
|
||||||
for j, frame in enumerate(frames):
|
|
||||||
image_name = "<image_{:02d}>".format(j)
|
|
||||||
images_dict[image_name] = frame
|
|
||||||
image_names.append(image_name)
|
|
||||||
for j in range(len(self.raw_data[i]["conversations"])):
|
|
||||||
content = self.raw_data[i]["conversations"][j]['content']
|
|
||||||
self.raw_data[i]["conversations"][j]['content'] = content.replace("<video>", "".join(image_names))
|
|
||||||
elif isinstance(self.raw_data[i]["video"], Dict):
|
|
||||||
videos = self.raw_data[i]["video"]
|
|
||||||
images_dict = {}
|
|
||||||
video_names = {}
|
|
||||||
cnt = 0
|
|
||||||
for video_name in videos:
|
|
||||||
video_id = video_name.split("_")[-1].strip(">")
|
|
||||||
video = videos[video_name]
|
|
||||||
frames = encode_video(video, max_num_frames=self.max_num_frames)
|
|
||||||
image_names = []
|
|
||||||
for j, frame in enumerate(frames):
|
|
||||||
image_name = "<image_{:02d}>".format(cnt)
|
|
||||||
cnt += 1
|
|
||||||
images_dict[image_name] = frame
|
|
||||||
image_names.append(image_name)
|
|
||||||
for j in range(len(self.raw_data[i]["conversations"])):
|
|
||||||
content = self.raw_data[i]["conversations"][j]['content']
|
|
||||||
self.raw_data[i]["conversations"][j]['content'] = content.replace(video_name, "".join(image_names))
|
|
||||||
# video: modify config
|
|
||||||
slice_config = self.video_slice_config
|
|
||||||
use_image_id = False
|
|
||||||
|
|
||||||
ret = preprocess(
|
ret = preprocess(
|
||||||
images_dict,
|
images_dict,
|
||||||
@@ -131,8 +67,7 @@ class SupervisedDataset(Dataset):
|
|||||||
llm_type=self.llm_type,
|
llm_type=self.llm_type,
|
||||||
patch_size=self.patch_size,
|
patch_size=self.patch_size,
|
||||||
batch_vision=self.batch_vision,
|
batch_vision=self.batch_vision,
|
||||||
max_length=self.max_length,
|
max_length=self.max_length
|
||||||
use_image_id=use_image_id
|
|
||||||
)
|
)
|
||||||
ret = dict(
|
ret = dict(
|
||||||
input_ids=ret["input_ids"],
|
input_ids=ret["input_ids"],
|
||||||
@@ -197,7 +132,7 @@ def conversation_to_ids(conversation, tokenizer, llm_type=None, new_schema=False
|
|||||||
input_ids, context, raw_msg = conversation_to_ids_llama3(
|
input_ids, context, raw_msg = conversation_to_ids_llama3(
|
||||||
conversation, tokenizer
|
conversation, tokenizer
|
||||||
)
|
)
|
||||||
elif llm_type == "qwen2":
|
elif llm_type == "qwen":
|
||||||
input_ids, context, raw_msg = conversation_to_ids_qwen2(
|
input_ids, context, raw_msg = conversation_to_ids_qwen2(
|
||||||
conversation, tokenizer
|
conversation, tokenizer
|
||||||
)
|
)
|
||||||
@@ -383,7 +318,6 @@ def preprocess(
|
|||||||
patch_size=14,
|
patch_size=14,
|
||||||
batch_vision=False,
|
batch_vision=False,
|
||||||
max_length=2048,
|
max_length=2048,
|
||||||
use_image_id=True
|
|
||||||
):
|
):
|
||||||
"""
|
"""
|
||||||
single(multi) image(s) preprocess, the image(s) will be placed at the top of the conversation
|
single(multi) image(s) preprocess, the image(s) will be placed at the top of the conversation
|
||||||
@@ -402,9 +336,9 @@ def preprocess(
|
|||||||
)
|
)
|
||||||
new_schema = False
|
new_schema = False
|
||||||
use_image_id = False
|
use_image_id = False
|
||||||
if llm_type=='qwen2':
|
if llm_type=='qwen':
|
||||||
new_schema = True
|
new_schema = True
|
||||||
use_image_id = use_image_id
|
use_image_id = True
|
||||||
image_placeholder_dict = {}
|
image_placeholder_dict = {}
|
||||||
images = []
|
images = []
|
||||||
image_id_cnt = 0
|
image_id_cnt = 0
|
||||||
|
|||||||
@@ -14,7 +14,7 @@ from accelerate.utils import DistributedType
|
|||||||
from deepspeed import zero
|
from deepspeed import zero
|
||||||
from deepspeed.runtime.zero.partition_parameters import ZeroParamStatus
|
from deepspeed.runtime.zero.partition_parameters import ZeroParamStatus
|
||||||
|
|
||||||
from transformers import AutoModel, AutoTokenizer, AutoProcessor
|
from transformers import AutoModel, AutoTokenizer
|
||||||
from transformers.integrations import deepspeed
|
from transformers.integrations import deepspeed
|
||||||
from transformers import AutoModel, AutoTokenizer
|
from transformers import AutoModel, AutoTokenizer
|
||||||
|
|
||||||
@@ -53,8 +53,6 @@ class TrainingArguments(transformers.TrainingArguments):
|
|||||||
llm_type: str = field(default="minicpm")
|
llm_type: str = field(default="minicpm")
|
||||||
use_lora: Optional[bool] = field(default=False)
|
use_lora: Optional[bool] = field(default=False)
|
||||||
max_slice_nums: Optional[int] = field(default=9)
|
max_slice_nums: Optional[int] = field(default=9)
|
||||||
video_max_slice_nums: Optional[int] = field(default=2)
|
|
||||||
max_num_frames: Optional[int] = field(default=1)
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
@@ -94,8 +92,6 @@ def make_supervised_data_module(
|
|||||||
query_nums=64,
|
query_nums=64,
|
||||||
batch_vision=False,
|
batch_vision=False,
|
||||||
max_length=2048,
|
max_length=2048,
|
||||||
video_max_slice_nums=2,
|
|
||||||
max_num_frames=1,
|
|
||||||
) -> Dict:
|
) -> Dict:
|
||||||
"""Make dataset and collator for supervised fine-tuning."""
|
"""Make dataset and collator for supervised fine-tuning."""
|
||||||
dataset_cls = SupervisedDataset
|
dataset_cls = SupervisedDataset
|
||||||
@@ -113,8 +109,6 @@ def make_supervised_data_module(
|
|||||||
query_nums=query_nums,
|
query_nums=query_nums,
|
||||||
batch_vision=batch_vision,
|
batch_vision=batch_vision,
|
||||||
max_length=max_length,
|
max_length=max_length,
|
||||||
video_max_slice_nums=video_max_slice_nums,
|
|
||||||
max_num_frames=max_num_frames,
|
|
||||||
)
|
)
|
||||||
|
|
||||||
if data_args.eval_data_path:
|
if data_args.eval_data_path:
|
||||||
@@ -129,8 +123,6 @@ def make_supervised_data_module(
|
|||||||
query_nums=query_nums,
|
query_nums=query_nums,
|
||||||
batch_vision=batch_vision,
|
batch_vision=batch_vision,
|
||||||
max_length=max_length,
|
max_length=max_length,
|
||||||
video_max_slice_nums=video_max_slice_nums,
|
|
||||||
max_num_frames=max_num_frames,
|
|
||||||
)
|
)
|
||||||
else:
|
else:
|
||||||
eval_dataset = None
|
eval_dataset = None
|
||||||
@@ -210,10 +202,10 @@ def train():
|
|||||||
trust_remote_code=True,
|
trust_remote_code=True,
|
||||||
torch_dtype=compute_dtype,
|
torch_dtype=compute_dtype,
|
||||||
device_map=device_map,
|
device_map=device_map,
|
||||||
|
init_vision=True,
|
||||||
|
init_audio=False,
|
||||||
|
init_tts=False,
|
||||||
)
|
)
|
||||||
model.__class__.register_for_auto_class()
|
|
||||||
|
|
||||||
model.processor = AutoProcessor.from_pretrained(model_args.model_name_or_path, trust_remote_code=True)
|
|
||||||
|
|
||||||
tokenizer = AutoTokenizer.from_pretrained(
|
tokenizer = AutoTokenizer.from_pretrained(
|
||||||
model_args.model_name_or_path, trust_remote_code=True
|
model_args.model_name_or_path, trust_remote_code=True
|
||||||
@@ -287,8 +279,6 @@ def train():
|
|||||||
query_nums=model.config.query_num,
|
query_nums=model.config.query_num,
|
||||||
batch_vision=batch_vision,
|
batch_vision=batch_vision,
|
||||||
max_length=training_args.model_max_length,
|
max_length=training_args.model_max_length,
|
||||||
video_max_slice_nums=training_args.video_max_slice_nums,
|
|
||||||
max_num_frames=training_args.max_num_frames,
|
|
||||||
)
|
)
|
||||||
|
|
||||||
training_args.gradient_checkpointing_kwargs={"use_reentrant":False}
|
training_args.gradient_checkpointing_kwargs={"use_reentrant":False}
|
||||||
|
|||||||
@@ -5,14 +5,17 @@ NNODES=1
|
|||||||
NODE_RANK=0
|
NODE_RANK=0
|
||||||
MASTER_ADDR=localhost
|
MASTER_ADDR=localhost
|
||||||
MASTER_PORT=6001
|
MASTER_PORT=6001
|
||||||
|
|
||||||
MODEL="openbmb/MiniCPM-V-2_6"
|
MODEL="openbmb/MiniCPM-o-2_6"
|
||||||
# or openbmb/MiniCPM-V-2, openbmb/MiniCPM-Llama3-V-2_5
|
# or openbmb/MiniCPM-V-2, openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2_6
|
||||||
# ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.
|
# ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.
|
||||||
# See the section for finetuning in README for more information.
|
# See the section for finetuning in README for more information.
|
||||||
DATA="path/to/trainging_data"
|
DATA="path/to/trainging_data"
|
||||||
EVAL_DATA="path/to/test_data"
|
EVAL_DATA="path/to/test_data"
|
||||||
LLM_TYPE="qwen2" # if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm, if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE="llama3"
|
|
||||||
|
# if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm, if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE="llama3",
|
||||||
|
# if use openbmb/MiniCPM-o-2_6 or openbmb/MiniCPM-V-2_6, please set LLM_TYPE=qwen
|
||||||
|
LLM_TYPE="qwen"
|
||||||
MODEL_MAX_Length=2048 # if conduct multi-images sft, please set MODEL_MAX_Length=4096
|
MODEL_MAX_Length=2048 # if conduct multi-images sft, please set MODEL_MAX_Length=4096
|
||||||
|
|
||||||
|
|
||||||
@@ -38,7 +41,7 @@ torchrun $DISTRIBUTED_ARGS finetune.py \
|
|||||||
--do_train \
|
--do_train \
|
||||||
--do_eval \
|
--do_eval \
|
||||||
--tune_vision true \
|
--tune_vision true \
|
||||||
--tune_llm true \
|
--tune_llm false \
|
||||||
--model_max_length $MODEL_MAX_Length \
|
--model_max_length $MODEL_MAX_Length \
|
||||||
--max_slice_nums 9 \
|
--max_slice_nums 9 \
|
||||||
--max_steps 10000 \
|
--max_steps 10000 \
|
||||||
@@ -60,5 +63,5 @@ torchrun $DISTRIBUTED_ARGS finetune.py \
|
|||||||
--lr_scheduler_type "cosine" \
|
--lr_scheduler_type "cosine" \
|
||||||
--logging_steps 1 \
|
--logging_steps 1 \
|
||||||
--gradient_checkpointing true \
|
--gradient_checkpointing true \
|
||||||
--deepspeed ds_config_zero2.json \
|
--deepspeed ds_config_zero3.json \
|
||||||
--report_to "tensorboard"
|
--report_to "tensorboard"
|
||||||
|
|||||||
@@ -5,16 +5,16 @@ NNODES=1
|
|||||||
NODE_RANK=0
|
NODE_RANK=0
|
||||||
MASTER_ADDR=localhost
|
MASTER_ADDR=localhost
|
||||||
MASTER_PORT=6001
|
MASTER_PORT=6001
|
||||||
|
|
||||||
MODEL="openbmb/MiniCPM-V-2_6" # or openbmb/MiniCPM-V-2, openbmb/MiniCPM-Llama3-V-2_5
|
MODEL="openbmb/MiniCPM-o-2_6"
|
||||||
|
# or openbmb/MiniCPM-V-2, openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2_6
|
||||||
# ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.
|
# ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.
|
||||||
# See the section for finetuning in README for more information.
|
# See the section for finetuning in README for more information.
|
||||||
DATA="path/to/trainging_data"
|
DATA="path/to/trainging_data"
|
||||||
EVAL_DATA="path/to/test_data"
|
EVAL_DATA="path/to/test_data"
|
||||||
LLM_TYPE="qwen2"
|
# if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm, if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE="llama3",
|
||||||
# if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm
|
# if use openbmb/MiniCPM-o-2_6 or openbmb/MiniCPM-V-2_6, please set LLM_TYPE=qwen
|
||||||
#if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE=llama3
|
LLM_TYPE="qwen"
|
||||||
|
|
||||||
MODEL_MAX_Length=2048 # if conduct multi-images sft, please set MODEL_MAX_Length=4096
|
MODEL_MAX_Length=2048 # if conduct multi-images sft, please set MODEL_MAX_Length=4096
|
||||||
|
|
||||||
DISTRIBUTED_ARGS="
|
DISTRIBUTED_ARGS="
|
||||||
@@ -24,6 +24,7 @@ DISTRIBUTED_ARGS="
|
|||||||
--master_addr $MASTER_ADDR \
|
--master_addr $MASTER_ADDR \
|
||||||
--master_port $MASTER_PORT
|
--master_port $MASTER_PORT
|
||||||
"
|
"
|
||||||
|
|
||||||
torchrun $DISTRIBUTED_ARGS finetune.py \
|
torchrun $DISTRIBUTED_ARGS finetune.py \
|
||||||
--model_name_or_path $MODEL \
|
--model_name_or_path $MODEL \
|
||||||
--llm_type $LLM_TYPE \
|
--llm_type $LLM_TYPE \
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
# MiniCPM-V Finetuning
|
# MiniCPM-V Finetuning
|
||||||
|
|
||||||
|
|
||||||
We offer the official scripts for easy finetuning of the pretrained **MiniCPM-V-2_6**, **MiniCPM-Llama3-V 2.5** and **MiniCPM-V 2.0** on downstream tasks. Our finetune scripts use transformers Trainer and DeepSpeed by default.
|
We offer the official scripts for easy finetuning of the pretrained **MiniCPM-o-2_6**, **MiniCPM-V-2_6**, **MiniCPM-Llama3-V 2.5** and **MiniCPM-V 2.0** on downstream tasks. Our finetune scripts use transformers Trainer and DeepSpeed by default.
|
||||||
|
|
||||||
### Data preparation
|
### Data preparation
|
||||||
|
|
||||||
@@ -20,30 +20,30 @@ If your input consists of a single image, you can use a single placeholder **\<i
|
|||||||
[
|
[
|
||||||
{
|
{
|
||||||
"id": "0",
|
"id": "0",
|
||||||
"image": "path/to/image_0.jpg",
|
"image": 'path/to/image_0.jpg',
|
||||||
"conversations": [
|
"conversations": [
|
||||||
{
|
{
|
||||||
"role": "user",
|
'role': 'user',
|
||||||
"content": "<image>\nHow many desserts are on the white plate?"
|
'content': '<image>\nHow many desserts are on the white plate?'
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"role": "assistant",
|
'role': 'assistant',
|
||||||
"content": "There are three desserts on the white plate."
|
'content': 'There are three desserts on the white plate.'
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"role": "user",
|
'role': 'user',
|
||||||
"content": "What type of desserts are they?"
|
'content': 'What type of desserts are they?'
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"role": "assistant",
|
'role': 'assistant',
|
||||||
"content": "The desserts are cakes with bananas and pecans on top. They share similarities with donuts, but the presence of bananas and pecans differentiates them."
|
'content': 'The desserts are cakes with bananas and pecans on top. They share similarities with donuts, but the presence of bananas and pecans differentiates them.'
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"role": "user",
|
'role': 'user',
|
||||||
"content": "What is the setting of the image?"},
|
'content': 'What is the setting of the image?'},
|
||||||
{
|
{
|
||||||
"role": "assistant",
|
'role': 'assistant',
|
||||||
"content": "The image is set on a table top with a plate containing the three desserts."
|
'content': 'The image is set on a table top with a plate containing the three desserts.'
|
||||||
},
|
},
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -91,81 +91,16 @@ If the total token count exceeds `max_length`, truncation will be applied. For m
|
|||||||
```
|
```
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
#### Single Video Example
|
|
||||||
If your input consists of a single video, you can use a single placeholder **\<video\>** to indicate where the video should be inserted in the conversation.
|
|
||||||
<details>
|
|
||||||
<summary>
|
|
||||||
<b>Single video example (vl_finetune_video.json) with 1 samples.</b>
|
|
||||||
</summary>
|
|
||||||
|
|
||||||
```
|
|
||||||
[
|
|
||||||
{
|
|
||||||
"id": "0",
|
|
||||||
"video": "path/to/video_0.mp4",
|
|
||||||
"conversations": [
|
|
||||||
{
|
|
||||||
"role": "user",
|
|
||||||
"content": "<video>\nHow many desserts are on the white plate?"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"role": "assistant",
|
|
||||||
"content": "There are three desserts on the white plate."
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
]
|
|
||||||
```
|
|
||||||
|
|
||||||
</details>
|
|
||||||
|
|
||||||
#### Multiple Videos Example
|
|
||||||
For inputs containing multiple videos, utilize a dictionary where each key represents a unique placeholder (e.g., **\<video_00\>**, **\<video_01\**) with the corresponding video path as its value. These placeholders can then be used within the conversation to seamlessly insert videos at specific positions.
|
|
||||||
|
|
||||||
Additionally, to optimize resource management, especially when dealing with large batches of videos during training or inference, consider reducing `video_max_slice_nums` and `max_num_frames`. To minimize the number of tokens used per video, you can set `video_max_slice_nums=1` and `max_num_frames=1`, resulting in a single video being represented by 64 tokens.
|
|
||||||
|
|
||||||
If the total token count exceeds `max_length`, truncation will be applied. For multi-video supervised fine-tuning (SFT), it's recommended to set `MODEL_MAX_LENGTH=4096` in your script for better performance.
|
|
||||||
|
|
||||||
<details>
|
|
||||||
<summary>
|
|
||||||
<b>Multiple videos example (vl_finetune_data.json) with 1 samples.</b>
|
|
||||||
</summary>
|
|
||||||
|
|
||||||
```
|
|
||||||
[
|
|
||||||
{
|
|
||||||
"id": "0",
|
|
||||||
"video": {
|
|
||||||
"<video_00>": "path/to/video_0.mp4",
|
|
||||||
"<video_01>": "path/to/video_1.avi",
|
|
||||||
"<video_02>": "path/to/video_2.mp4",
|
|
||||||
"<video_03>": "path/to/video_3.avi"
|
|
||||||
},
|
|
||||||
"conversations": [
|
|
||||||
{
|
|
||||||
"role": "user",
|
|
||||||
"content": "How to create such text-only videos using CapCut?\n<video_00>\n<image_01>\n<video_01>\n<video_02>\n"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"role": "assistant",
|
|
||||||
"content": "To create a text-only video as shown in the videos, follow these steps in CapCut..."
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
]
|
|
||||||
```
|
|
||||||
</details>
|
|
||||||
|
|
||||||
|
|
||||||
### Full-parameter finetuning
|
### Full-parameter finetuning
|
||||||
|
|
||||||
Full-parameter parameter finetuning requires updating all parameters of LLM in the whole training process. Please specify the correct MODEL path, DATA path and LLM_TYPE in the shell scripts.
|
Full-parameter parameter finetuning requires updating all parameters of LLM in the whole training process. Please specify the correct MODEL path, DATA path and LLM_TYPE in the shell scripts.
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
MODEL="openbmb/MiniCPM-V-2_6" # or openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2
|
MODEL="MiniCPM-o-2_6" # or "openbmb/MiniCPM-V-2_6", openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2
|
||||||
DATA="path/to/trainging_data" # json file
|
DATA="path/to/trainging_data" # json file
|
||||||
EVAL_DATA="path/to/test_data" # json file
|
EVAL_DATA="path/to/test_data" # json file
|
||||||
LLM_TYPE="qwen2" # if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm, if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE="llama3"
|
LLM_TYPE="qwen" # if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm, if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE="llama3",
|
||||||
|
# if use openbmb/MiniCPM-o-2_6 or openbmb/MiniCPM-V-2_6, please set LLM_TYPE=qwen
|
||||||
```
|
```
|
||||||
|
|
||||||
To launch your training, run the following script:
|
To launch your training, run the following script:
|
||||||
@@ -188,7 +123,7 @@ After training, you could load the model with the path to the adapter. We advise
|
|||||||
```
|
```
|
||||||
from peft import PeftModel
|
from peft import PeftModel
|
||||||
from transformers import AutoModel
|
from transformers import AutoModel
|
||||||
model_type= "openbmb/MiniCPM-V-2_6" # or openbmb/MiniCPM-Llama3-V-2_5 , openbmb/MiniCPM-V-2
|
model_type= ""openbmb/MiniCPM-o-2_6" or # openbmb/MiniCPM-V-2_6", openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2
|
||||||
path_to_adapter="path_to_your_fine_tuned_checkpoint"
|
path_to_adapter="path_to_your_fine_tuned_checkpoint"
|
||||||
|
|
||||||
model = AutoModel.from_pretrained(
|
model = AutoModel.from_pretrained(
|
||||||
|
|||||||
44
finetune/requirements.txt
Normal file
@@ -0,0 +1,44 @@
|
|||||||
|
packaging==23.2
|
||||||
|
addict==2.4.0
|
||||||
|
editdistance==0.6.2
|
||||||
|
einops==0.7.0
|
||||||
|
fairscale==0.4.0
|
||||||
|
jsonlines==4.0.0
|
||||||
|
markdown2==2.4.10
|
||||||
|
matplotlib==3.7.4
|
||||||
|
more_itertools==10.1.0
|
||||||
|
nltk==3.8.1
|
||||||
|
numpy==1.24.4
|
||||||
|
opencv_python_headless==4.5.5.64
|
||||||
|
openpyxl==3.1.2
|
||||||
|
Pillow==10.1.0
|
||||||
|
sacrebleu==2.3.2
|
||||||
|
seaborn==0.13.0
|
||||||
|
shortuuid==1.0.11
|
||||||
|
spacy==3.7.2
|
||||||
|
torch==2.2.0
|
||||||
|
torchaudio==2.2.0
|
||||||
|
torchvision==0.17.0
|
||||||
|
timm==0.9.10
|
||||||
|
tqdm==4.66.1
|
||||||
|
protobuf==4.25.0
|
||||||
|
typing_extensions==4.8.0
|
||||||
|
uvicorn==0.24.0.post1
|
||||||
|
#xformers==0.0.22.post7
|
||||||
|
#flash_attn==2.3.4
|
||||||
|
sentencepiece==0.1.99
|
||||||
|
accelerate==0.30.1
|
||||||
|
socksio==1.0.0
|
||||||
|
gradio==4.41.0
|
||||||
|
gradio_client
|
||||||
|
http://thunlp.oss-cn-qingdao.aliyuncs.com/multi_modal/never_delete/modelscope_studio-0.4.0.9-py3-none-any.whl
|
||||||
|
decord
|
||||||
|
aiosignal
|
||||||
|
tensorborad
|
||||||
|
deepspeed==0.12.3
|
||||||
|
transformers==4.44.2
|
||||||
|
librosa==0.9.0
|
||||||
|
soundfile==0.12.1
|
||||||
|
vector-quantize-pytorch==1.18.5
|
||||||
|
vocos==0.1.0
|
||||||
|
moviepy
|
||||||
@@ -170,7 +170,7 @@ class CPMTrainer(Trainer):
|
|||||||
|
|
||||||
return (loss, logits, labels)
|
return (loss, logits, labels)
|
||||||
|
|
||||||
def training_step(self, model: nn.Module, inputs: Dict[str, Union[torch.Tensor, Any]], num_items_in_batch: int=None) -> torch.Tensor:
|
def training_step(self, model: nn.Module, inputs: Dict[str, Union[torch.Tensor, Any]]) -> torch.Tensor:
|
||||||
"""
|
"""
|
||||||
Perform a training step on a batch of inputs.
|
Perform a training step on a batch of inputs.
|
||||||
|
|
||||||
@@ -245,9 +245,6 @@ class CPMTrainer(Trainer):
|
|||||||
|
|
||||||
if self.tokenizer is not None:
|
if self.tokenizer is not None:
|
||||||
self.tokenizer.save_pretrained(output_dir)
|
self.tokenizer.save_pretrained(output_dir)
|
||||||
|
|
||||||
if self.model.processor is not None:
|
|
||||||
self.model.processor.save_pretrained(output_dir)
|
|
||||||
|
|
||||||
# Good practice: save your training arguments together with the trained model
|
# Good practice: save your training arguments together with the trained model
|
||||||
torch.save(self.args, os.path.join(output_dir, TRAINING_ARGS_NAME))
|
torch.save(self.args, os.path.join(output_dir, TRAINING_ARGS_NAME))
|
||||||
|
|||||||
18
requirements_o2.6.txt
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
Pillow==10.1.0
|
||||||
|
torch==2.2.0
|
||||||
|
torchaudio==2.2.0
|
||||||
|
torchvision==0.17.0
|
||||||
|
transformers==4.44.2
|
||||||
|
sentencepiece==0.2.0
|
||||||
|
vector-quantize-pytorch==1.18.5
|
||||||
|
vocos==0.1.0
|
||||||
|
accelerate==1.2.1
|
||||||
|
timm==0.9.10
|
||||||
|
soundfile==0.12.1
|
||||||
|
librosa==0.9.0
|
||||||
|
decord
|
||||||
|
moviepy
|
||||||
|
|
||||||
|
# for web
|
||||||
|
fastapi
|
||||||
|
uvicorn
|
||||||
935
web_demos/minicpm-o_2.6/model_server.py
Normal file
@@ -0,0 +1,935 @@
|
|||||||
|
import base64
|
||||||
|
import json
|
||||||
|
import asyncio
|
||||||
|
import numpy as np
|
||||||
|
import os, sys, io
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
import aiofiles
|
||||||
|
import librosa
|
||||||
|
import soundfile
|
||||||
|
import wave
|
||||||
|
from typing import Dict, List, Any, Optional
|
||||||
|
import argparse
|
||||||
|
import logging
|
||||||
|
import torch
|
||||||
|
from PIL import Image
|
||||||
|
from transformers import AutoModel, AutoTokenizer, AutoProcessor
|
||||||
|
import uvicorn
|
||||||
|
from fastapi import FastAPI, Header, Query, Request, HTTPException, WebSocket, WebSocketDisconnect
|
||||||
|
from fastapi.responses import JSONResponse, StreamingResponse
|
||||||
|
|
||||||
|
cur_path = os.path.split(os.path.realpath(__file__))[0]
|
||||||
|
sys.path.append(os.path.abspath(cur_path))
|
||||||
|
import vad_utils
|
||||||
|
|
||||||
|
def setup_logger():
|
||||||
|
logger = logging.getLogger("api_logger")
|
||||||
|
logger.setLevel(logging.DEBUG)
|
||||||
|
|
||||||
|
# Create formatter
|
||||||
|
formatter = logging.Formatter(
|
||||||
|
'%(asctime)s.%(msecs)03d-%(levelname)s-[%(filename)s:%(lineno)d] - %(message)s',
|
||||||
|
datefmt='%Y-%m-%d %H:%M:%S'
|
||||||
|
)
|
||||||
|
|
||||||
|
# Create handlers for stdout and stderr
|
||||||
|
stdout_handler = logging.StreamHandler(sys.stdout)
|
||||||
|
stdout_handler.setLevel(logging.INFO) # INFO and DEBUG go to stdout
|
||||||
|
stdout_handler.setFormatter(formatter)
|
||||||
|
stdout_handler.addFilter(lambda record: record.levelno <= logging.INFO)
|
||||||
|
|
||||||
|
stderr_handler = logging.StreamHandler(sys.stderr)
|
||||||
|
stderr_handler.setLevel(logging.WARNING) # WARNING, ERROR, CRITICAL go to stderr
|
||||||
|
stderr_handler.setFormatter(formatter)
|
||||||
|
|
||||||
|
# Add handlers to logger
|
||||||
|
logger.addHandler(stdout_handler)
|
||||||
|
logger.addHandler(stderr_handler)
|
||||||
|
|
||||||
|
return logger
|
||||||
|
|
||||||
|
|
||||||
|
app = FastAPI()
|
||||||
|
logger = setup_logger()
|
||||||
|
|
||||||
|
ap = argparse.ArgumentParser()
|
||||||
|
ap.add_argument('--port', type=int , default=8088)
|
||||||
|
args = ap.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
class StreamManager:
|
||||||
|
def __init__(self):
|
||||||
|
self.uid = None
|
||||||
|
|
||||||
|
self.is_streaming_complete = threading.Event()
|
||||||
|
self.conversation_started = threading.Event()
|
||||||
|
self.last_request_time = None
|
||||||
|
self.last_stream_time = None
|
||||||
|
self.timeout = 900 # seconds timeout
|
||||||
|
self.stream_timeout = 3 # seconds no stream
|
||||||
|
self.num_stream = 0
|
||||||
|
self.stream_started = False
|
||||||
|
self.stop_response = False
|
||||||
|
|
||||||
|
# VAD settings
|
||||||
|
self.vad_options = vad_utils.VadOptions()
|
||||||
|
self.vad_sequence_length = 5
|
||||||
|
self.vad_sequence = []
|
||||||
|
self.audio_prefill = []
|
||||||
|
self.audio_input = []
|
||||||
|
self.image_prefill = None
|
||||||
|
self.audio_chunk = 200
|
||||||
|
|
||||||
|
# customized options
|
||||||
|
self.customized_audio = None
|
||||||
|
self.customized_options = None
|
||||||
|
|
||||||
|
# Omni model
|
||||||
|
self.target_dtype = torch.bfloat16
|
||||||
|
self.device='cuda:0'
|
||||||
|
|
||||||
|
self.minicpmo_model_path = "openbmb/MiniCPM-o-2_6"
|
||||||
|
self.model_version = "2.6"
|
||||||
|
with torch.no_grad():
|
||||||
|
self.minicpmo_model = AutoModel.from_pretrained(self.minicpmo_model_path, trust_remote_code=True, torch_dtype=self.target_dtype, attn_implementation='sdpa')
|
||||||
|
self.minicpmo_tokenizer = AutoTokenizer.from_pretrained(self.minicpmo_model_path, trust_remote_code=True)
|
||||||
|
self.minicpmo_model.init_tts()
|
||||||
|
# self.minicpmo_model.tts.float()
|
||||||
|
self.minicpmo_model.to(self.device).eval()
|
||||||
|
|
||||||
|
self.ref_path_video_default = "assets/ref_audios/video_default.wav"
|
||||||
|
self.ref_path_default = "assets/ref_audios/default.wav"
|
||||||
|
self.ref_path_female = "assets/ref_audios/female_example.wav"
|
||||||
|
self.ref_path_male = "assets/ref_audios/male_example.wav"
|
||||||
|
|
||||||
|
self.input_audio_id = 0
|
||||||
|
self.input_audio_vad_id = 0
|
||||||
|
self.input_image_id = 0
|
||||||
|
self.output_audio_id = 0
|
||||||
|
self.flag_decode = False
|
||||||
|
self.cnts = None
|
||||||
|
|
||||||
|
self.all_start_time = time.time()
|
||||||
|
self.session_id = 233
|
||||||
|
self.sys_prompt_flag = False
|
||||||
|
self.vad_time = 0
|
||||||
|
self.ls_time = 0
|
||||||
|
self.msg_type = 1
|
||||||
|
|
||||||
|
self.speaking_time_stamp = 0
|
||||||
|
self.cycle_wait_time = 12800/24000 + 0.15
|
||||||
|
self.extra_wait_time = 2.5
|
||||||
|
self.server_wait = True
|
||||||
|
|
||||||
|
self.past_session_id = 0
|
||||||
|
self.sys_prompt_init(0)
|
||||||
|
self.session_id += 1
|
||||||
|
|
||||||
|
|
||||||
|
def start_conversation(self):
|
||||||
|
logger.info(f"uid {self.uid}: new conversation started.")
|
||||||
|
self.conversation_started.set()
|
||||||
|
self.stop_response = False
|
||||||
|
|
||||||
|
def update_last_request_time(self):
|
||||||
|
self.last_request_time = time.time()
|
||||||
|
#logger.info(f"update last_request_time {self.last_request_time}")
|
||||||
|
|
||||||
|
def update_last_stream_time(self):
|
||||||
|
self.last_stream_time = time.time()
|
||||||
|
#logger.info(f"update last_stream_time {self.last_stream_time}")
|
||||||
|
|
||||||
|
def move_to_device(self, obj, device):
|
||||||
|
if isinstance(obj, torch.Tensor):
|
||||||
|
obj_ = obj.to(device)
|
||||||
|
if (obj_.dtype == torch.float) or (obj_.dtype == torch.half):
|
||||||
|
# cast to `torch.bfloat16`
|
||||||
|
obj_ = obj_.to(self.target_dtype)
|
||||||
|
return obj_
|
||||||
|
elif isinstance(obj, dict):
|
||||||
|
return {key: self.move_to_device(value, device) for key, value in obj.items()}
|
||||||
|
elif isinstance(obj, list):
|
||||||
|
return [self.move_to_device(item, device) for item in obj]
|
||||||
|
elif isinstance(obj, tuple):
|
||||||
|
return tuple(self.move_to_device(item, device) for item in obj)
|
||||||
|
elif isinstance(obj, set):
|
||||||
|
return {self.move_to_device(item, device) for item in obj}
|
||||||
|
else:
|
||||||
|
return obj
|
||||||
|
|
||||||
|
def reset(self):
|
||||||
|
logger.info("reset")
|
||||||
|
self.is_streaming_complete.clear()
|
||||||
|
self.conversation_started.clear()
|
||||||
|
self.last_request_time = None
|
||||||
|
self.last_stream_time = None
|
||||||
|
self.audio_buffer_raw = bytearray()
|
||||||
|
self.num_stream = 0
|
||||||
|
self.stream_started = False
|
||||||
|
self.stop_response = False
|
||||||
|
# self.customized_audio = None
|
||||||
|
# self.customized_options = None
|
||||||
|
# clear model
|
||||||
|
self.clear()
|
||||||
|
|
||||||
|
def merge_wav_files(self, input_bytes_list, output_file):
|
||||||
|
with wave.open(io.BytesIO(input_bytes_list[0]), 'rb') as wav:
|
||||||
|
params = wav.getparams()
|
||||||
|
n_channels, sampwidth, framerate, n_frames, comptype, compname = params
|
||||||
|
|
||||||
|
with wave.open(output_file, 'wb') as output_wav:
|
||||||
|
output_wav.setnchannels(n_channels)
|
||||||
|
output_wav.setsampwidth(sampwidth)
|
||||||
|
output_wav.setframerate(framerate)
|
||||||
|
output_wav.setcomptype(comptype, compname)
|
||||||
|
|
||||||
|
for wav_bytes in input_bytes_list:
|
||||||
|
with wave.open(io.BytesIO(wav_bytes), 'rb') as wav:
|
||||||
|
output_wav.writeframes(wav.readframes(wav.getnframes()))
|
||||||
|
|
||||||
|
|
||||||
|
def is_timed_out(self):
|
||||||
|
if self.last_request_time is not None:
|
||||||
|
return time.time() - self.last_request_time > self.timeout
|
||||||
|
return False
|
||||||
|
|
||||||
|
def no_active_stream(self):
|
||||||
|
if self.last_stream_time is not None and self.stream_started:
|
||||||
|
no_stream_duration = time.time() - self.last_stream_time
|
||||||
|
if no_stream_duration > self.stream_timeout:
|
||||||
|
#logger.info(f"no active stream for {no_stream_duration} secs.")
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
def sys_prompt_init(self, msg_type):
|
||||||
|
if self.past_session_id == self.session_id:
|
||||||
|
return
|
||||||
|
logger.info("### sys_prompt_init ###")
|
||||||
|
|
||||||
|
logger.info(f'msg_type is {msg_type}')
|
||||||
|
if msg_type <= 1: #audio
|
||||||
|
audio_voice_clone_prompt = "克隆音频提示中的音色以生成语音。"
|
||||||
|
audio_assistant_prompt = "Your task is to be a helpful assistant using this voice pattern."
|
||||||
|
ref_path = self.ref_path_default
|
||||||
|
|
||||||
|
|
||||||
|
if self.customized_options is not None:
|
||||||
|
audio_voice_clone_prompt = self.customized_options['voice_clone_prompt']
|
||||||
|
audio_assistant_prompt = self.customized_options['assistant_prompt']
|
||||||
|
if self.customized_options['use_audio_prompt'] == 1:
|
||||||
|
ref_path = self.ref_path_default
|
||||||
|
elif self.customized_options['use_audio_prompt'] == 2:
|
||||||
|
ref_path = self.ref_path_female
|
||||||
|
elif self.customized_options['use_audio_prompt'] == 3:
|
||||||
|
ref_path = self.ref_path_male
|
||||||
|
|
||||||
|
audio_prompt, sr = librosa.load(ref_path, sr=16000, mono=True)
|
||||||
|
sys_msg = {'role': 'user', 'content': [audio_voice_clone_prompt + "\n", audio_prompt, "\n" + audio_assistant_prompt]}
|
||||||
|
elif msg_type == 2: #video
|
||||||
|
voice_clone_prompt="你是一个AI助手。你能接受视频,音频和文本输入并输出语音和文本。模仿输入音频中的声音特征。"
|
||||||
|
assistant_prompt="作为助手,你将使用这种声音风格说话。"
|
||||||
|
ref_path = self.ref_path_video_default
|
||||||
|
|
||||||
|
if self.customized_options is not None:
|
||||||
|
voice_clone_prompt = self.customized_options['voice_clone_prompt']
|
||||||
|
assistant_prompt = self.customized_options['assistant_prompt']
|
||||||
|
if self.customized_options['use_audio_prompt'] == 1:
|
||||||
|
ref_path = self.ref_path_default
|
||||||
|
elif self.customized_options['use_audio_prompt'] == 2:
|
||||||
|
ref_path = self.ref_path_female
|
||||||
|
elif self.customized_options['use_audio_prompt'] == 3:
|
||||||
|
ref_path = self.ref_path_male
|
||||||
|
|
||||||
|
audio_prompt, sr = librosa.load(ref_path, sr=16000, mono=True)
|
||||||
|
sys_msg = {'role': 'user', 'content': [voice_clone_prompt, audio_prompt, assistant_prompt]}
|
||||||
|
# elif msg_type == 3: #user start
|
||||||
|
# assistant_prompt="作为助手,你将使用这种声音风格说话。"
|
||||||
|
# if self.customized_options is not None:
|
||||||
|
# assistant_prompt = self.customized_options['assistant_prompt']
|
||||||
|
|
||||||
|
# sys_msg = {'role': 'user', 'content': [assistant_prompt]}
|
||||||
|
|
||||||
|
self.msg_type = msg_type
|
||||||
|
msgs = [sys_msg]
|
||||||
|
if self.customized_options is not None:
|
||||||
|
if self.customized_options['use_audio_prompt'] > 0:
|
||||||
|
self.minicpmo_model.streaming_prefill(
|
||||||
|
session_id=str(self.session_id),
|
||||||
|
msgs=msgs,
|
||||||
|
tokenizer=self.minicpmo_tokenizer,
|
||||||
|
)
|
||||||
|
if msg_type == 0:
|
||||||
|
self.minicpmo_model.streaming_prefill(
|
||||||
|
session_id=str(self.session_id),
|
||||||
|
msgs=msgs,
|
||||||
|
tokenizer=self.minicpmo_tokenizer,
|
||||||
|
)
|
||||||
|
|
||||||
|
self.savedir = os.path.join(f"./log_data/{args.port}/", str(time.time()))
|
||||||
|
if not os.path.exists(self.savedir):
|
||||||
|
os.makedirs(self.savedir)
|
||||||
|
if not os.path.exists(self.savedir + "/input_audio_log"):
|
||||||
|
os.makedirs(self.savedir + "/input_audio_log")
|
||||||
|
if not os.path.exists(self.savedir + "/input_audio_vad_log"):
|
||||||
|
os.makedirs(self.savedir + "/input_audio_vad_log")
|
||||||
|
if not os.path.exists(self.savedir + "/input_image_log"):
|
||||||
|
os.makedirs(self.savedir + "/input_image_log")
|
||||||
|
if not os.path.exists(self.savedir + "/output_audio_log"):
|
||||||
|
os.makedirs(self.savedir + "/output_audio_log")
|
||||||
|
if not os.path.exists(self.savedir + "/feedback_log"):
|
||||||
|
os.makedirs(self.savedir + "/feedback_log")
|
||||||
|
if not os.path.exists(self.savedir + "/input_audio"):
|
||||||
|
os.makedirs(self.savedir + "/input_audio")
|
||||||
|
|
||||||
|
self.past_session_id = self.session_id
|
||||||
|
self.audio_prefill = []
|
||||||
|
self.audio_input = []
|
||||||
|
|
||||||
|
def clear(self):
|
||||||
|
try:
|
||||||
|
self.flag_decode = False
|
||||||
|
self.stream_started = False
|
||||||
|
self.cnts = None
|
||||||
|
self.vad_sequence = []
|
||||||
|
self.audio_prefill = []
|
||||||
|
self.audio_input = []
|
||||||
|
self.image_prefill = None
|
||||||
|
|
||||||
|
if self.minicpmo_model.llm_past_key_values[0][0].shape[2]>8192:
|
||||||
|
self.session_id += 1 # to clear all kv cache
|
||||||
|
self.sys_prompt_flag = False
|
||||||
|
|
||||||
|
self.vad_time = 0
|
||||||
|
self.ls_time = 0
|
||||||
|
self.msg_type = 1
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
raise ValueError(f"Clear error: {str(e)}")
|
||||||
|
|
||||||
|
|
||||||
|
def process_message(self, message: Dict[str, Any]):
|
||||||
|
try:
|
||||||
|
# Process content items
|
||||||
|
audio_data = None
|
||||||
|
image_data = None
|
||||||
|
for content_item in message["content"]:
|
||||||
|
if content_item["type"] == "stop_response":
|
||||||
|
logger.info("process_message: received request to stop_response")
|
||||||
|
self.stop_response = True
|
||||||
|
return "stop"
|
||||||
|
elif content_item["type"] == "input_audio":
|
||||||
|
audio_data = content_item["input_audio"]["data"]
|
||||||
|
audio_timestamp = content_item["input_audio"].get("timestamp", "")
|
||||||
|
elif content_item["type"] == "image_data":
|
||||||
|
image_data = content_item["image_data"]["data"]
|
||||||
|
if audio_data is None:
|
||||||
|
return "empty audio"
|
||||||
|
|
||||||
|
if self.conversation_started.is_set() and self.is_streaming_complete.is_set():
|
||||||
|
logger.info("conversation not started or still in generation, skip stream message.")
|
||||||
|
return "skip"
|
||||||
|
|
||||||
|
if self.flag_decode:
|
||||||
|
return "skip"
|
||||||
|
|
||||||
|
try:
|
||||||
|
audio_bytes = base64.b64decode(audio_data)
|
||||||
|
|
||||||
|
image = None
|
||||||
|
if image_data is not None:
|
||||||
|
if len(image_data) > 0:
|
||||||
|
image_bytes = base64.b64decode(image_data)
|
||||||
|
image_buffer = io.BytesIO(image_bytes)
|
||||||
|
image_buffer.seek(0)
|
||||||
|
image = Image.open(image_buffer)
|
||||||
|
# logger.info("read image")
|
||||||
|
|
||||||
|
if self.sys_prompt_flag is False:
|
||||||
|
self.all_start_time = time.time()
|
||||||
|
self.sys_prompt_flag = True
|
||||||
|
if image_data is not None:
|
||||||
|
self.sys_prompt_init(2)
|
||||||
|
else:
|
||||||
|
self.sys_prompt_init(1)
|
||||||
|
|
||||||
|
self.prefill(audio_bytes, image, False)
|
||||||
|
|
||||||
|
self.vad_sequence.append(audio_bytes)
|
||||||
|
if len(self.vad_sequence) < self.vad_sequence_length:
|
||||||
|
# logger.info('length of vad_sequence is {}, insufficient'.format(self.vad_sequence_length))
|
||||||
|
return "done"
|
||||||
|
elif len(self.vad_sequence) > self.vad_sequence_length:
|
||||||
|
# logger.info('length of vad_sequence exceeds {}'.format(self.vad_sequence_length))
|
||||||
|
self.vad_sequence.pop(0)
|
||||||
|
self.vad_check_audio_bytes(audio_bytes, image, 16000)
|
||||||
|
|
||||||
|
return "done"
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
raise ValueError(f"Audio processing error: {str(e)}")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
raise ValueError(f"Message processing error: {str(e)}")
|
||||||
|
|
||||||
|
def resample_audio(self, input_path, src_sr, tar_sr, output_path):
|
||||||
|
audio_data, _ = librosa.load(input_path, sr=src_sr)
|
||||||
|
audio_new = librosa.resample(audio_data, orig_sr=src_sr, target_sr=tar_sr)
|
||||||
|
soundfile.write(output_path, audio_new, tar_sr)
|
||||||
|
|
||||||
|
def calculate_rms(self, input_path, sr):
|
||||||
|
audio_data, _ = librosa.load(input_path, sr=sr)
|
||||||
|
return (np.sqrt(np.mean(audio_data**2)) > 0.002)
|
||||||
|
|
||||||
|
def vad_check_audio_bytes(self, audio, image, sr):
|
||||||
|
try:
|
||||||
|
input_audio_vad_path = self.savedir + f"/input_audio_vad_log/vad_{self.input_audio_vad_id}.wav"
|
||||||
|
self.input_audio_vad_id += 1
|
||||||
|
self.merge_wav_files(self.vad_sequence, input_audio_vad_path)
|
||||||
|
|
||||||
|
with open(input_audio_vad_path,"rb") as f:
|
||||||
|
temp_audio = f.read()
|
||||||
|
dur_vad, vad_audio_bytes, time_vad = vad_utils.run_vad(temp_audio, sr, self.vad_options)
|
||||||
|
if self.customized_options is not None:
|
||||||
|
vad_threshold = 1 - self.customized_options['vad_threshold']
|
||||||
|
else:
|
||||||
|
vad_threshold = 0.2
|
||||||
|
|
||||||
|
if self.calculate_rms(input_audio_vad_path, sr) and dur_vad > 0.4:
|
||||||
|
if self.stream_started == False:
|
||||||
|
self.vad_time = time.time()
|
||||||
|
self.stream_started = True
|
||||||
|
elif dur_vad < vad_threshold:
|
||||||
|
if self.stream_started:
|
||||||
|
self.stream_started = False
|
||||||
|
if (time.time() - self.vad_time >= 0.6):
|
||||||
|
self.prefill(audio, image, True)
|
||||||
|
self.is_streaming_complete.set()
|
||||||
|
# self.ls_time = time.time()
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"VAD error: {e}")
|
||||||
|
raise
|
||||||
|
return
|
||||||
|
|
||||||
|
def prefill(self, audio, image, is_end):
|
||||||
|
if self.server_wait:
|
||||||
|
now = time.time()
|
||||||
|
await_time = self.speaking_time_stamp - now + self.extra_wait_time
|
||||||
|
if await_time > 0:
|
||||||
|
return False
|
||||||
|
|
||||||
|
if self.flag_decode:
|
||||||
|
return False
|
||||||
|
|
||||||
|
if image is not None:
|
||||||
|
self.image_prefill = image
|
||||||
|
try:
|
||||||
|
if is_end == False:
|
||||||
|
self.audio_prefill.append(audio)
|
||||||
|
self.audio_input.append(audio)
|
||||||
|
slice_nums = 1
|
||||||
|
if is_end and self.customized_options is not None:
|
||||||
|
if self.customized_options['hd_video']:
|
||||||
|
slice_nums = 6
|
||||||
|
else:
|
||||||
|
return True
|
||||||
|
if (len(self.audio_prefill) == (1000/self.audio_chunk)) or (is_end and len(self.audio_prefill)>0):
|
||||||
|
time_prefill = time.time()
|
||||||
|
input_audio_path = self.savedir + f"/input_audio_log/input_audio_{self.input_audio_id}.wav"
|
||||||
|
self.merge_wav_files(self.audio_prefill, input_audio_path)
|
||||||
|
with open(input_audio_path,"rb") as wav_io:
|
||||||
|
signal, sr = soundfile.read(wav_io, dtype='float32')
|
||||||
|
soundfile.write(input_audio_path, signal, 16000)
|
||||||
|
audio_np, sr = librosa.load(input_audio_path, sr=16000, mono=True)
|
||||||
|
self.audio_prefill = []
|
||||||
|
|
||||||
|
if len(audio_np) > 16000:
|
||||||
|
audio_np = audio_np[:16000]
|
||||||
|
|
||||||
|
with torch.no_grad():
|
||||||
|
if self.image_prefill is not None:
|
||||||
|
input_image_path = self.savedir + f'/input_image_log/input_image_{self.input_audio_id}.png'
|
||||||
|
self.image_prefill.save(input_image_path, 'PNG')
|
||||||
|
self.image_prefill = self.image_prefill.convert("RGB")
|
||||||
|
|
||||||
|
cnts = None
|
||||||
|
if self.image_prefill is not None:
|
||||||
|
cnts = ["<unit>", self.image_prefill, audio_np]
|
||||||
|
else:
|
||||||
|
cnts = [audio_np]
|
||||||
|
|
||||||
|
if cnts is not None:
|
||||||
|
msg = {"role":"user", "content": cnts}
|
||||||
|
msgs = [msg]
|
||||||
|
res = self.minicpmo_model.streaming_prefill(
|
||||||
|
session_id=str(self.session_id),
|
||||||
|
msgs=msgs,
|
||||||
|
tokenizer=self.minicpmo_tokenizer,
|
||||||
|
max_slice_nums=slice_nums,
|
||||||
|
)
|
||||||
|
|
||||||
|
self.input_audio_id += 1
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"prefill error: {e}")
|
||||||
|
import traceback
|
||||||
|
traceback.print_exc()
|
||||||
|
raise
|
||||||
|
|
||||||
|
def generate_end(self):
|
||||||
|
self.input_audio_id += 10
|
||||||
|
self.output_audio_id += 10
|
||||||
|
self.flag_decode = False
|
||||||
|
self.reset()
|
||||||
|
return
|
||||||
|
|
||||||
|
async def generate(self):
|
||||||
|
""" return audio bytes and response text (optional) """
|
||||||
|
if self.stop_response:
|
||||||
|
self.generate_end()
|
||||||
|
return
|
||||||
|
|
||||||
|
self.flag_decode = True
|
||||||
|
try:
|
||||||
|
with torch.no_grad():
|
||||||
|
logger.info("=== model gen start ===")
|
||||||
|
time_gen = time.time()
|
||||||
|
input_audio_path = self.savedir + f"/input_audio/all_input_audio_{self.input_audio_id}.wav"
|
||||||
|
self.merge_wav_files(self.audio_input, input_audio_path)
|
||||||
|
audio_stream = None
|
||||||
|
try:
|
||||||
|
with open(input_audio_path, 'rb') as wav_file:
|
||||||
|
audio_stream = wav_file.read()
|
||||||
|
except FileNotFoundError:
|
||||||
|
print(f"File {input_audio_path} not found.")
|
||||||
|
yield base64.b64encode(audio_stream).decode('utf-8'), "assistant:\n"
|
||||||
|
|
||||||
|
print('=== gen start: ', time.time() - time_gen)
|
||||||
|
first_time = True
|
||||||
|
temp_time = time.time()
|
||||||
|
temp_time1 = time.time()
|
||||||
|
with torch.inference_mode():
|
||||||
|
if self.stop_response:
|
||||||
|
self.generate_end()
|
||||||
|
return
|
||||||
|
self.minicpmo_model.config.stream_input=True
|
||||||
|
msg = {"role":"user", "content": self.cnts}
|
||||||
|
msgs = [msg]
|
||||||
|
text = ''
|
||||||
|
self.speaking_time_stamp = time.time()
|
||||||
|
try:
|
||||||
|
for r in self.minicpmo_model.streaming_generate(
|
||||||
|
session_id=str(self.session_id),
|
||||||
|
tokenizer=self.minicpmo_tokenizer,
|
||||||
|
use_tts=True,
|
||||||
|
# enable_regenerate=True,
|
||||||
|
):
|
||||||
|
if self.stop_response:
|
||||||
|
self.generate_end()
|
||||||
|
return
|
||||||
|
audio_np, sr, text = r
|
||||||
|
|
||||||
|
output_audio_path = self.savedir + f'/output_audio_log/output_audio_{self.output_audio_id}.wav'
|
||||||
|
self.output_audio_id += 1
|
||||||
|
soundfile.write(output_audio_path, audio_np, samplerate=sr)
|
||||||
|
audio_stream = None
|
||||||
|
try:
|
||||||
|
with open(output_audio_path, 'rb') as wav_file:
|
||||||
|
audio_stream = wav_file.read()
|
||||||
|
except FileNotFoundError:
|
||||||
|
print(f"File {output_audio_path} not found.")
|
||||||
|
temp_time1 = time.time()
|
||||||
|
print('text: ', text)
|
||||||
|
yield base64.b64encode(audio_stream).decode('utf-8'), text
|
||||||
|
self.speaking_time_stamp += self.cycle_wait_time
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error happened during generation: {str(e)}")
|
||||||
|
yield None, '\n<end>'
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"发生异常:{e}")
|
||||||
|
import traceback
|
||||||
|
traceback.print_exc()
|
||||||
|
raise
|
||||||
|
|
||||||
|
finally:
|
||||||
|
logger.info(f"uid {self.uid}: generation finished!")
|
||||||
|
self.generate_end()
|
||||||
|
|
||||||
|
async def check_activity(self):
|
||||||
|
while True:
|
||||||
|
# Check for overall inactivity (30 minutes)
|
||||||
|
if self.is_timed_out():
|
||||||
|
self.reset()
|
||||||
|
if self.no_active_stream() and not self.is_streaming_complete.is_set():
|
||||||
|
self.is_streaming_complete.set()
|
||||||
|
|
||||||
|
await asyncio.sleep(1) # Check every second
|
||||||
|
|
||||||
|
def upload_customized_audio(self, audio_data, audio_fmt):
|
||||||
|
self.customized_audio = None
|
||||||
|
try:
|
||||||
|
if audio_data is not None and len(audio_data) > 0:
|
||||||
|
# if audio_fmt == "mp3" or audio_fmt == "wav":
|
||||||
|
audio_bytes = base64.b64decode(audio_data)
|
||||||
|
fio = io.BytesIO(audio_bytes)
|
||||||
|
fio.seek(0)
|
||||||
|
audio_np, sr = librosa.load(fio, sr=16000, mono=True)
|
||||||
|
if audio_np is not None and len(audio_np) > 1000:
|
||||||
|
output_audio_path = self.savedir + f'/customized_audio.wav'
|
||||||
|
soundfile.write(output_audio_path, audio_np, sr)
|
||||||
|
self.customized_audio = output_audio_path
|
||||||
|
logger.info(f"processed customized {audio_fmt} audio")
|
||||||
|
print(audio_np.shape, type(audio_np), sr)
|
||||||
|
else:
|
||||||
|
logger.info(f"empty customized audio, use default value instead.")
|
||||||
|
self.customized_audio = None
|
||||||
|
except Exception as e:
|
||||||
|
raise ValueError(f"Process customized audio error: {str(e)}")
|
||||||
|
|
||||||
|
def update_customized_options(self, uid, options):
|
||||||
|
self.customized_options = None
|
||||||
|
if options is None:
|
||||||
|
raise ValueError("Invalid None type for options, expected dict type")
|
||||||
|
self.customized_options = options
|
||||||
|
logger.info(f"uid: {uid} set customized_options to {options}")
|
||||||
|
|
||||||
|
|
||||||
|
stream_manager = StreamManager()
|
||||||
|
|
||||||
|
|
||||||
|
@app.on_event("startup")
|
||||||
|
async def startup_event():
|
||||||
|
logger.info("Starting application and activity checker")
|
||||||
|
asyncio.create_task(stream_manager.check_activity())
|
||||||
|
|
||||||
|
@app.on_event("shutdown")
|
||||||
|
async def shutdown_event():
|
||||||
|
logger.info("Shutting down application")
|
||||||
|
|
||||||
|
@app.post("/stream")
|
||||||
|
@app.post("/api/v1/stream")
|
||||||
|
async def stream(request: Request, uid: Optional[str] = Header(None)):
|
||||||
|
global stream_manager
|
||||||
|
|
||||||
|
stream_manager.update_last_request_time()
|
||||||
|
stream_manager.update_last_stream_time()
|
||||||
|
|
||||||
|
if not uid:
|
||||||
|
raise HTTPException(status_code=400, detail="Missing uid in headers")
|
||||||
|
if stream_manager.uid is not None and stream_manager.uid != uid:
|
||||||
|
logger.error(f"uid changed during steram: previous uid {stream_manager.uid}, new uid {uid}")
|
||||||
|
raise HTTPException(status_code=400, detail="uid changed in stream")
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Parse JSON request
|
||||||
|
data = await request.json()
|
||||||
|
|
||||||
|
# Validate basic structure
|
||||||
|
if not isinstance(data, dict) or "messages" not in data:
|
||||||
|
raise HTTPException(status_code=400, detail="Invalid request format")
|
||||||
|
|
||||||
|
# Process messages
|
||||||
|
reason = ""
|
||||||
|
for message in data["messages"]:
|
||||||
|
if not isinstance(message, dict) or "role" not in message or "content" not in message:
|
||||||
|
raise HTTPException(status_code=400, detail="Invalid message format")
|
||||||
|
reason = stream_manager.process_message(message)
|
||||||
|
|
||||||
|
# Return response using uid from header
|
||||||
|
response = {
|
||||||
|
"id": uid,
|
||||||
|
"choices": {
|
||||||
|
"role": "assistant",
|
||||||
|
"content": "success",
|
||||||
|
"finish_reason": reason
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return JSONResponse(content=response, status_code=200)
|
||||||
|
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
raise HTTPException(status_code=400, detail="Invalid JSON")
|
||||||
|
except Exception as e:
|
||||||
|
raise HTTPException(status_code=500, detail=str(e))
|
||||||
|
|
||||||
|
@app.websocket("/ws/stream")
|
||||||
|
@app.websocket("/ws/api/v1/stream")
|
||||||
|
async def websocket_stream(websocket: WebSocket,
|
||||||
|
uid: Optional[str] = Query(None)):
|
||||||
|
global stream_manager
|
||||||
|
|
||||||
|
if not uid:
|
||||||
|
await websocket.close(code=400, reason="Missing uid in request")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Accept the WebSocket connection
|
||||||
|
await websocket.accept()
|
||||||
|
|
||||||
|
#if stream_manager.uid is not None and stream_manager.uid != uid:
|
||||||
|
# logger.error(f"uid changed during steram: previous uid {stream_manager.uid}, new uid {uid}")
|
||||||
|
# await websocket.close(code=400, reason="Uid changed in stream.")
|
||||||
|
# return
|
||||||
|
|
||||||
|
try:
|
||||||
|
while True:
|
||||||
|
# Continuously listen for incoming messages from the client
|
||||||
|
data = await websocket.receive_text()
|
||||||
|
|
||||||
|
# Parse JSON request
|
||||||
|
try:
|
||||||
|
request_data = json.loads(data)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
await websocket.send_json({"error": "Invalid JSON"})
|
||||||
|
continue
|
||||||
|
|
||||||
|
stream_manager.update_last_request_time()
|
||||||
|
stream_manager.update_last_stream_time()
|
||||||
|
|
||||||
|
if stream_manager.uid is not None and stream_manager.uid != uid:
|
||||||
|
logger.error(f"uid changed during stream: previous uid {stream_manager.uid}, new uid {uid}")
|
||||||
|
await websocket.send_json({"error": "UID changed in stream"})
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Validate basic structure
|
||||||
|
if not isinstance(request_data, dict) or "messages" not in request_data:
|
||||||
|
await websocket.send_json({"error": "Invalid request format"})
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Process messages
|
||||||
|
try:
|
||||||
|
reason = ""
|
||||||
|
for message in request_data["messages"]:
|
||||||
|
if not isinstance(message, dict) or "role" not in message or "content" not in message:
|
||||||
|
await websocket.send_json({"error": "Invalid message format"})
|
||||||
|
continue
|
||||||
|
reason = stream_manager.process_message(message)
|
||||||
|
|
||||||
|
# Respond with success message
|
||||||
|
response = {
|
||||||
|
"id": uid,
|
||||||
|
"choices": {
|
||||||
|
"role": "assistant",
|
||||||
|
"content": "success",
|
||||||
|
"finish_reason": reason,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
await websocket.send_json(response)
|
||||||
|
except WebSocketDisconnect:
|
||||||
|
# Handle WebSocket disconnection
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"process message error: {str(e)}")
|
||||||
|
await websocket.close(code=1011, reason =f"Internal server error: {str(e)}")
|
||||||
|
|
||||||
|
except WebSocketDisconnect:
|
||||||
|
# Handle WebSocket disconnection
|
||||||
|
return
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"ws_stream error: {str(e)}")
|
||||||
|
await websocket.close(code=1011, reason =f"Unexpected error: {str(e)}")
|
||||||
|
|
||||||
|
|
||||||
|
async def generate_sse_response(request: Request, uid: Optional[str] = Header(None)):
|
||||||
|
global stream_manager
|
||||||
|
print(f"uid: {uid}")
|
||||||
|
try:
|
||||||
|
# Wait for streaming to complete or timeout
|
||||||
|
while not stream_manager.is_streaming_complete.is_set():
|
||||||
|
# if stream_manager.is_timed_out():
|
||||||
|
# yield f"data: {json.dumps({'error': 'Stream timeout'})}\n\n"
|
||||||
|
# return
|
||||||
|
# print(f"{uid} whille not stream_manager.is_streaming_complete.is_set(), asyncio.sleep(0.1)")
|
||||||
|
await asyncio.sleep(0.1)
|
||||||
|
|
||||||
|
logger.info("streaming complete\n")
|
||||||
|
# Generate response
|
||||||
|
try:
|
||||||
|
yield f"event: message\n"
|
||||||
|
async for audio, text in stream_manager.generate():
|
||||||
|
if text == "stop":
|
||||||
|
break
|
||||||
|
res = {
|
||||||
|
"id": stream_manager.uid,
|
||||||
|
"response_id": stream_manager.output_audio_id,
|
||||||
|
"choices": [
|
||||||
|
{
|
||||||
|
"role": "assistant",
|
||||||
|
"audio": audio,
|
||||||
|
"text": text,
|
||||||
|
"finish_reason": "processing"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
# logger.info("generate_sse_response yield response")
|
||||||
|
yield f"data: {json.dumps(res)}\n\n"
|
||||||
|
await asyncio.sleep(0)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error while generation: {str(e)}")
|
||||||
|
yield f'data:{{"error": "{str(exc)}"}}\n\n'
|
||||||
|
except Exception as e:
|
||||||
|
yield f'data:{{"error": "{str(e)}"}}\n\n'
|
||||||
|
|
||||||
|
@app.post("/completions")
|
||||||
|
@app.post("/api/v1/completions")
|
||||||
|
async def completions(request: Request, uid: Optional[str] = Header(None)):
|
||||||
|
global stream_manager
|
||||||
|
|
||||||
|
if not uid:
|
||||||
|
raise HTTPException(status_code=400, detail="Missing uid in headers")
|
||||||
|
|
||||||
|
try:
|
||||||
|
# if stream_manager.uid is not None and stream_manager.uid != uid:
|
||||||
|
if stream_manager.uid != uid:
|
||||||
|
# stream_manager.stop_response = True
|
||||||
|
# logger.info(f"uid changed, reset model: previous uid {stream_manager.uid}, new uid {uid}")
|
||||||
|
stream_manager.session_id += 1
|
||||||
|
stream_manager.sys_prompt_flag = False
|
||||||
|
stream_manager.reset()
|
||||||
|
|
||||||
|
# raise HTTPException(
|
||||||
|
# status_code=409,
|
||||||
|
# detail="User id changed, reset context."
|
||||||
|
# )
|
||||||
|
stream_manager.speaking_time_stamp = 0
|
||||||
|
stream_manager.update_last_request_time()
|
||||||
|
stream_manager.uid = uid
|
||||||
|
stream_manager.start_conversation()
|
||||||
|
|
||||||
|
data = await request.json()
|
||||||
|
|
||||||
|
return StreamingResponse(
|
||||||
|
generate_sse_response(request, uid),
|
||||||
|
media_type="text/event-stream",
|
||||||
|
headers={
|
||||||
|
"Cache-Control": "no-cache",
|
||||||
|
"Connection": "keep-alive",
|
||||||
|
"Transfer-Encoding": "chunked"
|
||||||
|
}
|
||||||
|
)
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=503,
|
||||||
|
detail="Server busy, please try again later"
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error processing request for user {uid}: {str(e)}")
|
||||||
|
raise HTTPException(status_code=500, detail=str(e))
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/stop")
|
||||||
|
@app.post("/api/v1/stop")
|
||||||
|
async def stop_response(request: Request, uid: Optional[str] = Header(None)):
|
||||||
|
if not uid:
|
||||||
|
raise HTTPException(status_code=400, detail="Missing uid in headers")
|
||||||
|
|
||||||
|
global stream_manager
|
||||||
|
# stream_manager.session_id += 1
|
||||||
|
logger.info(f"uid {uid}: received stop_response")
|
||||||
|
stream_manager.stop_response = True
|
||||||
|
response = {
|
||||||
|
"id": uid,
|
||||||
|
"choices": {
|
||||||
|
"role": "assistant",
|
||||||
|
"content": "success",
|
||||||
|
"finish_reason": "stop"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return JSONResponse(content=response, status_code=200)
|
||||||
|
|
||||||
|
@app.post("/feedback")
|
||||||
|
@app.post("/api/v1/feedback")
|
||||||
|
async def feedback(request: Request, uid: Optional[str] = Header(None)):
|
||||||
|
global stream_manager
|
||||||
|
|
||||||
|
# Validate the 'uid' header
|
||||||
|
if not uid:
|
||||||
|
raise HTTPException(status_code=400, detail="Missing 'uid' header")
|
||||||
|
|
||||||
|
try:
|
||||||
|
data = await request.json()
|
||||||
|
if "response_id" not in data or "rating" not in data:
|
||||||
|
raise HTTPException(status_code=400, detail="Invalid request: must have response_id and rating")
|
||||||
|
response_id = data.get("response_id", "")
|
||||||
|
rating = data.get("rating", "")
|
||||||
|
comment = data.get("comment", "")
|
||||||
|
# Validate the rating
|
||||||
|
if rating not in ["like", "dislike"]:
|
||||||
|
raise HTTPException(status_code=400, detail=f"Invalid rating value: {rating}")
|
||||||
|
|
||||||
|
# Define the log file path
|
||||||
|
log_file_path = f"{stream_manager.savedir}/feedback_log/{response_id}.{rating}"
|
||||||
|
# Write the feedback to the file asynchronously
|
||||||
|
async with aiofiles.open(log_file_path, mode="a") as file:
|
||||||
|
await file.write(f"model: {stream_manager.minicpmo_model_path}\nuid {uid}: {comment}\n")
|
||||||
|
response = {
|
||||||
|
"id": uid,
|
||||||
|
"choices": {
|
||||||
|
"role": "assistant",
|
||||||
|
"content": "success",
|
||||||
|
"finish_reason": "done"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return JSONResponse(content=response, status_code=200)
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error processing feedback for user {uid}: {str(e)}")
|
||||||
|
raise HTTPException(status_code=500, detail=str(e))
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/init_options")
|
||||||
|
@app.post("/api/v1/init_options")
|
||||||
|
async def init_options(request: Request, uid: Optional[str] = Header(None)):
|
||||||
|
global stream_manager
|
||||||
|
|
||||||
|
stream_manager.update_last_request_time()
|
||||||
|
|
||||||
|
if not uid:
|
||||||
|
raise HTTPException(status_code=400, detail="Missing uid in headers")
|
||||||
|
try:
|
||||||
|
# Parse JSON request
|
||||||
|
data = await request.json()
|
||||||
|
|
||||||
|
# Validate basic structure
|
||||||
|
if not isinstance(data, dict) or "messages" not in data:
|
||||||
|
raise HTTPException(status_code=400, detail="Invalid request format")
|
||||||
|
|
||||||
|
messages = data.get("messages", [])
|
||||||
|
for message in messages:
|
||||||
|
if not isinstance(message, dict) or "role" not in message or "content" not in message:
|
||||||
|
raise HTTPException(status_code=400, detail="Invalid message format")
|
||||||
|
|
||||||
|
for content in message.get("content", []):
|
||||||
|
if content["type"] == "input_audio":
|
||||||
|
audio_data = content["input_audio"].get("data", "")
|
||||||
|
audio_fmt = content["input_audio"].get("format", "")
|
||||||
|
stream_manager.upload_customized_audio(audio_data, audio_fmt)
|
||||||
|
elif content["type"] == "options":
|
||||||
|
stream_manager.update_customized_options(uid, content["options"])
|
||||||
|
else:
|
||||||
|
ctype = content["type"]
|
||||||
|
raise HTTPException(status_code=400, detail=f"Invalid content type: {ctype}")
|
||||||
|
version = stream_manager.model_version
|
||||||
|
print(version)
|
||||||
|
response = {
|
||||||
|
"id": uid,
|
||||||
|
"choices": {
|
||||||
|
"role": "assistant",
|
||||||
|
"content": version,
|
||||||
|
"finish_reason": "done"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return JSONResponse(content=response, status_code=200)
|
||||||
|
except Exception as e:
|
||||||
|
raise HTTPException(status_code=400, detail=f"init options error: {str(e)}")
|
||||||
|
|
||||||
|
|
||||||
|
@app.get('/health')
|
||||||
|
@app.get('/api/v1/health')
|
||||||
|
async def health_check():
|
||||||
|
return {"status": "OK"}
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
uvicorn.run(app, host="0.0.0.0", port=args.port)
|
||||||
BIN
web_demos/minicpm-o_2.6/silero_vad.onnx
Normal file
301
web_demos/minicpm-o_2.6/vad_utils.py
Normal file
@@ -0,0 +1,301 @@
|
|||||||
|
import functools
|
||||||
|
import numpy as np
|
||||||
|
import librosa
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
import traceback
|
||||||
|
|
||||||
|
from typing import List, NamedTuple, Optional
|
||||||
|
|
||||||
|
class VadOptions(NamedTuple):
|
||||||
|
"""VAD options.
|
||||||
|
|
||||||
|
Attributes:
|
||||||
|
threshold: Speech threshold. Silero VAD outputs speech probabilities for each audio chunk,
|
||||||
|
probabilities ABOVE this value are considered as SPEECH. It is better to tune this
|
||||||
|
parameter for each dataset separately, but "lazy" 0.5 is pretty good for most datasets.
|
||||||
|
min_speech_duration_ms: Final speech chunks shorter min_speech_duration_ms are thrown out.
|
||||||
|
max_speech_duration_s: Maximum duration of speech chunks in seconds. Chunks longer
|
||||||
|
than max_speech_duration_s will be split at the timestamp of the last silence that
|
||||||
|
lasts more than 100ms (if any), to prevent aggressive cutting. Otherwise, they will be
|
||||||
|
split aggressively just before max_speech_duration_s.
|
||||||
|
min_silence_duration_ms: In the end of each speech chunk wait for min_silence_duration_ms
|
||||||
|
before separating it
|
||||||
|
window_size_samples: Audio chunks of window_size_samples size are fed to the silero VAD model.
|
||||||
|
WARNING! Silero VAD models were trained using 512, 1024, 1536 samples for 16000 sample rate.
|
||||||
|
Values other than these may affect model performance!!
|
||||||
|
speech_pad_ms: Final speech chunks are padded by speech_pad_ms each side
|
||||||
|
"""
|
||||||
|
|
||||||
|
# threshold: float = 0.3 # rep 0.5
|
||||||
|
# min_speech_duration_ms: int = 250
|
||||||
|
# max_speech_duration_s: float = float("inf")
|
||||||
|
# min_silence_duration_ms: int = 2000
|
||||||
|
# window_size_samples: int = 1024
|
||||||
|
# speech_pad_ms: int = 600 # rep 400
|
||||||
|
|
||||||
|
threshold: float = 0.7 # gw: 0.3 # rep 0.5
|
||||||
|
min_speech_duration_ms: int = 128 # original & gw: 250
|
||||||
|
max_speech_duration_s: float = float("inf")
|
||||||
|
min_silence_duration_ms: int = 500 # original & gw: 2000
|
||||||
|
window_size_samples: int = 1024
|
||||||
|
speech_pad_ms: int = 30 # gw: 600 # rep 400
|
||||||
|
|
||||||
|
class SileroVADModel:
|
||||||
|
def __init__(self, path):
|
||||||
|
try:
|
||||||
|
import onnxruntime
|
||||||
|
except ImportError as e:
|
||||||
|
raise RuntimeError(
|
||||||
|
"Applying the VAD filter requires the onnxruntime package"
|
||||||
|
) from e
|
||||||
|
|
||||||
|
opts = onnxruntime.SessionOptions()
|
||||||
|
opts.inter_op_num_threads = 1
|
||||||
|
opts.intra_op_num_threads = 1
|
||||||
|
opts.log_severity_level = 4
|
||||||
|
|
||||||
|
self.session = onnxruntime.InferenceSession(
|
||||||
|
path,
|
||||||
|
providers=["CPUExecutionProvider"],
|
||||||
|
sess_options=opts,
|
||||||
|
)
|
||||||
|
|
||||||
|
def get_initial_state(self, batch_size: int):
|
||||||
|
h = np.zeros((2, batch_size, 64), dtype=np.float32)
|
||||||
|
c = np.zeros((2, batch_size, 64), dtype=np.float32)
|
||||||
|
return h, c
|
||||||
|
|
||||||
|
def __call__(self, x, state, sr: int):
|
||||||
|
if len(x.shape) == 1:
|
||||||
|
x = np.expand_dims(x, 0)
|
||||||
|
if len(x.shape) > 2:
|
||||||
|
raise ValueError(
|
||||||
|
f"Too many dimensions for input audio chunk {len(x.shape)}"
|
||||||
|
)
|
||||||
|
if sr / x.shape[1] > 31.25:
|
||||||
|
raise ValueError("Input audio chunk is too short")
|
||||||
|
|
||||||
|
h, c = state
|
||||||
|
|
||||||
|
ort_inputs = {
|
||||||
|
"input": x,
|
||||||
|
#"state": np.concatenate((h, c), axis=0),
|
||||||
|
"h": h,
|
||||||
|
"c": c,
|
||||||
|
"sr": np.array(sr, dtype="int64"),
|
||||||
|
}
|
||||||
|
|
||||||
|
out, h, c = self.session.run(None, ort_inputs)
|
||||||
|
#out = self.session.run(None, ort_inputs)
|
||||||
|
state = (h, c)
|
||||||
|
return out, state
|
||||||
|
|
||||||
|
|
||||||
|
@functools.lru_cache
|
||||||
|
def get_vad_model():
|
||||||
|
"""Returns the VAD model instance."""
|
||||||
|
path = os.path.join(os.path.dirname(__file__), "silero_vad.onnx")
|
||||||
|
return SileroVADModel(path)
|
||||||
|
|
||||||
|
|
||||||
|
def get_speech_timestamps(
|
||||||
|
audio: np.ndarray,
|
||||||
|
vad_options: Optional[VadOptions] = None,
|
||||||
|
**kwargs,
|
||||||
|
) -> List[dict]:
|
||||||
|
"""This method is used for splitting long audios into speech chunks using silero VAD.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
audio: One dimensional float array.
|
||||||
|
vad_options: Options for VAD processing.
|
||||||
|
kwargs: VAD options passed as keyword arguments for backward compatibility.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of dicts containing begin and end samples of each speech chunk.
|
||||||
|
"""
|
||||||
|
if vad_options is None:
|
||||||
|
vad_options = VadOptions(**kwargs)
|
||||||
|
|
||||||
|
threshold = vad_options.threshold
|
||||||
|
min_speech_duration_ms = vad_options.min_speech_duration_ms
|
||||||
|
max_speech_duration_s = vad_options.max_speech_duration_s
|
||||||
|
min_silence_duration_ms = vad_options.min_silence_duration_ms
|
||||||
|
window_size_samples = vad_options.window_size_samples
|
||||||
|
speech_pad_ms = vad_options.speech_pad_ms
|
||||||
|
|
||||||
|
if window_size_samples not in [512, 1024, 1536]:
|
||||||
|
warnings.warn(
|
||||||
|
"Unusual window_size_samples! Supported window_size_samples:\n"
|
||||||
|
" - [512, 1024, 1536] for 16000 sampling_rate"
|
||||||
|
)
|
||||||
|
|
||||||
|
sampling_rate = 16000
|
||||||
|
min_speech_samples = sampling_rate * min_speech_duration_ms / 1000 #如果间隔区间没这个长度就不会添加
|
||||||
|
speech_pad_samples = sampling_rate * speech_pad_ms / 1000
|
||||||
|
max_speech_samples = (
|
||||||
|
sampling_rate * max_speech_duration_s
|
||||||
|
- window_size_samples
|
||||||
|
- 2 * speech_pad_samples
|
||||||
|
)
|
||||||
|
min_silence_samples = sampling_rate * min_silence_duration_ms / 1000 # 在每个silent需要等 min_silence_duration_ms 后才结束,
|
||||||
|
min_silence_samples_at_max_speech = sampling_rate * 98 / 1000 # 0.098s # need to adjust?
|
||||||
|
|
||||||
|
audio_length_samples = len(audio)
|
||||||
|
|
||||||
|
# import pdb
|
||||||
|
# pdb.set_trace()
|
||||||
|
|
||||||
|
model = get_vad_model()
|
||||||
|
state = model.get_initial_state(batch_size=1)
|
||||||
|
|
||||||
|
speech_probs = []
|
||||||
|
#print("audio_length_samples ", audio_length_samples, ", window_size_samples ", window_size_samples)
|
||||||
|
for current_start_sample in range(0, audio_length_samples, window_size_samples):
|
||||||
|
chunk = audio[current_start_sample : current_start_sample + window_size_samples]
|
||||||
|
if len(chunk) < window_size_samples:
|
||||||
|
chunk = np.pad(chunk, (0, int(window_size_samples - len(chunk))))
|
||||||
|
speech_prob, state = model(chunk, state, sampling_rate)
|
||||||
|
speech_probs.append(speech_prob)
|
||||||
|
|
||||||
|
triggered = False
|
||||||
|
speeches = []
|
||||||
|
current_speech = {}
|
||||||
|
neg_threshold = threshold - 0.15
|
||||||
|
|
||||||
|
# to save potential segment end (and tolerate some silence)
|
||||||
|
temp_end = 0
|
||||||
|
# to save potential segment limits in case of maximum segment size reached
|
||||||
|
prev_end = next_start = 0
|
||||||
|
|
||||||
|
# 大概是一段音频找出其中连续部分,如果遇到silent的话会先记录temp_end,然后如果没超过最小silent长度遇到active的情况下会重置temp_end。silent片段会分别记录silent的起终,在超过长度的时候切开(不完全确定,但是inf的最大长也遇不到)
|
||||||
|
|
||||||
|
for i, speech_prob in enumerate(speech_probs):
|
||||||
|
if (speech_prob >= threshold) and temp_end:
|
||||||
|
temp_end = 0
|
||||||
|
if next_start < prev_end:
|
||||||
|
next_start = window_size_samples * i
|
||||||
|
|
||||||
|
if (speech_prob >= threshold) and not triggered:
|
||||||
|
triggered = True
|
||||||
|
current_speech["start"] = window_size_samples * i
|
||||||
|
continue
|
||||||
|
|
||||||
|
if (
|
||||||
|
triggered
|
||||||
|
and (window_size_samples * i) - current_speech["start"] > max_speech_samples
|
||||||
|
):
|
||||||
|
if prev_end:
|
||||||
|
current_speech["end"] = prev_end
|
||||||
|
speeches.append(current_speech)
|
||||||
|
current_speech = {}
|
||||||
|
# previously reached silence (< neg_thres) and is still not speech (< thres)
|
||||||
|
if next_start < prev_end:
|
||||||
|
triggered = False
|
||||||
|
else:
|
||||||
|
current_speech["start"] = next_start
|
||||||
|
prev_end = next_start = temp_end = 0
|
||||||
|
else:
|
||||||
|
current_speech["end"] = window_size_samples * i
|
||||||
|
speeches.append(current_speech)
|
||||||
|
current_speech = {}
|
||||||
|
prev_end = next_start = temp_end = 0
|
||||||
|
triggered = False
|
||||||
|
continue
|
||||||
|
|
||||||
|
if (speech_prob < neg_threshold) and triggered:
|
||||||
|
if not temp_end:
|
||||||
|
temp_end = window_size_samples * i
|
||||||
|
# condition to avoid cutting in very short silence
|
||||||
|
if (window_size_samples * i) - temp_end > min_silence_samples_at_max_speech:
|
||||||
|
prev_end = temp_end
|
||||||
|
if (window_size_samples * i) - temp_end < min_silence_samples:
|
||||||
|
continue
|
||||||
|
else:
|
||||||
|
current_speech["end"] = temp_end
|
||||||
|
if (
|
||||||
|
current_speech["end"] - current_speech["start"]
|
||||||
|
) > min_speech_samples:
|
||||||
|
speeches.append(current_speech)
|
||||||
|
current_speech = {}
|
||||||
|
prev_end = next_start = temp_end = 0
|
||||||
|
triggered = False
|
||||||
|
continue
|
||||||
|
|
||||||
|
|
||||||
|
if (
|
||||||
|
current_speech
|
||||||
|
and (audio_length_samples - current_speech["start"]) > min_speech_samples
|
||||||
|
):
|
||||||
|
current_speech["end"] = audio_length_samples
|
||||||
|
speeches.append(current_speech)
|
||||||
|
|
||||||
|
# pad 多少ms,每个中间都会不足平分
|
||||||
|
for i, speech in enumerate(speeches):
|
||||||
|
if i == 0:
|
||||||
|
speech["start"] = int(max(0, speech["start"] - speech_pad_samples))
|
||||||
|
if i != len(speeches) - 1:
|
||||||
|
silence_duration = speeches[i + 1]["start"] - speech["end"]
|
||||||
|
if silence_duration < 2 * speech_pad_samples:
|
||||||
|
speech["end"] += int(silence_duration // 2)
|
||||||
|
speeches[i + 1]["start"] = int(
|
||||||
|
max(0, speeches[i + 1]["start"] - silence_duration // 2)
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
speech["end"] = int(
|
||||||
|
min(audio_length_samples, speech["end"] + speech_pad_samples)
|
||||||
|
)
|
||||||
|
speeches[i + 1]["start"] = int(
|
||||||
|
max(0, speeches[i + 1]["start"] - speech_pad_samples)
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
speech["end"] = int(
|
||||||
|
min(audio_length_samples, speech["end"] + speech_pad_samples)
|
||||||
|
)
|
||||||
|
return speeches
|
||||||
|
|
||||||
|
def collect_chunks(audio: np.ndarray, chunks: List[dict]) -> np.ndarray:
|
||||||
|
"""Collects and concatenates audio chunks."""
|
||||||
|
if not chunks:
|
||||||
|
return np.array([], dtype=np.float32)
|
||||||
|
|
||||||
|
return np.concatenate([audio[chunk["start"] : chunk["end"]] for chunk in chunks])
|
||||||
|
|
||||||
|
|
||||||
|
def run_vad(ori_audio, sr, vad_options=None):
|
||||||
|
_st = time.time()
|
||||||
|
try:
|
||||||
|
audio = np.frombuffer(ori_audio, dtype=np.int16)
|
||||||
|
audio = audio.astype(np.float32) / 32768.0
|
||||||
|
sampling_rate = 16000
|
||||||
|
if sr != sampling_rate:
|
||||||
|
audio = librosa.resample(audio, orig_sr=sr, target_sr=sampling_rate)
|
||||||
|
# print('audio.encode.shape: {}'.format(audio.shape))
|
||||||
|
if vad_options is None:
|
||||||
|
vad_options = VadOptions()
|
||||||
|
|
||||||
|
# 确保传递给 get_speech_timestamps 的是 VadOptions 实例
|
||||||
|
speech_chunks = get_speech_timestamps(audio, vad_options=vad_options)
|
||||||
|
# print(speech_chunks)
|
||||||
|
audio = collect_chunks(audio, speech_chunks)
|
||||||
|
# print(audio.shape)
|
||||||
|
duration_after_vad = audio.shape[0] / sampling_rate
|
||||||
|
|
||||||
|
# print('audio.decode.shape: {}'.format(audio.shape))
|
||||||
|
if sr != sampling_rate:
|
||||||
|
# resample to original sampling rate
|
||||||
|
vad_audio = librosa.resample(audio, orig_sr=sampling_rate, target_sr=sr)
|
||||||
|
else:
|
||||||
|
vad_audio = audio
|
||||||
|
vad_audio = np.round(vad_audio * 32768.0).astype(np.int16)
|
||||||
|
|
||||||
|
# 这个round会有一定的误差
|
||||||
|
|
||||||
|
vad_audio_bytes = vad_audio.tobytes()
|
||||||
|
|
||||||
|
return duration_after_vad, vad_audio_bytes, round(time.time() - _st, 4)
|
||||||
|
except Exception as e:
|
||||||
|
msg = f"[asr vad error] audio_len: {len(ori_audio)/(sr*2):.3f} s, trace: {traceback.format_exc()}"
|
||||||
|
print(msg)
|
||||||
|
return -1, ori_audio, round(time.time() - _st, 4)
|
||||||
|
|
||||||
0
web_demos/minicpm-o_2.6/web_server/.env.development
Normal file
0
web_demos/minicpm-o_2.6/web_server/.env.production
Normal file
359
web_demos/minicpm-o_2.6/web_server/.eslintrc-auto-import.json
Normal file
@@ -0,0 +1,359 @@
|
|||||||
|
{
|
||||||
|
"globals": {
|
||||||
|
"Component": true,
|
||||||
|
"ComponentPublicInstance": true,
|
||||||
|
"ComputedRef": true,
|
||||||
|
"EffectScope": true,
|
||||||
|
"ExtractDefaultPropTypes": true,
|
||||||
|
"ExtractPropTypes": true,
|
||||||
|
"ExtractPublicPropTypes": true,
|
||||||
|
"InjectionKey": true,
|
||||||
|
"LegalTypeEnum": true,
|
||||||
|
"LoginTypeEnum": true,
|
||||||
|
"PropType": true,
|
||||||
|
"Ref": true,
|
||||||
|
"VNode": true,
|
||||||
|
"WritableComputedRef": true,
|
||||||
|
"acceptHMRUpdate": true,
|
||||||
|
"ajaxHeader": true,
|
||||||
|
"asyncComputed": true,
|
||||||
|
"authLogin": true,
|
||||||
|
"autoResetRef": true,
|
||||||
|
"computed": true,
|
||||||
|
"computedAsync": true,
|
||||||
|
"computedEager": true,
|
||||||
|
"computedInject": true,
|
||||||
|
"computedWithControl": true,
|
||||||
|
"controlledComputed": true,
|
||||||
|
"controlledRef": true,
|
||||||
|
"createApp": true,
|
||||||
|
"createEventHook": true,
|
||||||
|
"createGlobalState": true,
|
||||||
|
"createInjectionState": true,
|
||||||
|
"createPinia": true,
|
||||||
|
"createReactiveFn": true,
|
||||||
|
"createReusableTemplate": true,
|
||||||
|
"createSharedComposable": true,
|
||||||
|
"createTemplatePromise": true,
|
||||||
|
"createUnrefFn": true,
|
||||||
|
"customRef": true,
|
||||||
|
"debouncedRef": true,
|
||||||
|
"debouncedWatch": true,
|
||||||
|
"defineAsyncComponent": true,
|
||||||
|
"defineComponent": true,
|
||||||
|
"defineStore": true,
|
||||||
|
"eagerComputed": true,
|
||||||
|
"effectScope": true,
|
||||||
|
"extendRef": true,
|
||||||
|
"fetchSmsVerifyCode": true,
|
||||||
|
"getActivePinia": true,
|
||||||
|
"getCurrentInstance": true,
|
||||||
|
"getCurrentScope": true,
|
||||||
|
"getHomeInfo": true,
|
||||||
|
"h": true,
|
||||||
|
"ignorableWatch": true,
|
||||||
|
"inject": true,
|
||||||
|
"injectLocal": true,
|
||||||
|
"isDefined": true,
|
||||||
|
"isProxy": true,
|
||||||
|
"isReactive": true,
|
||||||
|
"isReadonly": true,
|
||||||
|
"isRef": true,
|
||||||
|
"loginSuccess": true,
|
||||||
|
"makeDestructurable": true,
|
||||||
|
"mapActions": true,
|
||||||
|
"mapGetters": true,
|
||||||
|
"mapState": true,
|
||||||
|
"mapStores": true,
|
||||||
|
"mapWritableState": true,
|
||||||
|
"markRaw": true,
|
||||||
|
"nextTick": true,
|
||||||
|
"onActivated": true,
|
||||||
|
"onBeforeMount": true,
|
||||||
|
"onBeforeRouteLeave": true,
|
||||||
|
"onBeforeRouteUpdate": true,
|
||||||
|
"onBeforeUnmount": true,
|
||||||
|
"onBeforeUpdate": true,
|
||||||
|
"onClickOutside": true,
|
||||||
|
"onDeactivated": true,
|
||||||
|
"onErrorCaptured": true,
|
||||||
|
"onKeyStroke": true,
|
||||||
|
"onLongPress": true,
|
||||||
|
"onMounted": true,
|
||||||
|
"onRenderTracked": true,
|
||||||
|
"onRenderTriggered": true,
|
||||||
|
"onScopeDispose": true,
|
||||||
|
"onServerPrefetch": true,
|
||||||
|
"onStartTyping": true,
|
||||||
|
"onUnmounted": true,
|
||||||
|
"onUpdated": true,
|
||||||
|
"pausableWatch": true,
|
||||||
|
"provide": true,
|
||||||
|
"provideLocal": true,
|
||||||
|
"reactify": true,
|
||||||
|
"reactifyObject": true,
|
||||||
|
"reactive": true,
|
||||||
|
"reactiveComputed": true,
|
||||||
|
"reactiveOmit": true,
|
||||||
|
"reactivePick": true,
|
||||||
|
"readonly": true,
|
||||||
|
"ref": true,
|
||||||
|
"refAutoReset": true,
|
||||||
|
"refDebounced": true,
|
||||||
|
"refDefault": true,
|
||||||
|
"refThrottled": true,
|
||||||
|
"refWithControl": true,
|
||||||
|
"resolveComponent": true,
|
||||||
|
"resolveRef": true,
|
||||||
|
"resolveUnref": true,
|
||||||
|
"setActivePinia": true,
|
||||||
|
"setMapStoreSuffix": true,
|
||||||
|
"setupStore": true,
|
||||||
|
"shallowReactive": true,
|
||||||
|
"shallowReadonly": true,
|
||||||
|
"shallowRef": true,
|
||||||
|
"store": true,
|
||||||
|
"storeToRefs": true,
|
||||||
|
"submitFeedback": true,
|
||||||
|
"syncRef": true,
|
||||||
|
"syncRefs": true,
|
||||||
|
"templateRef": true,
|
||||||
|
"throttledRef": true,
|
||||||
|
"throttledWatch": true,
|
||||||
|
"toRaw": true,
|
||||||
|
"toReactive": true,
|
||||||
|
"toRef": true,
|
||||||
|
"toRefs": true,
|
||||||
|
"toValue": true,
|
||||||
|
"triggerRef": true,
|
||||||
|
"tryOnBeforeMount": true,
|
||||||
|
"tryOnBeforeUnmount": true,
|
||||||
|
"tryOnMounted": true,
|
||||||
|
"tryOnScopeDispose": true,
|
||||||
|
"tryOnUnmounted": true,
|
||||||
|
"unref": true,
|
||||||
|
"unrefElement": true,
|
||||||
|
"until": true,
|
||||||
|
"useActiveElement": true,
|
||||||
|
"useAnimate": true,
|
||||||
|
"useArrayDifference": true,
|
||||||
|
"useArrayEvery": true,
|
||||||
|
"useArrayFilter": true,
|
||||||
|
"useArrayFind": true,
|
||||||
|
"useArrayFindIndex": true,
|
||||||
|
"useArrayFindLast": true,
|
||||||
|
"useArrayIncludes": true,
|
||||||
|
"useArrayJoin": true,
|
||||||
|
"useArrayMap": true,
|
||||||
|
"useArrayReduce": true,
|
||||||
|
"useArraySome": true,
|
||||||
|
"useArrayUnique": true,
|
||||||
|
"useAsyncQueue": true,
|
||||||
|
"useAsyncState": true,
|
||||||
|
"useAttrs": true,
|
||||||
|
"useBase64": true,
|
||||||
|
"useBattery": true,
|
||||||
|
"useBluetooth": true,
|
||||||
|
"useBreakpoints": true,
|
||||||
|
"useBroadcastChannel": true,
|
||||||
|
"useBrowserLocation": true,
|
||||||
|
"useCached": true,
|
||||||
|
"useClearLocalCache": true,
|
||||||
|
"useClipboard": true,
|
||||||
|
"useClipboardItems": true,
|
||||||
|
"useCloned": true,
|
||||||
|
"useColorMode": true,
|
||||||
|
"useConfirmDialog": true,
|
||||||
|
"useCounter": true,
|
||||||
|
"useCssModule": true,
|
||||||
|
"useCssVar": true,
|
||||||
|
"useCssVars": true,
|
||||||
|
"useCurrentElement": true,
|
||||||
|
"useCycleList": true,
|
||||||
|
"useDark": true,
|
||||||
|
"useDateFormat": true,
|
||||||
|
"useDebounce": true,
|
||||||
|
"useDebounceFn": true,
|
||||||
|
"useDebouncedRefHistory": true,
|
||||||
|
"useDeviceMotion": true,
|
||||||
|
"useDeviceOrientation": true,
|
||||||
|
"useDevicePixelRatio": true,
|
||||||
|
"useDevicesList": true,
|
||||||
|
"useDisplayMedia": true,
|
||||||
|
"useDocumentVisibility": true,
|
||||||
|
"useDraggable": true,
|
||||||
|
"useDropZone": true,
|
||||||
|
"useElementBounding": true,
|
||||||
|
"useElementByPoint": true,
|
||||||
|
"useElementHover": true,
|
||||||
|
"useElementSize": true,
|
||||||
|
"useElementVisibility": true,
|
||||||
|
"useEventBus": true,
|
||||||
|
"useEventListener": true,
|
||||||
|
"useEventSource": true,
|
||||||
|
"useEyeDropper": true,
|
||||||
|
"useFavicon": true,
|
||||||
|
"useFetch": true,
|
||||||
|
"useFetchLogin": true,
|
||||||
|
"useFileDialog": true,
|
||||||
|
"useFileSystemAccess": true,
|
||||||
|
"useFocus": true,
|
||||||
|
"useFocusWithin": true,
|
||||||
|
"useFps": true,
|
||||||
|
"useFullscreen": true,
|
||||||
|
"useGamepad": true,
|
||||||
|
"useGeolocation": true,
|
||||||
|
"useGetLocalCache": true,
|
||||||
|
"useHttp": true,
|
||||||
|
"useIdle": true,
|
||||||
|
"useImage": true,
|
||||||
|
"useInfiniteScroll": true,
|
||||||
|
"useIntersectionObserver": true,
|
||||||
|
"useInterval": true,
|
||||||
|
"useIntervalFn": true,
|
||||||
|
"useKeyModifier": true,
|
||||||
|
"useLastChanged": true,
|
||||||
|
"useLegal": true,
|
||||||
|
"useLink": true,
|
||||||
|
"useLocalStorage": true,
|
||||||
|
"useLogin": true,
|
||||||
|
"useMagicKeys": true,
|
||||||
|
"useManualRefHistory": true,
|
||||||
|
"useMediaControls": true,
|
||||||
|
"useMediaQuery": true,
|
||||||
|
"useMemoize": true,
|
||||||
|
"useMemory": true,
|
||||||
|
"useMounted": true,
|
||||||
|
"useMouse": true,
|
||||||
|
"useMouseInElement": true,
|
||||||
|
"useMousePressed": true,
|
||||||
|
"useMutationObserver": true,
|
||||||
|
"useNavigatorLanguage": true,
|
||||||
|
"useNetwork": true,
|
||||||
|
"useNow": true,
|
||||||
|
"useObjectUrl": true,
|
||||||
|
"useOffsetPagination": true,
|
||||||
|
"useOnline": true,
|
||||||
|
"usePageLeave": true,
|
||||||
|
"useParallax": true,
|
||||||
|
"useParentElement": true,
|
||||||
|
"usePerformanceObserver": true,
|
||||||
|
"usePermission": true,
|
||||||
|
"usePointer": true,
|
||||||
|
"usePointerLock": true,
|
||||||
|
"usePointerSwipe": true,
|
||||||
|
"usePreferredColorScheme": true,
|
||||||
|
"usePreferredContrast": true,
|
||||||
|
"usePreferredDark": true,
|
||||||
|
"usePreferredLanguages": true,
|
||||||
|
"usePreferredReducedMotion": true,
|
||||||
|
"usePrevious": true,
|
||||||
|
"useRafFn": true,
|
||||||
|
"useRefHistory": true,
|
||||||
|
"useResizeObserver": true,
|
||||||
|
"useRoute": true,
|
||||||
|
"useRouter": true,
|
||||||
|
"useScreenOrientation": true,
|
||||||
|
"useScreenSafeArea": true,
|
||||||
|
"useScriptTag": true,
|
||||||
|
"useScroll": true,
|
||||||
|
"useScrollLock": true,
|
||||||
|
"useSessionStorage": true,
|
||||||
|
"useSetLocalCache": true,
|
||||||
|
"useShare": true,
|
||||||
|
"useSlots": true,
|
||||||
|
"useSorted": true,
|
||||||
|
"useSpeechRecognition": true,
|
||||||
|
"useSpeechSynthesis": true,
|
||||||
|
"useStepper": true,
|
||||||
|
"useStorage": true,
|
||||||
|
"useStorageAsync": true,
|
||||||
|
"useStyleTag": true,
|
||||||
|
"useSupported": true,
|
||||||
|
"useSwipe": true,
|
||||||
|
"useTemplateRefsList": true,
|
||||||
|
"useTextDirection": true,
|
||||||
|
"useTextSelection": true,
|
||||||
|
"useTextareaAutosize": true,
|
||||||
|
"useThrottle": true,
|
||||||
|
"useThrottleFn": true,
|
||||||
|
"useThrottledRefHistory": true,
|
||||||
|
"useTimeAgo": true,
|
||||||
|
"useTimeout": true,
|
||||||
|
"useTimeoutFn": true,
|
||||||
|
"useTimeoutPoll": true,
|
||||||
|
"useTimestamp": true,
|
||||||
|
"useTitle": true,
|
||||||
|
"useToNumber": true,
|
||||||
|
"useToString": true,
|
||||||
|
"useToggle": true,
|
||||||
|
"useTransition": true,
|
||||||
|
"useUrlSearchParams": true,
|
||||||
|
"useUserMedia": true,
|
||||||
|
"useUserStore": true,
|
||||||
|
"useUserStoreWithOut": true,
|
||||||
|
"useVModel": true,
|
||||||
|
"useVModels": true,
|
||||||
|
"useVibrate": true,
|
||||||
|
"useVirtualList": true,
|
||||||
|
"useWakeLock": true,
|
||||||
|
"useWebNotification": true,
|
||||||
|
"useWebSocket": true,
|
||||||
|
"useWebWorker": true,
|
||||||
|
"useWebWorkerFn": true,
|
||||||
|
"useWindowFocus": true,
|
||||||
|
"useWindowScroll": true,
|
||||||
|
"useWindowSize": true,
|
||||||
|
"watch": true,
|
||||||
|
"watchArray": true,
|
||||||
|
"watchAtMost": true,
|
||||||
|
"watchDebounced": true,
|
||||||
|
"watchDeep": true,
|
||||||
|
"watchEffect": true,
|
||||||
|
"watchIgnorable": true,
|
||||||
|
"watchImmediate": true,
|
||||||
|
"watchOnce": true,
|
||||||
|
"watchPausable": true,
|
||||||
|
"watchPostEffect": true,
|
||||||
|
"watchSyncEffect": true,
|
||||||
|
"watchThrottled": true,
|
||||||
|
"watchTriggerable": true,
|
||||||
|
"watchWithFilter": true,
|
||||||
|
"whenever": true,
|
||||||
|
"ElMessage": true,
|
||||||
|
"ElLoading": true,
|
||||||
|
"deleteHistoryBatch": true,
|
||||||
|
"deleteHistoryItem": true,
|
||||||
|
"getHistory": true,
|
||||||
|
"createConv": true,
|
||||||
|
"fetchHistoryList": true,
|
||||||
|
"stopChat": true,
|
||||||
|
"useChatStore": true,
|
||||||
|
"useChatStoreWithOut": true,
|
||||||
|
"useChatExchangeStore": true,
|
||||||
|
"useChatExchangeStoreWithOut": true,
|
||||||
|
"useExchangeStore": true,
|
||||||
|
"useExchangeStoreWithOut": true,
|
||||||
|
"delMessage": true,
|
||||||
|
"sendRating": true,
|
||||||
|
"getInitialActions": true,
|
||||||
|
"sendFeedback": true,
|
||||||
|
"md": true,
|
||||||
|
"useMarkdown": true,
|
||||||
|
"connectService": true,
|
||||||
|
"sendMessage": true,
|
||||||
|
"Audio": true,
|
||||||
|
"SoundRecording": true,
|
||||||
|
"getVolume": true,
|
||||||
|
"ElMessageBox": true,
|
||||||
|
"encodeWav": true,
|
||||||
|
"encodeWAV": true,
|
||||||
|
"stopMessage": true,
|
||||||
|
"TaskQueue": true,
|
||||||
|
"getNewUserId": true,
|
||||||
|
"setNewUserId": true,
|
||||||
|
"uploadFile": true,
|
||||||
|
"feedback": true,
|
||||||
|
"uploadConfig": true
|
||||||
|
}
|
||||||
|
}
|
||||||
26
web_demos/minicpm-o_2.6/web_server/.eslintrc.cjs
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
/* eslint-env node */
|
||||||
|
require('@rushstack/eslint-patch/modern-module-resolution');
|
||||||
|
|
||||||
|
module.exports = {
|
||||||
|
root: true,
|
||||||
|
extends: [
|
||||||
|
'plugin:vue/vue3-essential',
|
||||||
|
'eslint:recommended',
|
||||||
|
'@vue/eslint-config-prettier/skip-formatting',
|
||||||
|
'./.eslintrc-auto-import.json',
|
||||||
|
],
|
||||||
|
parserOptions: {
|
||||||
|
ecmaVersion: 'latest',
|
||||||
|
},
|
||||||
|
rules: {
|
||||||
|
'no-console': process.env.NODE_ENV === 'production' ? 'off' : 'warn',
|
||||||
|
'no-debugger': process.env.NODE_ENV === 'production' ? 'error' : 'warn',
|
||||||
|
'no-var': process.env.NODE_ENV === 'production' ? 'off' : 'warn',
|
||||||
|
'no-undef': process.env.NODE_ENV === 'production' ? 'error' : 'warn',
|
||||||
|
'vue/multi-word-component-names': 'off', // 不校验组件名
|
||||||
|
'no-empty': 0, // 允许代码块为空
|
||||||
|
'vue/no-unused-components': 'warn',
|
||||||
|
'no-unused-vars': 'warn',
|
||||||
|
'prettier/prettier': 'off', // 不符合prettier格式规范的编码eslint直接自动报错
|
||||||
|
},
|
||||||
|
};
|
||||||
32
web_demos/minicpm-o_2.6/web_server/.gitignore
vendored
Normal file
@@ -0,0 +1,32 @@
|
|||||||
|
# Logs
|
||||||
|
logs
|
||||||
|
*.log
|
||||||
|
npm-debug.log*
|
||||||
|
yarn-debug.log*
|
||||||
|
yarn-error.log*
|
||||||
|
pnpm-debug.log*
|
||||||
|
lerna-debug.log*
|
||||||
|
|
||||||
|
node_modules
|
||||||
|
.DS_Store
|
||||||
|
dist
|
||||||
|
dist-ssr
|
||||||
|
coverage
|
||||||
|
*.local
|
||||||
|
|
||||||
|
/cypress/videos/
|
||||||
|
/cypress/screenshots/
|
||||||
|
|
||||||
|
# Editor directories and files
|
||||||
|
.vscode/*
|
||||||
|
!.vscode/extensions.json
|
||||||
|
.idea
|
||||||
|
*.suo
|
||||||
|
*.ntvs*
|
||||||
|
*.njsproj
|
||||||
|
*.sln
|
||||||
|
*.sw?
|
||||||
|
|
||||||
|
*.tsbuildinfo
|
||||||
|
.VSCodeCounter
|
||||||
|
.history
|
||||||
10
web_demos/minicpm-o_2.6/web_server/.husky/pre-push
Executable file
@@ -0,0 +1,10 @@
|
|||||||
|
#!/usr/bin/env sh
|
||||||
|
. "$(dirname -- "$0")/_/husky.sh"
|
||||||
|
|
||||||
|
echo "---format start---"
|
||||||
|
pnpm run format
|
||||||
|
echo "---format end---"
|
||||||
|
|
||||||
|
echo "---eslint start---"
|
||||||
|
pnpm run lint
|
||||||
|
echo "---eslint end---"
|
||||||
19
web_demos/minicpm-o_2.6/web_server/.prettierrc.json
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
{
|
||||||
|
"$schema": "https://json.schemastore.org/prettierrc",
|
||||||
|
"semi": true,
|
||||||
|
"trailingComma": "none",
|
||||||
|
"singleQuote": true,
|
||||||
|
"printWidth": 120,
|
||||||
|
"tabWidth": 4,
|
||||||
|
"useTabs": false,
|
||||||
|
"quoteProps": "as-needed",
|
||||||
|
"bracketSpacing": true,
|
||||||
|
"jsxBracketSameLine": false,
|
||||||
|
"arrowParens": "avoid",
|
||||||
|
"endOfLine": "auto",
|
||||||
|
"htmlWhitespaceSensitivity": "css",
|
||||||
|
"cssDeclarationSortOrder": "alphabetical",
|
||||||
|
"tableContentIndentation": "align",
|
||||||
|
"vueIndentScriptAndStyle": true,
|
||||||
|
"proseWrap": "preserve"
|
||||||
|
}
|
||||||
3
web_demos/minicpm-o_2.6/web_server/.vscode/extensions.json
vendored
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
{
|
||||||
|
"recommendations": ["Vue.volar", "dbaeumer.vscode-eslint", "esbenp.prettier-vscode"]
|
||||||
|
}
|
||||||
21
web_demos/minicpm-o_2.6/web_server/Dockerfile
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
# FROM 基于node的版本镜像,并通过构建阶段命名,将有node环境的阶段命名为build-stage
|
||||||
|
FROM modelbest-registry-vpc.cn-beijing.cr.aliyuncs.com/modelbest/playground:20.10.0 as build-stage
|
||||||
|
# 设置工作区为 /build 于系统文件隔离
|
||||||
|
WORKDIR /build
|
||||||
|
COPY . /build
|
||||||
|
|
||||||
|
# 在容器中安装依赖
|
||||||
|
RUN npm config set registry https://registry.npmmirror.com/
|
||||||
|
# 或者用源 https://registry.npm.taobao.org
|
||||||
|
RUN npm i pnpm -g
|
||||||
|
RUN pnpm config set registry https://registry.npmmirror.com/
|
||||||
|
RUN pnpm install
|
||||||
|
|
||||||
|
# 打包
|
||||||
|
RUN pnpm run build
|
||||||
|
|
||||||
|
# production stage
|
||||||
|
FROM modelbest-registry-vpc.cn-beijing.cr.aliyuncs.com/modelbest/playground:alpine as production-stage
|
||||||
|
COPY --from=build-stage /build/dist /usr/share/nginx/html
|
||||||
|
COPY nginx.conf /etc/nginx/
|
||||||
|
EXPOSE 3000
|
||||||
74
web_demos/minicpm-o_2.6/web_server/README.md
Normal file
@@ -0,0 +1,74 @@
|
|||||||
|
## Language
|
||||||
|
|
||||||
|
- [English](#english)
|
||||||
|
- [中文](#中文)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# English
|
||||||
|
|
||||||
|
## important
|
||||||
|
|
||||||
|
This project depends on Node and PNPM. If they are not installed, please install them.
|
||||||
|
|
||||||
|
|
||||||
|
## Project Setup
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pnpm install
|
||||||
|
```
|
||||||
|
|
||||||
|
## Compile and Hot-Reload for Development
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pnpm run dev
|
||||||
|
```
|
||||||
|
|
||||||
|
## Compile and Minify for Production
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pnpm run build
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tips
|
||||||
|
|
||||||
|
If you want to use your own backend in the development environment, please modify the proxy object in <font color="red">vite.config.js</font> located in the root directory.
|
||||||
|
|
||||||
|
### Recommended IDE Setup
|
||||||
|
|
||||||
|
[VSCode](https://code.visualstudio.com/)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# 中文
|
||||||
|
|
||||||
|
## 重要
|
||||||
|
|
||||||
|
这个项目依赖于node、pnpm环境,如果你的PC上没有,请先安装。
|
||||||
|
|
||||||
|
## 安装依赖
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pnpm install
|
||||||
|
```
|
||||||
|
|
||||||
|
## 运行在本地开发模式下(可热更新)
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pnpm run dev
|
||||||
|
```
|
||||||
|
|
||||||
|
## 编译代码(用于生产环境)
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pnpm run build
|
||||||
|
```
|
||||||
|
|
||||||
|
### 注意
|
||||||
|
|
||||||
|
如果你想在本地开发模式下运行项目,并且调用自己的后端服务,请修改项目根目录下的<font color="red">vite.config.js</font>文件中的proxy配置。
|
||||||
|
|
||||||
|
### 推荐IDE
|
||||||
|
|
||||||
|
[VSCode](https://code.visualstudio.com/)
|
||||||
|
|
||||||
31
web_demos/minicpm-o_2.6/web_server/components.d.ts
vendored
Normal file
@@ -0,0 +1,31 @@
|
|||||||
|
/* eslint-disable */
|
||||||
|
/* prettier-ignore */
|
||||||
|
// @ts-nocheck
|
||||||
|
// Generated by unplugin-vue-components
|
||||||
|
// Read more: https://github.com/vuejs/core/pull/3399
|
||||||
|
export {}
|
||||||
|
|
||||||
|
declare module 'vue' {
|
||||||
|
export interface GlobalComponents {
|
||||||
|
ElButton: typeof import('element-plus/es')['ElButton']
|
||||||
|
ElCheckbox: typeof import('element-plus/es')['ElCheckbox']
|
||||||
|
ElCheckboxGroup: typeof import('element-plus/es')['ElCheckboxGroup']
|
||||||
|
ElDialog: typeof import('element-plus/es')['ElDialog']
|
||||||
|
ElDropdown: typeof import('element-plus/es')['ElDropdown']
|
||||||
|
ElDropdownItem: typeof import('element-plus/es')['ElDropdownItem']
|
||||||
|
ElDropdownMenu: typeof import('element-plus/es')['ElDropdownMenu']
|
||||||
|
ElForm: typeof import('element-plus/es')['ElForm']
|
||||||
|
ElFormItem: typeof import('element-plus/es')['ElFormItem']
|
||||||
|
ElIcon: typeof import('element-plus/es')['ElIcon']
|
||||||
|
ElInput: typeof import('element-plus/es')['ElInput']
|
||||||
|
ElTooltip: typeof import('element-plus/es')['ElTooltip']
|
||||||
|
Lottie: typeof import('./src/components/Lottie/index.vue')['default']
|
||||||
|
RouterLink: typeof import('vue-router')['RouterLink']
|
||||||
|
RouterView: typeof import('vue-router')['RouterView']
|
||||||
|
SiderMenu: typeof import('./src/components/SiderMenu/index.vue')['default']
|
||||||
|
Toast: typeof import('./src/components/Toast/index.vue')['default']
|
||||||
|
}
|
||||||
|
export interface ComponentCustomProperties {
|
||||||
|
vInfiniteScroll: typeof import('element-plus/es')['ElInfiniteScroll']
|
||||||
|
}
|
||||||
|
}
|
||||||
13
web_demos/minicpm-o_2.6/web_server/index.html
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
<!doctype html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<link rel="icon" href="/favicon.svg" />
|
||||||
|
<meta name="viewport" content="viewport-fit=cover,maximum-scale=1" />
|
||||||
|
<title>MiniCPM-omni</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div id="app"></div>
|
||||||
|
<script type="module" src="/src/main.js"></script>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
110
web_demos/minicpm-o_2.6/web_server/nginx.conf
Normal file
@@ -0,0 +1,110 @@
|
|||||||
|
user root;
|
||||||
|
worker_processes auto;
|
||||||
|
pid /run/nginx.pid;
|
||||||
|
include /etc/nginx/modules-enabled/*.conf;
|
||||||
|
|
||||||
|
events {
|
||||||
|
worker_connections 768;
|
||||||
|
# multi_accept on;
|
||||||
|
}
|
||||||
|
|
||||||
|
http {
|
||||||
|
|
||||||
|
##
|
||||||
|
# Basic Settings
|
||||||
|
##
|
||||||
|
|
||||||
|
client_max_body_size 20M;
|
||||||
|
|
||||||
|
sendfile on;
|
||||||
|
tcp_nopush on;
|
||||||
|
tcp_nodelay on;
|
||||||
|
keepalive_timeout 65;
|
||||||
|
types_hash_max_size 2048;
|
||||||
|
# server_tokens off;
|
||||||
|
|
||||||
|
# server_names_hash_bucket_size 64;
|
||||||
|
# server_name_in_redirect off;
|
||||||
|
|
||||||
|
include /etc/nginx/mime.types;
|
||||||
|
default_type application/octet-stream;
|
||||||
|
|
||||||
|
##
|
||||||
|
# SSL Settings
|
||||||
|
##
|
||||||
|
|
||||||
|
ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3; # Dropping SSLv3, ref: POODLE
|
||||||
|
ssl_prefer_server_ciphers on;
|
||||||
|
|
||||||
|
##
|
||||||
|
# Logging Settings
|
||||||
|
##
|
||||||
|
|
||||||
|
access_log /var/log/nginx/access.log;
|
||||||
|
error_log /var/log/nginx/error.log;
|
||||||
|
|
||||||
|
##
|
||||||
|
# Gzip Settings
|
||||||
|
##
|
||||||
|
|
||||||
|
gzip on;
|
||||||
|
|
||||||
|
# gzip_vary on;
|
||||||
|
# gzip_proxied any;
|
||||||
|
# gzip_comp_level 6;
|
||||||
|
# gzip_buffers 16 8k;
|
||||||
|
# gzip_http_version 1.1;
|
||||||
|
# gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
|
||||||
|
|
||||||
|
##
|
||||||
|
# Virtual Host Configs
|
||||||
|
##
|
||||||
|
server {
|
||||||
|
# listen 8080;
|
||||||
|
server_name localhost;
|
||||||
|
|
||||||
|
add_header Access-Control-Allow-Origin *;
|
||||||
|
add_header Access-Control-Allow-Headers X-Requested-With;
|
||||||
|
add_header Access-Control-Allow-Methods GET,POST,OPTIONS;
|
||||||
|
|
||||||
|
# 后端请求
|
||||||
|
location /api/v1 {
|
||||||
|
proxy_pass http://127.0.0.1:32550;
|
||||||
|
proxy_set_header Host $host;
|
||||||
|
proxy_set_header Connection "";
|
||||||
|
chunked_transfer_encoding off;
|
||||||
|
proxy_set_header X-Accel-Buffering off; # 这里设置X-Accel-Buffering头部
|
||||||
|
add_header X-Accel-Buffering off; # 这里是用于响应中显示X-Accel-Buffering头部
|
||||||
|
proxy_http_version 1.1;
|
||||||
|
# 关闭 Nginx 缓存
|
||||||
|
proxy_buffering off;
|
||||||
|
proxy_cache off;
|
||||||
|
# 禁用 Nginx 默认缓冲条件
|
||||||
|
sendfile off;
|
||||||
|
tcp_nodelay on;
|
||||||
|
}
|
||||||
|
location /ws {
|
||||||
|
proxy_pass http://127.0.0.1:32550;
|
||||||
|
proxy_http_version 1.1;
|
||||||
|
proxy_set_header Upgrade $http_upgrade;
|
||||||
|
proxy_set_header Connection 'upgrade';
|
||||||
|
proxy_set_header Host $host;
|
||||||
|
proxy_cache_bypass $http_upgrade;
|
||||||
|
}
|
||||||
|
location / {
|
||||||
|
root /usr/share/nginx/html;
|
||||||
|
|
||||||
|
index index.html index.htm;
|
||||||
|
try_files $uri $uri/ /index.html;
|
||||||
|
}
|
||||||
|
|
||||||
|
location @router {
|
||||||
|
rewrite ^.*$ /index.html last;
|
||||||
|
}
|
||||||
|
|
||||||
|
location =/robots.txt {
|
||||||
|
index robots.txt;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
}
|
||||||
45
web_demos/minicpm-o_2.6/web_server/package.json
Normal file
@@ -0,0 +1,45 @@
|
|||||||
|
{
|
||||||
|
"name": "web",
|
||||||
|
"version": "0.0.0",
|
||||||
|
"private": true,
|
||||||
|
"type": "module",
|
||||||
|
"scripts": {
|
||||||
|
"dev": "vite",
|
||||||
|
"build": "vite build",
|
||||||
|
"preview": "vite preview",
|
||||||
|
"lint": "eslint . --ext .vue,.js,.jsx,.cjs,.mjs --fix --ignore-path .gitignore",
|
||||||
|
"format": "prettier --write src/",
|
||||||
|
"prepare": "husky install"
|
||||||
|
},
|
||||||
|
"dependencies": {
|
||||||
|
"@element-plus/icons-vue": "^2.3.1",
|
||||||
|
"@microsoft/fetch-event-source": "^2.0.1",
|
||||||
|
"@ricky0123/vad-web": "^0.0.22",
|
||||||
|
"@vueuse/core": "^11.0.3",
|
||||||
|
"axios": "^1.7.7",
|
||||||
|
"clipboard": "^2.0.11",
|
||||||
|
"el-table-infinite-scroll": "^3.0.6",
|
||||||
|
"element-plus": "^2.8.1",
|
||||||
|
"pinia": "^2.1.7",
|
||||||
|
"unplugin-icons": "^0.19.3",
|
||||||
|
"vue": "^3.4.29",
|
||||||
|
"vue-i18n": "^11.0.1",
|
||||||
|
"vue-router": "^4.3.3"
|
||||||
|
},
|
||||||
|
"devDependencies": {
|
||||||
|
"@iconify-json/fluent": "^1.2.1",
|
||||||
|
"@iconify-json/material-symbols": "^1.2.1",
|
||||||
|
"@rushstack/eslint-patch": "^1.8.0",
|
||||||
|
"@vitejs/plugin-vue": "^5.0.5",
|
||||||
|
"@vue/eslint-config-prettier": "^9.0.0",
|
||||||
|
"eslint": "^8.57.0",
|
||||||
|
"eslint-plugin-vue": "^9.23.0",
|
||||||
|
"husky": "^9.1.5",
|
||||||
|
"less": "^4.2.0",
|
||||||
|
"prettier": "^3.2.5",
|
||||||
|
"unplugin-auto-import": "^0.18.2",
|
||||||
|
"unplugin-vue-components": "^0.27.4",
|
||||||
|
"vite": "^5.3.1",
|
||||||
|
"vite-plugin-vue-devtools": "^7.3.1"
|
||||||
|
}
|
||||||
|
}
|
||||||
3743
web_demos/minicpm-o_2.6/web_server/pnpm-lock.yaml
generated
Normal file
BIN
web_demos/minicpm-o_2.6/web_server/public/favicon.ico
Normal file
|
After Width: | Height: | Size: 4.2 KiB |
9
web_demos/minicpm-o_2.6/web_server/public/favicon.svg
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<svg width="39px" height="40px" viewBox="0 0 39 40" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
|
||||||
|
<title>形状结合</title>
|
||||||
|
<g id="封面/目录" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
|
||||||
|
<g id="编组-9" transform="translate(-573, -4)" fill="#EF1C2F" fill-rule="nonzero">
|
||||||
|
<path d="M576.881892,21.235765 L580.450462,24.8041433 L580.38268,24.87313 C577.235834,28.1237009 577.267944,33.3099197 580.479012,36.5209876 C583.7129,39.7548751 588.950111,39.7644621 592.195876,36.5497487 L595.764033,40.1177144 L595.635716,40.2441837 C590.410115,45.3030383 582.072776,45.2514173 576.910679,40.0893208 L576.755816,39.9319282 C571.756877,34.7682174 571.748077,26.5660415 576.729414,21.3916323 L576.881892,21.235765 Z M592.417879,13.3160236 L604.512414,4 L599.920819,17.5607789 L607.492343,16.6473294 L602.663827,23.0830718 L611.570065,25.829445 L602.638402,29.682258 L606.245418,35.3702608 L600.389683,35.3702553 L597.78469,37.9753136 L594.216265,34.4068885 L597.546837,31.0764355 L595.209819,27.390919 L597.0017,26.6178775 L594.322938,25.7918752 L596.362671,23.0730359 L592.57191,23.5303217 L594.387004,18.1691054 L590.921987,20.8381842 L588.636635,16.8638388 L585.916869,19.3631275 L585.910577,19.36078 L582.540401,22.7310252 L578.972081,19.1627052 L581.472017,16.6628227 L581.47204,12.2468077 L584.806996,13.5296032 L589.867048,8.87978168 L592.417879,13.3160236 Z" id="形状结合"></path>
|
||||||
|
</g>
|
||||||
|
</g>
|
||||||
|
</svg>
|
||||||
|
After Width: | Height: | Size: 1.5 KiB |
BIN
web_demos/minicpm-o_2.6/web_server/public/silero_vad_legacy.onnx
Normal file
7
web_demos/minicpm-o_2.6/web_server/src/App.vue
Normal file
@@ -0,0 +1,7 @@
|
|||||||
|
<template>
|
||||||
|
<RouterView />
|
||||||
|
</template>
|
||||||
|
|
||||||
|
<script setup></script>
|
||||||
|
|
||||||
|
<style lang="less" scoped></style>
|
||||||
21
web_demos/minicpm-o_2.6/web_server/src/apis/index.js
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
// 定时发送消息
|
||||||
|
export const sendMessage = data => {
|
||||||
|
return useHttp.post('/api/v1/stream', data);
|
||||||
|
};
|
||||||
|
// 跳过当前
|
||||||
|
export const stopMessage = () => {
|
||||||
|
return useHttp.post('/api/v1/stop');
|
||||||
|
};
|
||||||
|
// 上传音色文件
|
||||||
|
export const uploadFile = data => {
|
||||||
|
return useHttp.post('/api/v1/upload_audio', data);
|
||||||
|
};
|
||||||
|
// 反馈
|
||||||
|
export const feedback = data => {
|
||||||
|
return useHttp.post('/api/v1/feedback', data);
|
||||||
|
};
|
||||||
|
// 上传配置
|
||||||
|
export const uploadConfig = data => {
|
||||||
|
return useHttp.post('/api/v1/init_options', data);
|
||||||
|
// return useHttp.post('/api/v1/upload_audio', data);
|
||||||
|
};
|
||||||
|
After Width: | Height: | Size: 221 B |
BIN
web_demos/minicpm-o_2.6/web_server/src/assets/images/cai.png
Normal file
|
After Width: | Height: | Size: 284 B |
|
After Width: | Height: | Size: 1.5 KiB |
BIN
web_demos/minicpm-o_2.6/web_server/src/assets/images/logo.png
Normal file
|
After Width: | Height: | Size: 6.2 KiB |
|
After Width: | Height: | Size: 1.6 KiB |
BIN
web_demos/minicpm-o_2.6/web_server/src/assets/images/voice.png
Normal file
|
After Width: | Height: | Size: 2.0 KiB |
|
After Width: | Height: | Size: 391 B |
BIN
web_demos/minicpm-o_2.6/web_server/src/assets/images/zan.png
Normal file
|
After Width: | Height: | Size: 279 B |
@@ -0,0 +1 @@
|
|||||||
|
<svg data-v-d2e47025="" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1024 1024"><path fill="currentColor" d="M600.704 64a32 32 0 0 1 30.464 22.208l35.2 109.376c14.784 7.232 28.928 15.36 42.432 24.512l112.384-24.192a32 32 0 0 1 34.432 15.36L944.32 364.8a32 32 0 0 1-4.032 37.504l-77.12 85.12a357.12 357.12 0 0 1 0 49.024l77.12 85.248a32 32 0 0 1 4.032 37.504l-88.704 153.6a32 32 0 0 1-34.432 15.296L708.8 803.904c-13.44 9.088-27.648 17.28-42.368 24.512l-35.264 109.376A32 32 0 0 1 600.704 960H423.296a32 32 0 0 1-30.464-22.208L357.696 828.48a351.616 351.616 0 0 1-42.56-24.64l-112.32 24.256a32 32 0 0 1-34.432-15.36L79.68 659.2a32 32 0 0 1 4.032-37.504l77.12-85.248a357.12 357.12 0 0 1 0-48.896l-77.12-85.248A32 32 0 0 1 79.68 364.8l88.704-153.6a32 32 0 0 1 34.432-15.296l112.32 24.256c13.568-9.152 27.776-17.408 42.56-24.64l35.2-109.312A32 32 0 0 1 423.232 64H600.64zm-23.424 64H446.72l-36.352 113.088-24.512 11.968a294.113 294.113 0 0 0-34.816 20.096l-22.656 15.36-116.224-25.088-65.28 113.152 79.68 88.192-1.92 27.136a293.12 293.12 0 0 0 0 40.192l1.92 27.136-79.808 88.192 65.344 113.152 116.224-25.024 22.656 15.296a294.113 294.113 0 0 0 34.816 20.096l24.512 11.968L446.72 896h130.688l36.48-113.152 24.448-11.904a288.282 288.282 0 0 0 34.752-20.096l22.592-15.296 116.288 25.024 65.28-113.152-79.744-88.192 1.92-27.136a293.12 293.12 0 0 0 0-40.256l-1.92-27.136 79.808-88.128-65.344-113.152-116.288 24.96-22.592-15.232a287.616 287.616 0 0 0-34.752-20.096l-24.448-11.904L577.344 128zM512 320a192 192 0 1 1 0 384 192 192 0 0 1 0-384m0 64a128 128 0 1 0 0 256 128 128 0 0 0 0-256"></path></svg>
|
||||||
|
After Width: | Height: | Size: 1.6 KiB |
@@ -0,0 +1 @@
|
|||||||
|
<svg data-v-d2e47025="" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1024 1024"><path fill="currentColor" d="M832 384H576V128H192v768h640zm-26.496-64L640 154.496V320zM160 64h480l256 256v608a32 32 0 0 1-32 32H160a32 32 0 0 1-32-32V96a32 32 0 0 1 32-32m160 448h384v64H320zm0-192h160v64H320zm0 384h384v64H320z"></path></svg>
|
||||||
|
After Width: | Height: | Size: 324 B |
@@ -0,0 +1,5 @@
|
|||||||
|
<svg width="20" height="20" viewBox="0 0 20 20" fill="none" xmlns="http://www.w3.org/2000/svg">
|
||||||
|
<g id="Icon/Utility Icon/line/error">
|
||||||
|
<path id="Union" fill-rule="evenodd" clip-rule="evenodd" d="M9.99997 20C4.48608 20 0 15.5139 0 10C0 4.48607 4.48606 0 9.99997 0C15.5139 0 19.9999 4.48609 19.9999 10C19.9999 15.5139 15.5139 20 9.99997 20ZM9.99997 1.875C5.52001 1.875 1.875 5.52002 1.875 10C1.875 14.48 5.52001 18.125 9.99997 18.125C14.4799 18.125 18.125 14.48 18.125 10C18.125 5.52002 14.4799 1.875 9.99997 1.875ZM13.7878 7.53784L11.3257 9.99999L13.7878 12.4621C14.154 12.8283 14.154 13.4216 13.7878 13.7878C13.6047 13.9709 13.3655 14.0625 13.125 14.0625C12.8845 14.0625 12.6452 13.9709 12.4621 13.7878L9.99998 11.3257L7.53784 13.7878C7.35473 13.9709 7.11548 14.0625 6.875 14.0625C6.63451 14.0625 6.39526 13.9709 6.21216 13.7878C5.84595 13.4216 5.84595 12.8283 6.21216 12.4621L8.6743 9.99999L6.21216 7.53784C5.84595 7.17163 5.84595 6.57837 6.21216 6.21216C6.57836 5.84595 7.17163 5.84595 7.53784 6.21216L10 8.67431L12.4621 6.21216C12.8283 5.84595 13.4216 5.84595 13.7878 6.21216C14.154 6.57837 14.154 7.17163 13.7878 7.53784Z" fill="#E72B00"/>
|
||||||
|
</g>
|
||||||
|
</svg>
|
||||||
|
After Width: | Height: | Size: 1.1 KiB |
@@ -0,0 +1,29 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<svg viewBox="0 0 2199 258" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
|
||||||
|
<title>编组 5</title>
|
||||||
|
<defs>
|
||||||
|
<linearGradient x1="45.9111958%" y1="57.6904311%" x2="4.78458419e-14%" y2="70.534914%" id="linearGradient-1">
|
||||||
|
<stop stop-color="#373ED8" offset="0%"></stop>
|
||||||
|
<stop stop-color="#497DFF" offset="100%"></stop>
|
||||||
|
</linearGradient>
|
||||||
|
<path d="M1812.80909,215.823442 L1812.80909,252.015442 L1952.00909,252.015442 L1952.00909,211.995442 L1870.22909,211.995442 L1930.08509,134.391442 C1937.27682,125.111446 1942.72882,116.063446 1946.44109,107.247442 C1950.15309,98.4314389 1952.00909,88.8034425 1952.00909,78.3634425 L1952.00909,72.4474425 C1952.00909,60.6154425 1949.16709,49.8274425 1943.48309,40.0834425 C1937.79935,30.3394425 1929.67935,22.5674425 1919.12309,16.7674425 C1908.56709,10.9674354 1896.32909,8.06744248 1882.40909,8.06744248 C1868.02509,8.06744248 1855.49709,10.8514425 1844.82509,16.4194425 C1834.15309,21.9874425 1825.97509,29.6434425 1820.29109,39.3874425 C1814.6071,49.1314425 1811.76509,60.2674425 1811.76509,72.7954425 L1811.76509,81.1474425 L1855.96109,81.1474425 L1855.96109,75.5794425 C1855.96109,66.5314425 1858.16509,59.4554496 1862.57309,54.3514425 C1866.98109,49.2474354 1873.01309,46.6954425 1880.66909,46.6954425 C1888.32509,46.6954425 1894.41509,49.1314425 1898.93909,54.0034425 C1903.46309,58.8754425 1905.72509,65.4874425 1905.72509,73.8394425 L1905.72509,78.7114425 C1905.72509,89.3834389 1901.31709,100.635442 1892.50109,112.467442 L1812.80909,215.823442 Z M1976.89309,202.599442 L1976.89309,252.015442 L2025.26509,252.015442 L2025.26509,202.599442 L1976.89309,202.599442 Z M2069.81109,237.051442 C2082.91909,249.579442 2101.07309,255.843442 2124.27309,255.843442 C2146.54509,255.843442 2164.46709,249.463444 2178.03909,236.703442 C2191.61109,223.943441 2198.39709,206.195446 2198.39709,183.459442 L2198.39709,172.323442 C2198.39709,151.675439 2192.77109,135.377446 2181.51909,123.429442 C2170.26709,111.481439 2155.24509,105.507442 2136.45309,105.507442 C2129.95709,105.507442 2124.15709,106.667446 2119.05309,108.987442 L2168.81709,11.8954425 L2120.79309,11.8954425 L2065.80909,118.731442 C2060.70509,128.939446 2056.81909,138.335446 2054.15109,146.919442 C2051.48309,155.503439 2050.14909,164.551444 2050.14909,174.063442 L2050.14909,184.851442 C2050.14909,207.123442 2056.70309,224.523442 2069.81109,237.051442 Z M2145.15309,208.863442 C2140.04909,214.431442 2133.08909,217.215442 2124.27309,217.215442 C2115.45709,217.215442 2108.55509,214.431442 2103.56709,208.863442 C2098.57909,203.295442 2096.08509,195.639442 2096.08509,185.895442 L2096.08509,174.411442 C2096.08509,164.667448 2098.57909,157.06945 2103.56709,151.617442 C2108.55507,146.165446 2115.45707,143.439442 2124.27309,143.439442 C2133.08909,143.439442 2140.04909,146.165446 2145.15309,151.617442 C2150.25709,157.069439 2152.80909,164.783441 2152.80909,174.759442 L2152.80909,185.547442 C2152.80909,195.523444 2150.25709,203.295442 2145.15309,208.863442 Z" id="path-2"></path>
|
||||||
|
</defs>
|
||||||
|
<g id="页面-1" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
|
||||||
|
<g id="画板备份-14" transform="translate(-1928, -1764)" fill-rule="nonzero">
|
||||||
|
<g id="编组-5" transform="translate(1928, 1764.9846)">
|
||||||
|
<path d="M760.177408,6.08426104 C780.959767,6.08426104 798.639412,11.1994393 813.310847,21.4099653 L814.826937,22.4868416 C827.871266,31.9421726 838.44267,44.5805385 846.551989,60.4441981 L846.854209,61.0497535 L805.164914,79.6080391 L804.700192,78.6693484 C800.249247,69.9367001 794.263119,63.1340103 786.753168,58.3199387 C778.149995,52.8050845 768.233984,50.0506369 757.083763,50.0506369 C744.833865,50.0506369 733.831738,53.5163071 724.177049,60.4281868 C714.581392,67.2978043 707.134706,76.9187061 701.846243,89.2224762 C696.604371,101.417853 693.991732,115.174193 693.991732,130.467076 C693.991732,146.161685 696.600155,160.26846 701.833896,172.765351 C707.114285,185.373628 714.549996,195.251726 724.136357,202.332561 C733.797042,209.468294 744.814803,213.049067 757.083763,213.049067 C767.641468,213.049067 777.212416,210.227573 785.712813,204.59744 L787.029919,203.697896 C793.997044,198.793482 800.045118,192.1801 805.176059,183.885785 L805.493229,183.359111 L847.470223,202.046016 L847.190902,202.601562 C838.534575,219.439902 826.824502,232.527091 812.037303,241.920218 C796.196991,251.982303 778.007922,257.015442 757.393128,257.015442 C735.537395,257.015442 716.157915,251.671101 699.179392,240.984619 C682.183395,230.287139 668.940168,215.394755 659.416171,196.246509 C649.861213,177.036014 645.075526,155.122604 645.075526,130.467076 C645.075526,106.236416 649.907085,84.6957154 659.55475,65.8023698 C669.170256,46.9720039 682.655972,32.3375052 700.053275,21.8391328 C717.453143,11.3392121 737.472005,6.08426104 760.177408,6.08426104 Z M472.804069,70.4320631 C490.215347,70.4320631 503.551514,75.9687387 513.08592,87.0440588 L513.922858,88.0433234 C522.993334,99.1752071 527.579887,114.927363 527.579887,135.416907 L527.579887,252.681061 L482.065262,252.681061 L482.066689,147.48212 C482.066689,137.230753 479.444996,128.966722 474.122797,122.834623 C468.710698,116.598944 461.272898,113.470346 452.076652,113.470346 C441.497963,113.470346 432.82714,116.858283 426.285393,123.629565 L425.517793,124.451905 C419.503248,131.122279 416.518055,139.98993 416.518055,150.885129 L416.517248,252.681061 L371.003248,252.681061 L371.003248,74.7611551 L416.517248,74.7611551 L416.518055,98.8935909 L423.358335,98.8935909 L424.345965,97.2602023 C429.414322,88.8779197 436.064565,82.3247602 444.331449,77.5591448 C452.566403,72.8119363 462.037327,70.4320631 472.804069,70.4320631 Z M605.481416,74.7611551 L605.481416,252.681061 L559.9708,252.681061 L559.9708,74.7611551 L605.481416,74.7611551 Z M335.78549,74.7611551 L335.78549,252.681061 L290.27149,252.681061 L290.27149,74.7611551 L335.78549,74.7611551 Z M0,10.4147082 L43.0533273,10.4147082 L122.24903,130.758127 L127.754059,130.758127 L206.947055,10.4147082 L250.000382,10.4147082 L250.000382,252.681061 L204.178374,252.681061 L204.180526,104.498777 L197.139835,104.498777 L143.308009,184.313596 L106.66868,184.313596 L52.5323692,105.736235 L45.5131981,105.736235 L45.5106162,252.681061 L0,252.681061 L0,10.4147082 Z M961.869908,10.4147082 C981.192886,10.4147082 997.923497,13.7715036 1012.08946,20.4554447 C1026.1438,27.0867173 1036.87643,36.5393059 1044.3624,48.8517555 C1051.8649,61.1913978 1055.62564,75.6900619 1055.62564,92.415251 C1055.62564,109.151297 1051.96203,123.607707 1044.65738,135.847946 C1037.36788,148.062782 1026.98507,157.46144 1013.43892,164.086199 C999.800233,170.756213 983.652345,174.105774 964.963553,174.105774 L922.598939,174.105774 L922.596926,252.681061 L875.228112,252.681061 L875.228112,10.4147082 L961.869908,10.4147082 Z M953.826433,52.8349168 L922.598939,52.8349168 L922.598939,131.686221 L953.826433,131.686221 C969.953126,131.686221 982.80491,128.309905 992.310812,121.456813 C1002.06659,114.423581 1007.0188,104.634314 1007.0188,92.415251 C1007.0188,80.2034681 1002.0737,70.3707714 992.331646,63.2341476 C982.822362,56.2680445 969.961757,52.8349168 953.826433,52.8349168 Z M335.78549,0 L335.78549,45.5106162 L290.27149,45.5106162 L290.27149,0 L335.78549,0 Z M605.119253,0 L605.119253,45.5106162 L559.605252,45.5106162 L559.605252,0 L605.119253,0 Z" id="形状" fill="#111111"></path>
|
||||||
|
<g id="M-V" transform="translate(1084.9431, 11.7574)" fill="#000111">
|
||||||
|
<polygon id="路径" points="44.394 239.184 0 239.184 0 0 41.676 0 119.894 123.216 121.706 123.216 200.226 0 241.902 0 241.902 239.184 197.508 239.184 197.508 85.466 195.696 85.466 137.41 176.368 104.492 176.368 46.206 86.372 44.394 86.372"></polygon>
|
||||||
|
<polygon id="路径" points="274.216 96.942 374.48 96.942 374.48 138.014 274.216 138.014"></polygon>
|
||||||
|
</g>
|
||||||
|
<g id="o" transform="translate(1501.3431, 42.9174)" fill="#000111">
|
||||||
|
<path d="M95.4,213.12 C75.96,213.12 59.04,208.8 44.64,200.16 C30.24,191.52 19.2,179.16 11.52,163.08 C3.84,147 0,128.16 0,106.56 C0,84.96 3.84,66.12 11.52,50.04 C19.2,33.96 30.24,21.6 44.64,12.96 C59.04,4.32 75.96,0 95.4,0 C114.84,0 131.76,4.32 146.16,12.96 C160.56,21.6 171.66,33.96 179.46,50.04 C187.26,66.12 191.16,84.96 191.16,106.56 C191.16,128.16 187.26,147 179.46,163.08 C171.66,179.16 160.56,191.52 146.16,200.16 C131.76,208.8 114.84,213.12 95.4,213.12 Z M95.4,169.92 C110.52,169.92 122.46,164.22 131.22,152.82 C139.98,141.42 144.36,126 144.36,106.56 C144.36,86.88 139.98,71.4 131.22,60.12 C122.46,48.84 110.52,43.2 95.4,43.2 C80.52,43.2 68.7,48.84 59.94,60.12 C51.18,71.4 46.8,86.88 46.8,106.56 C46.8,126 51.18,141.42 59.94,152.82 C68.7,164.22 80.52,169.92 95.4,169.92 Z" id="形状"></path>
|
||||||
|
</g>
|
||||||
|
<g id="形状结合">
|
||||||
|
<use fill="#000111" xlink:href="#path-2"></use>
|
||||||
|
<use fill="url(#linearGradient-1)" xlink:href="#path-2"></use>
|
||||||
|
</g>
|
||||||
|
</g>
|
||||||
|
</g>
|
||||||
|
</g>
|
||||||
|
</svg>
|
||||||
|
After Width: | Height: | Size: 8.9 KiB |
@@ -0,0 +1,5 @@
|
|||||||
|
<svg width="18" height="18" viewBox="0 0 18 18" fill="none" xmlns="http://www.w3.org/2000/svg">
|
||||||
|
<g id="Pause">
|
||||||
|
<path id="Vector" fill-rule="evenodd" clip-rule="evenodd" d="M4.875 2.2522H7.125C7.5375 2.2522 7.875 2.5897 7.875 3.0022V15.0022C7.875 15.4147 7.5375 15.7522 7.125 15.7522H4.875C4.4625 15.7522 4.125 15.4147 4.125 15.0022V3.0022C4.125 2.5897 4.4625 2.2522 4.875 2.2522ZM10.875 2.2522H13.125C13.5375 2.2522 13.875 2.5897 13.875 3.0022V15.0022C13.875 15.4147 13.5375 15.7522 13.125 15.7522H10.875C10.4625 15.7522 10.125 15.4147 10.125 15.0022V3.0022C10.125 2.5897 10.4625 2.2522 10.875 2.2522Z" fill="currentColor" />
|
||||||
|
</g>
|
||||||
|
</svg>
|
||||||
|
After Width: | Height: | Size: 638 B |
@@ -0,0 +1,10 @@
|
|||||||
|
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none">
|
||||||
|
<g clip-path="url(#clip0_7781_19663)">
|
||||||
|
<path d="M21.7786 18.4946C22.7599 18.6754 23.7615 17.9845 23.7955 16.9048C23.827 15.9053 23.7672 14.6519 23.4533 13.4613C23.141 12.2768 22.5546 11.0725 21.4695 10.2892C20.3647 9.49176 18.7205 8.97497 17.1207 8.64947C15.4984 8.31938 13.8179 8.16607 12.5642 8.15054C10.9332 8.13034 8.67094 8.26243 6.60622 8.68941C5.57392 8.90289 4.56701 9.19489 3.70489 9.59193C2.85192 9.98474 2.07652 10.5096 1.58739 11.2247C0.257894 13.1683 0.172116 15.4886 0.325588 16.9453C0.436943 18.0022 1.45742 18.5535 2.36025 18.353C3.07081 18.1951 3.71743 18.0593 4.36845 17.9225C5.30139 17.7265 6.24339 17.5286 7.3955 17.2614C7.46587 17.2451 7.53161 17.2194 7.59169 17.1859C7.85982 17.0768 8.05173 16.8168 8.05917 16.509C8.09666 14.957 8.40578 14.0228 8.95698 13.4586C9.50108 12.9017 10.4369 12.5484 12.1227 12.5476C13.8976 12.5468 14.8691 12.862 15.4225 13.3997C15.9698 13.9314 16.2828 14.8523 16.2836 16.5335C16.2836 16.5634 16.2854 16.5928 16.2888 16.6217C16.279 16.6521 16.2711 16.6836 16.2651 16.7159C16.19 17.1233 16.4594 17.5144 16.8668 17.5894L21.7786 18.4946Z" fill="currentColor" />
|
||||||
|
</g>
|
||||||
|
<defs>
|
||||||
|
<clipPath id="clip0_7781_19663">
|
||||||
|
<rect width="24" height="24" fill="white"/>
|
||||||
|
</clipPath>
|
||||||
|
</defs>
|
||||||
|
</svg>
|
||||||
|
After Width: | Height: | Size: 1.3 KiB |
@@ -0,0 +1 @@
|
|||||||
|
<svg t="1736675176012" class="icon" viewBox="0 0 1024 1024" version="1.1" xmlns="http://www.w3.org/2000/svg" p-id="4244" xmlns:xlink="http://www.w3.org/1999/xlink"><path d="M512 106.667A405.333 405.333 0 1 1 106.667 512 405.333 405.333 0 0 1 512 106.667m0-64A469.333 469.333 0 1 0 981.333 512 469.333 469.333 0 0 0 512 42.667z" p-id="4245"></path><path d="M501.333 664.533a32 32 0 1 0 32 32 32 32 0 0 0-32-32z m-0.426-27.093a32 32 0 0 1-32-32c0-80.213 50.56-111.787 91.306-136.96 32-19.84 51.84-33.28 59.094-60.16a85.333 85.333 0 0 0-12.587-69.547 91.52 91.52 0 0 0-76.8-29.226 123.52 123.52 0 0 0-92.16 29.866 82.56 82.56 0 0 0-21.333 52.907 32 32 0 1 1-64 2.56 144 144 0 0 1 39.466-99.84c31.574-32.853 78.08-49.493 138.24-49.493 70.827 0 108.587 29.44 128 54.186a149.333 149.333 0 0 1 23.894 125.014c-14.08 52.693-54.614 77.866-87.04 98.133-40.32 24.747-61.654 39.68-61.654 82.56a32 32 0 0 1-32.426 32z" p-id="4246"></path></svg>
|
||||||
|
After Width: | Height: | Size: 931 B |
@@ -0,0 +1,3 @@
|
|||||||
|
<svg data-v-d2e47025="" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1024 1024">
|
||||||
|
<path fill="currentColor" d="M771.776 794.88A384 384 0 0 1 128 512h64a320 320 0 0 0 555.712 216.448H654.72a32 32 0 1 1 0-64h149.056a32 32 0 0 1 32 32v148.928a32 32 0 1 1-64 0v-50.56zM276.288 295.616h92.992a32 32 0 0 1 0 64H220.16a32 32 0 0 1-32-32V178.56a32 32 0 0 1 64 0v50.56A384 384 0 0 1 896.128 512h-64a320 320 0 0 0-555.776-216.384z"></path>
|
||||||
|
</svg>
|
||||||
|
After Width: | Height: | Size: 442 B |
@@ -0,0 +1,7 @@
|
|||||||
|
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
|
||||||
|
<g id="下载">
|
||||||
|
<rect width="24" height="24" rx="7" fill="#EAEFFF"/>
|
||||||
|
<path id="Vector" d="M12.2816 16.1003C11.9134 16.1003 11.615 15.8019 11.615 15.4337V6.2513L8.28168 9.50983C8.11137 9.6765 7.86502 9.73978 7.63559 9.67571C7.40617 9.61165 7.22831 9.42989 7.16894 9.1989C7.10956 8.96817 7.17805 8.72313 7.34836 8.55646L11.8163 4.18987C12.0785 3.93363 12.4985 3.93727 12.7563 4.19794L17.088 8.56323C17.3399 8.82599 17.3344 9.24187 17.0763 9.49811C16.818 9.7541 16.4018 9.75592 16.1414 9.50202L12.9483 6.28489V15.4337C12.9483 15.8019 12.6498 16.1003 12.2816 16.1003Z" fill="#424EC5"/>
|
||||||
|
<path id="Vector_2" d="M4.66666 13.6001C5.03488 13.6001 5.33331 13.8985 5.33331 14.2668V17.4667C5.33331 17.8349 5.63174 18.1334 5.99997 18.1334H17.9998C18.368 18.1334 18.6664 17.8349 18.6664 17.4667V14.2668C18.6664 13.8985 18.9648 13.6001 19.3331 13.6001C19.7013 13.6001 19.9997 13.8985 19.9997 14.2668V17.4667C19.9997 18.5714 19.1044 19.4667 17.9998 19.4667H5.99997C4.8953 19.4667 4 18.5714 4 17.4667V14.2668C4 13.8985 4.29843 13.6001 4.66666 13.6001Z" fill="#424EC5"/>
|
||||||
|
</g>
|
||||||
|
</svg>
|
||||||
|
After Width: | Height: | Size: 1.2 KiB |
41
web_demos/minicpm-o_2.6/web_server/src/assets/svg/voice.svg
Normal file
@@ -0,0 +1,41 @@
|
|||||||
|
<svg xmlns="http://www.w3.org/2000/svg" width="195" height="45" viewBox="0 0 195 45" fill="none">
|
||||||
|
<rect x="16" y="18.2257" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="11" y="18.2257" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="6" y="18.2257" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="0.907227" y="18.2257" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="71" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="91" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="111" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="131" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="66" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="86" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="106" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="126" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="61" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="81" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="101" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="121" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="56" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="76" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="96" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="116" y="18" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="21" y="13.3407" width="3.14815" height="18.3186" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect width="3.14815" height="18.3186" rx="1.57407" transform="matrix(-1 0 0 1 54 13.3849)" fill="#F3F3F3"/>
|
||||||
|
<rect x="26" y="8.45581" width="3.14815" height="28.0885" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect width="3.14815" height="28.0885" rx="1.57407" transform="matrix(-1 0 0 1 49 8.5)" fill="#F3F3F3"/>
|
||||||
|
<rect x="31" y="9.9823" width="3.14815" height="25.0354" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect width="3.14815" height="25.0354" rx="1.57407" transform="matrix(-1 0 0 1 44 10.0265)" fill="#F3F3F3"/>
|
||||||
|
<rect x="36" y="5.09729" width="3.14815" height="34.8053" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="151" y="15.4779" width="3.14815" height="14.0442" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="156" y="5.70801" width="3.14815" height="33.5841" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="161" y="7.53979" width="3.14815" height="29.9204" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="166" y="15.4779" width="3.14815" height="14.0442" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="171" y="10.8982" width="3.14815" height="23.2035" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect width="3.14815" height="29.9204" rx="1.57407" transform="matrix(-1 0 0 1 149 7.5)" fill="#F3F3F3"/>
|
||||||
|
<rect width="3.14815" height="14.0442" rx="1.57407" transform="matrix(-1 0 0 1 144 15.4381)" fill="#F3F3F3"/>
|
||||||
|
<rect width="3.14815" height="23.2035" rx="1.57407" transform="matrix(-1 0 0 1 139 10.8584)" fill="#F3F3F3"/>
|
||||||
|
<rect x="176" y="18.2257" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="181" y="18.2257" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="186" y="18.2257" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
<rect x="191" y="18.2257" width="3.14815" height="8.54867" rx="1.57407" fill="#F3F3F3"/>
|
||||||
|
</svg>
|
||||||
|
After Width: | Height: | Size: 3.6 KiB |
@@ -0,0 +1,5 @@
|
|||||||
|
<svg width="20" height="20" viewBox="0 0 20 20" fill="none" xmlns="http://www.w3.org/2000/svg">
|
||||||
|
<g id="Icon/Utility Icon/line/warning">
|
||||||
|
<path id="Union" fill-rule="evenodd" clip-rule="evenodd" d="M11.8265 2.57765L19.6724 15.1369C20.0876 15.8014 20.1095 16.6395 19.7298 17.3248C19.3513 18.0101 18.6285 18.4352 17.8459 18.4352H2.15406C1.37142 18.4352 0.648615 18.0101 0.270115 17.3248C-0.109605 16.6395 -0.0876187 15.8014 0.327499 15.1369L8.17341 2.57765C8.569 1.94364 9.25275 1.56494 9.99998 1.56494C10.7472 1.56494 11.431 1.94364 11.8265 2.57765ZM17.8459 16.5589C17.9887 16.5589 18.0608 16.4685 18.0901 16.4148C18.1194 16.361 18.1585 16.2535 18.0828 16.1314L10.2369 3.57211C10.1661 3.4585 10.0574 3.44141 10 3.44141C9.94262 3.44141 9.83395 3.4585 9.76313 3.57211L1.91722 16.1314C1.84151 16.2535 1.88058 16.361 1.90988 16.4148C1.93918 16.4685 2.01122 16.5589 2.15407 16.5589H17.8459ZM9.99995 12.1893C10.5176 12.1893 10.9377 11.769 10.9377 11.2511V7.69991C10.9377 7.18195 10.5176 6.76172 9.99995 6.76172C9.48226 6.76172 9.06225 7.18195 9.06225 7.69991V11.2511C9.06225 11.7691 9.48226 12.1893 9.99995 12.1893ZM9.99996 15.6293C9.30946 15.6293 8.7497 15.0692 8.7497 14.3784C8.7497 13.6875 9.30946 13.1274 9.99996 13.1274C10.6905 13.1274 11.2502 13.6875 11.2502 14.3784C11.2502 15.0692 10.6905 15.6293 9.99996 15.6293Z" fill="#F9AC2A"/>
|
||||||
|
</g>
|
||||||
|
</svg>
|
||||||
|
After Width: | Height: | Size: 1.3 KiB |
@@ -0,0 +1,3 @@
|
|||||||
|
<template>
|
||||||
|
<div class="call-header"></div>
|
||||||
|
</template>
|
||||||
@@ -0,0 +1,82 @@
|
|||||||
|
<template>
|
||||||
|
<div class="time">
|
||||||
|
<div class="time-minute">{{ minute || '00' }}</div>
|
||||||
|
<div class="time-colon">:</div>
|
||||||
|
<div class="time-second">{{ second || '00' }}</div>
|
||||||
|
</div>
|
||||||
|
</template>
|
||||||
|
|
||||||
|
<script setup>
|
||||||
|
import { limitTime, tipsRemainingTime } from '@/enums';
|
||||||
|
|
||||||
|
const start = defineModel();
|
||||||
|
|
||||||
|
const emits = defineEmits(['timeUp']);
|
||||||
|
|
||||||
|
const remainingTime = ref();
|
||||||
|
const minute = ref();
|
||||||
|
const second = ref();
|
||||||
|
const timeInterval = ref(null);
|
||||||
|
|
||||||
|
const startCount = () => {
|
||||||
|
remainingTime.value = limitTime;
|
||||||
|
updateCountDown();
|
||||||
|
timeInterval.value = setInterval(() => {
|
||||||
|
updateCountDown();
|
||||||
|
}, 1000);
|
||||||
|
};
|
||||||
|
const updateCountDown = () => {
|
||||||
|
let minutes = Math.floor(remainingTime.value / 60);
|
||||||
|
let seconds = remainingTime.value % 60;
|
||||||
|
|
||||||
|
// 格式化分钟和秒,确保它们是两位数
|
||||||
|
minute.value = minutes < 10 ? '0' + minutes : minutes;
|
||||||
|
second.value = seconds < 10 ? '0' + seconds : seconds;
|
||||||
|
|
||||||
|
// 剩余1分钟提示用户
|
||||||
|
if (remainingTime.value === tipsRemainingTime) {
|
||||||
|
ElMessage({
|
||||||
|
type: 'warning',
|
||||||
|
message: `This call will disconnect in ${tipsRemainingTime} seconds.`,
|
||||||
|
duration: 3000,
|
||||||
|
customClass: 'time-warning'
|
||||||
|
});
|
||||||
|
}
|
||||||
|
// 防止倒计时变成负数
|
||||||
|
if (remainingTime.value > 0) {
|
||||||
|
remainingTime.value--;
|
||||||
|
} else {
|
||||||
|
clearInterval(timeInterval);
|
||||||
|
emits('timeUp');
|
||||||
|
}
|
||||||
|
};
|
||||||
|
watch(
|
||||||
|
() => start.value,
|
||||||
|
newVal => {
|
||||||
|
timeInterval.value && clearInterval(timeInterval.value);
|
||||||
|
if (newVal) {
|
||||||
|
startCount();
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{ immediate: true }
|
||||||
|
);
|
||||||
|
</script>
|
||||||
|
<style lang="less" scoped>
|
||||||
|
.time {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
.time-minute,
|
||||||
|
.time-second {
|
||||||
|
width: 26px;
|
||||||
|
height: 26px;
|
||||||
|
display: flex;
|
||||||
|
justify-content: center;
|
||||||
|
align-items: center;
|
||||||
|
border-radius: 3.848px;
|
||||||
|
background: rgba(47, 47, 47, 0.5);
|
||||||
|
}
|
||||||
|
.time-colon {
|
||||||
|
margin: 0 3px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
</style>
|
||||||
@@ -0,0 +1,23 @@
|
|||||||
|
<template>
|
||||||
|
<div class="delay-tips">
|
||||||
|
<span>当前发生延迟,目前延迟{{ delayTimestamp }}ms,积压{{ delayCount * 200 }}ms未发</span>
|
||||||
|
</div>
|
||||||
|
</template>
|
||||||
|
<script setup>
|
||||||
|
defineProps({
|
||||||
|
delayTimestamp: {
|
||||||
|
type: Number,
|
||||||
|
defalult: 0
|
||||||
|
},
|
||||||
|
delayCount: {
|
||||||
|
type: Number,
|
||||||
|
defalult: 0
|
||||||
|
}
|
||||||
|
});
|
||||||
|
</script>
|
||||||
|
<style lang="less" scoped>
|
||||||
|
.delay-tips {
|
||||||
|
font-size: 12px;
|
||||||
|
color: #dc3545;
|
||||||
|
}
|
||||||
|
</style>
|
||||||
@@ -0,0 +1,36 @@
|
|||||||
|
<template>
|
||||||
|
<div class="extra-info">
|
||||||
|
<div class="model-version" v-if="modelVersion">模型版本: {{ modelVersion }}</div>
|
||||||
|
<div class="web-version">前端版本: {{ webVersion }}</div>
|
||||||
|
</div>
|
||||||
|
</template>
|
||||||
|
|
||||||
|
<script setup>
|
||||||
|
defineProps({
|
||||||
|
modelVersion: {
|
||||||
|
type: String,
|
||||||
|
default: ''
|
||||||
|
},
|
||||||
|
webVersion: {
|
||||||
|
type: String,
|
||||||
|
default: ''
|
||||||
|
}
|
||||||
|
});
|
||||||
|
</script>
|
||||||
|
|
||||||
|
<style lang="less" scoped>
|
||||||
|
.extra-info {
|
||||||
|
position: fixed;
|
||||||
|
top: 62px;
|
||||||
|
left: 4vw;
|
||||||
|
display: flex;
|
||||||
|
.model-version,
|
||||||
|
.web-version {
|
||||||
|
font-size: 12px;
|
||||||
|
color: red;
|
||||||
|
}
|
||||||
|
.model-version {
|
||||||
|
margin-right: 16px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
</style>
|
||||||
@@ -0,0 +1,67 @@
|
|||||||
|
<template>
|
||||||
|
<div class="ideas">
|
||||||
|
<div class="ideas-title">
|
||||||
|
<img src="@/assets/images/ideas-icon.png " />
|
||||||
|
<span>Convsersation ideas</span>
|
||||||
|
</div>
|
||||||
|
<div class="ideas-content">
|
||||||
|
<div class="ideas-content-item" v-for="(item, index) in ideasList" :key="index">{{ item }}</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</template>
|
||||||
|
|
||||||
|
<script setup>
|
||||||
|
defineProps({
|
||||||
|
ideasList: {
|
||||||
|
type: Array,
|
||||||
|
default: () => []
|
||||||
|
}
|
||||||
|
});
|
||||||
|
</script>
|
||||||
|
|
||||||
|
<style lang="less" scoped>
|
||||||
|
.ideas {
|
||||||
|
margin-top: 16px;
|
||||||
|
box-shadow: 0 0 0 0.5px #e0e0e0;
|
||||||
|
border-radius: 12px;
|
||||||
|
padding: 18px 28px;
|
||||||
|
&-title {
|
||||||
|
font-size: 20px;
|
||||||
|
font-weight: 500;
|
||||||
|
margin-bottom: 20px;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
img {
|
||||||
|
width: 24px;
|
||||||
|
height: 24px;
|
||||||
|
margin-right: 10px;
|
||||||
|
}
|
||||||
|
span {
|
||||||
|
color: #171717;
|
||||||
|
font-family: PingFang SC;
|
||||||
|
font-size: 16px;
|
||||||
|
font-style: normal;
|
||||||
|
font-weight: 500;
|
||||||
|
line-height: normal;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
&-content {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: repeat(3, 1fr);
|
||||||
|
gap: 8px;
|
||||||
|
&-item {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
border-radius: 10px;
|
||||||
|
background: #eaefff;
|
||||||
|
padding: 10px 24px;
|
||||||
|
color: #7579eb;
|
||||||
|
font-family: PingFang SC;
|
||||||
|
font-size: 14px;
|
||||||
|
font-style: normal;
|
||||||
|
font-weight: 400;
|
||||||
|
line-height: normal;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
</style>
|
||||||
@@ -0,0 +1,110 @@
|
|||||||
|
<template>
|
||||||
|
<div class="like-box">
|
||||||
|
<div class="like-btn" @click="selectFeedbackStatus('like')">
|
||||||
|
<img v-if="feedbackStatus === '' || feedbackStatus === 'dislike'" src="@/assets/images/zan.png" />
|
||||||
|
<img v-else src="@/assets/images/zan-active.png" />
|
||||||
|
</div>
|
||||||
|
<div class="dislike-btn" @click="selectFeedbackStatus('dislike')">
|
||||||
|
<img v-if="feedbackStatus === '' || feedbackStatus === 'like'" src="@/assets/images/cai.png" />
|
||||||
|
<img v-else src="@/assets/images/cai-active.png" />
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<el-dialog
|
||||||
|
v-model="dialogVisible"
|
||||||
|
:title="t('feedbackDialogTitle')"
|
||||||
|
width="400"
|
||||||
|
:align-center="true"
|
||||||
|
@close="cancelFeedback"
|
||||||
|
>
|
||||||
|
<el-input type="textarea" :rows="4" v-model="comment" />
|
||||||
|
<div class="operate-btn">
|
||||||
|
<el-button type="primary" :loading="submitLoading" @click="submitFeedback">确定</el-button>
|
||||||
|
<el-button @click="cancelFeedback">取消</el-button>
|
||||||
|
</div>
|
||||||
|
</el-dialog>
|
||||||
|
</template>
|
||||||
|
<script setup>
|
||||||
|
import { feedback } from '@/apis';
|
||||||
|
import { useI18n } from 'vue-i18n';
|
||||||
|
|
||||||
|
const { t } = useI18n();
|
||||||
|
const feedbackStatus = defineModel('feedbackStatus');
|
||||||
|
const curResponseId = defineModel('curResponseId');
|
||||||
|
const dialogVisible = ref(false);
|
||||||
|
const comment = ref('');
|
||||||
|
const submitLoading = ref(false);
|
||||||
|
const selectFeedbackStatus = val => {
|
||||||
|
if (!curResponseId.value) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
feedbackStatus.value = val;
|
||||||
|
dialogVisible.value = true;
|
||||||
|
};
|
||||||
|
// 提交反馈
|
||||||
|
const submitFeedback = async () => {
|
||||||
|
submitLoading.value = true;
|
||||||
|
const { code, message } = await feedback({
|
||||||
|
response_id: curResponseId.value,
|
||||||
|
rating: feedbackStatus.value,
|
||||||
|
comment: comment.value
|
||||||
|
});
|
||||||
|
submitLoading.value = false;
|
||||||
|
if (code !== 0) {
|
||||||
|
ElMessage({
|
||||||
|
type: 'error',
|
||||||
|
message: message,
|
||||||
|
duration: 3000,
|
||||||
|
customClass: 'system-error'
|
||||||
|
});
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
ElMessage.success('反馈成功');
|
||||||
|
dialogVisible.value = false;
|
||||||
|
setTimeout(() => {
|
||||||
|
feedbackStatus.value = '';
|
||||||
|
}, 2000);
|
||||||
|
};
|
||||||
|
const cancelFeedback = () => {
|
||||||
|
dialogVisible.value = false;
|
||||||
|
feedbackStatus.value = '';
|
||||||
|
};
|
||||||
|
</script>
|
||||||
|
<style lang="less" scoped>
|
||||||
|
.like-box {
|
||||||
|
display: flex;
|
||||||
|
margin: 0 16px;
|
||||||
|
.like-btn,
|
||||||
|
.dislike-btn {
|
||||||
|
width: 26px;
|
||||||
|
height: 26px;
|
||||||
|
background: #f3f3f3;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
justify-content: center;
|
||||||
|
border-radius: 8px;
|
||||||
|
cursor: pointer;
|
||||||
|
&:hover {
|
||||||
|
background: #d1d1d1;
|
||||||
|
}
|
||||||
|
img {
|
||||||
|
width: 16px;
|
||||||
|
height: 16px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.dislike-btn {
|
||||||
|
margin-left: 16px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.operate-btn {
|
||||||
|
margin-top: 20px;
|
||||||
|
display: flex;
|
||||||
|
justify-content: flex-end;
|
||||||
|
.el-button--primary {
|
||||||
|
background: #647fff;
|
||||||
|
border-color: #647fff;
|
||||||
|
&:hover {
|
||||||
|
border-color: #647fff;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
</style>
|
||||||
@@ -0,0 +1,404 @@
|
|||||||
|
<template>
|
||||||
|
<div class="user-config">
|
||||||
|
<div class="user-config-title">模型配置</div>
|
||||||
|
<div class="config-item">
|
||||||
|
<div class="config-item-label">语音打断:</div>
|
||||||
|
<div class="config-item-content">
|
||||||
|
<el-switch
|
||||||
|
v-model="configData.canStopByVoice"
|
||||||
|
inline-prompt
|
||||||
|
active-text="是"
|
||||||
|
inactive-text="否"
|
||||||
|
size="small"
|
||||||
|
:disabled="isCalling"
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="config-item">
|
||||||
|
<div class="config-item-label">视频画质:</div>
|
||||||
|
<div class="config-item-content">
|
||||||
|
<el-radio-group v-model="configData.videoQuality" :disabled="isCalling">
|
||||||
|
<el-radio :value="true">高清</el-radio>
|
||||||
|
<el-radio :value="false">低清</el-radio>
|
||||||
|
</el-radio-group>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="config-item">
|
||||||
|
<div class="config-item-label">VAD阈值:</div>
|
||||||
|
<div class="config-item-content vad-slider">
|
||||||
|
<el-slider
|
||||||
|
v-model="configData.vadThreshold"
|
||||||
|
:min="0.5"
|
||||||
|
:max="1"
|
||||||
|
:step="0.1"
|
||||||
|
size="small"
|
||||||
|
:disabled="isCalling"
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<!-- <div class="timbre-model">
|
||||||
|
<div class="timbre-model-label">音色人物:</div>
|
||||||
|
<div class="timbre-model-content">
|
||||||
|
<el-select
|
||||||
|
v-model="configData.timbreId"
|
||||||
|
style="width: 100%"
|
||||||
|
@change="handleChangePeople"
|
||||||
|
clearable
|
||||||
|
placeholder="请选择"
|
||||||
|
>
|
||||||
|
<el-option v-for="item in peopleList" :key="item.id" :value="item.id" :label="item.name">
|
||||||
|
{{ item.name }}
|
||||||
|
</el-option>
|
||||||
|
</el-select>
|
||||||
|
</div>
|
||||||
|
</div> -->
|
||||||
|
<div class="prompt-item">
|
||||||
|
<div class="prompt-item-label">Assistant_prompt:</div>
|
||||||
|
<div class="prompt-item-content">
|
||||||
|
<el-input
|
||||||
|
type="textarea"
|
||||||
|
:rows="3"
|
||||||
|
v-model="configData.assistantPrompt"
|
||||||
|
resize="none"
|
||||||
|
:disabled="isCalling"
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="config-item">
|
||||||
|
<div class="config-item-label">使用语音prompt:</div>
|
||||||
|
<div class="config-item-content">
|
||||||
|
<el-switch
|
||||||
|
v-model="configData.useAudioPrompt"
|
||||||
|
inline-prompt
|
||||||
|
active-text="是"
|
||||||
|
inactive-text="否"
|
||||||
|
size="small"
|
||||||
|
:disabled="isCalling"
|
||||||
|
@change="handleSelectUseAudioPrompt"
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="voice-prompt-box">
|
||||||
|
<div class="prompt-item" v-if="configData.useAudioPrompt">
|
||||||
|
<div class="prompt-item-label">Voice_clone_prompt:</div>
|
||||||
|
<div class="prompt-item-content">
|
||||||
|
<el-input
|
||||||
|
type="textarea"
|
||||||
|
:rows="8"
|
||||||
|
v-model="configData.voiceClonePrompt"
|
||||||
|
resize="none"
|
||||||
|
:disabled="isCalling"
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="timbre-config" v-if="configData.useAudioPrompt">
|
||||||
|
<div class="timbre-config-label">音色选择:</div>
|
||||||
|
<div class="timbre-config-content">
|
||||||
|
<el-checkbox-group v-model="configData.timbre" @change="handleSelectTimbre" :disabled="isCalling">
|
||||||
|
<el-checkbox :value="1" label="Default Audio"></el-checkbox>
|
||||||
|
<el-upload
|
||||||
|
v-model:file-list="fileList"
|
||||||
|
action=""
|
||||||
|
:multiple="false"
|
||||||
|
:on-change="handleChangeFile"
|
||||||
|
:auto-upload="false"
|
||||||
|
:show-file-list="false"
|
||||||
|
:disabled="isCalling"
|
||||||
|
accept="audio/*"
|
||||||
|
>
|
||||||
|
<el-checkbox :value="2">
|
||||||
|
<!-- <span>Customization: Upload Audio</span> -->
|
||||||
|
<span>Customization</span>
|
||||||
|
<SvgIcon name="upload" className="checkbox-icon" />
|
||||||
|
</el-checkbox>
|
||||||
|
</el-upload>
|
||||||
|
</el-checkbox-group>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="file-content" v-if="fileName">
|
||||||
|
<SvgIcon name="document" class="document-icon" />
|
||||||
|
<span class="file-name">{{ fileName }}</span>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</template>
|
||||||
|
|
||||||
|
<script setup>
|
||||||
|
const isCalling = defineModel('isCalling');
|
||||||
|
const type = defineModel('type');
|
||||||
|
|
||||||
|
let defaultVoiceClonePrompt =
|
||||||
|
'你是一个AI助手。你能接受视频,音频和文本输入并输出语音和文本。模仿输入音频中的声音特征。';
|
||||||
|
let defaultAssistantPrompt = '作为助手,你将使用这种声音风格说话。';
|
||||||
|
|
||||||
|
const fileList = ref([]);
|
||||||
|
const fileName = ref('');
|
||||||
|
|
||||||
|
const configData = ref({
|
||||||
|
canStopByVoice: false,
|
||||||
|
videoQuality: false,
|
||||||
|
useAudioPrompt: true,
|
||||||
|
vadThreshold: 0.8,
|
||||||
|
voiceClonePrompt: defaultVoiceClonePrompt,
|
||||||
|
assistantPrompt: defaultAssistantPrompt,
|
||||||
|
timbre: [1],
|
||||||
|
audioFormat: 'mp3',
|
||||||
|
base64Str: '',
|
||||||
|
timbreId: ''
|
||||||
|
});
|
||||||
|
|
||||||
|
const peopleList = [
|
||||||
|
{
|
||||||
|
id: 1,
|
||||||
|
name: 'Trump',
|
||||||
|
voiceClonePrompt: '',
|
||||||
|
assistantPrompt: ''
|
||||||
|
},
|
||||||
|
{
|
||||||
|
id: 2,
|
||||||
|
name: '说相声',
|
||||||
|
voiceClonePrompt: '克隆音频提示中的音色以生成语音',
|
||||||
|
assistantPrompt: '请角色扮演这段音频,请以相声演员的口吻说话'
|
||||||
|
},
|
||||||
|
{
|
||||||
|
id: 3,
|
||||||
|
name: '默认',
|
||||||
|
voiceClonePrompt: defaultVoiceClonePrompt,
|
||||||
|
assistantPrompt: defaultAssistantPrompt
|
||||||
|
}
|
||||||
|
];
|
||||||
|
watch(
|
||||||
|
() => type.value,
|
||||||
|
val => {
|
||||||
|
if (val === 'video') {
|
||||||
|
console.log('val: ', val);
|
||||||
|
defaultVoiceClonePrompt =
|
||||||
|
'你是一个AI助手。你能接受视频,音频和文本输入并输出语音和文本。模仿输入音频中的声音特征。';
|
||||||
|
defaultAssistantPrompt = '作为助手,你将使用这种声音风格说话。';
|
||||||
|
} else {
|
||||||
|
defaultVoiceClonePrompt = '克隆音频提示中的音色以生成语音。';
|
||||||
|
defaultAssistantPrompt = 'Your task is to be a helpful assistant using this voice pattern.';
|
||||||
|
}
|
||||||
|
configData.value.voiceClonePrompt = defaultVoiceClonePrompt;
|
||||||
|
configData.value.assistantPrompt = defaultAssistantPrompt;
|
||||||
|
},
|
||||||
|
{ immediate: true }
|
||||||
|
);
|
||||||
|
onMounted(() => {
|
||||||
|
handleSetStorage();
|
||||||
|
});
|
||||||
|
const handleSelectTimbre = e => {
|
||||||
|
if (e.length > 1) {
|
||||||
|
const val = e[e.length - 1];
|
||||||
|
configData.value.timbre = [val];
|
||||||
|
// 默认音色
|
||||||
|
if (val === 1) {
|
||||||
|
configData.value.audioFormat = 'mp3';
|
||||||
|
configData.value.base64Str = '';
|
||||||
|
fileList.value = [];
|
||||||
|
fileName.value = '';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
};
|
||||||
|
const handleChangeFile = file => {
|
||||||
|
if (isAudio(file) && sizeNotExceed(file)) {
|
||||||
|
fileList.value = [file];
|
||||||
|
fileName.value = file.name;
|
||||||
|
configData.value.timbre = [2];
|
||||||
|
handleUpload();
|
||||||
|
} else {
|
||||||
|
ElMessage.error('Please upload audio file and size not exceed 10MB');
|
||||||
|
}
|
||||||
|
};
|
||||||
|
const isAudio = file => {
|
||||||
|
return file.raw.type.includes('audio');
|
||||||
|
};
|
||||||
|
const sizeNotExceed = file => {
|
||||||
|
return file.size / 1024 / 1024 <= 10;
|
||||||
|
};
|
||||||
|
const handleUpload = async () => {
|
||||||
|
const file = fileList.value[0].raw;
|
||||||
|
if (file) {
|
||||||
|
const reader = new FileReader();
|
||||||
|
reader.onload = e => {
|
||||||
|
const base64String = e.target.result.split(',')[1];
|
||||||
|
configData.value.audioFormat = file.name.split('.')[1];
|
||||||
|
configData.value.base64Str = base64String;
|
||||||
|
};
|
||||||
|
reader.readAsDataURL(file);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
const handleSelectUseAudioPrompt = val => {
|
||||||
|
if (val) {
|
||||||
|
configData.value.voiceClonePrompt = defaultVoiceClonePrompt;
|
||||||
|
configData.value.assistantPrompt = defaultAssistantPrompt;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
// 配置发生变化,更新到localstorage中
|
||||||
|
watch(configData.value, () => {
|
||||||
|
handleSetStorage();
|
||||||
|
});
|
||||||
|
const handleSetStorage = () => {
|
||||||
|
const { timbre, canStopByVoice, ...others } = configData.value;
|
||||||
|
const defaultConfigData = {
|
||||||
|
canStopByVoice,
|
||||||
|
...others
|
||||||
|
};
|
||||||
|
localStorage.setItem('configData', JSON.stringify(defaultConfigData));
|
||||||
|
localStorage.setItem('canStopByVoice', canStopByVoice);
|
||||||
|
};
|
||||||
|
const handleChangePeople = val => {
|
||||||
|
console.log('val: ', val);
|
||||||
|
if (!val) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
const index = peopleList.findIndex(item => item.id === val);
|
||||||
|
configData.value.voiceClonePrompt = peopleList[index].voiceClonePrompt;
|
||||||
|
configData.value.assistantPrompt = peopleList[index].assistantPrompt;
|
||||||
|
configData.value.timbre = [1];
|
||||||
|
};
|
||||||
|
</script>
|
||||||
|
<style lang="less">
|
||||||
|
.user-config {
|
||||||
|
&-title {
|
||||||
|
height: 61px;
|
||||||
|
padding: 18px 18px 0;
|
||||||
|
color: rgba(23, 23, 23, 0.9);
|
||||||
|
font-family: PingFang SC;
|
||||||
|
font-size: 16px;
|
||||||
|
font-style: normal;
|
||||||
|
font-weight: 500;
|
||||||
|
line-height: normal;
|
||||||
|
}
|
||||||
|
.config-item {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
width: 100%;
|
||||||
|
padding: 0 0 0 18px;
|
||||||
|
margin-bottom: 12px;
|
||||||
|
&-label {
|
||||||
|
width: 120px;
|
||||||
|
flex-shrink: 0;
|
||||||
|
}
|
||||||
|
&-content {
|
||||||
|
flex: 1;
|
||||||
|
margin-left: 16px;
|
||||||
|
.el-radio-group {
|
||||||
|
.el-radio {
|
||||||
|
width: 50px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
&-content.vad-slider {
|
||||||
|
width: 80%;
|
||||||
|
padding-left: 7px;
|
||||||
|
margin-right: 20px;
|
||||||
|
.el-slider__button {
|
||||||
|
width: 14px;
|
||||||
|
height: 14px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.timbre-config {
|
||||||
|
padding: 0 0 0 18px;
|
||||||
|
&-label {
|
||||||
|
margin-bottom: 12px;
|
||||||
|
}
|
||||||
|
&-content {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
.el-checkbox-group {
|
||||||
|
display: flex;
|
||||||
|
flex-wrap: wrap;
|
||||||
|
flex: 1;
|
||||||
|
> .el-checkbox {
|
||||||
|
margin-right: 12px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.el-checkbox {
|
||||||
|
padding: 8px 16px;
|
||||||
|
border-radius: 10px;
|
||||||
|
background: #eaefff;
|
||||||
|
margin-bottom: 12px;
|
||||||
|
height: 40px;
|
||||||
|
.el-checkbox__input {
|
||||||
|
.el-checkbox__inner {
|
||||||
|
border: 1px solid #4dc100;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.el-checkbox__input.is-checked {
|
||||||
|
.el-checkbox__inner {
|
||||||
|
background: #4dc100;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.el-checkbox__input.is-checked.is-disabled {
|
||||||
|
.el-checkbox__inner::after {
|
||||||
|
border-color: #ffffff;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.el-checkbox__label {
|
||||||
|
color: #7579eb !important;
|
||||||
|
font-family: PingFang SC;
|
||||||
|
font-size: 16px;
|
||||||
|
font-style: normal;
|
||||||
|
font-weight: 400;
|
||||||
|
line-height: normal;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
.checkbox-icon {
|
||||||
|
margin-left: 4px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.el-checkbox + .el-checkbox {
|
||||||
|
margin-left: 12px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.prompt-item {
|
||||||
|
// padding: 0 0 0 18px;
|
||||||
|
margin-bottom: 12px;
|
||||||
|
&-label {
|
||||||
|
// margin-bottom: 16px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.file-content {
|
||||||
|
padding: 0 0 0 18px;
|
||||||
|
font-size: 14px;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
.document-icon {
|
||||||
|
width: 16px;
|
||||||
|
height: 16px;
|
||||||
|
margin-right: 4px;
|
||||||
|
}
|
||||||
|
.file-name {
|
||||||
|
flex: 1;
|
||||||
|
overflow: hidden;
|
||||||
|
white-space: nowrap;
|
||||||
|
text-overflow: ellipsis;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.timbre-model {
|
||||||
|
padding: 0 0 0 18px;
|
||||||
|
margin-bottom: 12px;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
&-label {
|
||||||
|
width: 120px;
|
||||||
|
flex-shrink: 0;
|
||||||
|
}
|
||||||
|
&-content {
|
||||||
|
flex: 1;
|
||||||
|
margin-left: 16px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.voice-prompt-box {
|
||||||
|
border: 1px solid #eaefff;
|
||||||
|
margin-left: 18px;
|
||||||
|
padding: 12px;
|
||||||
|
width: 50%;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
</style>
|
||||||
@@ -0,0 +1,456 @@
|
|||||||
|
<template>
|
||||||
|
<div :class="`user-config ${t('modelConfigTitle') === '模型配置' ? '' : 'en-user-config'}`">
|
||||||
|
<div class="user-config-title">{{ t('modelConfigTitle') }}</div>
|
||||||
|
<div class="config-item">
|
||||||
|
<div class="config-item-label">
|
||||||
|
<span>{{ t('audioInterruptionBtn') }}</span>
|
||||||
|
<el-tooltip class="box-item" effect="dark" :content="t('audioInterruptionTips')" placement="top">
|
||||||
|
<SvgIcon name="question" class="question-icon" /> </el-tooltip
|
||||||
|
>:
|
||||||
|
</div>
|
||||||
|
<div class="config-item-content">
|
||||||
|
<el-switch
|
||||||
|
v-model="configData.canStopByVoice"
|
||||||
|
inline-prompt
|
||||||
|
:active-text="t('yes')"
|
||||||
|
:inactive-text="t('no')"
|
||||||
|
size="small"
|
||||||
|
:disabled="isCalling"
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="config-item" v-if="type === 'video'">
|
||||||
|
<div class="config-item-label">
|
||||||
|
<span>{{ t('videoQualityBtn') }}</span>
|
||||||
|
<el-tooltip class="box-item" effect="dark" :content="t('videoQualityTips')" placement="top">
|
||||||
|
<SvgIcon name="question" class="question-icon" /> </el-tooltip
|
||||||
|
>:
|
||||||
|
</div>
|
||||||
|
<div class="config-item-content">
|
||||||
|
<el-switch
|
||||||
|
v-model="configData.videoQuality"
|
||||||
|
inline-prompt
|
||||||
|
:active-text="t('yes')"
|
||||||
|
:inactive-text="t('no')"
|
||||||
|
size="small"
|
||||||
|
:disabled="isCalling"
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="config-item">
|
||||||
|
<div class="config-item-label">
|
||||||
|
<span>{{ t('vadThresholdBtn') }}</span>
|
||||||
|
<el-tooltip class="box-item" effect="dark" :content="t('vadThresholdTips')" placement="top">
|
||||||
|
<SvgIcon name="question" class="question-icon" /> </el-tooltip
|
||||||
|
>:
|
||||||
|
</div>
|
||||||
|
<div class="config-item-content vad-slider">
|
||||||
|
<el-slider
|
||||||
|
v-model="configData.vadThreshold"
|
||||||
|
:min="0.5"
|
||||||
|
:max="1"
|
||||||
|
:step="0.1"
|
||||||
|
size="small"
|
||||||
|
:disabled="isCalling"
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="prompt-item" v-if="type === 'voice'">
|
||||||
|
<div class="prompt-item-label">
|
||||||
|
<span>{{ t('assistantPromptBtn') }}</span>
|
||||||
|
<el-tooltip class="box-item" effect="dark" :content="t('assistantPromptTips')" placement="top">
|
||||||
|
<SvgIcon name="question" class="question-icon" /> </el-tooltip
|
||||||
|
>:
|
||||||
|
</div>
|
||||||
|
<div class="prompt-item-content">
|
||||||
|
<el-input
|
||||||
|
type="textarea"
|
||||||
|
:rows="3"
|
||||||
|
v-model="configData.assistantPrompt"
|
||||||
|
resize="none"
|
||||||
|
:disabled="isCalling"
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<!-- <div class="config-item">
|
||||||
|
<div class="config-item-label">{{ t('useVoicePromptBtn') }}:</div>
|
||||||
|
<div class="config-item-content">
|
||||||
|
<el-switch
|
||||||
|
v-model="configData.useAudioPrompt"
|
||||||
|
inline-prompt
|
||||||
|
:active-text="t('yes')"
|
||||||
|
:inactive-text="t('no')"
|
||||||
|
size="small"
|
||||||
|
:disabled="isCalling"
|
||||||
|
@change="handleSelectUseAudioPrompt"
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
</div> -->
|
||||||
|
<div class="timbre-model">
|
||||||
|
<div class="timbre-model-label">
|
||||||
|
<span>{{ t('toneColorOptions') }}</span>
|
||||||
|
<el-tooltip class="box-item" effect="dark" :content="t('toneColorOptionsTips')" placement="top">
|
||||||
|
<SvgIcon name="question" class="question-icon" /> </el-tooltip
|
||||||
|
>:
|
||||||
|
</div>
|
||||||
|
<div class="timbre-model-content">
|
||||||
|
<el-select
|
||||||
|
v-model="configData.useAudioPrompt"
|
||||||
|
style="width: 100%"
|
||||||
|
@change="handleChangePeople"
|
||||||
|
placeholder="请选择"
|
||||||
|
:disabled="isCalling"
|
||||||
|
>
|
||||||
|
<el-option :value="0" :label="t('nullOption')">{{ t('nullOption') }}</el-option>
|
||||||
|
<el-option :value="1" :label="t('defaultOption')">{{ t('defaultOption') }}</el-option>
|
||||||
|
<el-option :value="2" :label="t('femaleOption')">{{ t('femaleOption') }}</el-option>
|
||||||
|
<el-option :value="3" :label="t('maleOption')">{{ t('maleOption') }}</el-option>
|
||||||
|
</el-select>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<!-- <div class="prompt-item">
|
||||||
|
<div class="prompt-item-label">
|
||||||
|
<span>{{ t('voiceClonePromptInput') }}</span>
|
||||||
|
<el-tooltip class="box-item" effect="dark" :content="t('voiceClonePromptTips')" placement="top">
|
||||||
|
<SvgIcon name="question" class="question-icon" /> </el-tooltip
|
||||||
|
>:
|
||||||
|
</div>
|
||||||
|
<div class="prompt-item-content">
|
||||||
|
<el-input
|
||||||
|
type="textarea"
|
||||||
|
:rows="3"
|
||||||
|
v-model="configData.voiceClonePrompt"
|
||||||
|
resize="none"
|
||||||
|
:disabled="true"
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
</div> -->
|
||||||
|
<!-- <div class="timbre-config" v-if="configData.useAudioPrompt">
|
||||||
|
<div class="timbre-config-label">{{ t('audioChoiceBtn') }}:</div>
|
||||||
|
<div class="timbre-config-content">
|
||||||
|
<el-checkbox-group v-model="configData.timbre" @change="handleSelectTimbre" :disabled="isCalling">
|
||||||
|
<el-checkbox :value="1" :label="t('defaultAudioBtn')"></el-checkbox>
|
||||||
|
<el-upload
|
||||||
|
v-model:file-list="fileList"
|
||||||
|
action=""
|
||||||
|
:multiple="false"
|
||||||
|
:on-change="handleChangeFile"
|
||||||
|
:auto-upload="false"
|
||||||
|
:show-file-list="false"
|
||||||
|
:disabled="isCalling"
|
||||||
|
accept="audio/*"
|
||||||
|
>
|
||||||
|
<el-checkbox :value="2">
|
||||||
|
<span>{{ t('customizationBtn') }}</span>
|
||||||
|
<SvgIcon name="upload" className="checkbox-icon" />
|
||||||
|
</el-checkbox>
|
||||||
|
</el-upload>
|
||||||
|
</el-checkbox-group>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="file-content" v-if="fileName">
|
||||||
|
<SvgIcon name="document" class="document-icon" />
|
||||||
|
<span class="file-name">{{ fileName }}</span>
|
||||||
|
</div> -->
|
||||||
|
</div>
|
||||||
|
</template>
|
||||||
|
|
||||||
|
<script setup>
|
||||||
|
const isCalling = defineModel('isCalling');
|
||||||
|
const type = defineModel('type');
|
||||||
|
import { useI18n } from 'vue-i18n';
|
||||||
|
|
||||||
|
const { t, locale } = useI18n();
|
||||||
|
|
||||||
|
let defaultVoiceClonePrompt =
|
||||||
|
'你是一个AI助手。你能接受视频,音频和文本输入并输出语音和文本。模仿输入音频中的声音特征。';
|
||||||
|
let defaultAssistantPrompt = '';
|
||||||
|
|
||||||
|
const fileList = ref([]);
|
||||||
|
const fileName = ref('');
|
||||||
|
|
||||||
|
const configData = ref({
|
||||||
|
canStopByVoice: false,
|
||||||
|
videoQuality: false,
|
||||||
|
useAudioPrompt: 1,
|
||||||
|
vadThreshold: 0.8,
|
||||||
|
voiceClonePrompt: defaultVoiceClonePrompt,
|
||||||
|
assistantPrompt: defaultAssistantPrompt,
|
||||||
|
timbre: [1],
|
||||||
|
audioFormat: 'mp3',
|
||||||
|
base64Str: ''
|
||||||
|
});
|
||||||
|
|
||||||
|
// let peopleList = [];
|
||||||
|
// watch(
|
||||||
|
// () => type.value,
|
||||||
|
// val => {
|
||||||
|
// console.log('val: ', val);
|
||||||
|
// if (val === 'video') {
|
||||||
|
// defaultVoiceClonePrompt =
|
||||||
|
// '你是一个AI助手。你能接受视频,音频和文本输入并输出语音和文本。模仿输入音频中的声音特征。';
|
||||||
|
// defaultAssistantPrompt = '作为助手,你将使用这种声音风格说话。';
|
||||||
|
// } else {
|
||||||
|
// defaultVoiceClonePrompt = '克隆音频提示中的音色以生成语音。';
|
||||||
|
// defaultAssistantPrompt = 'Your task is to be a helpful assistant using this voice pattern.';
|
||||||
|
// }
|
||||||
|
// configData.value.voiceClonePrompt = defaultVoiceClonePrompt;
|
||||||
|
// configData.value.assistantPrompt = defaultAssistantPrompt;
|
||||||
|
// },
|
||||||
|
// { immediate: true }
|
||||||
|
// );
|
||||||
|
watch(
|
||||||
|
locale,
|
||||||
|
(newLocale, oldLocale) => {
|
||||||
|
console.log(`Language switched from ${oldLocale} to ${newLocale}`);
|
||||||
|
if (newLocale === 'zh' && type.value === 'video') {
|
||||||
|
defaultAssistantPrompt = '作为助手,你将使用这种声音风格说话。';
|
||||||
|
} else if (newLocale === 'zh' && type.value === 'voice') {
|
||||||
|
defaultAssistantPrompt = '作为助手,你将使用这种声音风格说话。';
|
||||||
|
} else if (newLocale === 'en' && type.value === 'video') {
|
||||||
|
defaultAssistantPrompt = 'As an assistant, you will speak using this voice style.';
|
||||||
|
} else {
|
||||||
|
defaultAssistantPrompt = 'As an assistant, you will speak using this voice style.';
|
||||||
|
}
|
||||||
|
configData.value.assistantPrompt = defaultAssistantPrompt;
|
||||||
|
},
|
||||||
|
{ immediate: true }
|
||||||
|
);
|
||||||
|
onMounted(() => {
|
||||||
|
handleSetStorage();
|
||||||
|
});
|
||||||
|
const handleSelectTimbre = e => {
|
||||||
|
if (e.length > 1) {
|
||||||
|
const val = e[e.length - 1];
|
||||||
|
configData.value.timbre = [val];
|
||||||
|
// 默认音色
|
||||||
|
if (val === 1) {
|
||||||
|
configData.value.audioFormat = 'mp3';
|
||||||
|
configData.value.base64Str = '';
|
||||||
|
fileList.value = [];
|
||||||
|
fileName.value = '';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
};
|
||||||
|
const handleChangeFile = file => {
|
||||||
|
if (isAudio(file) && sizeNotExceed(file)) {
|
||||||
|
fileList.value = [file];
|
||||||
|
fileName.value = file.name;
|
||||||
|
configData.value.timbre = [2];
|
||||||
|
handleUpload();
|
||||||
|
} else {
|
||||||
|
ElMessage.error('Please upload audio file and size not exceed 10MB');
|
||||||
|
}
|
||||||
|
};
|
||||||
|
const isAudio = file => {
|
||||||
|
return file.raw.type.includes('audio');
|
||||||
|
};
|
||||||
|
const sizeNotExceed = file => {
|
||||||
|
return file.size / 1024 / 1024 <= 10;
|
||||||
|
};
|
||||||
|
const handleUpload = async () => {
|
||||||
|
const file = fileList.value[0].raw;
|
||||||
|
if (file) {
|
||||||
|
const reader = new FileReader();
|
||||||
|
reader.onload = e => {
|
||||||
|
const base64String = e.target.result.split(',')[1];
|
||||||
|
configData.value.audioFormat = file.name.split('.')[1];
|
||||||
|
configData.value.base64Str = base64String;
|
||||||
|
};
|
||||||
|
reader.readAsDataURL(file);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
const handleSelectUseAudioPrompt = val => {
|
||||||
|
if (val) {
|
||||||
|
configData.value.voiceClonePrompt = defaultVoiceClonePrompt;
|
||||||
|
configData.value.assistantPrompt = defaultAssistantPrompt;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
// 配置发生变化,更新到localstorage中
|
||||||
|
watch(configData.value, () => {
|
||||||
|
handleSetStorage();
|
||||||
|
});
|
||||||
|
const handleSetStorage = () => {
|
||||||
|
const { timbre, canStopByVoice, ...others } = configData.value;
|
||||||
|
const defaultConfigData = {
|
||||||
|
canStopByVoice,
|
||||||
|
...others
|
||||||
|
};
|
||||||
|
localStorage.setItem('configData', JSON.stringify(defaultConfigData));
|
||||||
|
localStorage.setItem('canStopByVoice', canStopByVoice);
|
||||||
|
};
|
||||||
|
const handleChangePeople = val => {
|
||||||
|
console.log('val: ', val);
|
||||||
|
// const index = peopleList.findIndex(item => item.id === val);
|
||||||
|
configData.value.voiceClonePrompt = defaultVoiceClonePrompt;
|
||||||
|
configData.value.assistantPrompt = defaultAssistantPrompt;
|
||||||
|
configData.value.timbre = [1];
|
||||||
|
};
|
||||||
|
</script>
|
||||||
|
<style lang="less" scoped>
|
||||||
|
.user-config {
|
||||||
|
&-title {
|
||||||
|
height: 61px;
|
||||||
|
padding: 18px 18px 0;
|
||||||
|
color: rgba(23, 23, 23, 0.9);
|
||||||
|
font-family: PingFang SC;
|
||||||
|
font-size: 16px;
|
||||||
|
font-style: normal;
|
||||||
|
font-weight: 500;
|
||||||
|
line-height: normal;
|
||||||
|
}
|
||||||
|
.config-item {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
width: 100%;
|
||||||
|
padding: 0 0 0 18px;
|
||||||
|
margin-bottom: 20px;
|
||||||
|
&-label {
|
||||||
|
width: 120px;
|
||||||
|
flex-shrink: 0;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
}
|
||||||
|
&-content {
|
||||||
|
flex: 1;
|
||||||
|
margin-left: 16px;
|
||||||
|
.el-radio-group {
|
||||||
|
.el-radio {
|
||||||
|
width: 50px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
&-content.vad-slider {
|
||||||
|
width: 80%;
|
||||||
|
padding-left: 7px;
|
||||||
|
margin-right: 20px;
|
||||||
|
.el-slider__button {
|
||||||
|
width: 14px;
|
||||||
|
height: 14px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.timbre-config {
|
||||||
|
padding: 0 0 0 18px;
|
||||||
|
&-label {
|
||||||
|
margin-bottom: 20px;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
}
|
||||||
|
&-content {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
.el-checkbox-group {
|
||||||
|
display: flex;
|
||||||
|
flex-wrap: wrap;
|
||||||
|
flex: 1;
|
||||||
|
> .el-checkbox {
|
||||||
|
margin-right: 12px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.el-checkbox {
|
||||||
|
padding: 8px 16px;
|
||||||
|
border-radius: 10px;
|
||||||
|
background: #eaefff;
|
||||||
|
margin-bottom: 12px;
|
||||||
|
height: 40px;
|
||||||
|
.el-checkbox__input {
|
||||||
|
.el-checkbox__inner {
|
||||||
|
border: 1px solid #4dc100;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.el-checkbox__input.is-checked {
|
||||||
|
.el-checkbox__inner {
|
||||||
|
background: #4dc100;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.el-checkbox__input.is-checked.is-disabled {
|
||||||
|
.el-checkbox__inner::after {
|
||||||
|
border-color: #ffffff;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.el-checkbox__label {
|
||||||
|
color: #7579eb !important;
|
||||||
|
font-family: PingFang SC;
|
||||||
|
font-size: 16px;
|
||||||
|
font-style: normal;
|
||||||
|
font-weight: 400;
|
||||||
|
line-height: normal;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
.checkbox-icon {
|
||||||
|
margin-left: 4px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.el-checkbox + .el-checkbox {
|
||||||
|
margin-left: 12px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.prompt-item {
|
||||||
|
padding: 0 0 0 18px;
|
||||||
|
margin-bottom: 20px;
|
||||||
|
&-label {
|
||||||
|
// margin-bottom: 16px;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.file-content {
|
||||||
|
padding: 0 0 0 18px;
|
||||||
|
font-size: 14px;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
.document-icon {
|
||||||
|
width: 16px;
|
||||||
|
height: 16px;
|
||||||
|
margin-right: 4px;
|
||||||
|
}
|
||||||
|
.file-name {
|
||||||
|
flex: 1;
|
||||||
|
overflow: hidden;
|
||||||
|
white-space: nowrap;
|
||||||
|
text-overflow: ellipsis;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.timbre-model {
|
||||||
|
padding: 0 0 0 18px;
|
||||||
|
margin-bottom: 20px;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
&-label {
|
||||||
|
width: 120px;
|
||||||
|
flex-shrink: 0;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
}
|
||||||
|
&-content {
|
||||||
|
flex: 1;
|
||||||
|
margin-left: 16px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.en-user-config {
|
||||||
|
.config-item-label {
|
||||||
|
width: 160px;
|
||||||
|
}
|
||||||
|
.timbre-model-label {
|
||||||
|
width: 160px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.question-icon {
|
||||||
|
width: 14px;
|
||||||
|
height: 14px;
|
||||||
|
cursor: pointer;
|
||||||
|
margin-left: 6px;
|
||||||
|
}
|
||||||
|
</style>
|
||||||
|
<style lang="less">
|
||||||
|
.el-switch--small .el-switch__core {
|
||||||
|
min-width: 50px;
|
||||||
|
}
|
||||||
|
.el-popper.is-dark {
|
||||||
|
max-width: 300px;
|
||||||
|
}
|
||||||
|
</style>
|
||||||
@@ -0,0 +1,91 @@
|
|||||||
|
<template>
|
||||||
|
<div class="output-area">
|
||||||
|
<div
|
||||||
|
:class="`output-area-item ${item.type === 'USER' ? 'user-item' : 'bot-item'}`"
|
||||||
|
:key="index"
|
||||||
|
v-for="(item, index) in outputData"
|
||||||
|
>
|
||||||
|
<div v-if="item.type === 'USER'" class="user-input">
|
||||||
|
<audio v-if="item.audio" :src="item.audio" controls></audio>
|
||||||
|
</div>
|
||||||
|
<div v-else class="bot-output">
|
||||||
|
<div class="output-item">{{ item.text }}</div>
|
||||||
|
<audio v-if="item.audio" :src="item.audio" controls></audio>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</template>
|
||||||
|
|
||||||
|
<script setup>
|
||||||
|
const props = defineProps({
|
||||||
|
outputData: {
|
||||||
|
type: Array,
|
||||||
|
default: () => []
|
||||||
|
},
|
||||||
|
containerClass: {
|
||||||
|
type: String,
|
||||||
|
default: ''
|
||||||
|
}
|
||||||
|
});
|
||||||
|
watch(
|
||||||
|
() => props.outputData,
|
||||||
|
newVal => {
|
||||||
|
nextTick(() => {
|
||||||
|
if (newVal && props.containerClass) {
|
||||||
|
let dom = document.querySelector(`.${props.containerClass}`);
|
||||||
|
if (dom) {
|
||||||
|
dom.scrollTop = dom.scrollHeight;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
});
|
||||||
|
},
|
||||||
|
{ deep: true }
|
||||||
|
);
|
||||||
|
</script>
|
||||||
|
|
||||||
|
<style lang="less" scoped>
|
||||||
|
.output-area {
|
||||||
|
display: flex;
|
||||||
|
flex-direction: column;
|
||||||
|
&-item {
|
||||||
|
width: fit-content;
|
||||||
|
}
|
||||||
|
&-item + &-item {
|
||||||
|
margin-top: 16px;
|
||||||
|
}
|
||||||
|
&-item.user-item {
|
||||||
|
align-self: flex-end;
|
||||||
|
.user-input {
|
||||||
|
}
|
||||||
|
}
|
||||||
|
&-item.bot-item {
|
||||||
|
align-self: flex-start;
|
||||||
|
width: 100%;
|
||||||
|
.bot-output {
|
||||||
|
width: 100%;
|
||||||
|
display: flex;
|
||||||
|
flex-direction: column;
|
||||||
|
.output-item {
|
||||||
|
padding: 8px 24px;
|
||||||
|
border-radius: 10px;
|
||||||
|
color: #202224;
|
||||||
|
background: #f3f3f3;
|
||||||
|
max-width: 90%;
|
||||||
|
width: fit-content;
|
||||||
|
font-family: PingFang SC;
|
||||||
|
font-size: 16px;
|
||||||
|
font-style: normal;
|
||||||
|
font-weight: 400;
|
||||||
|
line-height: normal;
|
||||||
|
word-break: break-all;
|
||||||
|
word-wrap: break-word;
|
||||||
|
white-space: pre-wrap;
|
||||||
|
display: inline-block;
|
||||||
|
}
|
||||||
|
.output-item + audio {
|
||||||
|
margin-top: 16px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
</style>
|
||||||
@@ -0,0 +1,122 @@
|
|||||||
|
<template>
|
||||||
|
<div class="select-timbre">
|
||||||
|
<el-checkbox-group v-model="timbre" @change="handleSelectTimbre" :disabled="disabled">
|
||||||
|
<el-checkbox :value="1" label="Default Audio"></el-checkbox>
|
||||||
|
<!-- <el-upload
|
||||||
|
v-model:file-list="fileList"
|
||||||
|
action=""
|
||||||
|
:multiple="false"
|
||||||
|
:on-change="handleChangeFile"
|
||||||
|
:auto-upload="false"
|
||||||
|
:show-file-list="false"
|
||||||
|
:disabled="disabled"
|
||||||
|
accept="audio/*"
|
||||||
|
>
|
||||||
|
<el-checkbox :value="2">
|
||||||
|
<span>Customization: Upload Audio</span>
|
||||||
|
<SvgIcon name="upload" className="checkbox-icon" />
|
||||||
|
</el-checkbox>
|
||||||
|
</el-upload> -->
|
||||||
|
</el-checkbox-group>
|
||||||
|
</div>
|
||||||
|
</template>
|
||||||
|
|
||||||
|
<script setup>
|
||||||
|
const timbre = defineModel('timbre');
|
||||||
|
const audioData = defineModel('audioData');
|
||||||
|
const disabled = defineModel('disabled');
|
||||||
|
const fileList = ref([]);
|
||||||
|
|
||||||
|
const handleSelectTimbre = e => {
|
||||||
|
if (e.length > 1) {
|
||||||
|
const val = e[e.length - 1];
|
||||||
|
timbre.value = [val];
|
||||||
|
// 默认音色
|
||||||
|
if (val === 1) {
|
||||||
|
audioData.value = {
|
||||||
|
base64Str: '',
|
||||||
|
type: 'mp3'
|
||||||
|
};
|
||||||
|
}
|
||||||
|
}
|
||||||
|
};
|
||||||
|
const handleChangeFile = file => {
|
||||||
|
if (isAudio(file) && sizeNotExceed(file)) {
|
||||||
|
fileList.value = [file];
|
||||||
|
timbre.value = [2];
|
||||||
|
handleUpload();
|
||||||
|
} else {
|
||||||
|
ElMessage.error('Please upload audio file and size not exceed 1MB');
|
||||||
|
}
|
||||||
|
};
|
||||||
|
const isAudio = file => {
|
||||||
|
return file.name.endsWith('.mp3') || file.name.endsWith('.wav');
|
||||||
|
};
|
||||||
|
const sizeNotExceed = file => {
|
||||||
|
return file.size / 1024 / 1024 <= 1;
|
||||||
|
};
|
||||||
|
const handleUpload = async () => {
|
||||||
|
const file = fileList.value[0].raw;
|
||||||
|
if (file) {
|
||||||
|
const reader = new FileReader();
|
||||||
|
reader.onload = e => {
|
||||||
|
const base64String = e.target.result.split(',')[1];
|
||||||
|
audioData.value = {
|
||||||
|
base64Str: base64String,
|
||||||
|
type: file.name.split('.')[1]
|
||||||
|
};
|
||||||
|
};
|
||||||
|
reader.readAsDataURL(file);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
</script>
|
||||||
|
<style lang="less">
|
||||||
|
.select-timbre {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
.el-checkbox-group {
|
||||||
|
display: flex;
|
||||||
|
> .el-checkbox {
|
||||||
|
margin-right: 12px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.el-checkbox {
|
||||||
|
padding: 8px 16px;
|
||||||
|
border-radius: 10px;
|
||||||
|
background: #eaefff;
|
||||||
|
margin-right: 0;
|
||||||
|
height: 40px;
|
||||||
|
.el-checkbox__input {
|
||||||
|
.el-checkbox__inner {
|
||||||
|
border: 1px solid #4dc100;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.el-checkbox__input.is-checked {
|
||||||
|
.el-checkbox__inner {
|
||||||
|
background: #4dc100;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.el-checkbox__input.is-checked.is-disabled {
|
||||||
|
.el-checkbox__inner::after {
|
||||||
|
border-color: #ffffff;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.el-checkbox__label {
|
||||||
|
color: #7579eb !important;
|
||||||
|
font-family: PingFang SC;
|
||||||
|
font-size: 16px;
|
||||||
|
font-style: normal;
|
||||||
|
font-weight: 400;
|
||||||
|
line-height: normal;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
.checkbox-icon {
|
||||||
|
margin-left: 4px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.el-checkbox + .el-checkbox {
|
||||||
|
margin-left: 12px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
</style>
|
||||||
@@ -0,0 +1,67 @@
|
|||||||
|
<template>
|
||||||
|
<div :class="`skip-btn ${disabled ? 'disabled-btn' : ''}`">
|
||||||
|
<div class="pause-icon">
|
||||||
|
<SvgIcon name="pause" className="pause-svg" />
|
||||||
|
</div>
|
||||||
|
<span class="btn-text">{{ t('skipMessageBtn') }}</span>
|
||||||
|
</div>
|
||||||
|
</template>
|
||||||
|
<script setup>
|
||||||
|
import { useI18n } from 'vue-i18n';
|
||||||
|
|
||||||
|
const { t } = useI18n();
|
||||||
|
defineProps({
|
||||||
|
disabled: {
|
||||||
|
type: Boolean,
|
||||||
|
default: false
|
||||||
|
}
|
||||||
|
});
|
||||||
|
</script>
|
||||||
|
<style lang="less">
|
||||||
|
.skip-btn {
|
||||||
|
flex-shrink: 0;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
padding: 8px 14px 8px 10px;
|
||||||
|
border-radius: 90px;
|
||||||
|
background: #5865f2;
|
||||||
|
cursor: pointer;
|
||||||
|
user-select: none;
|
||||||
|
.pause-icon {
|
||||||
|
display: flex;
|
||||||
|
justify-content: center;
|
||||||
|
align-items: center;
|
||||||
|
width: 32px;
|
||||||
|
height: 32px;
|
||||||
|
background: #ffffff;
|
||||||
|
border-radius: 50%;
|
||||||
|
margin-right: 8px;
|
||||||
|
.pause-svg {
|
||||||
|
width: 18px;
|
||||||
|
height: 18px;
|
||||||
|
color: #5865f2;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.btn-text {
|
||||||
|
color: #fff;
|
||||||
|
font-family: PingFang SC;
|
||||||
|
font-size: 16px;
|
||||||
|
font-style: normal;
|
||||||
|
font-weight: 400;
|
||||||
|
line-height: normal;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.disabled-btn {
|
||||||
|
cursor: not-allowed;
|
||||||
|
background: #f3f3f3;
|
||||||
|
.pause-icon {
|
||||||
|
background: #d1d1d1;
|
||||||
|
.pause-svg {
|
||||||
|
color: #ffffff;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.btn-text {
|
||||||
|
color: #d1d1d1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
</style>
|
||||||
@@ -0,0 +1,39 @@
|
|||||||
|
<template>
|
||||||
|
<svg :class="iconClass" v-html="content"></svg>
|
||||||
|
</template>
|
||||||
|
|
||||||
|
<script setup>
|
||||||
|
const props = defineProps({
|
||||||
|
name: {
|
||||||
|
type: String,
|
||||||
|
required: true
|
||||||
|
},
|
||||||
|
className: {
|
||||||
|
type: String,
|
||||||
|
default: ''
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
const content = ref('');
|
||||||
|
|
||||||
|
const iconClass = computed(() => ['svg-icon', props.className]);
|
||||||
|
onMounted(() => {
|
||||||
|
import(`@/assets/svg/${props.name}.svg`)
|
||||||
|
.then(module => {
|
||||||
|
fetch(module.default)
|
||||||
|
.then(response => response.text())
|
||||||
|
.then(svg => {
|
||||||
|
content.value = svg;
|
||||||
|
});
|
||||||
|
})
|
||||||
|
.catch(error => {
|
||||||
|
console.error(`Error loading SVG icon: ${props.name}`, error);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
</script>
|
||||||
|
<style lang="less" scoped>
|
||||||
|
.svg-icon {
|
||||||
|
width: 24px;
|
||||||
|
height: 24px;
|
||||||
|
}
|
||||||
|
</style>
|
||||||
@@ -0,0 +1,138 @@
|
|||||||
|
<template>
|
||||||
|
<div class="bars" id="bars" :style="boxStyle">
|
||||||
|
<!-- 柱形条 -->
|
||||||
|
<div class="bar" v-for="(item, index) in defaultList" :key="index" :style="itemAttr(item)"></div>
|
||||||
|
</div>
|
||||||
|
</template>
|
||||||
|
|
||||||
|
<script setup>
|
||||||
|
const props = defineProps({
|
||||||
|
analyser: {
|
||||||
|
type: Object
|
||||||
|
},
|
||||||
|
dataArray: {
|
||||||
|
type: [Array, Uint8Array]
|
||||||
|
},
|
||||||
|
isCalling: {
|
||||||
|
type: Boolean,
|
||||||
|
default: false
|
||||||
|
},
|
||||||
|
isPlaying: {
|
||||||
|
type: Boolean,
|
||||||
|
default: false
|
||||||
|
},
|
||||||
|
// 容器高度
|
||||||
|
boxStyle: {
|
||||||
|
type: Object,
|
||||||
|
default: () => {
|
||||||
|
return {
|
||||||
|
height: '80px'
|
||||||
|
};
|
||||||
|
}
|
||||||
|
},
|
||||||
|
// 柱形条宽度
|
||||||
|
itemStyle: {
|
||||||
|
type: Object,
|
||||||
|
default: () => {
|
||||||
|
return {
|
||||||
|
width: '6px',
|
||||||
|
margin: '0 2px',
|
||||||
|
borderRadius: '5px'
|
||||||
|
};
|
||||||
|
}
|
||||||
|
},
|
||||||
|
configList: {
|
||||||
|
type: Array,
|
||||||
|
default: () => []
|
||||||
|
}
|
||||||
|
});
|
||||||
|
const animationFrameId = ref();
|
||||||
|
const defaultList = ref([]);
|
||||||
|
const bgColor = ref('#4c5cf8');
|
||||||
|
const itemAttr = computed(() => item => {
|
||||||
|
return {
|
||||||
|
height: item + 'px',
|
||||||
|
...props.itemStyle
|
||||||
|
};
|
||||||
|
});
|
||||||
|
watch(
|
||||||
|
() => props.dataArray,
|
||||||
|
newVal => {
|
||||||
|
if (newVal && props.isCalling) {
|
||||||
|
console.log('draw');
|
||||||
|
drawBars();
|
||||||
|
} else {
|
||||||
|
console.log('stop');
|
||||||
|
stopDraw();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
);
|
||||||
|
watch(
|
||||||
|
() => props.configList,
|
||||||
|
newVal => {
|
||||||
|
if (newVal.length > 0) {
|
||||||
|
defaultList.value = newVal;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{ immediate: true }
|
||||||
|
);
|
||||||
|
watch(
|
||||||
|
() => props.isPlaying,
|
||||||
|
newVal => {
|
||||||
|
if (newVal) {
|
||||||
|
// 绿色
|
||||||
|
bgColor.value = '#4dc100';
|
||||||
|
} else {
|
||||||
|
// 蓝色
|
||||||
|
bgColor.value = '#4c5cf8';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
);
|
||||||
|
function drawBars() {
|
||||||
|
const bars = document.querySelectorAll('.bar');
|
||||||
|
if (bars.length === 0) {
|
||||||
|
cancelAnimationFrame(animationFrameId.value);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const maxHeight = document.querySelector('.bars').clientHeight; // 最大高度为容器的高度
|
||||||
|
|
||||||
|
const averageVolume = props.dataArray.reduce((sum, value) => sum + value, 0) / props.dataArray.length;
|
||||||
|
const normalizedVolume = props.isPlaying ? Math.random() : averageVolume / 128; // 将音量数据归一化为0到1之间
|
||||||
|
|
||||||
|
bars.forEach((bar, index) => {
|
||||||
|
const minHeight = defaultList.value[index];
|
||||||
|
const randomFactor = Math.random() * 1.5 + 0.5; // 随机因子
|
||||||
|
const newHeight = Math.min(
|
||||||
|
maxHeight,
|
||||||
|
minHeight + (maxHeight - minHeight) * normalizedVolume * randomFactor
|
||||||
|
); // 根据音量设置高度
|
||||||
|
bar.style.height = `${newHeight}px`; // 设置新的高度
|
||||||
|
bar.style.backgroundColor = bgColor.value;
|
||||||
|
});
|
||||||
|
|
||||||
|
animationFrameId.value = requestAnimationFrame(drawBars);
|
||||||
|
}
|
||||||
|
const stopDraw = () => {
|
||||||
|
if (animationFrameId.value) {
|
||||||
|
cancelAnimationFrame(animationFrameId.value);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
</script>
|
||||||
|
|
||||||
|
<style lang="less" scoped>
|
||||||
|
.bars {
|
||||||
|
display: flex;
|
||||||
|
justify-content: center;
|
||||||
|
align-items: center;
|
||||||
|
}
|
||||||
|
.bar {
|
||||||
|
// width: 6px;
|
||||||
|
// margin: 0 2px;
|
||||||
|
background-color: #4c5cf8;
|
||||||
|
transition:
|
||||||
|
height 0.1s,
|
||||||
|
background-color 0.1s;
|
||||||
|
border-radius: 5px; /* 圆角 */
|
||||||
|
}
|
||||||
|
</style>
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
/**
|
||||||
|
* Configure and register global directives
|
||||||
|
*/
|
||||||
|
import ElTableInfiniteScroll from 'el-table-infinite-scroll';
|
||||||
|
|
||||||
|
export function setupGlobDirectives(app) {
|
||||||
|
app.use(ElTableInfiniteScroll);
|
||||||
|
}
|
||||||
18
web_demos/minicpm-o_2.6/web_server/src/enums/index.js
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
export const voiceIdeasList = ['TBD', 'TBD', 'TBD'];
|
||||||
|
export const videoIdeasList = ['TBD', 'TBD', 'TBD'];
|
||||||
|
export const limitTime = 10 * 60; // 限制单次使用时常不超过10分钟
|
||||||
|
export const tipsRemainingTime = 30; // 剩余30s时提醒用户
|
||||||
|
// 初始音频波形
|
||||||
|
export const voiceConfigList = [
|
||||||
|
16, 16, 16, 16, 36, 58, 50, 70, 50, 58, 36, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 46, 28,
|
||||||
|
60, 28, 68, 60, 28, 46, 16, 16, 16, 16, 16, 16, 16, 16, 36, 58, 50, 70, 50, 58, 36, 16, 16, 16, 16, 16, 16, 16, 16,
|
||||||
|
16, 16, 16, 16, 16, 16, 16, 16, 46, 28, 60, 28, 68, 60, 28, 46, 16, 16, 16, 16
|
||||||
|
];
|
||||||
|
// 初始视频中的音频波形
|
||||||
|
export const videoConfigList = [
|
||||||
|
8, 8, 8, 8, 18, 28, 26, 36, 26, 28, 18, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 24, 14, 30, 14, 34, 30, 14,
|
||||||
|
24, 8, 8, 8, 8, 8, 8, 8, 8, 18, 28, 26, 36, 26, 28, 18, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 24, 14, 30,
|
||||||
|
14, 34, 30, 14, 24, 8, 8, 8, 8, 8, 8, 8, 8, 18, 28, 26, 36, 26, 28, 18, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
|
||||||
|
8, 24, 14, 30, 14, 34, 30, 14, 24, 8, 8, 8, 8
|
||||||
|
];
|
||||||
|
export const showIdeasList = false;
|
||||||
61
web_demos/minicpm-o_2.6/web_server/src/hooks/useHttp.js
Normal file
@@ -0,0 +1,61 @@
|
|||||||
|
import axios from 'axios';
|
||||||
|
import { setNewUserId, getNewUserId } from './useRandomId';
|
||||||
|
|
||||||
|
// 创建实例时配置默认值
|
||||||
|
const service = axios.create({
|
||||||
|
baseURL: '/',
|
||||||
|
timeout: 30000,
|
||||||
|
responseType: 'json'
|
||||||
|
});
|
||||||
|
|
||||||
|
// 请求拦截器
|
||||||
|
service.interceptors.request.use(config => {
|
||||||
|
if (config.url.includes('stream')) {
|
||||||
|
config.timeout = 3000;
|
||||||
|
}
|
||||||
|
if (window.location.search) {
|
||||||
|
config.url += window.location.search;
|
||||||
|
}
|
||||||
|
Object.assign(config.headers, ajaxHeader());
|
||||||
|
return config;
|
||||||
|
});
|
||||||
|
|
||||||
|
// 响应拦截器
|
||||||
|
service.interceptors.response.use(
|
||||||
|
response => {
|
||||||
|
let res = response.data;
|
||||||
|
if (response?.status === 200) {
|
||||||
|
return Promise.resolve({
|
||||||
|
code: 0,
|
||||||
|
message: '',
|
||||||
|
data: res
|
||||||
|
});
|
||||||
|
}
|
||||||
|
return Promise.resolve({ code: -1, message: '网络异常,请稍后再试', data: null });
|
||||||
|
},
|
||||||
|
error => {
|
||||||
|
const res = { code: -1, message: error?.response?.data?.detail || '网络异常,请稍后再试', data: null };
|
||||||
|
return Promise.resolve(res);
|
||||||
|
}
|
||||||
|
);
|
||||||
|
|
||||||
|
export const ajaxHeader = () => {
|
||||||
|
if (!localStorage.getItem('uid')) {
|
||||||
|
setNewUserId();
|
||||||
|
}
|
||||||
|
return {
|
||||||
|
'Content-Type': 'application/json;charset=UTF-8',
|
||||||
|
Accept: 'application/json',
|
||||||
|
service: 'minicpmo-server',
|
||||||
|
uid: getNewUserId()
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
export default {
|
||||||
|
get(url, params, config = {}) {
|
||||||
|
return service.get(url, { params, ...config });
|
||||||
|
},
|
||||||
|
post(url, data, config = {}) {
|
||||||
|
return service.post(url, data, { ...config });
|
||||||
|
}
|
||||||
|
};
|
||||||
95
web_demos/minicpm-o_2.6/web_server/src/hooks/useQueue.js
Normal file
@@ -0,0 +1,95 @@
|
|||||||
|
export class TaskQueue {
|
||||||
|
constructor() {
|
||||||
|
this.tasks = [];
|
||||||
|
this.isRunning = false;
|
||||||
|
this.isPaused = false;
|
||||||
|
this.currentTask = null;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 添加任务到队列
|
||||||
|
addTask(task) {
|
||||||
|
this.tasks.push(task);
|
||||||
|
if (!this.isRunning) {
|
||||||
|
this.start();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 删除任务
|
||||||
|
removeTask(taskToRemove) {
|
||||||
|
this.tasks = this.tasks.filter(task => task !== taskToRemove);
|
||||||
|
}
|
||||||
|
|
||||||
|
// 清空任务队列
|
||||||
|
clearQueue() {
|
||||||
|
this.tasks = [];
|
||||||
|
}
|
||||||
|
|
||||||
|
// 暂停任务执行
|
||||||
|
pause() {
|
||||||
|
this.isPaused = true;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 恢复任务执行
|
||||||
|
resume() {
|
||||||
|
if (this.isPaused) {
|
||||||
|
this.isPaused = false;
|
||||||
|
if (!this.isRunning) {
|
||||||
|
this.start();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 内部启动方法
|
||||||
|
async start() {
|
||||||
|
this.isRunning = true;
|
||||||
|
while (this.tasks.length > 0 && !this.isPaused) {
|
||||||
|
this.currentTask = this.tasks.shift();
|
||||||
|
await this.currentTask();
|
||||||
|
|
||||||
|
// 检查是否暂停或任务队列已清空
|
||||||
|
if (this.isPaused || this.tasks.length === 0) {
|
||||||
|
this.isRunning = false;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
this.isRunning = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 示例任务函数
|
||||||
|
function exampleTask(id) {
|
||||||
|
return () =>
|
||||||
|
new Promise(resolve => {
|
||||||
|
console.log(`Executing task ${id}`);
|
||||||
|
setTimeout(() => {
|
||||||
|
console.log(`Task ${id} completed`);
|
||||||
|
resolve();
|
||||||
|
}, 1000); // 每个任务耗时1秒
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// 测试示例
|
||||||
|
const queue = new TaskQueue();
|
||||||
|
|
||||||
|
// 添加任务到队列
|
||||||
|
for (let i = 1; i <= 5; i++) {
|
||||||
|
queue.addTask(exampleTask(i));
|
||||||
|
}
|
||||||
|
|
||||||
|
// 暂停队列,在2.5秒后执行
|
||||||
|
setTimeout(() => {
|
||||||
|
console.log('Pausing queue...');
|
||||||
|
queue.pause();
|
||||||
|
}, 2500);
|
||||||
|
|
||||||
|
// 恢复队列,在4.5秒后执行
|
||||||
|
setTimeout(() => {
|
||||||
|
console.log('Resuming queue...');
|
||||||
|
queue.resume();
|
||||||
|
}, 4500);
|
||||||
|
|
||||||
|
// 清空队列,在3秒后执行
|
||||||
|
setTimeout(() => {
|
||||||
|
console.log('Clearing queue...');
|
||||||
|
queue.clearQueue();
|
||||||
|
}, 3000);
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
const uid = 'uid';
|
||||||
|
export const setNewUserId = () => {
|
||||||
|
const randomId = Math.random().toString(36).slice(2).toUpperCase();
|
||||||
|
localStorage.setItem(uid, randomId);
|
||||||
|
return randomId;
|
||||||
|
};
|
||||||
|
export const getNewUserId = () => {
|
||||||
|
return localStorage.getItem('uid');
|
||||||
|
};
|
||||||
38
web_demos/minicpm-o_2.6/web_server/src/hooks/useVoice.js
Normal file
@@ -0,0 +1,38 @@
|
|||||||
|
const writeString = (view, offset, string) => {
|
||||||
|
for (let i = 0; i < string.length; i++) {
|
||||||
|
view.setUint8(offset + i, string.charCodeAt(i));
|
||||||
|
}
|
||||||
|
};
|
||||||
|
const floatTo16BitPCM = (output, offset, input) => {
|
||||||
|
for (let i = 0; i < input.length; i++, offset += 2) {
|
||||||
|
const s = Math.max(-1, Math.min(1, input[i]));
|
||||||
|
output.setInt16(offset, s < 0 ? s * 0x8000 : s * 0x7fff, true);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
// audio buffer to wav file, need add 44 length header
|
||||||
|
export const encodeWAV = (samples, sampleRate) => {
|
||||||
|
const buffer = new ArrayBuffer(44 + samples.length * 2);
|
||||||
|
const view = new DataView(buffer);
|
||||||
|
const numChannels = 1;
|
||||||
|
const bitsPerSample = 16;
|
||||||
|
|
||||||
|
/* WAV 标头 */
|
||||||
|
writeString(view, 0, 'RIFF');
|
||||||
|
view.setUint32(4, 36 + samples.length * 2, true);
|
||||||
|
writeString(view, 8, 'WAVE');
|
||||||
|
writeString(view, 12, 'fmt ');
|
||||||
|
view.setUint32(16, 16, true);
|
||||||
|
view.setUint16(20, 1, true);
|
||||||
|
view.setUint16(22, numChannels, true);
|
||||||
|
view.setUint32(24, sampleRate, true);
|
||||||
|
view.setUint32(28, (sampleRate * numChannels * bitsPerSample) / 8, true);
|
||||||
|
view.setUint16(32, (numChannels * bitsPerSample) / 8, true);
|
||||||
|
view.setUint16(34, bitsPerSample, true);
|
||||||
|
writeString(view, 36, 'data');
|
||||||
|
view.setUint32(40, samples.length * 2, true);
|
||||||
|
|
||||||
|
/* PCM 数据 */
|
||||||
|
floatTo16BitPCM(view, 44, samples);
|
||||||
|
|
||||||
|
return new Blob([view], { type: 'audio/wav' });
|
||||||
|
};
|
||||||
36
web_demos/minicpm-o_2.6/web_server/src/i18n/en.json
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
{
|
||||||
|
"menuTabVideo": "Realtime Video Call",
|
||||||
|
"menuTabAudio": "Realtime Voice Call",
|
||||||
|
"menuTabChatbot": "Chatbot",
|
||||||
|
"videoCallBtn": "Call MiniCPM-omni",
|
||||||
|
"audioCallBtn": "Call MiniCPM-omni",
|
||||||
|
"hangUpBtn": "Hang Up",
|
||||||
|
"notReadyBtn": "Not ready yet, please wait",
|
||||||
|
"skipMessageBtn": "Skip this message",
|
||||||
|
"feedbackDialogTitle": "Feedback issue",
|
||||||
|
"modelConfigTitle": "Model Config",
|
||||||
|
"audioInterruptionBtn": "Speech Interruption",
|
||||||
|
"audioInterruptionTips": "When the \"voice interruption\" mode is enabled, it allows users to interrupt the model while it is speaking. The model will immediately terminate the previous round of generation and respond to the user's latest question.",
|
||||||
|
"yes": "Yes",
|
||||||
|
"no": "No",
|
||||||
|
"videoQualityBtn": "HD Mode",
|
||||||
|
"videoQualityTips": "When the \"high resulation\" mode is enabled, the model will perform high resolution encoding on the last frame, allowing the model to see more detailed parts.",
|
||||||
|
"high": "High",
|
||||||
|
"low": "Low",
|
||||||
|
"vadThresholdBtn": "VAD Threshold",
|
||||||
|
"vadThresholdTips": "The VAD threshold indicates how long the sound needs to be silent before triggering inference. If the VAD threshold is too low, it may trigger accidentally during speech pauses, while if it's too high, it will result in slower initial response.",
|
||||||
|
"assistantPromptBtn": "Task Prompt",
|
||||||
|
"assistantPromptTips": "Model task instructions are used to support different task objectives.",
|
||||||
|
"useVoicePromptBtn": "Tone Color Prompt",
|
||||||
|
"voiceClonePromptInput": "Tone Color Prompt",
|
||||||
|
"voiceClonePromptTips": "Tone Color Prompt tips",
|
||||||
|
"audioChoiceBtn": "Audio Choice",
|
||||||
|
"defaultAudioBtn": "Default Audio",
|
||||||
|
"customizationBtn": "Customization: Upload Audio",
|
||||||
|
"toneColorOptions": "Voice Options",
|
||||||
|
"toneColorOptionsTips": "We have provided a selection of sample tone colors, and you also have the option to choose \"none\" and instruct the model to create a new tone color.",
|
||||||
|
"nullOption": "Null",
|
||||||
|
"defaultOption": "Female 1(Default)",
|
||||||
|
"femaleOption": "Female 2",
|
||||||
|
"maleOption": "Male 1"
|
||||||
|
}
|
||||||
36
web_demos/minicpm-o_2.6/web_server/src/i18n/zh.json
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
{
|
||||||
|
"menuTabVideo": "实时视频通话",
|
||||||
|
"menuTabAudio": "实时语音通话",
|
||||||
|
"menuTabChatbot": "聊天机器人",
|
||||||
|
"videoCallBtn": "视频通话",
|
||||||
|
"audioCallBtn": "语音通话",
|
||||||
|
"hangUpBtn": "挂断",
|
||||||
|
"notReadyBtn": "服务繁忙,请稍后",
|
||||||
|
"skipMessageBtn": "跳过当前对话",
|
||||||
|
"feedbackDialogTitle": "请输入反馈意见",
|
||||||
|
"modelConfigTitle": "模型配置",
|
||||||
|
"audioInterruptionBtn": "语音打断",
|
||||||
|
"audioInterruptionTips": "开启\"语音打断\"功能,支持在模型说话时打断模型,模型会立刻结束上一轮的生成,并支持用户最新的问题。",
|
||||||
|
"yes": "是",
|
||||||
|
"no": "否",
|
||||||
|
"videoQualityBtn": "高清模式",
|
||||||
|
"videoQualityTips": "开启高清模式,模型会在最后一帧对图片进行高清编码,可以使得模型看得清更细节的部分。",
|
||||||
|
"high": "高清",
|
||||||
|
"low": "低清",
|
||||||
|
"vadThresholdBtn": "VAD阈值",
|
||||||
|
"vadThresholdTips": "vad阈值表示声音静音多久才开始触发推理,vad阈值过低会导致说话气口误触,过高会导致首响更慢。",
|
||||||
|
"assistantPromptBtn": "任务指令",
|
||||||
|
"assistantPromptTips": "模型的任务指令,用于支持不同的任务目标",
|
||||||
|
"useVoicePromptBtn": "音色指令",
|
||||||
|
"voiceClonePromptInput": "音色指令",
|
||||||
|
"voiceClonePromptTips": "我们的模型具有端到端的音色克隆能力,提供一段 5-7 秒的音频,模型在一定程度上可以用这种音色来说话。但基于法律考虑,我们的demo并不开启这个能力的试用。社区可以参照我们的开源代码自行适配。",
|
||||||
|
"audioChoiceBtn": "音色选择",
|
||||||
|
"defaultAudioBtn": "默认音色",
|
||||||
|
"customizationBtn": "自定义:上传音频",
|
||||||
|
"toneColorOptions": "语音选项",
|
||||||
|
"toneColorOptionsTips": "我们提供了一些示例音色,也可以选择“无”并通过指令让模型创建音色。",
|
||||||
|
"nullOption": "无",
|
||||||
|
"defaultOption": "女一号(默认)",
|
||||||
|
"femaleOption": "女二号",
|
||||||
|
"maleOption": "男一号"
|
||||||
|
}
|
||||||
40
web_demos/minicpm-o_2.6/web_server/src/main.js
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
import './styles/main.css';
|
||||||
|
|
||||||
|
import { router, setupRouter } from '@/router';
|
||||||
|
import { setupRouterGuard } from '@/router/guard';
|
||||||
|
import SvgIcon from '@/components/SvgIcon/index.vue';
|
||||||
|
import { createI18n } from 'vue-i18n';
|
||||||
|
|
||||||
|
import App from './App.vue';
|
||||||
|
import en from './i18n/en.json';
|
||||||
|
import zh from './i18n/zh.json';
|
||||||
|
|
||||||
|
const savedLanguage = localStorage.getItem('language') || 'zh';
|
||||||
|
|
||||||
|
const i18n = createI18n({
|
||||||
|
locale: savedLanguage, // 默认语言
|
||||||
|
messages: {
|
||||||
|
en,
|
||||||
|
zh
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
const app = createApp(App);
|
||||||
|
|
||||||
|
// Configure routing
|
||||||
|
// 配置路由
|
||||||
|
setupRouter(app);
|
||||||
|
|
||||||
|
// router-guard
|
||||||
|
// 路由守卫
|
||||||
|
setupRouterGuard(router);
|
||||||
|
|
||||||
|
// Register global directive
|
||||||
|
// 注册全局指令
|
||||||
|
// setupGlobDirectives(app);
|
||||||
|
|
||||||
|
app.component('SvgIcon', SvgIcon);
|
||||||
|
|
||||||
|
app.use(i18n);
|
||||||
|
|
||||||
|
app.mount('#app');
|
||||||
@@ -0,0 +1,5 @@
|
|||||||
|
import { createStateGuard } from './stateGuard';
|
||||||
|
|
||||||
|
export function setupRouterGuard(router) {
|
||||||
|
createStateGuard(router);
|
||||||
|
}
|
||||||
@@ -0,0 +1 @@
|
|||||||
|
export function createStateGuard() {}
|
||||||
16
web_demos/minicpm-o_2.6/web_server/src/router/index.js
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
import { createRouter, createWebHistory } from 'vue-router';
|
||||||
|
import { basicRoutes } from './menu';
|
||||||
|
|
||||||
|
// 创建一个可以被 Vue 应用程序使用的路由实例
|
||||||
|
export const router = createRouter({
|
||||||
|
// 创建一个 hash 历史记录。
|
||||||
|
history: createWebHistory(import.meta.env.BASE_URL),
|
||||||
|
// 路由列表。
|
||||||
|
routes: basicRoutes
|
||||||
|
});
|
||||||
|
|
||||||
|
// config router
|
||||||
|
// 配置路由器
|
||||||
|
export function setupRouter(app) {
|
||||||
|
app.use(router);
|
||||||
|
}
|
||||||
10
web_demos/minicpm-o_2.6/web_server/src/router/menu/index.js
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
export const basicRoutes = [
|
||||||
|
{
|
||||||
|
path: '/',
|
||||||
|
component: () => import('@/views/home/index.vue')
|
||||||
|
},
|
||||||
|
{
|
||||||
|
path: '/:port',
|
||||||
|
component: () => import('@/views/home/index.vue')
|
||||||
|
}
|
||||||
|
];
|
||||||