mirror of
https://github.com/OpenBMB/MiniCPM-V.git
synced 2026-02-05 18:29:18 +08:00
update readme
This commit is contained in:
@@ -43,7 +43,7 @@
|
|||||||
|
|
||||||
- 🕹 **Real-time Multimodal Interaction.**
|
- 🕹 **Real-time Multimodal Interaction.**
|
||||||
|
|
||||||
We combine the OmniLMM-12B and GPT-3.5 into a **real-time multimodal interactive assistant**. The assistant accepts video streams from the camera and speech streams from the microphone and emits speech output. While still primary, we find the model can **replicate some of the fun cases shown in the Gemini Demo video, without any video edition**.
|
We combine the OmniLMM-12B and GPT-3.5 (text-only) into a **real-time multimodal interactive assistant**. The assistant accepts video streams from the camera and speech streams from the microphone and emits speech output. While still primary, we find the model can **replicate some of the fun cases shown in the Gemini Demo video, without any video edition**.
|
||||||
|
|
||||||
### Evaluation
|
### Evaluation
|
||||||
|
|
||||||
@@ -159,8 +159,11 @@
|
|||||||
</p>
|
</p>
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
|
|
||||||
|
We combine the OmniLMM-12B and GPT-3.5 (text-only) into a **real-time multimodal interactive assistant**. Video frames are described in text using OmniLMM-12B, and ChatGPT 3.5 (text-only) is employed to generate response according to the descriptions and user prompts. The demo video is a raw recording without edition.
|
||||||
|
|
||||||
<div align="center" >
|
<div align="center" >
|
||||||
<video controls src="https://github.com/OpenBMB/OmniLMM/assets/157115220/c1fd3562-1ab1-4534-8139-79e9137b5398" type="video/mp4" />
|
<video controls src="https://github.com/OpenBMB/OmniLMM/assets/157115220/c1fd3562-1ab1-4534-8139-79e9137b5398" type="video/mp4" width=80%/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
## OmniLMM-3B
|
## OmniLMM-3B
|
||||||
@@ -256,7 +259,7 @@
|
|||||||
|
|
||||||
### Examples
|
### Examples
|
||||||
|
|
||||||
OmniLLM-3B is the first LMM deloyed on end devices. The demo video is the raw screen recording without edition.
|
OmniLLM-3B is the first LMM deloyed on end devices. The demo video is the raw screen recording on a OnePlus 9R without edition.
|
||||||
|
|
||||||
<table align="center" >
|
<table align="center" >
|
||||||
<p align="center" >
|
<p align="center" >
|
||||||
|
|||||||
Reference in New Issue
Block a user