From 8dd6397b8d605d881738a92c26a87539ea6766dd Mon Sep 17 00:00:00 2001
From: yaoyuanTHU
Date: Thu, 1 Feb 2024 15:16:09 +0800
Subject: [PATCH] update readme
---
README.md | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/README.md b/README.md
index 22b6fbb..1cacd77 100644
--- a/README.md
+++ b/README.md
@@ -43,7 +43,7 @@
- 🕹 **Real-time Multimodal Interaction.**
- We combine the OmniLMM-12B and GPT-3.5 into a **real-time multimodal interactive assistant**. The assistant accepts video streams from the camera and speech streams from the microphone and emits speech output. While still primary, we find the model can **replicate some of the fun cases shown in the Gemini Demo video, without any video edition**.
+ We combine the OmniLMM-12B and GPT-3.5 (text-only) into a **real-time multimodal interactive assistant**. The assistant accepts video streams from the camera and speech streams from the microphone and emits speech output. While still primary, we find the model can **replicate some of the fun cases shown in the Gemini Demo video, without any video edition**.
### Evaluation
@@ -159,8 +159,11 @@
+
+We combine the OmniLMM-12B and GPT-3.5 (text-only) into a **real-time multimodal interactive assistant**. Video frames are described in text using OmniLMM-12B, and ChatGPT 3.5 (text-only) is employed to generate response according to the descriptions and user prompts. The demo video is a raw recording without edition.
+
-
+
## OmniLMM-3B
@@ -256,7 +259,7 @@
### Examples
-OmniLLM-3B is the first LMM deloyed on end devices. The demo video is the raw screen recording without edition.
+OmniLLM-3B is the first LMM deloyed on end devices. The demo video is the raw screen recording on a OnePlus 9R without edition.