diff --git a/README.md b/README.md index 7078c61..3adb0e6 100644 --- a/README.md +++ b/README.md @@ -23,19 +23,29 @@ [中文文档](./README_zh.md) ## Contents +- [Contents](#contents) - [OmniLMM-12B](#omnilmm-12b) + - [Evaluation](#evaluation) + - [Examples](#examples) - [OmniLMM-3B](#omnilmm-3b) + - [Evaluation](#evaluation-1) + - [Examples](#examples-1) - [Demo](#demo) - [Install](#install) - [Inference](#inference) -- [Model Zoo](#model-zoo) + - [Model Zoo](#model-zoo) + - [Multi-turn Conversation](#multi-turn-conversation) +- [✅ TODO](#-todo) +- [Model License](#model-license) +- [Statement](#statement) +- [🏫 Institutions](#-institutions) ## OmniLMM-12B **OmniLMM-12B** is the most capable version. The model is built based on EVA02-5B and Zephyr-7B-β, connected with a perceiver resampler layer, and trained on multimodal data in a curriculum fashion. The model has three notable features: - 🔥 **Strong Performance.** - OmniLMM-12B achieves **leading performance** among models with comparable sizes, surpassing established LMMs on multiple benchmarks (including MME, MMBench, SEED-Bench, etc). The model endows **rich multimodal world knowledge**. + OmniLMM-12B achieves **leading performance** among models with comparable sizes, surpassing established LMMs on multiple benchmarks (including MME, MMBench, SEED-Bench, etc). - 🏆 **Trustworthy Behavior.** @@ -43,9 +53,15 @@ - 🕹 **Real-time Multimodal Interaction.** - We combine the OmniLMM-12B and GPT-3.5 into a **real-time multimodal interactive assistant**. The assistant accepts video streams from the camera and speech streams from the microphone and emits speech output. While still primary, we find the model can **replicate some of the fun cases shown in the Gemini Demo video, without any video edition**. + We combine the OmniLMM-12B and GPT-3.5 (text-only) into a **real-time multimodal interactive assistant**. The assistant accepts video streams from the camera and speech streams from the microphone and emits speech output. While still primary, we find the model can **replicate some of the fun cases shown in the Gemini Demo video, without any video edition**. + ### Evaluation +
+ +
+
+Click to view results on MME, MMBench, MMMU, MMBench, MMHal-Bench, Object HalBench, SeedBench, LLaVA Bench W, MathVista. @@ -150,17 +166,21 @@
†: Proprietary models +
### Examples

- +

+ +We combine the OmniLMM-12B and GPT-3.5 (text-only) into a **real-time multimodal interactive assistant**. Video frames are described in text using OmniLMM-12B, and ChatGPT 3.5 (text-only) is employed to generate response according to the descriptions and user prompts. The demo video is a raw recording without edition. +
-
## OmniLMM-3B @@ -256,7 +276,7 @@ ### Examples -OmniLLM-3B is the first LMM deloyed on end devices. The demo video is the raw screen recording without edition. +We deploy OmniLLM-3B on end devices. The demo video is the raw screen recording on a OnePlus 9R without edition.

@@ -294,8 +314,8 @@ pip install -r requirements.txt ### Model Zoo | Model | Description | Download Link | |:----------------------|:-------------------|:---------------:| -| OmniLMM-12B | The most capable version with strong performance. | [🤗](https://huggingface.co/openbmb/OmniLMM-12B)    | -| OmniLMM-3B | The efficient version for end device deployment. | [🤗](https://huggingface.co/openbmb/MiniCPM-V)    | +| OmniLMM-12B | The most capable version with strong performance. | [🤗](https://huggingface.co/openbmb/OmniLMM-12B)    [](https://modelscope.cn/models/OpenBMB/OmniLMM-12B/files) | +| OmniLMM-3B | The efficient version for end device deployment. | [🤗](https://huggingface.co/openbmb/MiniCPM-V)    [](https://modelscope.cn/models/OpenBMB/MiniCPM-V/files) | ### Multi-turn Conversation diff --git a/assets/eval_radar.png b/assets/eval_radar.png new file mode 100644 index 0000000..18a76f8 Binary files /dev/null and b/assets/eval_radar.png differ diff --git a/assets/omnilmm-12b-examples_2.pdf b/assets/omnilmm-12b-examples_2.pdf new file mode 100644 index 0000000..366b4e0 Binary files /dev/null and b/assets/omnilmm-12b-examples_2.pdf differ diff --git a/assets/omnilmm-12b-examples_2.png b/assets/omnilmm-12b-examples_2.png new file mode 100644 index 0000000..349a06a Binary files /dev/null and b/assets/omnilmm-12b-examples_2.png differ diff --git a/assets/omnilmm-12b-examples_2_00.jpg b/assets/omnilmm-12b-examples_2_00.jpg new file mode 100644 index 0000000..2b52a9d Binary files /dev/null and b/assets/omnilmm-12b-examples_2_00.jpg differ diff --git a/assets/omnilmm-12b-examples_3.png b/assets/omnilmm-12b-examples_3.png new file mode 100644 index 0000000..54e212a Binary files /dev/null and b/assets/omnilmm-12b-examples_3.png differ