diff --git a/README.md b/README.md index 5b7f610..7ae68c8 100644 --- a/README.md +++ b/README.md @@ -23,12 +23,22 @@ [中文文档](./README_zh.md) ## Contents +- [Contents](#contents) - [OmniLMM-12B](#omnilmm-12b) + - [Evaluation](#evaluation) + - [Examples](#examples) - [OmniLMM-3B](#omnilmm-3b) + - [Evaluation](#evaluation-1) + - [Examples](#examples-1) - [Demo](#demo) - [Install](#install) - [Inference](#inference) -- [Model Zoo](#model-zoo) + - [Model Zoo](#model-zoo) + - [Multi-turn Conversation](#multi-turn-conversation) +- [✅ TODO](#-todo) +- [Model License](#model-license) +- [Statement](#statement) +- [🏫 Institutions](#-institutions) ## OmniLMM-12B **OmniLMM-12B** is the most capable version. The model is built based on EVA02-5B and Zephyr-7B-β, connected with a perceiver resampler layer, and trained on multimodal data in a curriculum fashion. The model has three notable features: @@ -181,79 +191,118 @@ We combine the OmniLMM-12B and GPT-3.5 (text-only) into a **real-time multimodal OmniLMM-3B is **the first edge-deployable LMM supporting bilingual multimodal interaction in English and Chinese**. This is achieved by generalizing multimodal capabilities across languages, a technique from our ICLR 2024 spotlight [paper](https://arxiv.org/abs/2308.12038). + ### Evaluation -
+ +
+
+Click to view results on MME, MMBench, MMMU, MMBench, MMHal-Bench, Object HalBench, SeedBench, LLaVA Bench W. - +
- - + - + + + + + - - - - - - - + + + + + + + + + + - - - - - + + + + + + + + - - - - - + + + + + + + + - - - + + + - - - + + + + + + - - - + + + - - + + + + + - - - - - - - + + + + + + + + + + + + + + + + + + + + + +
Model Size MMEMMB dev (en)MMB dev (zh)MMB dev (en) MMMU valCMMMU valMMHal-BenchObject HalBenchSeedBench-IMathVistaLLaVA Bench W
LLaVA-Phi3B133559.8- - - GPT-4V†-140975.1 56.83.53 / 70.886.4 / 92.771.6 47.8 93.1
MobileVLM3B128959.6- Qwen-VL-Plus†-168166.2 45.2 - - 65.7 36.0 73.7
Imp-v13B143466.5- Yi-VL 6B6.7B - 68.2 39.1 - - 66.1 28.0 39.9
Qwen-VL-Chat9.6B1487Qwen-VL-Chat9.6B1488 60.6 56.7 35.9 30.7 35.92.93 / 59.456.2 / 80.064.8 33.8 67.7
CogVLM17.4B 1438 CogVLM17.4B1438 63.7 53.8 32.1 - 2.68 / 52.1 73.6 / 87.4 68.8 34.7 73.9
OmniLMM-3B3B 1452 67.3 61.9 34.7 32.1 LLaVA 1.513.6B 1531 68.2 36.4 2.71 / 51.0 53.7 / 77.4 68.1 26.4 64.6
OmniLMM-12B11.6B 1637 71.6 40.7 3.45 / 68.8 90.3 / 95.5 71.1 34.9 72.0
+†: Proprietary models +