diff --git a/README.md b/README.md index 5b7f610..7ae68c8 100644 --- a/README.md +++ b/README.md @@ -23,12 +23,22 @@ [中文文档](./README_zh.md) ## Contents +- [Contents](#contents) - [OmniLMM-12B](#omnilmm-12b) + - [Evaluation](#evaluation) + - [Examples](#examples) - [OmniLMM-3B](#omnilmm-3b) + - [Evaluation](#evaluation-1) + - [Examples](#examples-1) - [Demo](#demo) - [Install](#install) - [Inference](#inference) -- [Model Zoo](#model-zoo) + - [Model Zoo](#model-zoo) + - [Multi-turn Conversation](#multi-turn-conversation) +- [✅ TODO](#-todo) +- [Model License](#model-license) +- [Statement](#statement) +- [🏫 Institutions](#-institutions) ## OmniLMM-12B **OmniLMM-12B** is the most capable version. The model is built based on EVA02-5B and Zephyr-7B-β, connected with a perceiver resampler layer, and trained on multimodal data in a curriculum fashion. The model has three notable features: @@ -181,79 +191,118 @@ We combine the OmniLMM-12B and GPT-3.5 (text-only) into a **real-time multimodal OmniLMM-3B is **the first edge-deployable LMM supporting bilingual multimodal interaction in English and Chinese**. This is achieved by generalizing multimodal capabilities across languages, a technique from our ICLR 2024 spotlight [paper](https://arxiv.org/abs/2308.12038). + ### Evaluation -
+| Model | Size | MME | -MMB dev (en) | -MMB dev (zh) | +MMB dev (en) | MMMU val | -CMMMU val | +MMHal-Bench | +Object HalBench | +SeedBench-I | +MathVista | +LLaVA Bench W | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LLaVA-Phi | -3B | -1335 | -59.8 | -- | -- | -- | +GPT-4V† | +- | +1409 | +75.1 | +56.8 | +3.53 / 70.8 | +86.4 / 92.7 | +71.6 | +47.8 | +93.1 |
| MobileVLM | -3B | -1289 | -59.6 | -- | +Qwen-VL-Plus† | +- | +1681 | +66.2 | +45.2 | - | - | +65.7 | +36.0 | +73.7 | ||
| Imp-v1 | -3B | -1434 | -66.5 | -- | +Yi-VL 6B | +6.7B | +- | +68.2 | +39.1 | - | - | +66.1 | +28.0 | +39.9 | ||
| Qwen-VL-Chat | -9.6B | -1487 | +Qwen-VL-Chat | +9.6B | +1488 | 60.6 | -56.7 | -35.9 | -30.7 | +35.9 | +2.93 / 59.4 | +56.2 / 80.0 | +64.8 | +33.8 | +67.7 | |
| CogVLM | -17.4B | -1438 | +CogVLM | +17.4B | +1438 | 63.7 | -53.8 | 32.1 | -- | +2.68 / 52.1 | +73.6 / 87.4 | +68.8 | +34.7 | +73.9 | ||
| OmniLMM-3B | -3B | -1452 | -67.3 | -61.9 | -34.7 | -32.1 | +LLaVA 1.5 | +13.6B | +1531 | +68.2 | +36.4 | +2.71 / 51.0 | +53.7 / 77.4 | +68.1 | +26.4 | +64.6 | +
| OmniLMM-12B | +11.6B | +1637 | +71.6 | +40.7 | +3.45 / 68.8 | +90.3 / 95.5 | +71.1 | +34.9 | +72.0 |