diff --git a/README.md b/README.md index 5164074..148a650 100644 --- a/README.md +++ b/README.md @@ -209,6 +209,7 @@ We combine the OmniLMM-12B and GPT-3.5 (text-only) into a **real-time multimodal Model Size + Visual Tokens MME MMB dev (en) MMB dev (zh) @@ -220,6 +221,7 @@ We combine the OmniLMM-12B and GPT-3.5 (text-only) into a **real-time multimodal LLaVA-Phi 3B + 576 1335 59.8 - @@ -229,6 +231,7 @@ We combine the OmniLMM-12B and GPT-3.5 (text-only) into a **real-time multimodal MobileVLM 3B + 144 1289 59.6 - @@ -238,6 +241,7 @@ We combine the OmniLMM-12B and GPT-3.5 (text-only) into a **real-time multimodal Imp-v1 3B + 576 1434 66.5 - @@ -245,8 +249,9 @@ We combine the OmniLMM-12B and GPT-3.5 (text-only) into a **real-time multimodal - - Qwen-VL-Chat + Qwen-VL-Chat 9.6B + 256 1487 60.6 56.7 @@ -256,6 +261,7 @@ We combine the OmniLMM-12B and GPT-3.5 (text-only) into a **real-time multimodal CogVLM 17.4B + 1225 1438 63.7 53.8 @@ -265,6 +271,7 @@ We combine the OmniLMM-12B and GPT-3.5 (text-only) into a **real-time multimodal OmniLMM-3B 3B + 64 1452 67.3 61.9 diff --git a/README_zh.md b/README_zh.md index b2395ac..32de530 100644 --- a/README_zh.md +++ b/README_zh.md @@ -214,6 +214,7 @@ Model Size + Visual Tokens MME MMB dev (en) MMB dev (zh) @@ -225,6 +226,7 @@ LLaVA-Phi 3B + 576 1335 59.8 - @@ -234,6 +236,7 @@ MobileVLM 3B + 144 1289 59.6 - @@ -243,6 +246,7 @@ Imp-v1 3B + 576 1434 66.5 - @@ -250,8 +254,9 @@ - - Qwen-VL-Chat + Qwen-VL-Chat 9.6B + 256 1487 60.6 56.7 @@ -261,6 +266,7 @@ CogVLM 17.4B + 1225 1438 63.7 53.8 @@ -270,6 +276,7 @@ OmniLMM-3B 3B + 64 1452 67.3 61.9