mirror of
https://github.com/OpenBMB/MiniCPM-V.git
synced 2026-02-05 10:19:18 +08:00
Update README.md
This commit is contained in:
@@ -49,7 +49,7 @@
|
|||||||
|
|
||||||
- 🏆 **Trustworthy Behavior.**
|
- 🏆 **Trustworthy Behavior.**
|
||||||
|
|
||||||
LMMs are known for suffering from hallucination, often generating text that is not factually grounded in images (e.g., faithfully describing non-existing objects in images). OmniLMM-12B is **the first state-of-the-art open-source LMM aligned via multimodal RLHF for trustworthy behavior** (using our recent [RLHF-V](https://rlhf-v.github.io/) technique). It **ranks #1** among open-source models on [MMHal-Bench](https://huggingface.co/datasets/Shengcao1006/MMHal-Bench), and **outperforms GPT-4V** on [Object HalBench](https://arxiv.org/abs/2312.00849).
|
LMMs are known for suffering from hallucination, often generating text that is not factually grounded in images (e.g., faithfully describing non-existing objects in images). OmniLMM-12B is **the first state-of-the-art open-source LMM aligned via multimodal RLHF for trustworthy behavior** (using the recent [RLHF-V](https://rlhf-v.github.io/) technique). It **ranks #1** among open-source models on [MMHal-Bench](https://huggingface.co/datasets/Shengcao1006/MMHal-Bench), and **outperforms GPT-4V** on [Object HalBench](https://arxiv.org/abs/2312.00849).
|
||||||
|
|
||||||
- 🕹 **Real-time Multimodal Interaction.**
|
- 🕹 **Real-time Multimodal Interaction.**
|
||||||
|
|
||||||
@@ -196,7 +196,7 @@ We combine the OmniLMM-12B and GPT-3.5 (text-only) into a **real-time multimodal
|
|||||||
|
|
||||||
- 🙌 **Bilingual Support.**
|
- 🙌 **Bilingual Support.**
|
||||||
|
|
||||||
OmniLMM-3B is **the first edge-deployable LMM supporting bilingual multimodal interaction in English and Chinese**. This is achieved by generalizing multimodal capabilities across languages, a technique from our ICLR 2024 spotlight [paper](https://arxiv.org/abs/2308.12038).
|
OmniLMM-3B is **the first edge-deployable LMM supporting bilingual multimodal interaction in English and Chinese**. This is achieved by generalizing multimodal capabilities across languages, a technique from the ICLR 2024 spotlight [paper](https://arxiv.org/abs/2308.12038).
|
||||||
|
|
||||||
### Evaluation
|
### Evaluation
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user