From 71ab0b4de5e05da3950d526a2d70c1d02b4e434c Mon Sep 17 00:00:00 2001
From: fdyuandong <dy283090@alibaba-inc.com>
Date: Tue, 6 May 2025 16:21:21 +0800
Subject: [PATCH] fix: Update README

---
 README.md | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 1e4bebb..cf1fefe 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,13 @@
 [![Apache License](https://img.shields.io/badge/📃-Apache--2.0-929292)](https://www.apache.org/licenses/LICENSE-2.0)
 [![ModelScope](https://img.shields.io/badge/%20ModelScope%20-Space-blue)](https://www.modelscope.cn/studios/Damo_XR_Lab/LAM-A2E) 
 
-#### This project leverages audio input to generate ARKit blendshapes-driven facial expressions in ⚡real-time⚡, powering ultra-realistic 3D avatars generated by [LAM](https://github.com/aigc3d/LAM).
+## Description
+#### This project leverages audio input to generate ARKit blendshapes-driven facial expressions in ⚡real-time⚡, powering ultra-realistic 3D avatars generated by [LAM](https://github.com/aigc3d/LAM). 
+To enable ARKit-driven animation of the LAM model, we adapted ARKit blendshapes to align with FLAME's facial topology through manual customization. The LAM-A2E network follows an encoder-decoder architecture, as shown below. We adopt the state-of-the-art pre-trained speech model Wav2Vec for the audio encoder. The features extracted from the raw audio waveform are combined with style features and fed into the decoder, which outputs stylized blendshape coefficients. 
+
+<div align="center">
+<img src="./assets/images/framework.png" alt="Architecture" width="90%" align=center/>
+</div>
 
 ## Demo