This commit is contained in:
qianyu chen
2024-08-14 18:42:37 +08:00
committed by GitHub
parent 2c6a96f148
commit 6acf99fddf

View File

@@ -55,7 +55,9 @@ If your input consists of a single image, you can use a single placeholder **\<i
#### Multiple Images Example
For inputs containing multiple images, utilize a dictionary where each key represents a unique placeholder (e.g., **\<image_00\>**, **\<image_01\**) with the corresponding image path as its value. These placeholders can then be used within the conversation to seamlessly insert images at specific positions.
Additionally, to optimize resource management, especially when dealing with large batches of images during training or inference, consider reducing `max_slice_nums`. If you are performing multi-image supervised fine-tuning (SFT), it's recommended to set `MODEL_MAX_LENGTH=4096` in your script for better performance.
Additionally, to optimize resource management, especially when dealing with large batches of images during training or inference, consider reducing `max_slice_nums`. For example, when an image has a maximum resolution of 1344x1344, setting `slice=9` will occupy approximately 640 tokens, while `slice=2` will occupy around 192 tokens. If the total token count exceeds `max_length`, truncation will be applied.
If you are performing multi-image supervised fine-tuning (SFT), it's recommended to set `MODEL_MAX_LENGTH=4096` in your script for better performance.
<details>