# Best Practice with LLaMA-Factory ## Contents - [Support Models](#Support-Models) - [LLaMA-Factory Installation](#LLaMA-Factory-Installation) - [Dataset Prepare](#Dataset-Prepare) - [Image Dataset](#Image-Dataset) - [Video Dataset](#Video-Dataset) - [Audio Dataset](#Audio-Dataset) - [Lora Fine-Tuning](#Lora-Fine-Tuning) - [Full Parameters Fine-Tuning](#Full-Parameters-Fine-Tuning) - [Inference](#Inference) ## Support Models * [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) * [openbmb/MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6) ## LLaMA-Factory Installation You can install LLaMA-Factory using commands below. ``` git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git cd LLaMA-Factory pip install -e ".[torch,metrics,deepspeed,minicpm_v]" mkdir configs # let's put all yaml files here ``` ## Dataset Prepare Refer to [data/dataset_info.json](https://github.com/hiyouga/LLaMA-Factory/blob/main/data/dataset_info.json) to add your customised dataset. Let's use the two existing demo datasets `mllm_demo` and `mllm_video_demo` as examples. ### Image Dataset Refer to image sft demo data: [data/mllm_demo.json](https://github.com/hiyouga/LLaMA-Factory/blob/main/data/mllm_demo.json)
data/mllm_demo.json ```json [ { "messages": [ { "content": "Who are they?", "role": "user" }, { "content": "They're Kane and Gretzka from Bayern Munich.", "role": "assistant" }, { "content": "What are they doing?", "role": "user" }, { "content": "They are celebrating on the soccer field.", "role": "assistant" } ], "images": [ "mllm_demo_data/1.jpg" ] }, { "messages": [ { "content": "Who is he?", "role": "user" }, { "content": "He's Thomas Muller from Bayern Munich.", "role": "assistant" }, { "content": "Why is he on the ground?", "role": "user" }, { "content": "Because he's sliding on his knees to celebrate.", "role": "assistant" } ], "images": [ "mllm_demo_data/2.jpg" ] }, { "messages": [ { "content": "Please describe this image", "role": "user" }, { "content": "Chinese astronaut Gui Haichao is giving a speech.", "role": "assistant" }, { "content": "What has he accomplished?", "role": "user" }, { "content": "He was appointed to be a payload specialist on Shenzhou 16 mission in June 2022, thus becoming the first Chinese civilian of Group 3 in space on 30 May 2023. He is responsible for the on-orbit operation of space science experimental payloads.", "role": "assistant" } ], "images": [ "mllm_demo_data/3.jpg" ] } ] ```
### Video Dataset Refer to video sft demo data: [data/mllm_video_demo.json](https://github.com/hiyouga/LLaMA-Factory/blob/main/data/mllm_video_demo.json)
data/mllm_video_demo.json ```json [ { "messages": [ { "content": "
### Audio Dataset Refer to audio sft demo data: [data/mllm_audio_demo.json](https://github.com/hiyouga/LLaMA-Factory/blob/main/data/mllm_audio_demo.json)
data/mllm_audio_demo.json ```json [ { "messages": [ { "content": "
## Lora Fine-Tuning We can use one command to do lora sft: ```shell CUDA_VISIBLE_DEVICES=0 llamafactory-cli train configs/minicpmo_2_6_lora_sft.yaml ```
configs/minicpmo_2_6_lora_sft.yaml ```yaml ### model model_name_or_path: openbmb/MiniCPM-o-2_6 # MiniCPM-o-2_6 MiniCPM-V-2_6 trust_remote_code: true ### method stage: sft do_train: true finetuning_type: lora lora_target: q_proj,v_proj ### dataset dataset: mllm_demo # mllm_demo mllm_video_demo mllm_audio_demo template: minicpm_v cutoff_len: 3072 max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 16 ### output output_dir: saves/minicpmo_2_6/lora/sft logging_steps: 1 save_steps: 100 plot_loss: true overwrite_output_dir: true save_total_limit: 10 ### train per_device_train_batch_size: 2 gradient_accumulation_steps: 1 learning_rate: 1.0e-5 num_train_epochs: 20.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000 save_only_model: true ### eval do_eval: false ```
### Lora Model Export One command to export lora model ```shell llamafactory-cli export configs/minicpmo_2_6_lora_export.yaml ```
configs/minicpmo_2_6_lora_export.yaml ```yaml ### model model_name_or_path: openbmb/MiniCPM-o-2_6 # MiniCPM-o-2_6 MiniCPM-V-2_6 adapter_name_or_path: saves/minicpmo_2_6/lora/sft template: minicpm_v finetuning_type: lora trust_remote_code: true ### export export_dir: models/minicpmo_2_6_lora_sft export_size: 2 export_device: cpu export_legacy_format: false ```
## Full Parameters Fine-Tuning We can use one command to do full sft: ```shell llamafactory-cli train configs/minicpmo_2_6_full_sft.yaml ```
configs/minicpmo_2_6_full_sft.yaml ```yaml ### model model_name_or_path: openbmb/MiniCPM-o-2_6 # MiniCPM-o-2_6 MiniCPM-V-2_6 trust_remote_code: true freeze_vision_tower: true print_param_status: true flash_attn: fa2 ### method stage: sft do_train: true finetuning_type: full deepspeed: configs/deepspeed/ds_z2_config.json ### dataset dataset: mllm_demo # mllm_demo mllm_video_demo template: minicpm_v cutoff_len: 3072 max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 16 ### output output_dir: saves/minicpmo_2_6/full/sft logging_steps: 1 save_steps: 100 plot_loss: true overwrite_output_dir: true save_total_limit: 10 ### train per_device_train_batch_size: 2 gradient_accumulation_steps: 1 learning_rate: 1.0e-5 num_train_epochs: 20.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000 save_only_model: true ### eval do_eval: false ```
## Inference ### Web UI ChatBox Refer [LLaMA-Factory doc](https://github.com/hiyouga/LLaMA-Factory/tree/main/examples#inferring-lora-fine-tuned-models) for more inference usages. For example, we can use one command to run web chat: ```shell CUDA_VISIBLE_DEVICES=0 llamafactory-cli webchat configs/minicpmo_2_6_infer.yaml ```
configs/minicpmo_2_6_infer.yaml ```yaml model_name_or_path: saves/minicpmo_2_6/full/sft template: minicpm_v infer_backend: huggingface trust_remote_code: true ```
### Official Code You can also use official code to inference
official inference code ```python # test.py import torch from PIL import Image from transformers import AutoModel, AutoTokenizer model_id = "saves/minicpmo_2_6/full/sft" model = AutoModel.from_pretrained(model_id, trust_remote_code=True, attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager model = model.eval().cuda() tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) image = Image.open('data/mllm_demo_data/1.jpg').convert('RGB') question = 'Who are they??' msgs = [{'role': 'user', 'content': [image, question]}] res = model.chat( image=None, msgs=msgs, tokenizer=tokenizer ) print(res) ```