experiment is stored in ./output
pip install -r requirements.txt
- Download dataset from BaiduYun(vj2d). Place them into the 'feat' directory.
AutoVMR needs to go through three stages: SFT, Reward model training, and PPO training. Please follow the instructions below to train AutoVMR-7B model.
- Download clip and Vicuna v1.5 weights, and place them into the 'checkpoints' directory.
- Download sft checkpoint(901c) weight, and place them into the 'checkpoints/sft_ckp/'
- Download reward model checkpoint(fcl1) weight, and place them into the 'checkpoints/reward_model_ckp/'
- Download ppo checkpoint(m0hw) weight, and place them into the 'checkpoints/ppo_ckp/'
-- checkpoints
-- clip
-- ppo_ckp
-- reward_model_ckp
-- v_uniform
-- sft_ckp
-- vtimellm-vicuna-v1-5-7b-stage1
-- vtimellm-vicuna-v1-5-7b-stage2
-- vtimellm-vicuna-v1-5-7b-stage_sft
-- vicuna-7b-v1.5
-- feat
-- scripts
-- sft.sh
-- reward_model.sh
-- ppo.sh
-- inference.sh
bash scripts/ppo.sh
bash scripts/ppo.sh
bash scripts/val.sh