This folder contains the implementation of the InternVL-Chat.
See INSTALLATION.md
In addition, using this codebase requires executing the following steps:
-
Install other requirements:
pip install --upgrade pip # enable PEP 660 support pip install -e .
model name | type | download | size |
---|---|---|---|
InternViT-6B-448px-V1-2 | ViT | 🤗 HF link | 11.1 GB |
Nous-Hermes-2-Yi-34B | LLM | 🤗 HF link | 65.0 GB |
Please download the above model weights and place them in the pretrained/
folder.
cd pretrained/
# pip install -U huggingface_hub
huggingface-cli download --resume-download --local-dir-use-symlinks False OpenGVLab/InternViT-6B-448px-V1-2 --local-dir intern_vit_6b_448px_v1_2
huggingface-cli download --resume-download --local-dir-use-symlinks False NousResearch/Nous-Hermes-2-Yi-34B --local-dir Nous-Hermes-2-Yi-34B
The directory structure is:
pretrained
│── intern_vit_6b_448px_v1_2/
└── Nous-Hermes-2-Yi-34B/
Inspired by LLaVA-NeXT, we adopted a data-efficient SFT strategy to train InternVL-Chat-V1.2, utilizing approximately 1.2M of visual instruction tuning samples in total, all of which are fully open-source. In a macro sense, we build upon ShareGPT-4V and additionally integrate LLaVA-ZH, DVQA, ChartQA, AI2D, DocVQA, GeoQA+, and SynthDoG-EN. Most of the data remains consistent with LLaVA-NeXT.
First, download the annotation files and place them in the playground/
folder.
Second, download all the images we used.
- AI2D: ai2d-all
- ChartQA: ChartQA Dataset
- COCO: train2017
- DocVQA: train, val, test
- DVQA: images
- GQA: images
- LLaVA-Pretrain: images
- OCR-VQA: download script. We save all files as
.jpg
- SAM: We only use 000000~000050.tar for now. You can quickly download 9K images from here.
- TextVQA: trainvalimages
- SynthDoG-EN: We only use 00000~00004 parquet files for now, with a total of 30K images. We provide the converted images.
- VisualGenome: part1, part2
- WebData: images. Only for academic usage.
- GeoQA+: GeoQA+
Then, organize the data as follows in playground/data
:
playground/
├── sharegpt4v_instruct_gpt4-vision_cap100k.jsonl
├── llava_instruct_150k_zh.jsonl
├── sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k.jsonl
├── dvqa_train_200k.jsonl
├── chartqa_train_18k.jsonl
├── ai2d_train_12k.jsonl
├── docvqa_train_10k.jsonl
├── geoqa+.jsonl
├── synthdog_en.jsonl
├── data
│ ├── ai2d
│ │ └── images
│ ├── chartqa
│ │ ├── test
│ │ ├── train
│ │ └── val
│ ├── coco
│ │ └── train2017
│ ├── docvqa
│ │ ├── test
│ │ ├── train
│ │ └── val
│ ├── dvqa
│ │ └── images
│ ├── gqa
│ │ └── images
│ ├── llava
│ │ └── llava_pretrain
│ │ └── images
│ ├── ocr_vqa
│ │ └── images
│ ├── sam
│ │ └── images
│ ├── share_textvqa
│ │ └── images
│ ├── synthdog-en
│ │ └── images
│ ├── textvqa
│ │ └── train_images
│ ├── vg
│ │ ├── VG_100K
│ │ └── VG_100K_2
│ ├── web-celebrity
│ │ └── images
│ ├── web-landmark
│ │ └── images
│ ├── wikiart
│ │ └── images
│ ├── geoqa+
│ │ └── images
We provide slurm scripts for multi-node multi-GPU training. You can use either 32 or 64 GPUs to train this model. If you use 64 GPUs, training will take approximately 18 hours.
- If you encounter an OOM error, you can decrease the
PER_DEVICE_BATCH_SIZE
, for example, setPER_DEVICE_BATCH_SIZE=4
.
# using 32 GPUs
PARTITION='your partition' GPUS=32 PER_DEVICE_BATCH_SIZE=8 sh shell/hermes2_yi34b/internvl_chat_v1_2_hermes2_yi34b_448_finetune.sh
# using 64 GPUs
PARTITION='your partition' GPUS=64 PER_DEVICE_BATCH_SIZE=8 sh shell/hermes2_yi34b/internvl_chat_v1_2_hermes2_yi34b_448_finetune.sh
The hyperparameters used for finetuning are listed in the following table.
Hyperparameter | Trainable Param | Global Batch Size | Learning rate | Epochs | Max length | Weight decay |
---|---|---|---|---|---|---|
InternVL-Chat-V1.2 | 40B | 512 | 1e-5 | 1 | 2048 | 0.05 |
MultiModal Benchmark
model | MME | MMBdev/test | MMB-CNdev/test | POPE | MMVP | MathVista |
---|---|---|---|---|---|---|
InternVL-Chat-V1.1 | 1672.3 / 341.1 | 76.6 / 75.4 | 71.5 / 70.1 | 87.2 | 44.7 | 34.5 |
InternVL-Chat-V1.2 | 1672.1 / 509.3 | 81.4 / 82.2 | 79.5 / 81.2 | 88.0 | 56.7 | 47.7 |
model | MMMUval/test | CMMMUval/test | TinyLVLM | LLaVAbench | MM-Vet |
---|---|---|---|---|---|
InternVL-Chat-V1.1 | 39.1 / 35.3 | 34.8 / 34.0 | 344.5 | 76.3 | 45.0 |
InternVL-Chat-V1.2 | 51.6 / 46.2 | TODO | 350.3 | - | 48.9 |
Visual Question Answering
model | VQAv2test | OKVQAval | TextVQAval | VizWizval/test | AI2Dtest | GQAtest | SQAtest |
---|---|---|---|---|---|---|---|
InternVL-Chat-V1.1 | 80.9 | 64.2 | 65.8 | 58.3 / 57.3 | 70.2 | 62.4 | 91.2 |
InternVL-Chat-V1.2 | - | 62.5 | 69.7 | 61.9 / 60.0 | 71.6 | 64.0 | 83.3 |
Image Captioning
model | COCOtest | Flickr30Ktest | NoCapsval |
---|---|---|---|
InternVL-Chat-V1.1 | 141.8* | 84.3 | 120.4 |
InternVL-Chat-V1.2 | 113.9 | 92.4 | 112.5 |
model | QLLaMA | LLM | res | COCO | Flickr | NoCaps | VQAv2 | GQA | VizWiz | TextVQA | MME | POPE | Download |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
InternVL-Chat | ✔️ | frozen V-7B | 224 | 141.4 | 89.7 | 120.5 | 72.3 | 57.7 | 44.5 | 42.1 | 1298.5 | 85.2 | TODO |
InternVL-Chat | ✔️ | frozen V-13B | 224 | 142.4 | 89.9 | 123.1 | 71.7 | 59.5 | 54.0 | 49.1 | 1317.2 | 85.4 | TODO |
InternVL-Chat | ✔️ | V-13B | 336 | 146.2 | 92.2 | 126.2 | 81.2 | 66.6 | 58.5 | 61.5 | 1586.4 | 87.6 | TODO |
Please prepare the data according to the following directory structure.
Directory Structure
data
├── flickr30k
│ ├── flickr30k_test_karpathy.json
│ └── Images/
├── coco
│ ├── annotations
│ │ ├── coco_karpathy_test_gt.json
│ │ ├── coco_karpathy_test.json
│ │ └── ...
│ ├── train2014/
│ ├── val2014/
│ └── test2015/
├── nocaps
│ ├── nocaps_val_4500_captions.json
│ └── images/
├── vqav2
│ ├── v2_mscoco_train2014_annotations.json
│ ├── v2_mscoco_train2014_complementary_pairs.json
│ ├── v2_mscoco_val2014_annotations.json
│ ├── v2_OpenEnded_mscoco_test2015_questions.json
│ ├── v2_OpenEnded_mscoco_test-dev2015_questions.json
│ ├── v2_OpenEnded_mscoco_train2014_questions.json
│ ├── v2_OpenEnded_mscoco_val2014_questions.json
│ ├── vqav2_testdev.jsonl
│ ├── vqav2_train.jsonl
│ ├── vqav2_val.jsonl
│ ├── train2014/ -> ../coco/train2014/
│ ├── val2014/ -> ../coco/val2014/
│ └── test2015/ -> ../coco/test2015/
├── okvqa
│ ├── mscoco_train2014_annotations.json
│ ├── mscoco_val2014_annotations.json
│ ├── OpenEnded_mscoco_train2014_questions.json
│ ├── OpenEnded_mscoco_val2014_questions.json
│ ├── okvqa_train.jsonl
│ ├── okvqa_val.jsonl
│ ├── train2014/ -> ../coco/train2014/
│ └── val2014/ -> ../coco/val2014/
├── textvqa
│ ├── textvqa_train_annotations.json
│ ├── textvqa_train.jsonl
│ ├── textvqa_train_questions.json
│ ├── textvqa_val_annotations.json
│ ├── textvqa_val.jsonl
│ ├── textvqa_val_questions.json
│ ├── textvqa_val_llava.jsonl
│ └── train_images/
├── vizwiz
│ ├── vizwiz_test.jsonl
│ ├── vizwiz_train_annotations.json
│ ├── vizwiz_train.jsonl
│ ├── vizwiz_train_questions.json
│ ├── vizwiz_val_annotations.json
│ ├── vizwiz_val.jsonl
│ ├── vizwiz_val_questions.json
│ ├── test/
│ ├── train/
│ ├── val/
│ └── annotations/
├── docvqa
│ ├── test.jsonl
│ ├── train.jsonl
│ ├── val.jsonl
│ ├── test/
│ ├── train/
│ └── val/
├── chartqa
│ ├── ChartQA Dataset/
│ │ ├── train/
│ │ ├── test/
│ │ ├── val/
│ ├── test_augmented.jsonl
│ ├── test_human.jsonl
│ ├── train_augmented.jsonl
│ └── train_human.jsonl
├── gqa
│ ├── images/
│ ├── eval.py
│ ├── challenge_all_questions.json
│ ├── challenge_balanced_questions.json
│ ├── llava_gqa_testdev_balanced_qwen_format.jsonl
│ ├── submission_all_questions.json
│ ├── test_all_questions.json
│ ├── test_balanced.jsonl
│ ├── test_balanced_questions.json
│ ├── testdev_all_questions.json
│ ├── testdev_balanced_all_questions.json
│ ├── testdev_balanced_questions.json
│ ├── train_all_questions/
│ ├── train_balanced.jsonl
│ ├── train_balanced_questions.json
│ ├── val_all_questions.json
│ └── val_balanced_questions.json
├── ocrvqa
│ ├── images/
│ ├── ocrvqa_test.jsonl
│ ├── ocrvqa_train.jsonl
│ └── ocrvqa_val.jsonl
├── ai2diagram
│ ├── ai2d/
│ ├── test.jsonl
│ └── train.jsonl
├── scienceqa
│ ├── images/
│ ├── problems.json
│ └── scienceqa_test_img.jsonl
├── refcoco
│ ├── refcocog_test.jsonl
│ ├── refcocog_val.jsonl
│ ├── refcoco_testA.jsonl
│ ├── refcoco+_testA.jsonl
│ ├── refcoco_testB.jsonl
│ ├── refcoco+_testB.jsonl
│ ├── refcoco_val.jsonl
│ └── refcoco+_val.jsonl
├── mme
│ ├── MME_Benchmark_release/
│ └── images/
├── pope
│ ├── coco/
│ │ ├── coco_pope_adversarial.json
│ │ ├── coco_pope_popular.json
│ │ └── coco_pope_random.json
│ ├── val2014/ -> ../coco/val2014/
│ └── llava_pope_test.jsonl
├── tiny_lvlm
│ └── updated_datasets
│ ├── Object_Hallucination
│ ├── ...
│ └── Visual_Reasoning
├── mmbench
│ ├── mmbench_dev_20230712.tsv
│ ├── mmbench_dev_cn_20231003.tsv
│ ├── mmbench_dev_en_20231003.tsv
│ ├── mmbench_test_cn_20231003.tsv
│ └── mmbench_test_en_20231003.tsv
├── llava-bench-in-the-wild
│ ├── answers_gpt4.jsonl
│ ├── ...
│ └── images/
├── mmmu
│ ├── Accounting/
│ ├── ...
│ └── Sociology
├── mm-vet
│ └── images/
├── MMVP
│ ├── MMVP Images/
│ ├── Questions.csv
│ └── Questions.xlsx
├── MMVP_VLM
│ ├── MLLM_VLM Images/
│ └── Questions.csv
├── MathVista
│ ├── annot_testmini.json
│ └── AI4Math___math_vista/
├── SEED
│ ├── SEED-Bench.json
│ ├── SEED-Bench-image/
│ └── SEED-Bench-video-image-1/
COCO images are used in VQAv2/OK-VQA/RefCOCO/RefCOCO+/RefCOCOg. Make sure you have already downloaded COCO images before evaluating on these benchmarks.
Data Preparation
mkdir -p data/coco && cd data/coco
# download coco images
wget http://images.cocodataset.org/zips/train2014.zip && unzip train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip && unzip val2014.zip
wget http://images.cocodataset.org/zips/test2015.zip && unzip test2015.zip
mkdir -p annotations && cd annotations/
# download converted annotation files
wget https://github.com/OpenGVLab/InternVL/releases/download/data/coco_karpathy_test.json
wget https://github.com/OpenGVLab/InternVL/releases/download/data/coco_karpathy_test_gt.json
cd ../../../
Evaluation
GPUS=8 sh evaluate.sh <checkpoint> caption-coco
Data Preparation
mkdir -p data/flickr30k && cd data/flickr30k
# download images from https://bryanplummer.com/Flickr30kEntities/
# karpathy split annotations can be downloaded from https://cs.stanford.edu/people/karpathy/deepimagesent/
# download converted files
wget https://github.com/OpenGVLab/InternVL/releases/download/data/flickr30k_test_karpathy.json
cd ../..
Evaluation
GPUS=8 sh evaluate.sh <checkpoint> caption-flickr30k
Data Preparation
mkdir -p data/nocaps && cd data/nocaps
# download images from https://nocaps.org/download
# original annotations can be downloaded from https://nocaps.s3.amazonaws.com/nocaps_val_4500_captions.json
wget https://nocaps.s3.amazonaws.com/nocaps_val_4500_captions.json
cd ../..
Evaluation
GPUS=8 sh evaluate.sh <checkpoint> caption-nocaps
Data Preparation
mkdir -p data/vqav2 && cd data/vqav2
# make sure you have downloaded COCO images
ln -s ../coco/train2014 ./
ln -s ../coco/val2014 ./
ln -s ../coco/test2015 ./
# download questions and annotations
wget https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Annotations_Train_mscoco.zip && unzip v2_Annotations_Train_mscoco.zip
wget https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Questions_Train_mscoco.zip && unzip v2_Questions_Train_mscoco.zip
wget https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Annotations_Val_mscoco.zip && unzip v2_Annotations_Val_mscoco.zip
wget https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Questions_Val_mscoco.zip && unzip v2_Questions_Val_mscoco.zip
wget https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Questions_Test_mscoco.zip && unzip v2_Questions_Test_mscoco.zip
# download converted files
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/vqav2/vqav2_train.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/vqav2/vqav2_val.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/vqav2/vqav2_testdev.jsonl
cd ../..
Evaluation
# VQAv2-val
GPUS=8 sh evaluate.sh <checkpoint> vqa-vqav2-val
# VQAv2-testdev
GPUS=8 sh evaluate.sh <checkpoint> vqa-vqav2-testdev
For the testdev set, submit the results to the evaluation server.
Data Preparation
mkdir -p data/okvqa && cd data/okvqa
# make sure you have downloaded COCO images
ln -s ../coco/train2014 ./
ln -s ../coco/val2014 ./
# download annotations and questions
wget https://okvqa.allenai.org/static/data/mscoco_train2014_annotations.json.zip && unzip mscoco_train2014_annotations.json.zip
wget https://okvqa.allenai.org/static/data/OpenEnded_mscoco_train2014_questions.json.zip && unzip OpenEnded_mscoco_train2014_questions.json.zip
wget https://okvqa.allenai.org/static/data/mscoco_val2014_annotations.json.zip && unzip mscoco_val2014_annotations.json.zip
wget https://okvqa.allenai.org/static/data/OpenEnded_mscoco_val2014_questions.json.zip && unzip OpenEnded_mscoco_val2014_questions.json.zip
# download converted files
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/okvqa/okvqa_train.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/okvqa/okvqa_val.jsonl
cd ../..
Evaluation
GPUS=8 sh evaluate.sh <checkpoint> vqa-okvqa-val
Data Preparation
mkdir -p data/textvqa && cd data/textvqa
# download images
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip && unzip train_val_images.zip
# download annotations and questions
wget https://dl.fbaipublicfiles.com/textvqa/data/TextVQA_0.5.1_train.json
wget https://dl.fbaipublicfiles.com/textvqa/data/TextVQA_0.5.1_val.json
# download converted files
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/textvqa/textvqa_train_annotations.json
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/textvqa/textvqa_train_questions.json
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/textvqa/textvqa_train.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/textvqa/textvqa_val_annotations.json
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/textvqa/textvqa_val_questions.json
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/textvqa/textvqa_val.jsonl
wget https://github.com/OpenGVLab/InternVL/releases/download/data/textvqa_val_llava.jsonl
cd ../..
Evaluation
GPUS=8 sh evaluate.sh <checkpoint> vqa-textvqa-val
Data Preparation
mkdir -p data/vizwiz && cd data/vizwiz
# download images
wget https://vizwiz.cs.colorado.edu/VizWiz_final/images/train.zip && unzip train.zip
wget https://vizwiz.cs.colorado.edu/VizWiz_final/images/val.zip && unzip val.zip
wget https://vizwiz.cs.colorado.edu/VizWiz_final/images/test.zip && unzip test.zip
# download annotations
wget https://vizwiz.cs.colorado.edu/VizWiz_final/vqa_data/Annotations.zip && unzip Annotations.zip
# download converted files
# train
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/vizwiz/vizwiz_train_annotations.json
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/vizwiz/vizwiz_train_questions.json
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/vizwiz/vizwiz_train.jsonl
# val
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/vizwiz/vizwiz_val_annotations.json
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/vizwiz/vizwiz_val_questions.json
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/vizwiz/vizwiz_val.jsonl
# test
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/vizwiz/vizwiz_test.jsonl
cd ../..
Evaluation
# VizWiz val
GPUS=8 sh evaluate.sh <checkpoint> vqa-vizwiz-val
# VizWiz test
GPUS=8 sh evaluate.sh <checkpoint> vqa-vizwiz-test
For the test set, submit the results to the evaluation server.
Data Preparation
mkdir -p data/docvqa && cd data/docvqa
# download images and annotations from https://www.docvqa.org/datasets
# download converted files
# train
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/docvqa/train.jsonl
# val
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/docvqa/val.jsonl
# test
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/docvqa/test.jsonl
cd ../..
Evaluation
# DocVQA-val
GPUS=8 sh evaluate.sh <checkpoint> vqa-docvqa-val
# DocVQA-test
GPUS=8 sh evaluate.sh <checkpoint> vqa-docvqa-test
For the test set, submit the results to the evaluation server.
Data Preparation
mkdir -p data/chartqa && cd data/chartqa
# download images from https://drive.google.com/file/d/1Lm_w6zeET1Hyl_9ks6w5nEsgpoyPHalV/view
# download converted files
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/chartqa/train_human.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/chartqa/train_augmented.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/chartqa/test_human.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/chartqa/test_augmented.jsonl
cd ../..
Evaluation
# ChartQA-test-human
GPUS=8 sh evaluate.sh <checkpoint> vqa-chartqa-test-human
# ChartQA-test-augmented
GPUS=8 sh evaluate.sh <checkpoint> vqa-chartqa-test-augmented
Data Preparation
mkdir -p data/gqa && cd data/gqa
# download images
wget https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip
unzip images.zip
# download converted files
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/gqa/testdev_balanced.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/gqa/train_balanced.jsonl
wget https://github.com/OpenGVLab/InternVL/releases/download/data/llava_gqa_testdev_balanced_qwen_format.jsonl
cd ../..
Evaluation
GPUS=8 sh evaluate.sh <checkpoint> vqa-gqa-testdev
Data Preparation
mkdir -p data/ocrvqa && cd data/ocrvqa
# download images by following instructions at https://ocr-vqa.github.io/kvqa_ProjectFiles/README.txt
# download converted files
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/ocrvqa/ocrvqa_train.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/ocrvqa/ocrvqa_val.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/ocrvqa/ocrvqa_test.jsonl
cd ../..
Evaluation
# OCRVQA-val
GPUS=8 sh evaluate.sh <checkpoint> vqa-ocrvqa-val
# OCRVQA-test
GPUS=8 sh evaluate.sh <checkpoint> vqa-ocrvqa-test
Data Preparation
mkdir -p data/ai2diagram && cd data/ai2diagram
# download images
wget https://ai2-public-datasets.s3.amazonaws.com/diagrams/ai2d-all.zip
# download converted files
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/ai2diagram/train.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/ai2diagram/test.jsonl
cd ../..
Evaluation
GPUS=8 sh evaluate.sh <checkpoint> vqa-ai2d-test
Data Preparation
mkdir -p data/scienceqa/images && cd data/scienceqa/images
# download images
wget https://scienceqa.s3.us-west-1.amazonaws.com/images/test.zip && unzip test.zip
cd ..
# download original questions
wget https://github.com/lupantech/ScienceQA/blob/main/data/scienceqa/problems.json
# download converted files
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/scienceqa/scienceqa_test_img.jsonl
cd ../..
Evaluation
GPUS=8 sh evaluate.sh <checkpoint> scienceqa
Data Preparation
mkdir -p data/refcoco && cd data/refcoco
# download converted files
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/refcoco/refcoco_val.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/refcoco/refcoco_testA.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/refcoco/refcoco_testB.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/refcoco%2B/refcoco%2B_val.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/refcoco%2B/refcoco%2B_testA.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/refcoco%2B/refcoco%2B_testB.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/refcocog/refcocog_val.jsonl
wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/refcocog/refcocog_test.jsonl
cd ../..
Evaluation
GPUS=8 sh evaluate.sh <checkpoint> refcoco
Data Preparation
mkdir -p data/mme && cd data/mme
# 1. Download MME images and eval_tool from the [MME repo](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/blob/Evaluation/README.md)
# 2. Rearrange images by executing `python get_images.py`
python get_images.py
cd ../..
Evaluation
# single GPU testing
CUDA_VISIBLE_DEVICES=0 sh evaluate.sh <checkpoint> mme
Data Preparation
mkdir -p data/mmbench && cd data/mmbench
# download csv files of mmbench
wget https://download.openmmlab.com/mmclassification/datasets/mmbench/mmbench_dev_20230712.tsv
wget https://download.openmmlab.com/mmclassification/datasets/mmbench/mmbench_dev_cn_20231003.tsv
wget https://download.openmmlab.com/mmclassification/datasets/mmbench/mmbench_dev_en_20231003.tsv
wget https://download.openmmlab.com/mmclassification/datasets/mmbench/mmbench_test_cn_20231003.tsv
wget https://download.openmmlab.com/mmclassification/datasets/mmbench/mmbench_test_en_20231003.tsv
cd ../..
Evaluation
# mmbench_dev_20230712
GPUS=8 sh evaluate.sh <checkpoint> mmbench-dev-en
# mmbench_dev_cn_20231003
GPUS=8 sh evaluate.sh <checkpoint> mmbench-dev-cn
# mmbench_test_en_20231003
GPUS=8 sh evaluate.sh <checkpoint> mmbench-test-en
# mmbench_test_cn_20231003
GPUS=8 sh evaluate.sh <checkpoint> mmbench-test-cn
Then, submit the results to the evaluation server.
Data Preparation
mkdir -p data/pope && cd data/pope
# make sure you have downloaded COCO images
ln -s ../coco/val2014 ./
wget https://github.com/OpenGVLab/InternVL/releases/download/data/llava_pope_test.jsonl
# download `coco` from POPE
mkdir -p coco && cd coco
wget https://github.com/AoiDragon/POPE/raw/e3e39262c85a6a83f26cf5094022a782cb0df58d/output/coco/coco_pope_adversarial.json
wget https://github.com/AoiDragon/POPE/raw/e3e39262c85a6a83f26cf5094022a782cb0df58d/output/coco/coco_pope_popular.json
wget https://github.com/AoiDragon/POPE/raw/e3e39262c85a6a83f26cf5094022a782cb0df58d/output/coco/coco_pope_random.json
cd ../../..
Evaluation
GPUS=8 sh evaluate.sh <checkpoint> pope
Data Preparation
The evaluation code will automatically download the dataset from hugging face.
Evaluation
# dev set
GPUS=8 sh evaluate.sh <checkpoint> mmmu-dev
# val set
GPUS=8 sh evaluate.sh <checkpoint> mmmu-val
# test set
GPUS=8 sh evaluate.sh <checkpoint> mmmu-test
For the test set, submit the results to the evaluation server.
Data Preparation
mkdir -p data/tiny_lvlm && cd data/tiny_lvlm
# download dataset from https://github.com/OpenGVLab/Multi-Modality-Arena/tree/main/tiny_lvlm_evaluation
# i.e., download `updated_datasets.tar.gz` from https://drive.google.com/file/d/1PuFC612XzOmKwzRldtBb1CFZnIjiR7we/view
tar -xzvf updated_datasets.tar.gz
cd ../..
Evaluation
GPUS=8 sh evaluate.sh <checkpoint> tiny_lvlm
Data Preparation
cd data/
# download dataset from https://huggingface.co/datasets/liuhaotian/llava-bench-in-the-wild
git clone https://huggingface.co/datasets/liuhaotian/llava-bench-in-the-wild
cd llava-bench-in-the-wild/
rm -rf images && mkdir -p images && cd images
# download all 24 images
wget https://huggingface.co/datasets/liuhaotian/llava-bench-in-the-wild/resolve/main/images/001.jpg
# ...
wget https://huggingface.co/datasets/liuhaotian/llava-bench-in-the-wild/resolve/main/images/024.jpg
cd ../../../
Evaluation
# single GPU testing
export OPENAI_API_KEY='your_gpt4_key'
CUDA_VISIBLE_DEVICES=0 sh evaluate.sh <checkpoint> llava-bench
Data Preparation
mkdir -p data/mm-vet && cd data/mm-vet
wget https://github.com/yuweihao/MM-Vet/releases/download/v1/mm-vet.zip
unzip mm-vet.zip
cd ../..
Evaluation
GPUS=8 sh evaluate.sh <checkpoint> mmvet
Data Preparation
cd data
git lfs install
git clone https://huggingface.co/datasets/MMVP/MMVP
git clone https://huggingface.co/datasets/MMVP/MMVP_VLM
cd ..
Evaluation
GPUS=8 sh evaluate.sh <checkpoint> mmvp
Data Preparation
mkdir -p data/MathVista && cd data/MathVista
# Execute the following python code
# from datasets import load_dataset
# dataset = load_dataset("AI4Math/MathVista")
# dataset.save_to_disk('./MathVista')
wget https://huggingface.co/datasets/AI4Math/MathVista/raw/main/annot_testmini.json
cd ../..
Evaluation
# testmini set
GPUS=8 sh evaluate.sh <checkpoint> mathvista-testmini
# test set
GPUS=8 sh evaluate.sh <checkpoint> mathvista-test
Data Preparation
- Follow the official instructions Data Preparation for SEED-Bench-1 to download the images and the videos. Put images under
./playground/data/eval/seed_bench/SEED-Bench-image
. - Extract the video frame in the middle from the downloaded videos, and put them under
./playground/data/eval/seed_bench/SEED-Bench-video-image
. We provide our scriptextract_video_frames.py
modified from the official one.
Evaluation
GPUS=8 sh evaluate.sh <checkpoint> seed