Skip to content

Latest commit

 

History

History
110 lines (90 loc) · 5.88 KB

README.md

File metadata and controls

110 lines (90 loc) · 5.88 KB

Reimplementation Self-Supervised Vision Transformers for DINO v2 with Huggingface 🤗


  • Pytorch implementation and pretrained models for DINO v2 in remote sensing.
  • See Official Paper and Github for information in detail. [arXiv #1] [arXiv #2] [Github]

Training

This project use the deepspeed interface for multi gpu training

deepspeed --include localhost:0,1,2,3... vit_train.py
deepspeed --include localhost:0,1,2,3... convvit_train.py

Training Dataset for Remote Sensing

Dataset name # of corpus Dataset Paper
Million-AID 990,666 Link
SkyScript 5,181,068 Link
Total 6,171,734

Pretrained Model on Huggingface

Model Epoch Total Params Student Backbone Params Student DINO Head Params Student iBOT Head Params Weight & Config Logs
ViT-S/16-e25 25 132M 21M 22M 22M Link logs
ViT-S/16-e100 25 132M 21M 22M 22M Link logs
ViT-B/16-e25 25 264M 88M 21M 21M Link logs
ConvViT-S-e25(DINOv1) 25 88.5M 22.2M 22M x Link logs

Evaluation

The evaluation methods for DINOv2 are k-nn clustering and linear probing. 90% of the data is randomly selected as the training set while the 10% is selected as test set. The k=20 is selected for evaluation with K-NN. The evaluation datasets are including below table. The splited data is stored in linprob_data_lists.

Dataset Name Dataset Paper
RESISC Remote Sensing Image Scene Classification: Benchmark and State of the Art
Optimal 31 Scene Classification With Recurrent Attention of VHR Remote Sensing Images
MLRSNet MLRSNet: A Multi-label High Spatial Resolution Remote Sensing Dataset for Semantic Scene Understanding
WHU-RS19
EuroSAT EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
UC Merced Bag-of-visual-words and spatial extensions for land-use classification
Cv-BrCT AiRound and CV-BrCT: Novel Multi-View Datasets for Scene Classification
AiRound AiRound and CV-BrCT: Novel Multi-View Datasets for Scene Classification
RSI-CB128 RSI-CB: A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data

Linear Probing Evaluation

# train_textfile = linprob_data_lists/RESISC/train.txt
# test_textfile = linprob_data_lists/RESISC/test.txt

python3 evaluation/linprob.py --model-path {model_registry} \
                              --data-root {data_root} \
                              --train-text {train_textfile} \
                              --test-text {test_textfile}
Model RESISC Optimal 31 MLRSNet WHU-RS19 EuroSAT UC Merced Cv-BrCT AiRound RSI-CB128
ViT-S/16-e25 94.381 96.237 96.642 99.811 98.037 99.048 77.613 78.644 99.593
ViT-S/16-e100 94.381 95.161 96.349 100.00 97.704 99.048 76.910 79.407 99.539
ViT-B/16-e25 95.460 98.925 97.301 100.00 97.889 98.571 79.058 80.339 99.675
ConvViT-S-e25(DINOv1) 94.476 93.548 95.919 99.065 96.778 98.095 77.695 81.949 99.295

KNN Evaluation

# train_textfile = linprob_data_lists/RESISC/train.txt
# test_textfile = linprob_data_lists/RESISC/test.txt

python3 evaluation/knn.py --model-path {model_registry} \
                              --data-root {data_root} \
                              --train-text {train_textfile} \
                              --test-text {test_textfile}
Model RESISC Optimal 31 MLRSNet WHU-RS19 EuroSAT UC Merced Cv-BrCT AiRound RSI-CB128
ViT-S/16-e25 93.365 89.785 96.981 97.196 95.741 87.143 76.208 77.881 98.943
ViT-S/16-e100 93.746 94.624 97.081 97.196 96.222 86.667 75.960 76.695 98.808
ViT-B/16-e25 94.286 90.323 97.328 100.00 95.704 87.143 76.456 77.373 99.106
ConvViT-S-e25(DINOv1) 92.508 91.935 95.947 98.131 94.074 90.000 75.630 76.271 98.374

Property Analysis