Reimplementation Self-Supervised Vision Transformers for DINO v2 with Huggingface 🤗
Pytorch implementation and pretrained models for DINO v2 in remote sensing.
See Official Paper and Github for information in detail.
[arXiv #1
]
[arXiv #2
]
[Github
]
This project use the deepspeed interface for multi gpu training
deepspeed --include localhost:0,1,2,3... vit_train.py
deepspeed --include localhost:0,1,2,3... convvit_train.py
Training Dataset for Remote Sensing
Dataset name
# of corpus
Dataset Paper
Million-AID
990,666
Link
SkyScript
5,181,068
Link
Total
6,171,734
Pretrained Model on Huggingface
Model
Epoch
Total Params
Student Backbone Params
Student DINO Head Params
Student iBOT Head Params
Weight & Config
Logs
ViT-S/16-e25
25
132M
21M
22M
22M
Link
logs
ViT-S/16-e100
25
132M
21M
22M
22M
Link
logs
ViT-B/16-e25
25
264M
88M
21M
21M
Link
logs
ConvViT-S-e25(DINOv1)
25
88.5M
22.2M
22M
x
Link
logs
The evaluation methods for DINOv2 are k-nn clustering and linear probing. 90% of the data is randomly selected as the training set while the 10% is selected as test set. The k=20
is selected for evaluation with K-NN. The evaluation datasets are including below table. The splited data is stored in linprob_data_lists .
Linear Probing Evaluation
# train_textfile = linprob_data_lists/RESISC/train.txt
# test_textfile = linprob_data_lists/RESISC/test.txt
python3 evaluation/linprob.py --model-path {model_registry} \
--data-root {data_root} \
--train-text {train_textfile} \
--test-text {test_textfile}
Model
RESISC
Optimal 31
MLRSNet
WHU-RS19
EuroSAT
UC Merced
Cv-BrCT
AiRound
RSI-CB128
ViT-S/16-e25
94.381
96.237
96.642
99.811
98.037
99.048
77.613
78.644
99.593
ViT-S/16-e100
94.381
95.161
96.349
100.00
97.704
99.048
76.910
79.407
99.539
ViT-B/16-e25
95.460
98.925
97.301
100.00
97.889
98.571
79.058
80.339
99.675
ConvViT-S-e25(DINOv1)
94.476
93.548
95.919
99.065
96.778
98.095
77.695
81.949
99.295
# train_textfile = linprob_data_lists/RESISC/train.txt
# test_textfile = linprob_data_lists/RESISC/test.txt
python3 evaluation/knn.py --model-path {model_registry} \
--data-root {data_root} \
--train-text {train_textfile} \
--test-text {test_textfile}
Model
RESISC
Optimal 31
MLRSNet
WHU-RS19
EuroSAT
UC Merced
Cv-BrCT
AiRound
RSI-CB128
ViT-S/16-e25
93.365
89.785
96.981
97.196
95.741
87.143
76.208
77.881
98.943
ViT-S/16-e100
93.746
94.624
97.081
97.196
96.222
86.667
75.960
76.695
98.808
ViT-B/16-e25
94.286
90.323
97.328
100.00
95.704
87.143
76.456
77.373
99.106
ConvViT-S-e25(DINOv1)
92.508
91.935
95.947
98.131
94.074
90.000
75.630
76.271
98.374