Reimplementation Self-Supervised Vision Transformers for DINO v2 with Huggingface 🤗

Pytorch implementation and pretrained models for DINO v2 in remote sensing.
See Official Paper and Github for information in detail. [arXiv #1] [arXiv #2] [Github]

Training

This project use the deepspeed interface for multi gpu training

deepspeed --include localhost:0,1,2,3... vit_train.py

deepspeed --include localhost:0,1,2,3... convvit_train.py

Training Dataset for Remote Sensing

Dataset name	# of corpus	Dataset Paper
Million-AID	990,666	Link
SkyScript	5,181,068	Link
Total	6,171,734

Pretrained Model on Huggingface

Model	Epoch	Total Params	Student Backbone Params	Student DINO Head Params	Student iBOT Head Params	Weight & Config	Logs
ViT-S/16-e25	25	132M	21M	22M	22M	Link	logs
ViT-S/16-e100	25	132M	21M	22M	22M	Link	logs
ViT-B/16-e25	25	264M	88M	21M	21M	Link	logs
ConvViT-S-e25(DINOv1)	25	88.5M	22.2M	22M	x	Link	logs

Evaluation

The evaluation methods for DINOv2 are k-nn clustering and linear probing. 90% of the data is randomly selected as the training set while the 10% is selected as test set. The k=20 is selected for evaluation with K-NN. The evaluation datasets are including below table. The splited data is stored in linprob_data_lists.

Dataset Name	Dataset Paper
`RESISC`	Remote Sensing Image Scene Classification: Benchmark and State of the Art
`Optimal 31`	Scene Classification With Recurrent Attention of VHR Remote Sensing Images
`MLRSNet`	MLRSNet: A Multi-label High Spatial Resolution Remote Sensing Dataset for Semantic Scene Understanding
`WHU-RS19`
`EuroSAT`	EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
`UC Merced`	Bag-of-visual-words and spatial extensions for land-use classification
`Cv-BrCT`	AiRound and CV-BrCT: Novel Multi-View Datasets for Scene Classification
`AiRound`	AiRound and CV-BrCT: Novel Multi-View Datasets for Scene Classification
`RSI-CB128`	RSI-CB: A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data

Linear Probing Evaluation

# train_textfile = linprob_data_lists/RESISC/train.txt
# test_textfile = linprob_data_lists/RESISC/test.txt

python3 evaluation/linprob.py --model-path {model_registry} \
                              --data-root {data_root} \
                              --train-text {train_textfile} \
                              --test-text {test_textfile}

Model	RESISC	Optimal 31	MLRSNet	WHU-RS19	EuroSAT	UC Merced	Cv-BrCT	AiRound	RSI-CB128
ViT-S/16-e25	94.381	96.237	96.642	99.811	98.037	99.048	77.613	78.644	99.593
ViT-S/16-e100	94.381	95.161	96.349	100.00	97.704	99.048	76.910	79.407	99.539
ViT-B/16-e25	95.460	98.925	97.301	100.00	97.889	98.571	79.058	80.339	99.675
ConvViT-S-e25(DINOv1)	94.476	93.548	95.919	99.065	96.778	98.095	77.695	81.949	99.295

KNN Evaluation

# train_textfile = linprob_data_lists/RESISC/train.txt
# test_textfile = linprob_data_lists/RESISC/test.txt

python3 evaluation/knn.py --model-path {model_registry} \
                              --data-root {data_root} \
                              --train-text {train_textfile} \
                              --test-text {test_textfile}

Model	RESISC	Optimal 31	MLRSNet	WHU-RS19	EuroSAT	UC Merced	Cv-BrCT	AiRound	RSI-CB128
ViT-S/16-e25	93.365	89.785	96.981	97.196	95.741	87.143	76.208	77.881	98.943
ViT-S/16-e100	93.746	94.624	97.081	97.196	96.222	86.667	75.960	76.695	98.808
ViT-B/16-e25	94.286	90.323	97.328	100.00	95.704	87.143	76.456	77.373	99.106
ConvViT-S-e25(DINOv1)	92.508	91.935	95.947	98.131	94.074	90.000	75.630	76.271	98.374

Property Analysis

Feature Mapping - feature_mapping.ipynb
Sparse Feature Matching - vit-feature-matching.ipynb
Image Retrieval - index_search.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Reimplementation Self-Supervised Vision Transformers for DINO v2 with Huggingface 🤗

Training

Training Dataset for Remote Sensing

Pretrained Model on Huggingface

Evaluation

Linear Probing Evaluation

KNN Evaluation

Property Analysis

Files

README.md

Latest commit

History

README.md

File metadata and controls

Reimplementation Self-Supervised Vision Transformers for DINO v2 with Huggingface 🤗

Training

Training Dataset for Remote Sensing

Pretrained Model on Huggingface

Evaluation

Linear Probing Evaluation

KNN Evaluation

Property Analysis