Skip to content

Latest commit

 

History

History
56 lines (46 loc) · 3.1 KB

nkd.md

File metadata and controls

56 lines (46 loc) · 3.1 KB

NKD and tf-NKD

Paper: Rethinking Knowledge Distillation via Cross-Entropy

architecture

Train

#single GPU
python tools/train.py configs/distillers/imagenet/res18_sd_img.py

#multi GPU
bash tools/dist_train.sh configs/distillers/imagenet/res34_distill_res18_img.py 8

Transfer

# Tansfer the Distillation model into mmcls model
python pth_transfer.py --dis_path $dis_ckpt --output_path $new_mmcls_ckpt

Test

#single GPU
python tools/test.py configs/resnet/resnet18_8xb32_in1k.py $new_mmcls_ckpt --metrics accuracy

#multi GPU
bash tools/dist_test.sh configs/resnet/resnet18_8xb32_in1k.py $new_mmcls_ckpt 8 --metrics accuracy

Results

NKD

Model Teacher Baseline(Top-1 Acc) +NKD(Top-1 Acc) dis_config weight
ResNet18 ResNet34 69.90 71.96 (+2.06) config baidu/one drive
MobileNet ResNet50 69.21 72.58 (+3.37) config baidu/one drive

tf-NKD

Model Baseline(Top-1 Acc) +tf-NKD(Top-1 Acc) dis_config
MobileNet 69.21 70.04 (+0.83) config
MobileNetV2 71.86 72.08 (+0.22) config
ShuffleNetV2 69.55 69.93 (+0.38) config
ResNet18 69.90 70.76 (+0.86) config
ResNet50 76.55 76.93 (+0.38) config
Swin-Tiny 81.18 81.48 (+0.30) config

Citation

@article{yang2022rethinking,
  title={Rethinking Knowledge Distillation via Cross-Entropy},
  author={Yang, Zhendong and Li, Zhe and Gong, Yuan and Zhang, Tianke and Lao, Shanshan and Yuan, Chun and Li, Yu},
  journal={arXiv preprint arXiv:2208.10139},
  year={2022}
}