[FEATURE] ViT with average pooling like ResNet #1555
Replies: 4 comments 3 replies
-
@CharlesLeeeee there are already some there, but discussions for these sorts of q please. |
Beta Was this translation helpful? Give feedback.
-
Most of these use global pooling, but they're relative position https://github.com/rwightman/pytorch-image-models/blob/main/timm/models/vision_transformer_relpos.py They don't really train much differently than token models though. Most of the newer weights for these are trained with a swin type recipe, which isn't far from deit. Deit should not blow up, it's possible you didn't adapt the LR to your batch size or didn't enable grad clipping... I have hparams somewhere in a cloud bucket but have to find them... |
Beta Was this translation helpful? Give feedback.
-
So far I am able to train a DeiT model with average pooling successfully by increasing the warmup-epochs from 5 to 20. However, when using a res-post-norm structure with avg pooling if it keeps on getting NaN. Are you using repeated-aug to get these results? I know that the swin model doesn't use repeated-aug |
Beta Was this translation helpful? Give feedback.
-
hparams from the rpn b16 attached, it was trained on TPU v4-8 so it's 4x256 global batch size, but it's using |
Beta Was this translation helpful? Give feedback.
-
Is there any attempt on training ViT models with global average pooling like resnet models? If yes, what are the exact hyperparameters used to get the best performance when training on imagenet1k from scratch
Beta Was this translation helpful? Give feedback.
All reactions