Skip to content

Commit

Permalink
Merge branch 'master' into lionel-lig-5994-update-the-byol-examples
Browse files Browse the repository at this point in the history
  • Loading branch information
liopeer authored Feb 4, 2025
2 parents 0a2f0cc + 061223a commit b2d41b7
Show file tree
Hide file tree
Showing 4 changed files with 38 additions and 2 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ lightning_logs/
**/__MACOSX
datasets/
dist/
docs/source/sg_execution_times.rst
docs/source/tutorials/package/*
docs/source/tutorials/platform/*
docs/source/tutorials_source/platform/data
Expand Down
2 changes: 1 addition & 1 deletion docs/source/examples/aim.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ architecture. It learns image representations by predicting pixel values for ima
patches based on previous patches in the image. This is similar to the next word prediction
task in natural language processing. AIM demonstrates that it is possible to train
large-scale vision models using an autoregressive objective. The model is split into
and encoder and a decoder part. The encoder generates features for image patches and
an encoder and a decoder part. The encoder generates features for image patches and
the decoder predicts pixel values based on the features.

Reference:
Expand Down
29 changes: 28 additions & 1 deletion docs/source/examples/nnclr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,34 @@
NNCLR
=====

Example implementation of the NNCLR architecture.
NNCLR is a self-supervised framework for visual representation learning that builds upon contrastive methods. It shares similarities with SimCLR, such as using two augmented views of the same image, projection and prediction heads, and a contrastive loss. However, it introduces key modifications:

1. Nearest Neighbor Replacement: Instead of directly comparing two augmented views of the same sample, NNCLR replaces each sample with its nearest neighbor in a support set (or memory bank). This increases semantic variation in the learned representations.
2. Symmetric Loss: The contrastive loss is made symmetric to improve training stability.
3. Architectural Adjustments: NNCLR employs different sizes for projection and prediction head layers compared to SimCLR.

These improvements result in significantly better performance across multiple self-supervised learning benchmarks. Compared to SimCLR and other self-supervised methods, NNCLR achieves:

- Higher ImageNet linear evaluation accuracy.
- Improved semi-supervised learning results.
- Superior performance on transfer learning tasks, outperforming BYOL, SimCLR, and even supervised ImageNet pretraining in 8 out of 12 benchmarked cases.

Key Components
--------------

- **Data Augmentations**: NNCLR applies the same transformations as SimCLR, including random cropping, resizing, color jittering, and Gaussian blur, to create diverse views of the same image.
- **Backbone**: A convolutional neural network (typically ResNet) encodes augmented images into feature representations.
- **Projection Head**: A multilayer perceptron (MLP) maps features into a contrastive space, improving representation learning.
- **Memory Bank**: NNCLR maintains a first-in, first-out (FIFO) memory bank, storing past feature representations. Older features are gradually discarded, ensuring a large and diverse set approximating the full dataset.
- **Nearest Neighbor Sampling**: Each feature representation is replaced by its nearest neighbor from the memory bank, introducing additional semantic variation beyond standard augmentations.
- **Contrastive Loss**: NNCLR employs normalized temperature-scaled cross-entropy loss (NT-Xent), encouraging alignment between positive pairs and separation from negative pairs.

Good to Know
----------------

- **Optimized for CNNs**: NNCLR is specifically designed for convolutional neural networks (CNNs), particularly ResNet. It is not recommended for transformer-based architectures.
- **Augmentation Robustness**: Compared to SimCLR, NNCLR is less dependent on strong augmentations since nearest neighbor sampling introduces natural semantic variation. However, performance still benefits from well-chosen augmentations and larger batch sizes.


Reference:
`With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations, 2021 <https://arxiv.org/abs/2104.14548>`_
Expand Down
8 changes: 8 additions & 0 deletions tests/models/test_ModelUtils.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,14 @@ def test__mask_reduce(
out = _mask_reduce(feature_map, mask, num_cls=3)
assert (out == expected_result).all()

def test_singular_mask(self) -> None:
b, c, h, w = 4, 16, 4, 4
proj = torch.randn((b, c, h, w))
mask = torch.zeros((b, h, w), dtype=torch.int64)
pooled_global = torch.mean(proj, dim=(2, 3)).unsqueeze(-1) # (b, c, 1=num_cls)
pooled_mask = pool_masked(proj, mask, num_cls=1) # (b, c, 1=num_cls)
assert torch.allclose(pooled_global, pooled_mask)


def has_grad(model: nn.Module) -> bool:
"""Helper method to check if a model has `requires_grad` set to True"""
Expand Down

0 comments on commit b2d41b7

Please sign in to comment.