Skip to content

Commit

Permalink
Lionel lig 5994: Update the BYOL examples (#1794)
Browse files Browse the repository at this point in the history
* update byol example

* add good to know section

* typos

* Update docs/source/examples/byol.rst

Co-authored-by: stegmuel <[email protected]>

* Update docs/source/examples/byol.rst

Co-authored-by: stegmuel <[email protected]>

* Update docs/source/examples/byol.rst

Co-authored-by: stegmuel <[email protected]>

* add DINO hint

---------

Co-authored-by: stegmuel <[email protected]>
  • Loading branch information
liopeer and stegmuel authored Feb 4, 2025
1 parent 061223a commit 22464e3
Showing 1 changed file with 29 additions and 2 deletions.
31 changes: 29 additions & 2 deletions docs/source/examples/byol.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,37 @@
BYOL
====

Example implementation of the BYOL architecture.
BYOL (Bootstrap Your Own Latent) [0]_ is a self-supervised learning framework for visual
representation learning without negative samples. Unlike contrastive learning methods,
such as MoCo [1]_ and SimCLR [2]_ that compare positive and negative pairs, BYOL uses
two neural networks – "online" and "target" – where the online network is
trained to predict the target’s representation of the same image under different
augmentations, yielding in iterative bootstrapping of the latent samples.
The target's weights are updated as the exponential moving average
(EMA) of the online network, and the authors show that this is sufficient to prevent
collapse to trivial solutions. The authors also show that due to the absence
of negative samples, BYOL is less sensitive to the batch size during training and manages
to achieve state-of-the-art performance on several semi-supervised and transfer learning benchmarks.

Key Components
--------------

- **Data Augmentations**: BYOL [0]_ uses the same augmentations as SimCLR [2]_, namely random resized crop, random horizontal flip, color distortions, Gaussian blur and solarization. The color distortion consists of a random sequence of brightness, constrast, saturation, hue adjustments and an optional grayscale conversion. However the hyperparameters for the augmentations are different from SimCLR [2]_.
- **Backbone**: BYOL [0]_ uses ResNet-type convolutional backbones as the online and target networks. They do not evaluate the performance of other architectures.
- **Projection & Prediction Head**: A projection head is used to map the output of the backbone to a lower-dimensional space. For this, the target network once again relies on an EMA of the online network. A notable architectureal choice is the use of an additional prediction head, a secondary MLP appended to only the online network's projection head.
- **Loss Function**: BYOL [0]_ uses a negative cosine similarity loss between the representations of the online's prediction output and the target's projection output.

Good to Know
-------------

- **Backbone Networks**: SimCLR is specifically optimized for convolutional neural networks, with a focus on ResNet architectures. We do not recommend using it with transformer-based models and instead suggest using :doc:`DINO <dino>` [3]_.


Reference:
`Bootstrap your own latent: A new approach to self-supervised Learning, 2020 <https://arxiv.org/abs/2006.07733>`_
.. [0] `Bootstrap your own latent: A new approach to self-supervised Learning, 2020 <https://arxiv.org/abs/2006.07733>`_
.. [1] `Momentum Contrast for Unsupervised Visual Representation Learning, 2019 <https://arxiv.org/abs/1911.05722>`_
.. [2] `A Simple Framework for Contrastive Learning of Visual Representations, 2020 <https://arxiv.org/abs/2002.05709>`_
.. [3] `Emerging Properties in Self-Supervised Vision Transformers, 2021 <https://arxiv.org/abs/2104.14294>`_
.. tabs::
Expand Down

0 comments on commit 22464e3

Please sign in to comment.