Implementation of a Masked Autoencoder for representation learning #8152

Lucas-rbnt · 2024-10-15T14:23:53Z

This follows a previous PR (#7598).

In the previous PR, the official implementation was under a non-compatible license. This is a clean-sheet implementation I developed. The code is fairly straightforward, involving a transformer, encoder, and decoder. The primary changes are in how masks are selected and how patches are organized as they pass through the model.

In the official masked autoencoder implementation, noise is first generated and then sorted twice using torch.argsort. This rearranges the tokens and identifies which ones are retained, ultimately selecting only a subset of the shuffled indices.

In our implementation, we use torch.multinomial to generate mask indices, followed by simple boolean indexing to manage the sub-selection of patches for encoding and the reordering with mask tokens in the decoder.

Let me know if you need a detailed, line-by-line explanation of the new code, including how it works and how it differs from the previous version.

Description

Implementation of the Masked Autoencoder as described in the paper: Masked Autoencoders Are Scalable Vision Learners from Kaiming et al.

Its effectiveness has already been demonstrated in the literature for medical tasks in the paper Self Pre-training with Masked Autoencoders for Medical Image Classification and Segmentation.
The PR contains the architecture and associated unit tests.

Note: The output includes the prediction, which is a tensor of size: ($BS$, $N_{tokens}$, $D$), and the associated mask ($BS$, $N_{tokens}$). The mask is used to apply loss only to masked patches, but I'm not sure it's the “best” output format, what do you think?

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
In-line docstrings updated.
Documentation updated, tested make html command in the docs/ folder.

Signed-off-by: Lucas Robinet <[email protected]>

ericspod · 2024-10-18T11:21:01Z

Hi @Lucas-rbnt thanks for the effort on this followup PR. @atbenmurray could you please re-review the content here?

atbenmurray · 2024-10-20T09:57:38Z

@Lucas-rbnt @atbenmurray I shall do so

KumoLiu

Thanks for the PR.

In the official masked autoencoder implementation, noise is first generated and then sorted twice using torch.argsort. This rearranges the tokens and identifies which ones are retained, ultimately selecting only a subset of the shuffled indices.
In our implementation, we use torch.multinomial to generate mask indices, followed by simple boolean indexing to manage the sub-selection of patches for encoding and the reordering with mask tokens in the decoder.

As you mentioned here, I wonder have you verified that whether there is a big difference between the two different implementations? Does it have any impact on the final performance? Thanks.

monai/networks/nets/masked_autoencoder_vit.py

tests/test_masked_autoencoder_vit.py

Co-authored-by: YunLiu <[email protected]> Signed-off-by: Lucas Robinet <[email protected]>

Signed-off-by: Lucas Robinet <[email protected]>

monai/networks/nets/masked_autoencoder_vit.py

ericspod · 2024-11-15T18:30:11Z

I think this is fine now though the comments should be looked at the conflict resolved, then we can trigger the blossom tests. Thanks!

Signed-off-by: Lucas Robinet <[email protected]>

KumoLiu · 2024-11-27T10:38:53Z

/build

ericspod

I'm good with this and look forward to an example notebook in Tutorials demonstrating its use!

Lucas-rbnt added 3 commits October 15, 2024 15:49

Implementation of a masked autoencoder for representation learning

8933f84

Signed-off-by: Lucas Robinet <[email protected]>

fix: typo in docs for the masked_autoencoder autoclass

d935cc4

Signed-off-by: Lucas Robinet <[email protected]>

fix: title underline too short

82683c5

Signed-off-by: Lucas Robinet <[email protected]>

ericspod requested review from atbenmurray, Nic-Ma and KumoLiu October 18, 2024 11:20

KumoLiu reviewed Oct 22, 2024

View reviewed changes

monai/networks/nets/masked_autoencoder_vit.py Outdated Show resolved Hide resolved

ericspod reviewed Oct 22, 2024

View reviewed changes

tests/test_masked_autoencoder_vit.py Outdated Show resolved Hide resolved

Lucas-rbnt and others added 2 commits October 23, 2024 10:54

Update monai/networks/nets/masked_autoencoder_vit.py

0eb38c2

Co-authored-by: YunLiu <[email protected]> Signed-off-by: Lucas Robinet <[email protected]>

Refactoring masked_autoencoder_vit test ill_arg function

9f1f4d9

Signed-off-by: Lucas Robinet <[email protected]>

KumoLiu requested a review from ericspod November 15, 2024 16:32

ericspod reviewed Nov 15, 2024

View reviewed changes

monai/networks/nets/masked_autoencoder_vit.py Outdated Show resolved Hide resolved

Lucas-rbnt and others added 4 commits November 19, 2024 11:24

simplification of transformer blocks forward pass

e0a3d8c

Signed-off-by: Lucas Robinet <[email protected]>

Merge branch 'dev' into masked-autoencoder

e3fadba

Signed-off-by: Lucas Robinet <[email protected]>

Update __init__.py

0a724a7

Signed-off-by: Lucas Robinet <[email protected]>

Merge branch 'dev' into masked-autoencoder

3475e67

ericspod approved these changes Nov 27, 2024

View reviewed changes

KumoLiu merged commit 20372f0 into Project-MONAI:dev Nov 27, 2024
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of a Masked Autoencoder for representation learning #8152

Implementation of a Masked Autoencoder for representation learning #8152

Lucas-rbnt commented Oct 15, 2024

ericspod commented Oct 18, 2024

atbenmurray commented Oct 20, 2024

KumoLiu left a comment

ericspod commented Nov 15, 2024

KumoLiu commented Nov 27, 2024

ericspod left a comment

Implementation of a Masked Autoencoder for representation learning #8152

Implementation of a Masked Autoencoder for representation learning #8152

Conversation

Lucas-rbnt commented Oct 15, 2024

Description

Types of changes

ericspod commented Oct 18, 2024

atbenmurray commented Oct 20, 2024

KumoLiu left a comment

Choose a reason for hiding this comment

ericspod commented Nov 15, 2024

KumoLiu commented Nov 27, 2024

ericspod left a comment

Choose a reason for hiding this comment