Skip to content

Embedding receptive field #67

Answered by srmsoumya
brunosan asked this question in Q&A
Discussion options

You must be logged in to vote

@brunosan your understanding of how the encoder side of MAE works is spot on.

In our modified architecture, we are making two changes for now:

  1. We are splitting the image spatially and channel-wise.
  2. We are adding time and lat/lon information as learnable embeddings. These embeddings provide additional information to the model about when and where a certain feature appears.

Transformers have a concept called the cls token, which is used to capture a generic vector representation of the input space (EO imagery in our case). This idea is borrowed from the BERT paper and is commonly used in Vision Transformers. We can choose to use the embeddings from the cls token, which represents what the…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by brunosan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #62 on December 04, 2023 18:04.