Benchmark on encoder representation as comparison #1

MarvinT · 2018-10-11T23:39:33Z

It would be nice to run a MLP on the encoder representation to compare the representation learned by the unsupervised encoder in comparison to the full CPC model representation.

davidtellez · 2018-10-14T09:37:58Z

I don't fully understand your question, there is just one (1) encoder trained via CPC that can encode the patches. What do you mean by "the unsupervised encoder" and "the full CPC model representation"? Let me summarize what I did just to clarify:

Trained the CPC model to distinguish between number sequences (this step trains the encoder that lives within the CPC model).
Once the CPC model was trained, I read the encoder network only (discarding the rest of the CPC model).
I took the encoder, froze its weights and added an MLP on top of it (see here). Then I trained this encoder+MLP to distinguish numbers. Because it achieved 90%, I concluded that the encoder learned useful features to describe the numbers.

I hope this clarifies a bit your question, please reply if you meant something else. Thanks for dropping by!

MarvinT · 2018-10-14T11:14:30Z

Sorry, I mis-remembered the paper for some reason. I thought the network_encoder or g_enc in the paper was pre trained as a VAE, not that the whole network was trained end to end. I guess I'm interested in the encoder network compared against the features learned by a VAE of similar architecture.

davidtellez · 2018-10-15T07:31:17Z

I see, no problem. In the original paper, they compare CPC with other methods, not VAE though. I have some code for VAE from another project so I might run the experiment you mention if I get some free time. I'll keep you posted.

N-Kingsley · 2018-10-19T10:02:31Z

I want to ask two questions.

Does this code not apply formula (2)(3)?
the calculation of 'loss' does not seem to be NCE in the section 2.2 of the paper?

davidtellez · 2018-10-19T10:50:54Z

Let's focus on equation (3) in section 2.2. It describes how to measure prediction error, and this is what happens:

The context c_t is mapped to a predicted image embedding using a linear layer with parameters W_k: W_k.c_t. Let's call the resulting vector p_{t+k}. This happens in my code here. Note that this can be any function of your choice, but a linear layer is used for simplicity.
The prediction p_{t+k} is compared with the vector embedding of the real image z_{t+k}. This comparison is done via dot product in the formula (that's why z_{t+k} is transposed). This operation produces a high value if both vectors are "similar" and a low value if they are "not similar". Because we get a similarity score for each k, I average the score across the temporal dimension. This happens in my code here.
An exponential operation is applied to the previous similarity score. I use a sigmoid to limit the values of the score to the [0, 1] range here.

At this point, we can measure semantic similarity between our predictions and the actual data. Our data contains two kinds of sequences, actually two labels. Positive labels correspond to sorted sequences and negative labels correspond to non-sorted sequences.

For the positive labels, we want CPC to predict sorted sequences that produce high similarity scores, in our case a 1. For the negative labels, we want CPC to predict non-sorted sequences that produce low similarity scores, in our case a 0.

As they propose in the paper in section 2.3, all we need to do to train CPC is apply binary cross-entropy loss between the similarity scores and the labels, done here.

I hope this helps understanding my implementation. Beware that this is my own interpretation of this paper, which might or might not be completely correct.

N-Kingsley · 2018-10-19T13:53:42Z

Oh, you explained too clearly so that I fully understand. Thank you for your help very much.

N-Kingsley · 2018-10-19T14:47:18Z

By the way, is the equation (4) in section 2.3 the ‘binary_crossentropy’ in your code?

babbu3682 · 2021-03-25T02:03:09Z

Why did you use binary_crossentropy?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark on encoder representation as comparison #1

Benchmark on encoder representation as comparison #1

MarvinT commented Oct 11, 2018

davidtellez commented Oct 14, 2018 •

edited

Loading

MarvinT commented Oct 14, 2018

davidtellez commented Oct 15, 2018

N-Kingsley commented Oct 19, 2018

davidtellez commented Oct 19, 2018

N-Kingsley commented Oct 19, 2018

N-Kingsley commented Oct 19, 2018

babbu3682 commented Mar 25, 2021

Benchmark on encoder representation as comparison #1

Benchmark on encoder representation as comparison #1

Comments

MarvinT commented Oct 11, 2018

davidtellez commented Oct 14, 2018 • edited Loading

MarvinT commented Oct 14, 2018

davidtellez commented Oct 15, 2018

N-Kingsley commented Oct 19, 2018

davidtellez commented Oct 19, 2018

N-Kingsley commented Oct 19, 2018

N-Kingsley commented Oct 19, 2018

babbu3682 commented Mar 25, 2021

davidtellez commented Oct 14, 2018 •

edited

Loading