Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy in Logit Scales between clip_logits and cache_logits #28

Open
Aikoin opened this issue Jul 25, 2024 · 2 comments
Open

Discrepancy in Logit Scales between clip_logits and cache_logits #28

Aikoin opened this issue Jul 25, 2024 · 2 comments

Comments

@Aikoin
Copy link

Aikoin commented Jul 25, 2024

Hello,

I have a question regarding the scaling and balance between CLIP logits and cache logits in the tip-adapter implementation. Specifically, I'm looking at the following code:

  1. clip_logits = 100. * val_features @ clip_weights

Here, val_features and clip_weights are L2-normalized vectors. The resulting clip_logits has a range of [-100, 100] due to the 100x scaling factor.

  1. cache_logits = ((-1) * (beta - beta * affinity)).exp() @ cache_values
    affinity = val_features @ cache_keys

val_features and cache_keys are also L2-normalized. The affinity values range from [-1, 1].
The expression - (beta - beta * affinity) leads to a range of [-2*beta, 0], which is then exponentiated, yielding values in the range (0, 1].

The primary concern is that clip_logits and cache_logits are not on the same scale. clip_logits ranges between [-100, 100], while adapter A is mostly in (0, 1]. This discrepancy might affect the effective fusion of these logits in the model, as seen in tip_logits = clip_logits + cache_logits * alpha.

Given that alpha is typically a single-digit number, I'm wondering if this difference in scale is intended or if there might be a need for additional scaling or normalization to align these logits more effectively? Any insights or suggestions would be greatly appreciated.

Thank you for your time and assistance!

@caoziyang1997
Copy link

can you give me imagenet link? I do not how to download it. Thank you!

@erjui
Copy link

erjui commented Nov 28, 2024

I'm also curious how the logit scale between clip_logits and cache_logits is well mitigated.
From the dataset I'm dealing with now, it seems the actual value of clip_logits spans around 15~20, whereas the value of cache_logits is between 0 and 1 theoretically stated by @Aikoin , which means the scale of each logit actually differs.

Thanks in advance! 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants