-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run TIP-adapter on text2img retrieval instead #8
Comments
Thanks for your interest! |
@ZrrSkywalker thanks for the reply. Yea, I did a work-around by using the text to cache value affinity, and retrieve the corresponding images associated with the one-hot labels in the dataset. Works pretty neat and fine! I have another question though. Would it be possible for the cache model to have an unbalance K-shot images? In some of the real-world usage, we would get varying k-shots as exemplar images for training. Just wondering how would we build the cache-key matrix with differing K-values for the different classes? |
That is a quite insightful question. I tried on some datasets with varying K for different categories. Generally, a larger K leads to higher classification accuracy for the corresponding category. This can also be used to tackle some long-tail issues, e.g., setting larger K for the sample-insufficient categories to balance the learned network. |
Hi, thanks for the amazing work on adapters on CLIP. Currently the framework computes the affinities between the test query image and the cache keys, before obtaining the corresponding few-shot label. This works well and good. I would just like your advise on how can i extend this to text2img retrieval where I would like to query with text search term, and utilise the cache key-value adapter to return corresponding images. Would it be as naive as to do a text to text embedding affinity matching of the query text with the cache VALUES (instead of keys) as they contain the ground truth labels for the few-shot learning?
The text was updated successfully, but these errors were encountered: