Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the best content and position/anchor query pairs for DETR decoder? #226

Open
smartbarbarian opened this issue Mar 7, 2023 · 3 comments
Assignees

Comments

@smartbarbarian
Copy link

The DN DETR architecture employs static queries, while DINO uses mixed query selection. Later, masked DINO reverted back to using the pure query selection of deformable DETR. In the context of this DETR architecture, is there any further research or explanation on which content and position or anchor query pairs should be used during the decoding process?

@FengLi-ust
Copy link
Collaborator

For detection, using learnable content query could be better. Mask DINO mainly focuses on segmentation that is deeply related to content query, so we use selected content query.

@smartbarbarian
Copy link
Author

May I ask if you have any follow-up research on the topic? For example, content and anchor queries from the encoder, along with some learnable embeddings, can be integrated in a variety of ways.

@smartbarbarian
Copy link
Author

In DAB-DETR, a complex design for anchor queries is used in both self and cross attentions. However, in DINO, you discarded the design and just compared no, pure, and mixed query selections. Could you please explain why this change was made?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants