What are the best content and position/anchor query pairs for DETR decoder? #226

smartbarbarian · 2023-03-07T06:16:13Z

The DN DETR architecture employs static queries, while DINO uses mixed query selection. Later, masked DINO reverted back to using the pure query selection of deformable DETR. In the context of this DETR architecture, is there any further research or explanation on which content and position or anchor query pairs should be used during the decoding process?

FengLi-ust · 2023-03-23T03:26:42Z

For detection, using learnable content query could be better. Mask DINO mainly focuses on segmentation that is deeply related to content query, so we use selected content query.

smartbarbarian · 2023-04-10T12:46:20Z

May I ask if you have any follow-up research on the topic? For example, content and anchor queries from the encoder, along with some learnable embeddings, can be integrated in a variety of ways.

smartbarbarian · 2023-04-11T07:21:27Z

In DAB-DETR, a complex design for anchor queries is used in both self and cross attentions. However, in DINO, you discarded the design and just compared no, pure, and mixed query selections. Could you please explain why this change was made?

rentainhe assigned FengLi-ust Mar 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What are the best content and position/anchor query pairs for DETR decoder? #226

What are the best content and position/anchor query pairs for DETR decoder? #226

smartbarbarian commented Mar 7, 2023

FengLi-ust commented Mar 23, 2023

smartbarbarian commented Apr 10, 2023

smartbarbarian commented Apr 11, 2023

What are the best content and position/anchor query pairs for DETR decoder? #226

What are the best content and position/anchor query pairs for DETR decoder? #226

Comments

smartbarbarian commented Mar 7, 2023

FengLi-ust commented Mar 23, 2023

smartbarbarian commented Apr 10, 2023

smartbarbarian commented Apr 11, 2023