Questions about the hw-modulated attention in DAB-DETR #193

Artificial-Inability · 2023-02-01T03:52:46Z

I have two questions about hw-modulated attention equation (Eq.(6) in DAB-DETR):

Why use 1/wq and 1/hq instead of wq and hq? Does that mean an anchor with larger width will result in a narrower shape attention map in x direction?
DAB-DETR has already updated the 4D anchor in each decoder layer using the embedding of last laryer through MLP, why we still need wref and href which are also generated using the embedding of last layer through MLP? Is that necessary?

SlongLiu · 2023-02-01T05:11:50Z

1/wq can make sure that attention maps have a similar shape as the anchor boxes. For example, a large w can result in a flatten attention map in the x direction under the 1/wq formulation. We provide some visualizations in our paper.
href and wref are designed to keep the same dimension with hq and wq. It helps to the final performance.

Artificial-Inability · 2023-02-01T05:51:13Z

1/wq can make sure that attention maps have a similar shape as the anchor boxes. For example, a large w can result in a flatten attention map in the x direction under the 1/wq formulation. We provide some visualizations in our paper.

Could you give a more detailed explanation about how this works? My personal understanding of the "H=1, W=3" in Figure 6 of the DAB paper is that "href/hq = 1, wref/wq = 3", in which larger wq will lead to smaller W. If I misunderstood something, what is the definition of H and W in Figure6? Thanks.

SlongLiu · 2023-02-02T15:07:39Z

1/wq can make sure that attention maps have a similar shape as the anchor boxes. For example, a large w can result in a flatten attention map in the x direction under the 1/wq formulation. We provide some visualizations in our paper.

Could you give a more detailed explanation about how this works? My personal understanding of the "H=1, W=3" in Figure 6 of the DAB paper is that "href/hq = 1, wref/wq = 3", in which larger wq will lead to smaller W. If I misunderstood something, what is the definition of H and W in Figure6? Thanks.

The results in Fig 6 are examples. "H=1, W=3" means hq =1, wq = 3. We suppose the href and wref are 1.

Artificial-Inability · 2023-02-03T02:29:46Z

1/wq can make sure that attention maps have a similar shape as the anchor boxes. For example, a large w can result in a flatten attention map in the x direction under the 1/wq formulation. We provide some visualizations in our paper.

Could you give a more detailed explanation about how this works? My personal understanding of the "H=1, W=3" in Figure 6 of the DAB paper is that "href/hq = 1, wref/wq = 3", in which larger wq will lead to smaller W. If I misunderstood something, what is the definition of H and W in Figure6? Thanks.

The results in Fig 6 are examples. "H=1, W=3" means hq =1, wq = 3. We suppose the href and wref are 1.

I couldn't understand this phenomenon theoretically. If the origin value of attention map at a fix point is calculated by (PE(x)*PE(xref)wref/wq + ... When we increase wq to wq'=3wq, the new value should decrease, which will result in a narrower shape attention map. Could you explain why larger wq leads to wider atten map with the formulation theoretically? Thanks.

rentainhe assigned SlongLiu Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the hw-modulated attention in DAB-DETR #193

Questions about the hw-modulated attention in DAB-DETR #193

Artificial-Inability commented Feb 1, 2023

SlongLiu commented Feb 1, 2023

Artificial-Inability commented Feb 1, 2023

SlongLiu commented Feb 2, 2023 •

edited

Loading

Artificial-Inability commented Feb 3, 2023

Questions about the hw-modulated attention in DAB-DETR #193

Questions about the hw-modulated attention in DAB-DETR #193

Comments

Artificial-Inability commented Feb 1, 2023

SlongLiu commented Feb 1, 2023

Artificial-Inability commented Feb 1, 2023

SlongLiu commented Feb 2, 2023 • edited Loading

Artificial-Inability commented Feb 3, 2023

SlongLiu commented Feb 2, 2023 •

edited

Loading