You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you for your excellent work on visualizing attention maps for DiT (Diffusion Transformer). I am currently extending your approach to visualize attention maps for video-based DiT models.
While going through the source code, I encountered the following snippet:
I understand that the register_cross_attention_hook function is used to define a hook to capture the attention map during the forward pass. However, I am confused about the necessity of the second line, replace_call_method_for_sd3.
From my understanding, the second line replaces the forward method for SD3Transformer2DModel and its submodules. However, I noticed that the code does not seem to define a custom forward process for SD3Transformer2DModel, and it appears that the original attention computation is already sufficient.
Could you please explain:
Why is replace_call_method_for_sd3 necessary in this context?
If the forward process is not altered, what specific purpose does this replacement serve?
Any clarification or suggestions on this would be greatly appreciated. Thank you again for your work and support!
Best regards,
The text was updated successfully, but these errors were encountered:
Hello,
First of all, thank you for your excellent work on visualizing attention maps for DiT (Diffusion Transformer). I am currently extending your approach to visualize attention maps for video-based DiT models.
While going through the source code, I encountered the following snippet:
I understand that the
register_cross_attention_hook
function is used to define a hook to capture the attention map during the forward pass. However, I am confused about the necessity of the second line,replace_call_method_for_sd3
.From my understanding, the second line replaces the
forward
method forSD3Transformer2DModel
and its submodules. However, I noticed that the code does not seem to define a custom forward process forSD3Transformer2DModel
, and it appears that the original attention computation is already sufficient.Could you please explain:
replace_call_method_for_sd3
necessary in this context?Any clarification or suggestions on this would be greatly appreciated. Thank you again for your work and support!
Best regards,
The text was updated successfully, but these errors were encountered: