SemanticAttack/README.md at main · siiyayu/SemanticAttack · GitHub

Implemented Distraction is All You Need: Memory-Efficient Image Immunization against Diffusion-Based Image Editing paper from scratch.

Motivation

Easy to manipulate image content for malicious purposes using diffusion models
Want to protect images from being edited

Existing approaches:

Encoder/decoder attack
Semantic Attack (Distraction is All You Need)
Image Immunization (Diffvax)

Method

The mechanism for immunization is based on attacking the cross-attention layers of a denoising U-Net

Creating mask by averaging cross-attention maps correspondent to a token
Token represents an immunized object
Applying a mask on the image
2 cycles: epochs and diffusion
Calculating loss using the L1 norm of the averaged attention responses for different diffusion steps
Estimating perturbations using the projected gradient descent on the immunized image
Applying the estimated perturbations on the image for each attacking step

Disussion&Conclusion

The absence of code and details of implementation in the paper make it hard to reproduce
The model is quite slow: it takes 20-30 minutes on 1 image using A100 GPU on Colab
It takes 15Gb of memory instead of 12Gb proposed in the paper, field for optimization (or different implementation)
Not zeroing deltas on each diffusion step and clipping them doesn’t seem intuitive and robust
In further experimentation, try to introduce loss for the perturbation