Implemented Distraction is All You Need: Memory-Efficient Image Immunization against Diffusion-Based Image Editing paper from scratch.
- Easy to manipulate image content for malicious purposes using diffusion models
- Want to protect images from being edited
Existing approaches:
- Encoder/decoder attack
- Semantic Attack (Distraction is All You Need)
- Image Immunization (Diffvax)
The mechanism for immunization is based on attacking the cross-attention layers of a denoising U-Net
- Creating mask by averaging cross-attention maps correspondent to a token
- Token represents an immunized object
- Applying a mask on the image
- 2 cycles: epochs and diffusion
- Calculating loss using the L1 norm of the averaged attention responses for different diffusion steps
- Estimating perturbations using the projected gradient descent on the immunized image
- Applying the estimated perturbations on the image for each attacking step
- The absence of code and details of implementation in the paper make it hard to reproduce
- The model is quite slow: it takes 20-30 minutes on 1 image using A100 GPU on Colab
- It takes 15Gb of memory instead of 12Gb proposed in the paper, field for optimization (or different implementation)
- Not zeroing deltas on each diffusion step and clipping them doesn’t seem intuitive and robust
- In further experimentation, try to introduce loss for the perturbation