Questions about inpainting pipeline in MAM-E #32

pedr0sorio · 2024-09-13T07:38:21Z

Hi, very interesting work! Congrats!

I am particularly interested in the inpainting pipeline you report in this paper: MAM-E: Mammographic Synthetic Image Generation with Diffusion Models.

From what is reported and the provided Figure 11, it seems that during training the image example with the lesion is encoded with the VAE along with its masked equivalent, and then stacked with a reshaped version of the mask to serve as input for the UNet. I have some doubts about the effect that this design choice has at inference time.

If the actual ground truth image that we are aiming to generate is provided as an input to the UNet during training, then at inference time there will be a "domain shift" between what the UNet expects as the first channel and what it will actually get. This is because the model has been trained to generate an image with lesion from some lesion masks AND the image with the lesion itself. In contrast, at inference time you are providing an image that DOES NOT have the lesion + some target lesion masks, and expecting to get an image inpainted with a lesion.

I wonder if this was something you thought of. I imagine that removing the actual image with the lesion from the first channel during training could improve performance. Did you experiment with this?

Additional question, in Figure 11 you show that it is the SD's UNet doing the inpainting. Is this UNet pretrained on simple text-to-mammography or trained from scratch on this inpainting objective?

Best,
Pedro

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about inpainting pipeline in MAM-E #32

Questions about inpainting pipeline in MAM-E #32

pedr0sorio commented Sep 13, 2024

Questions about inpainting pipeline in MAM-E #32

Questions about inpainting pipeline in MAM-E #32

Comments

pedr0sorio commented Sep 13, 2024