Skip to content

yinboc/infd

Repository files navigation

Image Neural Field Diffusion Models

infd

Official implementation of the paper:

Image Neural Field Diffusion Models
Yinbo Chen, Oliver Wang, Richard Zhang, Eli Shechtman, Xiaolong Wang, Michael Gharbi
CVPR 2024 (Highlight)

Contact [email protected] for any issues about the code.

Environment

conda create -n infd python=3.8 -y
conda activate infd
pip install -r requirements.txt

Training

Below shows an example for training on FFHQ-1024 with 8 GPUs.

Download the FFHQ dataset (images1024x1024.zip). Unzip it and put the image folder as load/ffhq/ffhq_1024.

To visualize with wandb, complete information in wandb.yaml and append -w in running commands.

To train for the FFHQ-6K-Mix setting, append -mix6000 to the yaml config names.

1. Autoencoding stage

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --standalone --nproc-per-node=8 run.py --cfg cfgs/ae_ffhq.yaml

2. Latent diffusion stage

First resize the images for faster loading:

python resize_images.py --input load/ffhq/ffhq_1024 --output load/ffhq/ffhq_lanczos256

Then run:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --standalone --nproc-per-node=8 run.py --cfg cfgs/dm_ffhq.yaml

Custom datasets

To train on custom datasets, use ae_custom.yaml, dm_custom.yaml as cfg and replace root_path in configs with path to the image folder.

Pretrained models

Model Download pFID 256/1K pFID 512/1K pFID 1K/1K
FFHQ Google Drive 7.91 6.73 5.71
FFHQ 6K-Mix Google Drive 10.99 7.52 6.80

Evaluation

1. Generate samples

Can use a single or multiple GPUs. For example, with 2 GPUs:

CUDA_VISIBLE_DEVICES=0,1 python gen_samples.py --model save/dm_ffhq/last-model.pth --n-samples 50000 --batch-size 32 -o save/gen_samples --output-sizes 1024

By default it uses the sampler defined in the model (200 DDIM steps, eta=1, following LDM).

2. Evaluate patch FID

CUDA_VISIBLE_DEVICES=0 python eval_pfid.py --input1 load/ffhq/ffhq_1024 --input2 save/gen_samples/1024

Citation

@inproceedings{chen2024image,
  title={Image Neural Field Diffusion Models},
  author={Chen, Yinbo and Wang, Oliver and Zhang, Richard and Shechtman, Eli and Wang, Xiaolong and Gharbi, Michael},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={8007--8017},
  year={2024}
}

Releases

No releases published

Packages

No packages published

Languages