Skip to content

Latest commit

 

History

History
100 lines (73 loc) · 3.89 KB

README.md

File metadata and controls

100 lines (73 loc) · 3.89 KB

Latent Video Diffusion Model

The Latent Video Diffusion Model (LVDM) is responsible for animating the latent representation of a heart generated by the Latent Image Diffusion Model (LIDM). The LVDM is trained on the VAE-encoded videos. We condition it on an encoded frame and an ejection fraction score, and train it to reconstruct the video corresponding to that frame and ejection fraction.

During inference, it can animate any heart, real or synthetic, by conditioning on the latent representation of the heart and the desired ejection fraction.

1. Activate the environment

First, activate the echosyn environment.

conda activate echosyn

2. Data preparation

Follow the instruction in the Data preparation to prepare the data for training. Here, you need the VAE-encoded videos.

3. Train the LVDM

Once the environment is set up and the data is ready, you can train the LVDM with the following command:

python echosyn/lvdm/train.py --config echosyn/lvdm/configs/default.yaml

or this one, for multi-gpu training:

accelerate launch \
    --num_processes 8 \
    --multi_gpu \
    --mixed_precision fp16 \
    echosyn/lvdm/train.py \
    --config echosyn/lvdm/configs/default.yaml

Note that we train a single model for all datasets. The model is conditioned on the frames, so the same model can generate videos for any dataset.

4. Sample from the LVDM

Once the LVDM is trained, you can sample from it with the following command:

python echosyn/lvdm/sample.py \
    --config echosyn/lvdm/configs/default.yaml \
    --unet experiments/lvdm/checkpoint-500000/unet_ema \
    --vae models/vae \
    --conditioning samples/lidm_dynamic/privacy_compliant_latents \
    --output samples/lvdm_dynamic \
    --num_samples 2048 \
    --batch_size 8 \
    --num_steps 64 \
    --min_lvef 10 \
    --max_lvef 90 \
    --save_as avi,jpg \
    --frames 192

This will generate 2048 videos of 192 frames, conditioned on the latent synthetic and privacy-compliant representation of the heart and uniformaly sampled ejection fraction scores. The videos will be saved in the samples/lvdm_dynamic/avi directory.

5. Evaluate the LVDM

To evaluate the LIDMs, we use the FID, FVD16, FVD128 and IS scores. To do so, we need to generate 2048 videos with 192 frames for the dynamic dataset or 128 frames for the two pediatric datasets. The outputs MUST be in jpg format. The samples are compared to the real samples, which are generated in the Data preparation step.

Then, to evaluate the synthetic videos, run the following commands:

cd external/stylegan-v

python src/scripts/calc_metrics_for_dataset.py \
    --real_data_path ../../data/reference/dynamic \
    --fake_data_path ../../samples/lvdm_dynamic/jpg \
    --mirror 0 --gpus 1 --resolution 112 \
    --metrics fvd2048_16f,fvd2048_128f,fid50k_full,is50k >> "../../samples/lvdm_dynamic/metrics.txt"

python src/scripts/calc_metrics_for_dataset.py \
    --real_data_path ../../data/reference/ped_a4c \
    --fake_data_path ../../samples/lvdm_ped_a4c/jpg \
    --mirror 0 --gpus 1 --resolution 112 \
    --metrics fvd2048_16f,fvd2048_128f,fid50k_full,is50k >> "../../samples/lvdm_ped_a4c/metrics.txt"

python src/scripts/calc_metrics_for_dataset.py \
    --real_data_path ../../data/reference/ped_psax \
    --fake_data_path ../../samples/lvdm_ped_psax/jpg \
    --mirror 0 --gpus 1 --resolution 112 \
    --metrics fvd2048_16f,fvd2048_128f,fid50k_full,is50k >> "../../samples/lvdm_ped_psax/metrics.txt"

6. Save the LVDM for later use

Once the LVDM is trained, you can save it for later use with the following command:

mkdir -p models/lvdm; cp -r experiments/lvdm/checkpoint-500000/unet_ema/* models/lvdm/; cp experiments/lvdm/config.yaml models/lvdm/

This will save the selected ema version of the model, ready to be loaded in any other script as a standalone model.