This is the quick walkthrough for the DirectedDiffusion.
- Pull the repo and go to the root directory. E.g.,
git clone [email protected]:gladiator8072/conditioned-diffusion.git && cd conditioned-diffusion
- Make sure you expose the graphics card in the shell. E.g.,
export CUDA_VISIBLE_DEVICES=”0,1”
- We use diffusers version 0.5.1 in our implementation. E.g.,
pip3 install diffusers==0.5.1
- Install diffusion/clip model
The default paths of the models in the executable program are
- CLIP: “assets/models/clip-vit-large-patch14”
- Diffusion: “assets/models/stable-diffusion-v1-4”
You could either follow the guide to install the models through scripts/install-model.sh as described in the repo README. Or install them randomly, However, you need then to give the model path to the flag -dp and -cp. E.g.,
-dp /path/to/diffusion -cp /path/to/clip
- Try to run the DDCmd command help to see if you get the help description.
If so then it means all core modules and libraries are imported correctly.
python ./bin/DDCmd.py –help
Here is the command for all the essential materials to run DirectedDiffusion
python ./bin/DDCmd.py \
-roi 0.0,0.5,0.0,1.0 # fraction tuple
-ei 1,2 # associated prompt indices
-nt 20 # number of trailing attention
-s 1.0 # noise level
-ns 5 # number of Attention Editing steps
-ds 50 # number of diffusion steps (default: 50)
-m # image annotation (optional, useful for debugging)
-p "A tiger sitting a on car" # prompt
-n "Your note" # your note
-dp "/path/to/diffusion/model/folder" # default: assets/models/clip-vit-large-patch14
-cp "/path/to/clip/model/folder" # default: assets/models/stable-diffusion-v1-4
-f "/your/output/folder" # experiment output folder
python ./bin/DDCmd.py \
-roi 0.0,0.5,0.0,1.0 # \mathbf{r} = {r_left, r_right, r_top, r_bottom}
-ei 1,2 # \mathcal{I}
-nt 20 # will be turned into E.g., \mathcal{T} = {|P|+1,...,|P|+T}
-s 1.0 # c_g
-ns 5 # N
-ds 50 # T
The parameters of the associated regions are the region of the interests (-roi, \mathcal{B}), prompt edited indices (-ei, \mathcal{I}), and the number trailing attentions (-nt). Note that the trailing attention indices \mathcal{T} is converted from the user input -nt via \mathcal{T} = {|P|+1,…,|P|+T}. The length of the sets must be the same.
Note that more regions may break the image synthesis as we pointed in our limitation section. In our experience, one, and two regions are the recommendations.
E.g., The prompt: A dog sitting on the chair
If it is single region
-roi 0.0,0.5,0.0,1.0 # left part of the image
-ei 1,2 # representing "A" "dog"
-nt 5 # 5 trailings thus \mathcal{T} = \{7,8,9,10,11\}
-s 1.0 # gaussian amplifying
python ./bin/DDCmd.py -roi 0.5,1.0,0.0,0.5 -ei 1,2,3 -nt 10 -s 2.0 -ns 15 -p "A yellow car on a bridge" -m
If it is multiple regions
-roi 0.0,0.5,0.0,1.0 0.5,1.0,0.0,1.0 # left and right part of the image
-ei 1,2 5,6 # representing the indices of "A" "dog" region, and "the" "chair"
-nt 5 5 # 5 trailings thus \mathcal{T} = \{7,8,9,10,11\}
-s 1.0 1.0 # gaussian amplifying
python ./bin/DDCmd.py -roi 0.4,0.7,0.0,0.5 0.4,0.7,0.5,1.0 -ei 2,3 6,7 -nt 10,10 -s 1.0,1.0 -ns 10 -p "A red cube above a blue sphere" --seed 2483964026821236 -m
We provide the grid search on parameters -nt, -s, -ns to boost the user experience. DDCmd.py will run all the combination of those specified list of parameters.
E.g., the following command will generate all combination of -nt 5 10 20 -ns 5 10 -s 2.5, and thus 6 experiments will be saved in a timestamped folder.
python ./bin/SdEditorCmd.py -roi 0.5,1.0,0.0,0.5 -ei 1,2,3 -nt 5 10 20 -ns 5 10 -s 2.5 -p “A yellow car running on a bridge” -m
We also provide a lazy way to grid search with built-in parameter list by specifying -l1, or -l2 flag
python ./bin/SdEditorCmd.py -roi 0.5,1.0,0.0,0.5 -ei 1,2,3 -p “A yellow car running on a bridge” -m -l2
We also make a simple script to run our program in the file bin/DDCmdMain.py so one can easily edit our code with different purposes.
DD has ported in the Gradio UI for your better experience. Mostly the arguments of sliders, textfields are migrated from the commands DDCmd.py on the Web App. The gallery output is useful if you want to compare things when doing grid search.
Note that the grid search is based on all the combinition of the slider parameters: DDSteps = [5, 10], GaussianCoef=[1.0, 1.5, 2.5], and Trailings = [10, 20, 30, 40]. Thus there are 24 experiments in each run and it will take a bit time to finish all of them. (Please leave any suggestion on this feature. Thank You!)
To run the Gradio app, please do “python bin/DDGradio.py” on the terminal.