Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The advantage 👍 of this network arch #3

Open
TabuaTambalam opened this issue Aug 29, 2022 · 1 comment
Open

The advantage 👍 of this network arch #3

TabuaTambalam opened this issue Aug 29, 2022 · 1 comment

Comments

@TabuaTambalam
Copy link

TabuaTambalam commented Aug 29, 2022

Seems more vram efficient than original LDM/SD,
On colab freetier T4, this can work with [1,4,104,112] latent (832x896 image) without cuda OOM,
while the original can only work with [1,4,88,96] (704x768). Both under fp16.

The issues I encountered are:
Without re-train, clip_proj is empty, and image_embed seems must be None. (otherwise some conv error.)
So is it possible to use image_embed without re-train?

Orig LDM/SD has 6 other samplers from k-diffusion. You can see the minimal (zero extra dependency) ripoff of k-diffusion on my notebook:
https://github.com/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_SR_jited.ipynb
(My ripoff also get sigmas_karras and eta (ddim_eta) works unlike all other k-diffusion copypastas.)

Will this network arch get more samplers than plms&ddim in the future?

Also did you try JIT (torch.jit.trace()) on this network arch? JIT can help checking is there some weird pythonic things in the code.
I followed Ailia's instructions axinc-ai/ailia-models#830 ,
turned Orig LDM/SD into jit (the notebook above is it), wonder if this arch can also be JIT'd.

@Jack000
Copy link
Owner

Jack000 commented Sep 1, 2022

clip_proj should be removed. It was meant to project a (single) clip embedding to the DDPM timestep embedding dimension, to replicate GLIDE which was the original goal of this project. Stable diffusion doesn't use a clip embedding but instead the 77 token embeddings from the clip text encoder, so this key is not needed anymore.

image_embed is used to give the unet an image for conditioning (for inpainting or upscaling). For normal use it should be set to None.

the Latent diffusion code uses the OpenAI unet code directly, with only slight modifications. That's why it was so easy to just patch the original OpenAI repo to use the LDM models. In theory they should be identical, I'm not sure what could cause differences in performance.

I actually have no idea about other samplers, I'll have to look into it.

anyways, I'll update the code soon to resolve some of these issues. Currently busy training some new models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants