-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix and enhancements for alternate img2img script for stable diffusion XL #16761
base: dev
Are you sure you want to change the base?
Conversation
Fix with documentation
c479eba
to
64a8f9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
until seeing your PR today I wasn't even aware that we have this script, so I'm not even sure what the script is about
from my testing unfortunately it isn't fixed
I'm not sure what your test condition is like did you perhaps have a different config or something?
but by default
num_classes: sequential
and so will faill assert on self.num_classes is not None
so whatever you can piece together vector
lost the key is lost
PR
without alt img2img
I believe related code in the normal pipeline should be in this section
stable-diffusion-webui/modules/sd_samplers_cfg_denoiser.py
Lines 192 to 255 in 021154d
if shared.sd_model.model.conditioning_key == "crossattn-adm": | |
image_uncond = torch.zeros_like(image_cond) | |
make_condition_dict = lambda c_crossattn, c_adm: {"c_crossattn": [c_crossattn], "c_adm": c_adm} | |
else: | |
image_uncond = image_cond | |
if isinstance(uncond, dict): | |
make_condition_dict = lambda c_crossattn, c_concat: {**c_crossattn, "c_concat": [c_concat]} | |
else: | |
make_condition_dict = lambda c_crossattn, c_concat: {"c_crossattn": [c_crossattn], "c_concat": [c_concat]} | |
if not is_edit_model: | |
x_in = torch.cat([torch.stack([x[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [x]) | |
sigma_in = torch.cat([torch.stack([sigma[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [sigma]) | |
image_cond_in = torch.cat([torch.stack([image_cond[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [image_uncond]) | |
else: | |
x_in = torch.cat([torch.stack([x[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [x] + [x]) | |
sigma_in = torch.cat([torch.stack([sigma[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [sigma] + [sigma]) | |
image_cond_in = torch.cat([torch.stack([image_cond[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [image_uncond] + [torch.zeros_like(self.init_latent)]) | |
denoiser_params = CFGDenoiserParams(x_in, image_cond_in, sigma_in, state.sampling_step, state.sampling_steps, tensor, uncond, self) | |
cfg_denoiser_callback(denoiser_params) | |
x_in = denoiser_params.x | |
image_cond_in = denoiser_params.image_cond | |
sigma_in = denoiser_params.sigma | |
tensor = denoiser_params.text_cond | |
uncond = denoiser_params.text_uncond | |
skip_uncond = False | |
if shared.opts.skip_early_cond != 0. and self.step / self.total_steps <= shared.opts.skip_early_cond: | |
skip_uncond = True | |
self.p.extra_generation_params["Skip Early CFG"] = shared.opts.skip_early_cond | |
elif (self.step % 2 or shared.opts.s_min_uncond_all) and s_min_uncond > 0 and sigma[0] < s_min_uncond and not is_edit_model: | |
skip_uncond = True | |
self.p.extra_generation_params["NGMS"] = s_min_uncond | |
if shared.opts.s_min_uncond_all: | |
self.p.extra_generation_params["NGMS all steps"] = shared.opts.s_min_uncond_all | |
if skip_uncond: | |
x_in = x_in[:-batch_size] | |
sigma_in = sigma_in[:-batch_size] | |
self.padded_cond_uncond = False | |
self.padded_cond_uncond_v0 = False | |
if shared.opts.pad_cond_uncond_v0 and tensor.shape[1] != uncond.shape[1]: | |
tensor, uncond = self.pad_cond_uncond_v0(tensor, uncond) | |
elif shared.opts.pad_cond_uncond and tensor.shape[1] != uncond.shape[1]: | |
tensor, uncond = self.pad_cond_uncond(tensor, uncond) | |
if tensor.shape[1] == uncond.shape[1] or skip_uncond: | |
if is_edit_model: | |
cond_in = catenate_conds([tensor, uncond, uncond]) | |
elif skip_uncond: | |
cond_in = tensor | |
else: | |
cond_in = catenate_conds([tensor, uncond]) | |
if shared.opts.batch_cond_uncond: | |
x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict(cond_in, image_cond_in)) | |
else: | |
x_out = torch.zeros_like(x_in) | |
for batch_offset in range(0, x_out.shape[0], batch_size): | |
a = batch_offset | |
b = a + batch_size | |
x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(subscript_cond(cond_in, a, b), image_cond_in[a:b])) |
which does preserve the vector
Error trace
click to expand
Traceback (most recent call last):
File "B:\GitHub\stable-diffusion-webui\modules\call_queue.py", line 74, in f
res = list(func(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\modules\call_queue.py", line 53, in f
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\modules\call_queue.py", line 37, in f
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\modules\img2img.py", line 240, in img2img
processed = modules.scripts.scripts_img2img.run(p, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\modules\scripts.py", line 781, in run
processed = script.run(p, *script_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\scripts\img2imgalt.py", line 249, in run
processed = processing.process_images(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\modules\processing.py", line 874, in process_images
res = process_images_inner(p)
^^^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 59, in processing_process_images_hijack
return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\modules\processing.py", line 1014, in process_images_inner
samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\scripts\img2imgalt.py", line 223, in sample_extra
rec_noise = find_noise_for_image(p, cond, uncond, cfg, st)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\scripts\img2imgalt.py", line 52, in find_noise_for_image
eps = shared.sd_model.model(x_in * c_in, t, {"crossattn": cond_in["c_crossattn"][0]} )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\venv-311\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\venv-311\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\modules\sd_hijack_utils.py", line 22, in <lambda>
setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\modules\sd_hijack_utils.py", line 34, in __call__
return self.__sub_func(self.__orig_func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\modules\sd_hijack_unet.py", line 50, in apply_model
result = orig_func(self, x_noisy.to(devices.dtype_unet), t.to(devices.dtype_unet), cond, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\wrappers.py", line 28, in forward
return self.diffusion_model(
^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\venv-311\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\venv-311\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\modules\sd_unet.py", line 91, in UNetModel_forward
return original_forward(self, x, timesteps, context, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "B:\GitHub\stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py", line 979, in forward
assert (y is not None) == (
^^^^^^^^^^^^^^^^^^^^
AssertionError: must specify y if and only if the model is class-conditional
Looks like I had modified the forward method of Unet, will update this pr so it does not need this in some time. Having some issues with toesd and controlnet. if False:
assert (y is not None) == (
self.num_classes is not None
), f"must specify y if and only if the model is class-conditional, {y}, {self.num_classes}"
hs = []
t_emb = timestep_embedding(timesteps, self.model_channels, repeat_only=False)
emb = self.time_embed(t_emb)
if y is not None and y.shape[0] == x.shape[0]:
if self.num_classes is not None:
assert y.shape[0] == x.shape[0]
emb = emb + self.label_emb(y) |
9959f54
to
d24fd8f
Compare
Hi, I have been trying to make this script faster and more stable for SDXL and SD1.5 models. After a lot of (~150+ images generated and 20-30 code changes) experimentation with different tweaks. These are the changes I have made so far in this pr to make it as stable as I can.
Examples using
Original image Example continued with settings changed (some tuning)
|
d24fd8f
to
6b6396f
Compare
@arrmansa It looks fine. Can you confirm that it does not break or cause significant changes to the sd1.5 version? Perhaps it might be possible to instead make the SD 1.5 and SDXL two separate scripts considering they do not really take up much UI or storage space anyway. |
I was basically thinking on the same lines |
The only current break is a pre-existing one where controlnets cause a bad result, so you have to first run it with controlnet disabled, then enable it and run it with controlnet enabled.
|
optional vector for sdxl Better functions, better cache Tested everything
I have tested this on more models, sd1.5, sdxl, they seem to work well except for inpaint sdxl. |
@arrmansa As long as old functionality is not removed. I assume you read the wiki entry about the script prior to changing and understand the purpose of it. img2img-alternative-test |
@silveroxides Yes I went through the wiki and underand the script and purpose. |
@silveroxides I think something like this should be added to the wiki Features page if this PR is merged
|
hello how can i download this file i went https://github.com/arrmansa and i cloned his repository and im still getting TypeError: expected Tensor as element 0 in argument 0, but got dict |
@darthmalak1986 You'll need to use the branch https://github.com/arrmansa/stable-diffusion-webui/tree/img2img_alt_sdxl_fix |
This should fix #12381
Hi,
This shouldn't introduce any new bugs and shouldn't break the script for old sd1.5 models.
With this fix, img2img alt is working for me and should work in general.
The machine that I tested this on - Windows 11 - rtx 3080 ti 16 gb, with both sdxl and sd1.5 models, loras and controlnets.
There were no crashes and results were the same as before the fix.
Please let me know if any more changes are needed for this to be merged.
Thanks,
Arrman