Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDXL Visual Sliders training new concept has no effect at all #95

Open
Moonlight63 opened this issue May 19, 2024 · 1 comment
Open

Comments

@Moonlight63
Copy link

Moonlight63 commented May 19, 2024

I've been reading through some of the other issues here trying to learn what I can about how this works, and the most helpful comments I have seen so far is to simply adjust LR for bad generations, and someone pointing to a copier LoRa method that I hadn't seen before. I had a though experiment about wanting to train a lora for a concept/facial feature that a base model wouldn't have a pre-existing reference for. I decided I want to try to generate a different looking nose for generic faces for making non-human characters that are consistent and controllable. Visual sliders seemed perfect because I could inpaint an original image to get the pairs, then train on the difference. I tried this, but the resulting lora seems to have zero effect at all. Wondering why. It looks like I am not the only one having this issue: #60
But that slider is relying on text prompting it looks like.

I tried creating models both with and without adding a prompt, eg:

- target: "" # what word for erasing the positive concept from
  positive: "" # concept to erase
  unconditional: "" # word to take the difference from the positive concept
  neutral: "" # starting point for conditioning the target
  action: "enhance" # erase or enhance
  guidance_scale: 4
  resolution: 2048
  dynamic_resolution: false
  batch_size: 1

and:

- target: "nose" # what word for erasing the positive concept from
  positive: "nose, flat" # concept to erase
  unconditional: "nose" # word to take the difference from the positive concept
  neutral: "nose" # starting point for conditioning the target
  action: "enhance" # erase or enhance
  guidance_scale: 4
  resolution: 2048
  dynamic_resolution: false
  batch_size: 1

And here are the config params (basically all default, but tried more training steps after it didn't work, also saw a recommendation somewhere to use full attention if it was a difficult concept):

prompts_file: "trainscripts/imagesliders/data/prompts-xl.yaml"
pretrained_model:
  name_or_path: "stabilityai/stable-diffusion-xl-base-1.0" # you can also use .ckpt or .safetensors models
  v2: false # true if model is v2.x
  v_pred: false # true if model uses v-prediction
network:
  type: "c3lier" # or "c3lier" or "lierla"
  rank: 4
  alpha: 1.0
  training_method: "full" #xattn, noxattn, full
train:
  precision: "bfloat16"
  noise_scheduler: "ddim" # or "ddpm", "lms", "euler_a"
  iterations: 5000
  lr: 0.0002
  optimizer: "AdamW"
  lr_scheduler: "constant"
  max_denoising_steps: 50
save:
  name: "temp"
  path: "./models"
  per_steps: 500
  precision: "bfloat16"
logging:
  use_wandb: false
  verbose: false
other:
  use_xformers: true

I set the resolution to 2048 because thats what my training images are.
I created base images, then inpainted the noses with a lora of Voldemort.
My theroy is that if I can get this to work, I can create any kind of facial features, or really any concept, in 3D, and transfer them to SDXL models.

Here are 2 of the pairs of training images:
00290-2468369899
00196-1959257638
00040-3834321717
00154-778726337

I used dynamic prompts to create a few hundred random images with different age, eye size/color, hair color, skin tones, male/female, backgrounds, and distance from camera. Then I picked a few of the best ones and inpainted them.

The result is.... nothing.
portrait photo of a blonde woman:
00040-999923733

portrait photo of a blonde woman <lora:flatnose3_alpha1.0_rank4_full_last:1>:
00039-999923733

I have also tried large swings in the strength of the lora with no change.

This is pretty confusing as I would expect the lora to have some effect after training on something, but nothing happened. I've tried different settings, nothing I have done is working. I am starting to wonder if the lora itself is bugged.

I should mention that for generation, I dropped the lora into A1111. Nothing else. I've seen some people mentioning using an extension to keyframe strength over steps, but since I am getting no change at all, probably not going to help.

I haven't tried using lierla for the network, or changing the noise scheduler, I usually use Euler in A1111 for generation, but I don't know if that makes a difference.

Is anyone else have problems training this way? Any pointers? I am interested in trying other visual concepts.

EDIT:
Here is the command I am using for training btw. The README left a few things out, but I think I did this correctly, but with things not working maybe someone can sanity check me:

python trainscripts/imagesliders/train_lora-scale-xl.py --name 'flatnose3' --rank 4 --alpha 1 --config_file 'trainscripts/imagesliders/data/config-xl.yaml' --folder_main './trainscripts/imagesliders/nose/' --folders="smallsize, bigsize" --scales="-1, 1"

The readme says to create 2 folders, smallsize and bigsize, under the 'folder_main'. I then figured out that at some point that must have changed to allow for in between values, so the folder names don't matter. So I set the folders smallsize and bigsize to -1 and 1. All of the base 'regular nose' images are in the 'smallsize' folder, and all of the 'flat nose' images are in the bigsize folder.

@rohitgandikota
Copy link
Owner

Hi @Moonlight63 - thanks for the details you provided.

The entire setup process looks good to me. The one main thing I would try different is to set the resolution parameter to 512 and see. I understand that your training images re 2048 resolution, but we noticed in our experiments that training sliders on a lower resolution (lower than the model's default) helps a lot.

Let me know if that helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants