You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been reading through some of the other issues here trying to learn what I can about how this works, and the most helpful comments I have seen so far is to simply adjust LR for bad generations, and someone pointing to a copier LoRa method that I hadn't seen before. I had a though experiment about wanting to train a lora for a concept/facial feature that a base model wouldn't have a pre-existing reference for. I decided I want to try to generate a different looking nose for generic faces for making non-human characters that are consistent and controllable. Visual sliders seemed perfect because I could inpaint an original image to get the pairs, then train on the difference. I tried this, but the resulting lora seems to have zero effect at all. Wondering why. It looks like I am not the only one having this issue: #60
But that slider is relying on text prompting it looks like.
I tried creating models both with and without adding a prompt, eg:
- target: "" # what word for erasing the positive concept from
positive: "" # concept to erase
unconditional: "" # word to take the difference from the positive concept
neutral: "" # starting point for conditioning the target
action: "enhance" # erase or enhance
guidance_scale: 4
resolution: 2048
dynamic_resolution: false
batch_size: 1
and:
- target: "nose" # what word for erasing the positive concept from
positive: "nose, flat" # concept to erase
unconditional: "nose" # word to take the difference from the positive concept
neutral: "nose" # starting point for conditioning the target
action: "enhance" # erase or enhance
guidance_scale: 4
resolution: 2048
dynamic_resolution: false
batch_size: 1
And here are the config params (basically all default, but tried more training steps after it didn't work, also saw a recommendation somewhere to use full attention if it was a difficult concept):
prompts_file: "trainscripts/imagesliders/data/prompts-xl.yaml"
pretrained_model:
name_or_path: "stabilityai/stable-diffusion-xl-base-1.0" # you can also use .ckpt or .safetensors models
v2: false # true if model is v2.x
v_pred: false # true if model uses v-prediction
network:
type: "c3lier" # or "c3lier" or "lierla"
rank: 4
alpha: 1.0
training_method: "full" #xattn, noxattn, full
train:
precision: "bfloat16"
noise_scheduler: "ddim" # or "ddpm", "lms", "euler_a"
iterations: 5000
lr: 0.0002
optimizer: "AdamW"
lr_scheduler: "constant"
max_denoising_steps: 50
save:
name: "temp"
path: "./models"
per_steps: 500
precision: "bfloat16"
logging:
use_wandb: false
verbose: false
other:
use_xformers: true
I set the resolution to 2048 because thats what my training images are.
I created base images, then inpainted the noses with a lora of Voldemort.
My theroy is that if I can get this to work, I can create any kind of facial features, or really any concept, in 3D, and transfer them to SDXL models.
Here are 2 of the pairs of training images:
I used dynamic prompts to create a few hundred random images with different age, eye size/color, hair color, skin tones, male/female, backgrounds, and distance from camera. Then I picked a few of the best ones and inpainted them.
The result is.... nothing.
portrait photo of a blonde woman:
portrait photo of a blonde woman <lora:flatnose3_alpha1.0_rank4_full_last:1>:
I have also tried large swings in the strength of the lora with no change.
This is pretty confusing as I would expect the lora to have some effect after training on something, but nothing happened. I've tried different settings, nothing I have done is working. I am starting to wonder if the lora itself is bugged.
I should mention that for generation, I dropped the lora into A1111. Nothing else. I've seen some people mentioning using an extension to keyframe strength over steps, but since I am getting no change at all, probably not going to help.
I haven't tried using lierla for the network, or changing the noise scheduler, I usually use Euler in A1111 for generation, but I don't know if that makes a difference.
Is anyone else have problems training this way? Any pointers? I am interested in trying other visual concepts.
EDIT:
Here is the command I am using for training btw. The README left a few things out, but I think I did this correctly, but with things not working maybe someone can sanity check me:
The readme says to create 2 folders, smallsize and bigsize, under the 'folder_main'. I then figured out that at some point that must have changed to allow for in between values, so the folder names don't matter. So I set the folders smallsize and bigsize to -1 and 1. All of the base 'regular nose' images are in the 'smallsize' folder, and all of the 'flat nose' images are in the bigsize folder.
The text was updated successfully, but these errors were encountered:
Hi @Moonlight63 - thanks for the details you provided.
The entire setup process looks good to me. The one main thing I would try different is to set the resolution parameter to 512 and see. I understand that your training images re 2048 resolution, but we noticed in our experiments that training sliders on a lower resolution (lower than the model's default) helps a lot.
I've been reading through some of the other issues here trying to learn what I can about how this works, and the most helpful comments I have seen so far is to simply adjust LR for bad generations, and someone pointing to a copier LoRa method that I hadn't seen before. I had a though experiment about wanting to train a lora for a concept/facial feature that a base model wouldn't have a pre-existing reference for. I decided I want to try to generate a different looking nose for generic faces for making non-human characters that are consistent and controllable. Visual sliders seemed perfect because I could inpaint an original image to get the pairs, then train on the difference. I tried this, but the resulting lora seems to have zero effect at all. Wondering why. It looks like I am not the only one having this issue: #60
But that slider is relying on text prompting it looks like.
I tried creating models both with and without adding a prompt, eg:
and:
And here are the config params (basically all default, but tried more training steps after it didn't work, also saw a recommendation somewhere to use full attention if it was a difficult concept):
I set the resolution to 2048 because thats what my training images are.
I created base images, then inpainted the noses with a lora of Voldemort.
My theroy is that if I can get this to work, I can create any kind of facial features, or really any concept, in 3D, and transfer them to SDXL models.
Here are 2 of the pairs of training images:
I used dynamic prompts to create a few hundred random images with different age, eye size/color, hair color, skin tones, male/female, backgrounds, and distance from camera. Then I picked a few of the best ones and inpainted them.
The result is.... nothing.
portrait photo of a blonde woman:
portrait photo of a blonde woman <lora:flatnose3_alpha1.0_rank4_full_last:1>:
I have also tried large swings in the strength of the lora with no change.
This is pretty confusing as I would expect the lora to have some effect after training on something, but nothing happened. I've tried different settings, nothing I have done is working. I am starting to wonder if the lora itself is bugged.
I should mention that for generation, I dropped the lora into A1111. Nothing else. I've seen some people mentioning using an extension to keyframe strength over steps, but since I am getting no change at all, probably not going to help.
I haven't tried using lierla for the network, or changing the noise scheduler, I usually use Euler in A1111 for generation, but I don't know if that makes a difference.
Is anyone else have problems training this way? Any pointers? I am interested in trying other visual concepts.
EDIT:
Here is the command I am using for training btw. The README left a few things out, but I think I did this correctly, but with things not working maybe someone can sanity check me:
The readme says to create 2 folders, smallsize and bigsize, under the 'folder_main'. I then figured out that at some point that must have changed to allow for in between values, so the folder names don't matter. So I set the folders smallsize and bigsize to -1 and 1. All of the base 'regular nose' images are in the 'smallsize' folder, and all of the 'flat nose' images are in the bigsize folder.
The text was updated successfully, but these errors were encountered: