-
-
Notifications
You must be signed in to change notification settings - Fork 454
Control HowTo
ControlNet, from its name, can "control" the Diffusion model you are using and force it to represent images in a specific way depending on the information from an additional model supplied by the user.
These models can influence the image in different ways: for example, you can pose people or characters, draw a scene basing on spatial information (how far is a person from a car behind, for example), on pre-existing lineart, and so on. They work by supplying a "control image" (which can be pre existing, or created on the fly before generation) which represents in a special way the information the ControlNet model needs.
Note
ControlNet models can be large
Using ControlNet decreases generation speed and increases resource usage, especially GPU VRAM
Choosing the model is important because different models can influence in different ways the image you are going to generate (strongly or weakly).
Depending on the model you're using (SD 1.5, SDXL, SD 3.5, Flux...) there are different models available.
Here we'll cover the commonly used ones:
-
Openpose
This is used to pose the people or characters in an image, and uses "stick-like" images that represent eyes, nose, ears, and limbs (and occasionally fingers). It's a widely used model for poses.
Effect on composition: Light to medium -
Depth
This model uses depth maps, that is images that show how close / far items and people are based on the intensity of a grey color scale (white: closest, black: farthest) in a specific scene. It can be used to replicate settings, or to ensure items and people are placed appropriately. It has a very strong impact on the composition.
Effect on composition: Strong -
Lineart
This model uses pre-existing lineart to guide the generation of an image (imagine, for example, a lineart representation of a café). It can use lineart generated from photorealistic images, or from anime-like images. It has the weakest impact on the image composition.
Effect on composition: Weak -
Canny
The Canny model uses control images that have gone through a process of "edge detection" (that is, identifying edges and contours in an image) and are represented as something close to lineart. It can be used effectively to represent scenes where lineart would be too complex, or where lineart isn't really possible to use. The effect on the composition depends on the model used. SD 1.5 models have a strong impact, while SDXL based ones tend to be weaker.
Effect on composition: Variable (from weak to strong) -
Segmentation
This model uses control images that have been "segmented" (a technique from computer vision that identifies distinct elements in an image), to identify the various elements in an image by different colors: for example, people are shown as red silhouetters, buildings as light blue, and so on (from about 50 to 100 distinct colors are used). It's very useful when the model has no knowledge of specific objects, to avoid other concepts from "bleeding in".
Effect on composition: Medium -
Tiling
This is a special model type in a sense its not really a controlnet, but instead allows you to use a pre-existing image to create large output image from "tiles" of the original image. This can be used instead of normal img2img process with resizing as it allows much larger output sizes since each tile is generated separately. -
Union and ProMax
Those are special models that combine multiple types of control in a single model and can be used instead of any of the above listed models. When selecting Union or ProMax models, you also need to select control mode which is going to be used
Control images are required for ControlNet to work: you can't use it without one. The problem of course is, how to make one? There are two different ways.
SD.Next can generate the appropriate control image from any input image you supply using a "preprocessor", which, depending on the model used, turns your input image in way suitable for use in ControlNet. There are as many preprocessors as ControlNet models, so use the model choice section to guide yourself.
Note that preprocessors are additional models, so using them consumes more VRAM (you can choose to unload or move them to the CPU after use in the SD.Next options, if this is a concern). Also, depending on the data they have been trained on, the accuracy of the resulting image vary.
Canny, Depth, Segmentation and Lineart preprocessors are recommended in case you do not have control images at hand.
In particular for Openpose, the accuracy of the preprocessor may not be enough, or you know how to generate images yourself, so you can supply a pre-made image. You will not require the additional VRAM for preprocessing, but of course you need to know how to make one.
Several examples of the available software that can be used to generate controlnet input images:
- PoseMyArt can export images in OpenPose format
- Depth Anything HuggingFace page can generate depth maps
- Clip Studio Paint Ex can be used to generate lineart from 3D models with the "Convert lines" feature
Note
Following step-by-step guide is created using SD.Next ModernUI
Same options exist in StandardUI as well althrough their location in the UI may differ
First, enable Control by clicking on the "Control" checkbox near the preview area.
A new tab will appear, make sure "ControlNet" is selected.
Now you have to decide how many "units" (ControlNet models) to use. For most uses one is sufficient, but for particularly complex scenarios you may need more than one. You can control the number of units by increasing the "Units" number. This guide assumes you use one unit. The workflow is the same the more units you add. Bear in mind that the more units you use, the more VRAM will be used.
Ensure the unit is enabled by checking if the checkbox under "ControlNet Unit 1" is checked. If it is not, click on it to enable it.
Now you have to select the ControlNet model you want to use. Click on the "reload" icon to load the available ControlNet models and select the one you want from the list. It will be automatically downloaded and made available to SD.Next. You can check the console output or the log for progress information.
Note
No items will display unless you have loaded a checkpoint first.
Should you want to use a preprocessor, select it from the list next to the ControlNet combo box.
Now you have to decide how much your ControlNet model will affect the generation. The "CN" strength slider goes from 0 (no effect) to 1 (complete effect). You might want to experiment here depending on your needs: for OpenPose models, 1 is usually fine, but for depth and canny, lower strength may be required.
For specific uses (out of scope of this guide) you can also decide for how long ControlNet will be active during the generation process. This can be adjusted by changing the values on "CN start" and "CN end" (0, start of the generation, 1 end).
Click on the upper arrow icon to upload your control image. Note: if you don't use a preprocessor it must be in the same aspect ratio as the image that will be generated.
If you have used a preprocessor, you can hit the preview icon to see how the image will be preprocessed.
Specific preprocessor settings can be changed in the Processor settings tab.
Tip
If you have messed up, hit the "Reset" icon and the values will be all reset to default.
Once everything is set up, write your prompt, set your image parameters, and hit Generate. You will get a preview of your control image and generation will begin.