FreeControl allows training-free spatial control of Stable Diffusion generations

Image generators can be controlled by more than just text prompts, e.g. with ControlNet. In the past, however, these methods required a great deal of training. FreeControl aims to change that.

Researchers at the University of California at Los Angeles, the University of Wisconsin-Madison, and Innopeak Technology Inc. have developed a new method for controllable text-to-image (T2I) generation, which they call "FreeControl".

FreeControl can be fed with various conditions in addition to the text prompt, such as a sketch, depth map, or point cloud, and can be combined with the Stable Diffusion 1.5, 2.1, and SDXL 1.0 models. The method is similar to the widely used ControlNet, but does not require specially trained models.

FreeControl requires no training

FreeControl is designed to provide users with fine-grained spatial control over diffusion models. Unlike previous approaches that require additional modules for each type of spatial condition, model architecture, and control point, FreeControl does not require this explicit training.

FreeControl first extracts a set of key features from sample images. These sample images consist of an input image that specifies the image structure, a generated image that is automatically generated from the respective prompt, and a further variant in which the prompt is adapted so that only the essential concept - e.g. a man - and not all details - e.g. a Lego man giving a speech - are included. The extracted features therefore include image composition, content, and style as well as the essential concept. Together they guide the generation of the final image.

FreeControl supports image-to-image

According to the authors, FreeControl produces high-quality images and is said to achieve better results than other methods that attempt to guide image generation without training. The FreeControl method can also be easily adapted for image-to-image guidance.

ControlNet has provided an initial approach to controlling the output of Stable Diffusion models under conditions other than text-only prompts, and is widely used. GLIGEN, released shortly afterward, went a step further by incorporating position and size information for the required objects in the image.

FreeControl seems to be a logical, even more powerful development of these ideas, on the one hand working independently of resource-intensive training, and on the other hand, accepting a whole range of different input conditions.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

FreeControl allows training-free spatial control of Stable Diffusion generations

FreeControl requires no training

FreeControl supports image-to-image

MVDream creates impressive 3D renderings from text

StableVideo lets you edit video with Stable Diffusion

Fooocus is Stable Diffusion for people who don't like prompt engineering

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

FreeControl allows training-free spatial control of Stable Diffusion generations

FreeControl requires no training

FreeControl supports image-to-image

MVDream creates impressive 3D renderings from text

StableVideo lets you edit video with Stable Diffusion

Fooocus is Stable Diffusion for people who don't like prompt engineering