Stable Diffusion XL: An image model at Midjourney's level?

A new beta version of Stable Diffusion delivers much more aesthetic and photorealistic results than the previous version. Will this make commercial offerings obsolete?

While Stable Diffusion is the most developed open-source image model, it can't always match the quality and especially the accessibility of commercial competitors like Midjourney.

Its strength so far is not so much in generating aesthetic images after entering a few commands, but in its openness and the possibility of further development by a constantly growing community.

Stable Diffusion XL: Beta available via DreamStudio and API

While Stable Diffusion v2.1 was already a visible leap over v1.5, at least in some scenarios, the latest version, Stable Diffusion XL (v2.2.2), marks a significant improvement. It is still under development, but a beta version is already available via the paid DreamStudio web interface and API. The code will be released on GitHub as usual once it is finished.

We are pleased to announce the latest release in our Stable Diffusion series of imaging solutions. SDXL offers a variety of image generation capabilities that are transformative across multiple industries, including graphic design and architecture, with results happening right before our eyes.

Tom Mason, CTO of Stability AI

Stable Diffusion XL comes with a number of enhancements that should pave the way for version 3. Exactly how the training material differs from previous versions is unknown. However, 80 million images are said to have been removed for v3 at the request of artists.

"Minimalistic home gym with rubber flooring, wall-mounted TV, weight bench, medicine ball, dumbbells, yoga mats, high-tech equipment, high detail, organized and efficient."

Compared to v2.1 with 900 million parameters, SDXL is also significantly larger with 2.3 billion. According to Stability AI CEO Emad Mostaque, the plan is to have a distilled version ready by the time of release and offer it as an alternative.

Stable Diffusion XL delivers more photorealistic results and a bit of text

In general, SDXL seems to deliver more accurate and higher quality results, especially in the area of photorealism. Human anatomy, which even Midjourney struggled with for a long time, is also handled much better by SDXL, although the finger problem seems to have not been solved yet.

"Skilled archer, bow and quiver of arrows, standing in forest clearing, intense, detailed, high detail, portrait".

Recommendation

AI research

AI models might need to scale down to scale up again

In addition, Stable Diffusion XL will be able to generate text on images for the first time. Although the results are not always perfect, and it may take several tries before the text is correct, Stability AI is the first available text-enabled generative AI model.

Stable Diffusion XL can text as the first publicly available generative AI model. Fingers and feet can still be a problem. | Image: Stability AI

As usual with Stable Diffusion, SDXL's capabilities go beyond text-to-image, supporting image-to-image (img2img) as well as the inpainting and outpainting features known from DALL-E 2. However, the maximum resolution of 512 x 512 pixels remains unchanged.

DreamStudio offers a limited free trial quota, after which the account must be recharged. 5,000 image generations cost about 10 US dollars.

"AI image generation is as good as done," CEO Mostaque said in a Q&A on the official Discord server shortly after SDXL's announcement. By the end of the year, he expects "pixel-perfect image generation" that is indistinguishable from real photos.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Stable Diffusion XL: An image model at Midjourney's level?

Stable Diffusion XL: Beta available via DreamStudio and API

Stable Diffusion XL delivers more photorealistic results and a bit of text

AI models might need to scale down to scale up again

FreeControl allows training-free spatial control of Stable Diffusion generations

MVDream creates impressive 3D renderings from text

StableVideo lets you edit video with Stable Diffusion

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Stable Diffusion XL: An image model at Midjourney's level?

Stable Diffusion XL: Beta available via DreamStudio and API

Stable Diffusion XL delivers more photorealistic results and a bit of text

Share

Bank details