Content
summary Summary

Ahead of the October launch, OpenAI staff and users from the research community are sharing DALL-E 3 samples. The leap from the previous model is huge.

OpenAI introduced DALL-E 3 with an image of an avocado in therapy, complaining to her psychiatrist about her suffering, a spoon: "I just feel so empty inside."

Prompt: "An illustration of an avocado sitting in a therapist's chair, saying, 'I just feel so empty inside,' with a pit-sized hole in the center. The therapist, a spoon, scribbles notes."| Image: OpenAI

Of course, OpenAI chose this image deliberately because it shows two new core competencies of DALL-E 3 that have been missing in previous text-to-picture systems:

  • DALL-E 3 can write and, more importantly,
  • DALL-E 3 can accurately convert the specifications of a prompt into an image.

Thanks to ChatGPT support, DALL-E 3 even writes these prompts itself. All it needs is an image idea from the user, put into words. The whole thing works so well that with the release of DALL-E 3, OpenAI declares the much-touted "prompt engineering" to be over, at least for image systems, before it has really begun. It's all about creativity now, less about how to put things into very specific words that resemble some kind of imprecise programming language.

Ad
Ad

Video: OpenAI

Impressive DALL-E 3 examples on Twitter

Anyone who witnessed the launch of DALL-E 2 knows that, in retrospect, the image generator was overrated and quickly became obsolete thanks to Midjourney and Stable Diffusion.

OpenAI also chose examples that were particularly impressive when they introduced DALL-E 2. That's legitimate marketing, of course. In practice, however, it was much harder to generate useful images with DALL-E 2 than with Midjourney, for example.

Will this be different with DALL-E 3? Yes, if you look at the examples that OpenAI developers and users with access to DALL-E 3 are sharing on the platform that used to be called Twitter. A common thread running through these examples is DALL-E 3's astonishing attention to detail, likely due to the superior text understanding that comes with the integration of GPT-4.

In the following example, DALL-E 3 successfully reproduces the storm seen through the window in the coffee cup, as requested in the prompt. A highly complex image idea that DALL-E 3 executes correctly.

Recommendation
Prompt: "A 3D render of a coffee mug placed on a window sill during a stormy day. The storm outside the window is reflected in the coffee, with miniature lightning bolts and turbulent waves seen inside the mug. The room is dimly lit, adding to the dramatic atmosphere." | Image: DALL-E 3 prompted by OpenAI

The following example is similarly complex, looking through a wormhole in New York to the city of Shanghai, as described in the prompt. The city backgrounds show typical features associated with the city, such as the Oriental Pearl Tower, yellow New York taxis, and the One World Trade Center.

Image: Will Depue

At least as impressive is the following demonstration by Nathan Shipley. First, he asks DALL-E 3 to list 50 everyday objects. Then he instructs DALL-E 3 to show how a surfer carries these 50 objects on his back while struggling to surf (for good reason).

Image: Nathan Shipley
Image: Nathan Shipley

In the video below, Shipley shows how he first visualizes the idea of a cloud-shaped dachshund with DALL-E 3, and then derives a logo, merchandising, and even video game packaging from it.

OpenAI researcher Will Depue shows numerous DALL-E 3 images and calls it the best product since GPT-4. The horse riding on the astronaut's back is symbolic. Previous text-image systems could not visualize this unusual concept ("horse on man") because the reverse is much more common. So instead they show an astronaut on a horse, or just nonsense.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
A Astronaut riding a horse.
"Horse riding astronaut" in Midjourney gives me a nice-looking image, but it's not what I asked for. | Bild: Midjourney prompted by THE DECODER

For AI critics, this has long been an example of AI's lack of generalization and understanding of language. Thanks to DALL-E 3, this criticism might fall silent.

Image: Will Depue
Image: Carlos Davilla

According to Depue, the difficult scene doesn't always come out right the first time. But with two or three touch-ups, he says, you can reliably get there. "With a little effort, you can get almost anything you want," Depue writes.

Thanks to ChatGPT support, DALL-E 3 can also fill in gaps in the prompt itself. In the following example, the user asks for a cartoon scene of two onions talking, asking for a pun but not specifying the exact text.

Image: LoganGPT

DALL-E 3 even masters water reflections, though not (yet) inverted. Depue also does a spectacular job with the Pepe meme.

Image: Will Depue

OpenAI researcher Andrej Karpathy shares a new potential workflow for content creators: Using a headline from the Wall Street Journal, he has DALL-E 3 generate an image that he then animates using Pika Labs' video tool. He believes it is possible to use such workflows to automatically convert stories into audiovisual formats.

OpenAI has not yet commented on the technology behind DALL-E 3. Presumably, newly developed consistency models will replace the diffusion models used so far. They allow for faster rendering while maintaining high quality and subsequent image processing.

All in all, it looks like DALL-E 3 will be a new industry leader in image generation when it is released in October, and by some margin. Granted, the images are not perfect; many examples show AI-typical inaccuracies and inconsistencies. Overall, however, the leap in quality seems enormous based on the demos.

DALL-E competitor Midjourney is also working on a major version leap with v6, which should especially improve the model's text comprehension. The new version will be released before the end of the year.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • OpenAI launches DALL-E 3, an enhanced version of its text-to-image generator that can create detailed images based on text input.
  • DALL-E 3 has improved text understanding and can even visualize complex image ideas such as storms in coffee cups or horses riding astronauts.
  • The official launch of DALL-E 3 is scheduled for October and could be a major step forward for AI-generated images, although some inaccuracies and inconsistencies remain.
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.