Content
summary Summary

A leaker on Discord claims to have access to a new image model from OpenAI. It shows significant progress, especially in font generation and matching prompts.

The leaker first came forward on a Discord channel in May, claiming to be part of an alpha test of a new AI image model from OpenAI. At the time, he showed images generated specifically for the channel, which he claimed were from a new image model trained by OpenAI.

In mid-July, he reappeared and showed more examples that he claimed to have generated using a "closed alpha" test version of what may or may not be DALL-E 3. The model is currently accessible to about 400 people, according to the leaker.

The leaker was invited via email and claims to have been involved in testing DALL-E and DALL-E 2. According to the leaker, the test version of the new image model is uncensored and therefore may contain scenes of violence and nudity or copyrighted material such as company logos.

Ad
Ad
Subway would probably not be happy with this generation, and with so much blood and religion, OpenAI is more likely to censor images like the one in the final DALL-E 3. | Image: Kaamalauppias, Discord

The images show the typical DALL-E mark in the lower right corner, but it could easily be faked. In any case, the new generations surpass the current capabilities of models like Midjourney and SD XL in terms of details and fonts.

According to the tester, the results are also "significantly" better than Google Parti, which was already far ahead of DALL-E 2 when Google presented it about a year ago. For comparison, the leaker tested prompts from the Parti paper. However, Midjourney is said to be still ahead with photorealistic generations.

Better font and prompt precision

The leaker's demonstrations show that the potential DALL-E 3 model is much better at handling type, for example, when including a phrase in the prompt that should appear as a phrase on the screen, as the following example shows.

Typos are part of the original prompt: "an image of an angel holding the sun and moon. above the angel, it says, "BE NOT AFRIAD" in the background is the entire universe. fantasy art, 8k reoslution, beautiful, emotional." | Image: via Discord

While errors still creep into the words, overall the new model shows a better understanding of the language. Interestingly, in the example above, the model writes "afraid" even though the prompt says "afriad," probably a spelling error that the model corrected. This could also mean that writing on the image is not 1:1.

The new model's improved language understanding enables it to accurately render even complex image compositions with many abstract details, such as the following cheese-animal scene or the chilled wombat on a beach chair.

Recommendation
Relaxed wombat: "A wombat sits in a yellow beach chair, while sipping a martini that is on his laptop keyboard. The wombat is wearing a white panama hat and a floral Hawaiian shirt. Out-of-focus palm trees in the background. dslr photograph. wide-angle view." | Image: via Discord

The example of the cheese animals is particularly impressive because in many models there is a so-called concept spillover, i.e. the image model mixes different content concepts. The potential DALL-E 3 model clearly separates the concepts of the cheese animal and the real animal.

Practically, a cheese animal is directly filled with ham. Prompt: "A group of farm animals (cows, sheep, and pigs) made out of cheese and ham, on a wooden board. There is a dog in the background eyeing the board hungrily." | Image: via Discord

The following Midjourney example with the same prompt illustrates the concept spillover. Here, the cheese has not become a cow, but one of the three dogs (instead of one) has horns that look like they could be made of cheese.

Image: Midjourney prompted by THE DECODER

DALL-E 2 goes all in on cheese, not even trying to put a real animal in the picture, just sticking to a concept.

DALL-E 2 is entirely cheesy. | Bild: DALL-E 2 prompted by THE DECODER

If you search for the user "Kaamalauppias", you can find some more potential DALL-E-3 generations in this Discord channel.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

OpenAI and others tinker with next-generation image AI

DALL-E 2 was quickly overtaken by Midjourney and Stable Diffusion after its launch, and then got lost in the hype surrounding ChatGPT and GPT-4. Of course, this does not mean that OpenAI has stopped working on image AI systems.

The first sign of this was the introduction of the Bing Image Creator, which according to Microsoft uses a "better version" of DALL-E 2. Details are not known, and the results of the Image Creator are not on the level of Midjourney or Stable Diffusion XL, even with DALL-E 2.5.

Since the introduction of DALL-E 2, a lot has happened in the field of image models in general, and companies like Meta have introduced new architectures that can generate images and fonts more efficiently and with higher accuracy.

In particular, Meta's latest image model CM3leon, at least based on the selected examples, seems to provide a similar level of detail to match the prompt as the potential DALL-E 3 generations shown above. Furthermore, CM3leon has been trained exclusively on licensed material.

Earlier this year, Google unveiled Muse, a high-speed AI image model that can also follow prompts more accurately than previous models and generate text.

In April, the OpenAI research team unveiled a new architecture called "Consistency Models," which generates much faster than classic diffusion models like DALL-E 2 while maintaining high quality - a possible prelude to video generation.

So significant advances in AI image models have been made, but they haven't made it into a product yet. DALL-E-3 may soon change that.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • A leak shows images allegedly generated with a new image model from OpenAI - possibly an early version of DALL-E 3.
  • The images clearly follow the prompt instructions more closely. In addition, the new model can generate words and sentences and integrate them into the image.
  • According to the leaker, the uncensored OpenAI image model is currently in a closed alpha phase with about 400 test subjects. It is not known if or when the product will be available to a wider audience.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.