ChatGPT's automatic prompt rewriting reduces DALL-E 3's performance, study finds

Midjourney prompted by THE DECODER

Research from the University of California, Berkeley shows that automatic prompt revision by a large language model significantly reduces the quality of images generated by DALL-E 3. This could limit users' ability to take full advantage of the model's capabilities.

UC Berkeley researchers conducted an online experiment with 1,891 participants to examine how automatic prompt rephrasing by a large language model (LLM) affects the image quality of DALL-E 3.

The results showed LLM-based prompt revision reduced DALL-E 3's advantages over DALL-E 2 by nearly 58 percent. While DALL-E 3 users with prompt rewriting still outperformed DALL-E 2 users, the improvement was less than when prompts written for DALL-E 2 were passed directly to DALL-E 3.

The study suggests AI-assisted prompt rewrites in their current form are not a cure-all. They may even hinder users from realizing a model's full potential if they don't align with the end user's goals. OpenAI uses ChatGPT's "prompt transformation" as a safety and moderation feature.

People prompt advanced AI more thoroughly

In the experiment, each participant was randomly assigned to one of three text-to-image models: DALL-E 2, the more capable DALL-E 3, or a version of DALL-E 3 with automatic prompt revision. The task was to write ten consecutive prompt attempts to reproduce a target image as accurately as possible.

In dem Experiment mussten Teilnehmende per Prompt ein Zielbild möglichst genau nachbilden. — In the experiment, participants had to reproduce a target image as accurately as possible using a prompt. | Image: Jahani et al.

Results showed DALL-E 3 outperformed DALL-E 2, with a significant difference in how closely the generated images matched the targets.

The researchers identified two main reasons for the performance gap between DALL-E 2 and DALL-E 3: improved technical capabilities of DALL-E 3 and users adapting their prompting strategies. Notably, DALL-E 3 users wrote longer prompts with greater semantic similarity and more descriptive words, even though they didn't know which model they were using.

The researchers suspect an interplay could develop as models advance: As the models improve, people continuously adapt their prompts to best utilize the latest model's capabilities. This suggests newer models won't make prompting obsolete. Instead, prompting will be the mechanism by which people tap into new models' capabilities.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

ChatGPT's automatic prompt rewriting reduces DALL-E 3's performance, study finds

People prompt advanced AI more thoroughly

DALL-E 4 could be much better than DALL-E 3

OpenAI's image AI DALL-E 3 gets an interface update

Here are some news from OpenAI's Dev Days that you might have missed

The long-predicted deepfake dystopia has arrived with Sora 2

Anthropic claims to lower the entry barrier for advanced AI models with Claude Haiku 4.5

OpenAI says GPT-5 shows 30 percent less political bias than previous models

ChatGPT's automatic prompt rewriting reduces DALL-E 3's performance, study finds

People prompt advanced AI more thoroughly

DALL-E 4 could be much better than DALL-E 3

OpenAI's image AI DALL-E 3 gets an interface update

Here are some news from OpenAI's Dev Days that you might have missed