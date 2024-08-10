AI research
Matthias Bastian

ChatGPT's automatic prompt rewriting reduces DALL-E 3's performance, study finds

Midjourney prompted by THE DECODER
ChatGPT's automatic prompt rewriting reduces DALL-E 3's performance, study finds
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
summary Summary

Research from the University of California, Berkeley shows that automatic prompt revision by a large language model significantly reduces the quality of images generated by DALL-E 3. This could limit users' ability to take full advantage of the model's capabilities.

UC Berkeley researchers conducted an online experiment with 1,891 participants to examine how automatic prompt rephrasing by a large language model (LLM) affects the image quality of DALL-E 3.

The results showed LLM-based prompt revision reduced DALL-E 3's advantages over DALL-E 2 by nearly 58 percent. While DALL-E 3 users with prompt rewriting still outperformed DALL-E 2 users, the improvement was less than when prompts written for DALL-E 2 were passed directly to DALL-E 3.

The study suggests AI-assisted prompt rewrites in their current form are not a cure-all. They may even hinder users from realizing a model's full potential if they don't align with the end user's goals. OpenAI uses ChatGPT's "prompt transformation" as a safety and moderation feature.

People prompt advanced AI more thoroughly

In the experiment, each participant was randomly assigned to one of three text-to-image models: DALL-E 2, the more capable DALL-E 3, or a version of DALL-E 3 with automatic prompt revision. The task was to write ten consecutive prompt attempts to reproduce a target image as accurately as possible.

In dem Experiment mussten Teilnehmende per Prompt ein Zielbild möglichst genau nachbilden.
In the experiment, participants had to reproduce a target image as accurately as possible using a prompt. | Image: Jahani et al.

Results showed DALL-E 3 outperformed DALL-E 2, with a significant difference in how closely the generated images matched the targets.

The researchers identified two main reasons for the performance gap between DALL-E 2 and DALL-E 3: improved technical capabilities of DALL-E 3 and users adapting their prompting strategies. Notably, DALL-E 3 users wrote longer prompts with greater semantic similarity and more descriptive words, even though they didn't know which model they were using.

The researchers suspect an interplay could develop as models advance: As the models improve, people continuously adapt their prompts to best utilize the latest model's capabilities. This suggests newer models won't make prompting obsolete. Instead, prompting will be the mechanism by which people tap into new models' capabilities.

Summary
  • A study at the University of California, Berkeley, shows that the automatic revision of prompts by a large language model reduces the quality of images generated by DALL-E 3 by almost 58 percent compared to the direct use of prompts written for DALL-E 2.
  • In an experiment with 1,891 participants, users of DALL-E 3 achieved better results than those using DALL-E 2, but the advantage was smaller when the prompts were automatically revised. This suggests that AI-assisted prompt revisions may affect users' ability to realize the full potential of a model.
  • The researchers identified two main reasons for the differences in performance between DALL-E 2 and DALL-E 3: improved technical capabilities and adapted prompting strategies by users. As models evolve, an interplay may develop in which people continually adjust their prompts to take advantage of the latest model's capabilities.
Sources
Arxiv
