Content
summary Summary

A University of Pittsburgh study indicates that readers can't tell the difference between poems written by AI and those written by humans. The research shows that people actually rate computer-generated verses higher than works by famous poets like Shakespeare, but only if they don't know who wrote them.

Ad

In a large-scale test with 16,340 participants, people correctly identified AI versus human-written poems only 46.6 percent of the time—worse than chance. Even more striking, participants mistook AI-generated poems for human work more often than they mistook actual human-written verses.

The research team used ChatGPT 3.5 to create five poems that mimicked the style of ten famous English-language poets, including William Shakespeare, Walt Whitman, and Emily Dickinson, and then asked participants to rate these AI creations as well as authentic poems by these authors.

"We found that AI-generated poems were rated more favorably in qualities such as rhythm and beauty, and that this contributed to their mistaken identification as human-authored," write study authors Brian Porter and Edouard Machery.

Ad
Ad
Box plot diagram showing ratings of 13 poetic quality dimensions, comparing AI and human authorship on a 7-point scale.
Across 13 different quality metrics, from beauty to wit, the AI poems matched or exceeded human works. The computer-generated verses stood out particularly in technical aspects like rhythm and sound. | Graphic: Porter, Machery

Simpler language gives AI an edge

The researchers point to one possible explanation: AI poems use more direct, accessible language that non-experts find easier to understand. Participants in the study "mistakenly interpret their own preference for a poem as evidence that it is human-written," the researchers note.

In a follow-up experiment with 696 participants, poems labeled as AI-generated received lower ratings—a pattern seen in other creative fields. When analyzing four key factors—creativity, mood, structure, and emotional quality—ratings stayed surprisingly balanced between AI and human authors when the source wasn't revealed.

Box plot comparing four poetry factors (Emotional, Structural, Atmosphere, Creativity) between Human, AI, and unattributed conditions on -4 to 2 scale, showing similar distributions.
When source attribution is hidden (orange), readers rate AI and human-written poems nearly identically across emotional, structural, atmospheric, and creative dimensions. This suggests our biases about AI creativity may be more influential than actual quality differences. | Graphic: Porter, Machery

Two key points frame these findings: First, the study used ChatGPT 3.5, now an older version of the AI model. Newer versions might produce even more convincing poems.

Second, the AI poems specifically imitated existing poets' styles rather than creating entirely original work. Without these human templates, the AI-generated poems might look very different.

The researchers also note their study focused on "non-experts," who likely responded well to AI's simpler language. Poetry experts would likely spot differences more easily, given their deeper knowledge of poetic structure, rhyme, and meter.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • A University of Pittsburgh study found that readers cannot tell the difference between AI-generated poems and those written by humans, and even rate the computer-generated verses higher than works by famous poets like Shakespeare.
  • The researchers had ChatGPT 3.5 create poems in the style often well-known English-language poets and compared them to the authors' actual poems. The AI poems scored better than the human originals in aspects such as rhythm and beauty.
  • The researchers suggest that the more straightforward, accessible language of the AI poems, which is easier for non-experts to comprehend, could be a reason for the higher ratings. When participants were told beforehand that the poems were AI-generated, they gave lower ratings.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.