Ad
Skip to content

ChatGPT's goblin obsession may be hilarious, but it points to a deeper problem in AI training

OpenAI has traced a strange quirk in its AI models: starting with GPT-5.1, the models began sprinkling goblins, gremlins, and other mythical creatures into their answers. Mentions of "goblin" jumped 175 percent after GPT-5.1 launched, OpenAI writes.

The culprit was the training of ChatGPT's "Nerdy" personality, a feature that tweaks the model's language style. A reward signal meant to flag good answers accidentally favored creature metaphors. Though "Nerdy" only made up 2.5 percent of responses, it drove 66.7 percent of all goblin mentions, and a feedback loop during training spread the habit to other modes. OpenAI shut off the personality in March, removed the faulty reward signal, and filtered creature-related terms out of the training data.

OpenAI lead researcher Jakub Pachocki asked GPT-5.5 for a unicorn in ASCII art and got something that looks a lot more like a goblin. | Image: OpenAI

GPT-5.5 still had the issue because its training had already started before OpenAI found the cause. As a workaround, the company added a special instruction to Codex, its coding tool, telling it to drop the goblin metaphors:

Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.

Ad
DEC_D_Incontent-1

OpenAI says the case shows how small training incentives can trigger unexpected behaviors in AI models.

AI News Without the Hype – Curated by Humans

Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.

Source: OpenAI