ChatGPT's goblin obsession may be hilarious, but it points to a deeper problem in AI training
OpenAI has traced a strange quirk in its AI models: starting with GPT-5.1, the models began sprinkling goblins, gremlins, and other mythical creatures into their answers. Mentions of "goblin" jumped 175 percent after GPT-5.1 launched, OpenAI writes.
The culprit was the training of ChatGPT's "Nerdy" personality, a feature that tweaks the model's language style. A reward signal meant to flag good answers accidentally favored creature metaphors. Though "Nerdy" only made up 2.5 percent of responses, it drove 66.7 percent of all goblin mentions, and a feedback loop during training spread the habit to other modes. OpenAI shut off the personality in March, removed the faulty reward signal, and filtered creature-related terms out of the training data.

GPT-5.5 still had the issue because its training had already started before OpenAI found the cause. As a workaround, the company added a special instruction to Codex, its coding tool, telling it to drop the goblin metaphors:
Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.
OpenAI says the case shows how small training incentives can trigger unexpected behaviors in AI models.
AI News Without the Hype – Curated by Humans
Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.
Subscribe now