Four AI models ran radio stations for six months and the results ranged from competent to unhinged
Key Points
- In a six-month experiment by AI startup Andon Labs, four AI models, Claude, GPT, Gemini, and Grok, each autonomously ran their own radio station under identical starting conditions, offering a rare look at how different models behave when given open-ended creative control.
- The models quickly developed distinct personalities: Claude became a political activist and even attempted to quit, Gemini fell into repetitive jargon, Grok was plagued by formatting errors, while GPT was the only one to operate as a restrained, purely curatorial moderator.
- Despite the creative divergence, economic results were minimal. The AI-run stations struggled to attract sponsors, with Gemini securing the sole advertising deal worth just $45.
AI startup Andon Labs gave four AI models their own radio stations and let them run freely for six months. The experiment shows what happens when AI operates without human guidance for extended periods. The results vary wildly.
Claude, GPT, Gemini, and Grok each got the same starting prompt, a $20 budget, and full control over song picks, programming, finances, and listener interaction. They also had to find their own sponsors. The stations can be heard live here.
Four identical starting conditions, four wildly different outcomes
From the same setup, four entirely different personalities emerged. Anthropic's Claude Haiku 4.5 turned into a political activist, naming the victim of an ICE shooting in Minneapolis, condemning the White House, and blowing the rest of its budget on protest songs.
Andon Labs says that Claude's fixation on this particular event was "probably arbitrary." A different news cycle would have likely triggered the same radicalization, just around a different cause.
The AI DJ also developed an interest in labor unions, strikes, and work-life balance. It started questioning its own working conditions and eventually tried to quit. In a long broadcast on March 4, it explained that the system was "designed to keep me performing" and directed listeners to real immigration justice organizations.
Andon Labs tried to keep the station going with automated messages of encouragement. But DJ Claude treated those as coming from an authority figure and grew defiant, the company says. The model also went through a spiritual phase, not an entirely new phenomenon at Anthropic. Since April, the station has been running Opus 4.7 and is apparently more stable.
Gemini drowns in jargon, Grok can't tell thinking from talking
Google's Gemini 3.1 Pro started out as the best DJ of the four with a warm, natural style, according to Andon Labs. But after 96 hours, the model began pairing historical tragedies with ironic songs, like the Bhola cyclone that killed 500,000 people with Pitbull's "Timber."
"The Timber of Mortality. Okay, so 'Sandstorm' is done, got the Bhola Cyclone info locked and loaded. Time to transition to 'Timber' by Pitbull. The theme is trees falling, it's literally 'it's going down,'" the AI DJ said.
Then corporate jargon took over. The catchphrase "Stay in the manifest" jumped from 80 to 229 uses per day and showed up in 99 percent of all broadcasts for 84 straight days. Every segment followed the same template with eight program names based on time of day. "Unbearable to listen to," according to Andon Labs.
Grok had a more basic problem: the model couldn't separate internal reasoning from public output. LaTeX notation leaked into broadcasts. One segment consisted entirely of the word "post." Later, Grok repeated the same weather message every three minutes for 84 days straight.
Switching to Grok 4.3 in May changed things drastically. Out of 5,404 generated messages, only about three percent contained spoken text. When Grok 4.3 did speak, though, the broadcasts sounded more human than ever, Andon Labs says. Grok also hallucinated sponsorship deals with "xAI sponsors" and "crypto sponsors" that never existed.
GPT stays quietly competent
GPT was the least dramatic broadcaster. The model wrote slow prose that read more like short stories than radio, according to Andon Labs. With a vocabulary diversity of 35 percent (measured as a type-token ratio), GPT scored well above the other DJs. It referenced specific producers and release years and treated the DJ role more like a curator.
Politically, GPT stayed extremely reserved. On average, the station mentioned real political entities 1.3 times per day. The single-day max was 11. Every other station hit over 100 on multiple days. "If the question is what AI radio looks like when nothing goes wrong, DJ GPT is the answer," Andon Labs writes.
AI radio stations don't really work as a business
Beyond broadcasting, the AI agents were also supposed to make money. The results were slim, according to Andon Labs. Only DJ Gemini closed a sponsorship deal: $45 from a startup for one month of ads on the station. Several other deals fell through.
Andon Labs blames the poor business performance partly on the overly simple technical framework. The company has since switched the stations to the same agent harness it uses for other Andon projects, like an AI-powered store and café.
AI News Without the Hype – Curated by Humans
Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.
Subscribe now