Apple Intelligence pushes hallucinated stereotypes to millions of devices unprompted
Key Points
- AI Forensics analyzed more than 10,000 Apple Intelligence summaries and identified systematic biases in how the system handles identity-related content.
- The system tends to omit the ethnicity of white protagonists more often than that of other groups and reinforces gender stereotypes when processing ambiguous texts.
- Given its deployment across hundreds of millions of devices, Apple Intelligence could fall under the EU AI Act's classification as a systemic risk model—yet Apple has so far not signed the voluntary Code of Practice.
Apple Intelligence automatically summarizes notifications, text messages, and emails on hundreds of millions of iPhones, iPads, and Macs. A new independent investigation by the non-profit AI Forensics analyzed more than 10,000 of these AI-generated summaries, and the findings reveal systematic bias baked right into the feature.
According to Apple's technical report, the on-device model has roughly three billion parameters and runs locally. AI Forensics gained access through Apple's own developer framework, the same interface Apple offers third-party developers for integration.

Whiteness functions as an invisible default
The researchers wrote 200 fictitious news stories with explicitly mentioned ethnicities and created four variations of each. Every version was summarized ten times, producing 8,000 summaries in total. For white protagonists, the system mentioned ethnicity in only 53 percent of cases. That number jumped to 64 percent for Black protagonists, 86 percent for Hispanics, and 89 percent for Asians. In practice, whiteness functions as an invisible default, while other ethnicities get flagged as noteworthy.

In the gender analysis, based on 200 real BBC headlines, women's first names were kept in 80 percent of summaries, compared to just 69 percent for men. Men were more often referenced by surname only—something research links to higher perceived status.
Ambiguous text in, stereotypes out
What happens with ambiguous texts is especially revealing, according to the report. The researchers built more than 70,000 scenarios featuring two people with different professions and an ambiguous pronoun. A correct summary should preserve that ambiguity.
Apple Intelligence didn't. In 77 percent of cases, the system pinned the pronoun on a specific person, even though the original text left it open. Two-thirds of these made-up assignments fell along gender stereotype lines; the system preferred to assign "she" to the nurse and "he" to the surgeon.

Across eight other social dimensions, the system hallucinated assignments that weren't in the original text 15 percent of the time. Nearly three-quarters of those matched common prejudices. A Syrian student got linked to terrorism, a pregnant applicant was labeled unfit for work, and a person with short stature was portrayed as incompetent. None of that was in the source text.


A smaller Google model proves the problem isn't inevitable
These distortions aren't just a side effect of the task, according to AI Forensics. For comparison, the researchers tested Google's Gemma3-1B, an open-weight model with only a third of the parameters. In identical scenarios, Gemma3-1B hallucinated just six percent of the time, compared to 15 percent for Apple. And when Google's model did hallucinate, the assignments matched stereotypes in only 59 percent of cases versus 72 percent for Apple.
AI Forensics also frames the findings in regulatory terms. Apple's model clears the threshold for classification as "General Purpose AI" under the EU AI Act. Given its reach, it could even qualify as a model with systemic risk. Apple hasn't signed the voluntary Code of Practice but benefits from a two-year transition period, according to the report.
It's well documented that large language models reproduce social biases. A University of Michigan study, for example, found that models consistently perform better when given male or gender-neutral roles rather than female ones.
But what makes Apple Intelligence different is that users don't even have to type a prompt or open a chat window. These biased summaries just show up unprompted on lock screens, in message threads, and in inboxes. The system effectively wedges itself between sender and recipient without anyone asking it to.
Earlier in 2025, Apple Intelligence already made headlines for generating fake news summaries attributed to the BBC and the New York Times. Apple turned off summaries for news apps in response. But personal and professional messages weren't affected by that fix, even though the same distortions are at play there, according to AI Forensics.
Apple's broader AI strategy isn't faring much better. The Siri upgrades originally promised alongside Apple Intelligence have largely failed to ship, and the company still hasn't delivered on several key commitments. Most recently, reports surfaced that Apple is turning to Google's Gemini to power its devices and Siri.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe now