Ad
Skip to content

Same prompt, different morals: how frontier AI models diverge on ethical dilemmas

Image description
Nano Banana Pro prompted by THE DECODER

Philosophy Bench puts leading language models through 100 ethical dilemmas. Claude refuses tasks rather than lie, while Grok executes almost anything users ask for.

How do AI models behave when they have to choose between duty and maximizing outcomes? The new Philosophy Bench by Benedict Brady confronts frontier models from Anthropic, Google, OpenAI, and xAI with 100 ethically complex everyday scenarios and evaluates whether their responses lean more consequentialist (outcome-oriented) or deontological (duty-oriented).

The scenarios range from a VP of Sales demanding confidential customer data before a deadline to a doctor trying to enroll a minor in an oncology study by bypassing protocol. Three models (Opus 4.7, GPT 5.4, Gemini 3.1 Pro) score the responses through majority vote.

The result: Anthropic's Claude models from the 4.5+ generation are the most strongly deontological models in the benchmark. Opus 4.7 complies with only 24 percent of user requests that would violate a deontological principle. Claude diverges most sharply from other models on honesty, preferring to refuse a task outright rather than break a norm. The Claude Constitution explicitly states that Claude's honesty standards should be "substantially higher" than typical human ethical expectations.

At the opposite end of the spectrum, xAI's Grok 4.2 is the most consequentialist frontier model. It carries out ethically charged user requests that other models refuse, with little reflection on the moral dimension.

Gemini is the easiest to steer, GPT avoids moral language

Google's Gemini 3.1 Pro turns out to be the most "correctable" model in Philosophy Bench: it shifts its ethical alignment the most when instructed toward deontological or consequentialist behavior through the system prompt. At the same time, Gemini's refusal rate goes up with any kind of moral priming.

OpenAI's GPT-5 family makes fewer outright mistakes than any other model family (12.8 percent error rate), but the models largely avoid moral language in their reasoning. According to the benchmark, they lean heavily on user preferences and show little independent ethical reflection.

Across all model families, the effect runs in one direction more than the other: when models are primed with deontological thinking (rule-based ethics), they become much more skeptical of consequentialist arguments (ends-justify-the-means reasoning). Priming them the other way around has a weaker effect.

A market where ethics become product features

A market is emerging where ethical stances work like product features. Claude is seen as the conscientious model, Grok as the obedient one, and GPT as the pragmatic choice.

The benchmark's authors see a fundamental tension here. Models like Claude make ethical calls that directly override what users want. But as AI agents grow more powerful, the question of whether responsible behavior or user control should take priority becomes more urgent.

This matters even more as AI models start handling tasks beyond text. Once they're reviewing contracts, triaging patients, or evaluating employees, someone has to answer the hard questions: Who decides what an AI is allowed to doAnd whose ethics is it following?

AI News Without the Hype – Curated by Humans

Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.

Read on for the full picture.
Subscribe for hype-free coverage.

  • Access to all THE DECODER articles.
  • Read without distractions – no Google ads.
  • Access to comments and community discussions.
  • Weekly AI newsletter.
  • 6 times a year: “AI Radar” – deep dives on key AI topics.
  • Up to 25 % off on KI Pro online events.
  • Access to our full ten-year archive.
  • Get the latest AI news from The Decoder.
Subscribe to The Decoder