AI research
Maximilian Schreiner

Human-aligned AI models prove more robust and reliable

Google Deepmind
Human-aligned AI models prove more robust and reliable
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Profile
E-Mail
Content
summary Summary

A team from Google Deepmind, Anthropic, and German researchers has introduced a method that helps AI models better mirror how people judge what they see. Their Nature study finds that AI models aligned with human perception are more robust, generalize better, and make fewer errors.

Ad

Deep neural networks can match humans at some visual tasks, but break down in unfamiliar situations. According to the study, the problem is structural: People organize visual concepts in a hierarchy, from fine details up to broader categories. AI models, on the other hand, focus on local similarities and often miss abstract connections.

This difference shows up in important ways. People might group a dog and a fish together as "living," even though they look nothing alike. AIs don’t make these leaps. When it comes to confidence, humans are usually only as certain as they are accurate, but AIs can be very confident even when they’re wrong.

Die Visualisierung zeigt wie unterschiedlich die Modelle die Welt verstehen.
The visualization highlights how differently unaligned and human-aligned AI models interpret the world.

AligNet: Narrowing the gap between AI and human perception

To close this gap, Lukas Muttenthaler and his team built AligNet. The core of their approach is a "surrogate teacher model," a version of the SigLIP multimodal model fine-tuned on human judgments from the THINGS dataset.

Ad
Ad

This teacher model generates “pseudo-human” similarity scores for millions of synthetic ImageNet images. These labels then help fine-tune a range of vision models, including Vision Transformers (ViT) and self-supervised systems like DINOv2. AligNet-aligned models ended up matching human judgments much more often, especially on abstract comparison tasks.

On the new "Levels" dataset, which covers different abstraction levels and includes ratings from 473 people, an AligNet-tuned ViT-B model even outperformed the average agreement among humans.

How human-like structure boosts model robustness

Aligning with human perception didn’t just make the models more "human" - it made them technically better. In generalization and robustness tests, AligNet models sometimes more than doubled their accuracy over baseline versions.

They also held up better on challenging tests like the BREEDS benchmark, which forces models to handle shifts between training and test data. On adversarial ImageNet-A, accuracy jumped by up to 9.5 percentage points. The models also estimated their own uncertainty more realistically, with confidence scores tracking closely to human response times.

The models also reorganized their internal representations. After alignment, they grouped objects by meaning, not just by looks - lizards, for example, moved closer to other animals, not just to plants of the same color.

Recommendation
AI research

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

According to Muttenthaler and colleagues, this approach could point the way toward AI systems that are easier to interpret and trust. Bringing human-like similarity structures into foundation models could make them more stable when faced with new situations. However, the researchers caution that perfect human-likeness isn't the goal - after all, human judgments are influenced by cultural and personal biases.

All training data and models from the AligNet project are openly available.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers from Google Deepmind, Anthropic, and German institutions have developed AligNet, a method that helps AI models align more closely with how humans judge visual information, improving accuracy, robustness, and generalization in tests.
  • The system uses a teacher model fine-tuned on human ratings to generate similarity scores for synthetic images, then applies these to adjust leading vision architectures like Vision Transformers and DINOv2, resulting in models that often match or exceed human consistency on abstract comparisons.
  • AligNet-aligned models not only performed better on challenging robustness benchmarks but also represented concepts in a more meaningful structure, grouping objects by shared semantics rather than visual similarity, which researchers say could make future AI systems more interpretable and dependable.
Sources
Nature
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Profile
E-Mail
AI research

Google's Mirasol pushes the boundaries of AI video understanding

News, tests and reports about VR, AR and MIXED Reality.
What happens next with MIXED My personal farewell to MIXED Meta and Anduril are now jointly developing XR headsets for the US military MIXED-NEWS.com
Google News
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Human-aligned AI models prove more robust and reliable

Bank details

IBAN: DE88 2507 0070 0053 0014 00
BIC: DEUTDE2HXXX
Account holder: Deep Content GmbH
Purpose: Support THE DECODER
AI and society

German court deepens the split on AI and copyright with its latest ruling

AI and society
Comment

OpenAI and Microsoft call AGI pointless, then make it the linchpin of billion-dollar deals

AI in practice

Google leans on token metrics, not revenue, adding to bubble talk about AI growth

Google News