A new study links layer-time dynamics in Transformer models with real-time human processing. The findings suggest that AI models may not only reach similar outputs as humans but could also follow similar "processing strategies" along the way.
Transformers, the architecture behind models like GPT and Vision Transformers (ViT), are good at producing human-like outputs. But researchers from Harvard, Brown, and the University of Tübingen wanted to go deeper. They asked whether the internal computations of these models resemble how humans process language and vision in real time.
Instead of focusing only on the final output by comparing what a model predicts to what a human says, the team examined how predictions change inside the model. Specifically, they analyzed how the model's internal signals evolve during a forward pass.
In neural networks, a forward pass is the process where input data flows layer by layer through the model, from the input layer to the output layer, producing a prediction at the end. The researchers wanted to see whether the way information shifts during this process mirrors the steps humans take while thinking.
What exactly did the researchers compare?
The study focused on how the model’s probability estimates changed across layers during a forward pass. These internal dynamics, meaning how confident the model is in each possible answer as it processes an input, were compared to human behavioral data. That included accuracy, reaction times (RT), typing behavior such as time to first keypress or number of backspaces, and mouse-tracking data like movement path and acceleration.
To make this comparison, the researchers extracted several "process metrics" from the model's layer-wise activity. These included Uncertainty (entropy), Confidence in the correct answer (log probability or reciprocal rank), Relative Confidence between the correct and an intuitive but wrong answer (log probability difference), and boosting of the correct answer over the intuitive one (based on differences in raw model logits).
Each of these metrics was measured in three ways: as the final value at the output layer, as an area-under-the-curve (AUC) summary across all layers, and as the point of maximum change between layers (Max-delta).
They then used these metrics to predict human behavior. For each task, they compared a baseline regression model using only the final output to models that added one process metric at a time.
Five studies across domains and modalities
In a fact recall task (naming capitals like Springfield for Illinois, not the intuitive Chicago), models showed evidence of "two-stage" processing: intermediate layers often preferred the intuitive wrong answer before later layers boosted the correct one.
Process metrics significantly improved predictions of human accuracy and processing load (like backspaces). For instance, confidence metrics best predicted accuracy, while relative confidence and boosting metrics best predicted typing uncertainty (backspaces).
In a fact recognition version of the task, where participants chose between Springfield and Chicago, the pattern changed. Here, relative metrics provided the biggest gains over output measures for predicting both human accuracy and reaction time, aligning with the task demanding explicit comparison.
In an exemplar categorization task using mouse-tracking, such as classifying "whale" as a mammal, process metrics again improved predictions. They were especially helpful for reaction times and mouse trajectory features like acceleration. Confidence, Relative Confidence, and Boosting showed the strongest effects.
The third study tackled syllogistic reasoning, which involves logic puzzles where people often respond based on belief rather than formal logic. The models showed similar belief-based biases. In this case, Confidence integrated across layers best predicted human accuracy and reaction time, especially on belief-driven questions.
Finally, extending to the vision modality, the team compared Vision Transformer (ViT) dynamics to human performance on out-of-distribution (OOD) object recognition. Even in these encoder-only models, process metrics improved predictions. Uncertainty integrated over time was particularly predictive of human accuracy and reaction time across various image datasets.
Not just the right answers, but similar ways of thinking
Across all five studies, the results consistently showed that process metrics derived from layer-time dynamics significantly improved the prediction of human behavioral data. This included both accuracy and measures related to cognitive load or uncertainty, going beyond what could be explained by output metrics from the final layer.
This suggests a functional parallel: stimuli that are "difficult" for a model (requiring more layer-wise computation to settle on an answer) also tend to be difficult for humans, resulting in longer reaction times or more errors.
The researchers argue that large AI models should not be seen only as black boxes that map inputs to outputs. If their internal processing mirrors human reasoning, these models could help test cognitive theories, uncover patterns in human decision-making, or support AI systems that better recognize and communicate uncertainty.
A foundation for explainability - and new research questions
The study acts as a bridge between mechanistic AI interpretability and cognitive modeling. The authors acknowledge limitations: the tested models were pre-trained (Llama-2 and ViT families) and covered specific tasks. Whether findings generalize to other architectures or fine-tuned models is an open question.
It also remains unclear whether layer-time dynamics better reflect an individual's processing stream or an aggregate pattern across people. Still, the researchers see this as a foundation. The internal computations of AI models, and how their predictions evolve layer by layer during a forward pass, could offer new insight into how humans think in real time.