A new language model design draws inspiration from the structure of the human brain

A Polish-American start-up is taking a new approach to language model design, drawing inspiration directly from the structure of the human brain.

The architecture, called "(Baby) Dragon Hatchling" (BDH) and developed by Pathway, swaps the standard Transformer setup for a network of artificial neurons and synapses.

While most language models today use Transformer architectures that get better results by scaling up compute and inference, Pathway says these systems work very differently from the biological brain.

Transformers are notoriously hard to interpret, and their long-term behavior is tough to predict—a real problem for autonomous AI, where keeping systems under control is critical.

Diagram with four columns from left to right: Transformer, BDH-GPU, BDH, and Brain Models. Each column shows Attention Mechanism, Local Dynamics, and Feed-Forward Network. Arrows connect the architectures and show their relationship. Tensor-based models are shown on the left, local graph models on the right. — The BDH-GPU architecture acts as a bridge between conventional AI models and brain-like processing. Instead of abstract math structures, BDH organizes information as connections between artificial neurons and synapses, mirroring how the brain stores knowledge. | Image: Pathway

The human brain is a massively complex graph, made up of about 80 billion neurons and over 100 trillion connections. Past attempts to link language models and brain function haven't produced convincing results. Pathway's BDH takes a different tack, ditching fixed compute blocks for a dynamic network where artificial neurons communicate via synapses.

A key part of BDH is "Hebbian learning," a neuroscience principle summed up as "neurons that fire together wire together." When two neurons activate at the same time, the connection between them gets stronger. This means memory isn't stored in fixed slots, but in the strength of the connections themselves.

Pathway calls these local learning rules "equations of reasoning." By describing how individual neurons interact, they create the basis for complex reasoning inside the model.

Unlimited context window

In tests, BDH matched GPT-2's performance on language and translation tasks - notable because GPT-2, while now outdated, paved the way for later breakthroughs through scaling. The team ran BDH-GPU head-to-head against Transformer models with the same number of parameters (from 10 million up to 1 billion), using identical training time and data. BDH learned faster per data token and showed better loss reduction, especially on translation.

Line graph: Validation loss by model size for BDH-GPU, BDH-GPU', and GPTXL – BDH-GPU' close to GPTXL performance. — BDH scales only by adding more neurons, keeping other hyperparameters fixed. Still, it matches the prediction accuracy of more complex scaled models like GPTXL across all tested sizes, measured by next-token prediction in translation tasks. | Image: Pathway

BDH's design comes with another advantage: a theoretically unlimited context window. Since it stores information in synaptic connections instead of a limited cache, it can, in principle, process texts of any length.

Recommendation

AI research

Meta Neuroscientist King: "Some of the concepts like reasoning may need to be re-evaluated"

At any given time, only about five percent of BDH's neurons are active. This sparse activation makes the model more efficient and much easier to interpret. With just a handful of neurons firing at once, it's easier to pinpoint which concepts or pieces of information the model is handling.

Interpretable synapses

Pathway's researchers found that BDH forms "monosemantic synapses" - connections that respond to specific concepts. In experiments with European parliamentary transcripts, some synapses lit up almost exclusively when the text mentioned currencies or country names.

Two neurons whose connection becomes stronger with successive responses to “US dollar.” — When neuron i activates for "US dollar" and neuron j fires shortly after, the connection between them is strengthened. | Image: Pathway

These synapses even worked across languages: the same connection lit up for both "British Pound" and "livre sterling." The system develops these interpretable structures on its own during training, with no need for hand-coding.

BDH also naturally builds a modular, scale-free network structure with high modularity—features typical of biological information processing like the brain.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Applications and outlook

Pathway says BDH could open up new possibilities for model engineering. For example, researchers at Pathway showed that it's possible to combine different language models by merging their neuron layers, similar to linking computer programs together.

The model's biological plausibility and interpretability could also have implications for AI safety. In neuroscience, the study provides evidence that features like modular networks, synaptic plasticity, and sparse activation can emerge directly from the underlying processes of reasoning.

The researchers argue that the brain's complex properties may not result from specific training methods, but from the basic requirements of language and reasoning itself.

Pathway sees BDH as a step toward a new theory for understanding how large language models behave at extreme scale. The ultimate goal is to establish mathematical guarantees for how reliably AI systems can reason over long periods of time, much like how thermodynamics describes the behavior of gases.

With progress on language models slowing, the field appears to have reached a plateau. As AI labs hit the limits of scaling data and compute, they're shifting their focus to inference and reasoning instead. The Transformer architecture likely isn't going away anytime soon, but more hybrid systems are starting to show up, including models that combine Transformers with new architectures like Mamba.

A new language model design draws inspiration from the structure of the human brain

Unlimited context window

Meta Neuroscientist King: "Some of the concepts like reasoning may need to be re-evaluated"

Interpretable synapses

Applications and outlook

Chinese researchers let LLMs share meaning through internal memory instead of text

We risk a deluge of AI-written "science" pushing corporate interests – here’s what to do about it

Study claims 78 training examples are enough to build autonomous agents

Anthropic claims to lower the entry barrier for advanced AI models with Claude Haiku 4.5

OpenAI says GPT-5 shows 30 percent less political bias than previous models

OpenAI suddenly remembers that copyright law exists after a few days of wild Sora videos

A new language model design draws inspiration from the structure of the human brain

Unlimited context window

Interpretable synapses

Applications and outlook

Share

Bank details