Content
newsletter Newsletter

The neuroscientist Jean-Rémi King leads the Brain & AI team in Meta’s AI division. In an interview with The Decoder, he discusses the connection between AI and neuroscience, the challenges of long-term prediction in models, predictive coding, the question of multimodal systems, and the search for cognitive principles in artificial architectures.

Ad

The Decoder: Mr. King, let’s start with a simple question: How did Meta become interested in neuroscience in the first place? At first glance, it seems like an unusual path – from a social network to neuroscience research.

Jean Rémi King: Yes, I work at Meta within FAIR, the Fundamental AI Research lab. It was launched by Yann LeCun a little over ten years. The idea back then was to establish a lab dedicated to fundamental AI research. Even at that time, the broader industry—and Mark Zuckerberg in particular—recognized how impactful AI would be for the tech sector. So, it was critical for the company to remain at the cutting edge of knowledge in this area.

FAIR has grown quite a bit since then. Initially, most researchers were working in computer vision and natural language processing. At some point, there was a decision to ensure a more diverse portfolio of researchers, so that not everyone was thinking in the same way. A few physicists were hired, and I was brought on as a neuroscientist—likely to broaden that portfolio.

This didn’t come out of nowhere. Neuroscience and AI have been intertwined from the beginning. That’s why we talk about artificial neural networks. The idea of hierarchical layers in algorithms actually originates in systems neuroscience, and the two fields have shared many links over the years. I believe Yann and Joelle Pineau saw the importance of continuing to push in that direction, and that’s probably why I was hired.

That said, I always feel a bit awkward answering this question—no one ever told me this directly. I was just hired and then given the freedom to continue the research I had already been working on.

The Decoder: Has your work always been situated at the intersection of AI and neuroscience?

Jean Rémi King: I did my undergraduate degree in AI and cognitive science more than 20 years ago now, which feels a bit daunting to admit. Even then, I was positioned at the intersection of those two fields. As a teenager—and even as a kid—I was fascinated by robotics and the idea of building intelligent systems. Of course, back then, it was something of an AI winter.

After my undergraduate studies, I started to think that neuroscience might be a bit more mature as a field, so I shifted away from computer science. I pursued my master’s and PhD more heavily on the neuroscience side, using machine learning algorithms mainly as tools to analyze complex data—rather than as a means to build intelligent systems. At the time, it felt more like statistics on steroids than a scientific goal in itself.

But around 2011–2012, things began to accelerate in what we now call deep learning. That’s when I returned to the frontier between neuroscience and AI, this time with the goal of exploring whether there are general principles that shape our own reasoning—principles that could also apply to algorithms.

The Decoder: Has your research with AI changed your conceptual understanding of the brain?

Jean Rémi King: I think studying the brain is one of the ways you’re forced to reconsider what thinking really means. AI today also makes it clear that some of the concepts we take for granted—like reasoning or thinking—may need to be re-evaluated in light of what deep learning algorithms are now capable of.

For those of us working in the field, this is a profound source of curiosity and wonder. The idea that intelligence and reasoning can emerge from something as mechanical as cells interacting—action potentials firing in the brain—is a deeply compelling question.

So it’s not that AI made me rethink these ideas; rather, I was already deeply intrigued by the notion that thinking, at its core, must be grounded in physics. That’s what drew me to the field in the first place, and I think many of my peers followed a similar path.

The Decoder: Do you have a personal “favorite theory” of how the brain works? In your papers, you often mention predictive coding. Is that a framework you consider particularly promising?

Jean Rémi King: That’s a tricky one, because I think many of us have a love-hate relationship with predictive coding. It’s a framework that was first popularized by Rao and Ballard in the 1990s, and then widely promoted by Karl Friston in the 2000s within systems neuroscience.

Friston is a fascinating figure in science. He has both incredibly original ideas and a tendency to obscure them behind dense, often cryptic mathematics. Sometimes, when reading his equations, it takes a moment to realize they're actually familiar concepts—just expressed in very unusual formalisms. And in a way, that’s reflective of the theory itself.

There are many compelling ideas in the original formulation of predictive coding. But when it comes to making the theory precise enough to generate specific, testable predictions, it becomes quite difficult. That’s the challenge—translating these broad concepts into empirically grounded models.

That said, many of the general ideas are genuinely interesting. In AI, and in predictive coding, one central idea is that driving a system to minimize its prediction error can be a sufficient principle for intelligence to emerge. The notion is that by learning to predict the world, a system will build useful intermediate representations. This idea is at the heart of the theory.

But why this process is sufficient—why minimizing prediction error leads to intelligent representations—remains unclear. We see this optimization happening in our algorithms, where we can control and understand it. But presumably, something similar is happening in the brain. And yet, the fact that this alone might give rise to intelligent processing is still something we don’t fully understand. It may not be a necessary condition, but it increasingly seems to be a sufficient one.

So, to answer your question—I don’t have a favorite theory. Like many of my peers, I’m more interested in exploring these large, sometimes unwieldy theories to see if they contain missing pieces—concepts that could help us better understand how the brain actually works.

The Decoder: In one of your earlier papers, you wrote that word sequences – the order of individual words – quickly become unpredictable, while their meaning may remain more stable. You suggest that for an intelligent system, it might be important not only to predict the next words, but to anticipate more abstract, hierarchical representations over longer timescales. I'm curious: Have you gained any new insights into this in your recent research – especially with regard to other modalities like images or videos, where similar challenges arise?

Jean Rémi King: What became apparent to us after publishing that paper—and I think this still holds true today—is that it’s not enough to simply predict what’s going to happen next, right after an observation. It's equally important to predict what will happen much later, well beyond the immediate moment. That’s a valuable goal, but in practice, it's extremely difficult.

Even today’s models that can handle multi-token prediction don’t scale particularly well. Building a model that can generate an entire paragraph or page at once is still incredibly challenging. This kind of long-range prediction just isn’t something current systems do easily.

My strong belief is that this is a genuinely hard problem: figuring out what kinds of architectures can support long-range inference in latent space. The classic transformer, as we have it today, remains limited in that regard.

Within our group, we've decided to take a step back from trying to invent those architectures ourselves—largely because so many teams are already working on this problem purely from an AI perspective. It seems unlikely that a breakthrough architecture will come directly from a neuroscience lab. However, we still collaborate with others working on adjacent challenges.

For example, at FAIR we have a team focused on computer vision for video. There, too, the goal isn’t just to predict the next video frame but to anticipate what might happen 10 seconds or even a minute later. That’s a massive challenge from a computer science standpoint.

We also have people working on code generation. In that context, it’s not useful to just predict the next character in a line of code. Ideally, you'd want a model to generate a full structure—say, a set of functions, which call classes, which interact with a dataset. Simply predicting the next token often isn't the best way to reason through that process.

So while we've explored these ideas, I wouldn't say we've solved anything. What we've mostly learned is just how difficult this problem truly is.

The Decoder: And what about progress on the AI side? Have recent developments captured human thinking more accurately in this regard – or are we still far from that?

Jean Rémi King: I do see progress—but it's clear that things haven't followed the trajectory many were predicting at the start of the LLM boom. When models like ChatGPT first emerged, there was a widespread belief that scaling was all we needed. People were saying, “Just make the models bigger, feed them more data, and intelligence will follow.”

Now, more than three years later, it's obvious that scaling alone isn't sufficient. Yes, performance improves with size, but not at a pace that’s reasonable or sustainable. A lot of companies have tried this brute-force approach, and while it works to some degree, it's not a magic bullet.

What this tells us is that we’re missing something fundamental. Take the human brain, for example. Children can learn language from just a few million words—a tiny fraction of the data required to train large language models. That discrepancy highlights just how inefficient our current architectures and optimization procedures are. It’s not that AI hasn't progressed—we’ve seen huge strides—but it’s clear that today's models are still limited in key ways.

We have made real engineering progress, especially in making inference more efficient and in compressing large models. People are now running powerful language models on single GPUs using distillation techniques, which makes the technology more accessible and easier to iterate on.

We've also seen major advances in generating images and videos. But conceptually, we haven't yet had a breakthrough on the scale of the transformer’s introduction in 2017. There are interesting developments—like mixture-of-experts models and new attention mechanisms—but these are incremental, not transformative.

Still, I believe we will see another leap. It’s just a belief—there’s no data behind it—but I think a new architecture or training paradigm will eventually emerge, one that’s far more efficient than what we have now.

Another area where I think we’re stuck is hardware. We’re all working with GPUs, which are incredibly energy-intensive. When you compare that to the human brain—which operates on just a few watts of power—the contrast is striking. Current computing is far from energy-efficient.

This is an area where I believe a major paradigm shift is possible, though probably not imminent. Rethinking our hardware to compute more data with less energy could change the entire landscape. But for now, it's not a focus for most of the industry. It’s likely a longer-term challenge—but one that could redefine the bottlenecks we're working with today.

The Decoder: In your research, have you also systematically studied how model size affects similarity to neural processing in the brain?

Jean Rémi King: Yes, we do make that comparison now—it's become almost systematic. The general rule of thumb is that, broadly speaking, larger models tend to be more brain-like, but let me start from the beginning.

The first thing we've observed—and others have as well; it's now a fairly robust finding—is that when you train an AI system, say a large language model, on a text-based task, it ends up processing text in a way that resembles how the human brain does. We know this by comparing brain activity in people reading or listening to natural language with the activation patterns of AI models trained on similar tasks.

Using brain imaging techniques—like fMRI, MEG, or even electrophysiology—we can measure human neural responses while they read or listen to stories. Then we compare those neural patterns with the internal activations of models like LLMs. What we consistently find is that the more effectively these models are trained, especially on tasks like next-token prediction, the more their internal representations resemble those observed in the brain.

And this isn't limited to language. We’re seeing similar effects in models trained on images, video, motor actions, or even navigation tasks. A whole new interdisciplinary field is emerging around this—bridging neuroscience and AI by systematically comparing model representations with brain activity.

Within that field, a key question is: what factors make an algorithm more or less brain-like? We can examine variables like model size, the amount of training data, whether the model uses attention mechanisms, or whether it’s trained with supervised or unsupervised learning. By varying these parameters, we can begin to understand what influences the similarity between artificial and biological systems.

What we've observed so far is that all these factors do have some effect—but the strongest predictor of brain-like representations is simply whether the model is good at the task it was trained on. For example, in vision, models that perform well at object recognition or image segmentation tend to produce internal representations that align well with brain activity. In language, models that are strong at next-word prediction or translation tend to show the same effect.

So, the better a model is at solving its task—regardless of its architecture—the more likely its internal representations are to be linearly aligned with those of the brain. That’s our first, coarse-grained observation.

But when we dig deeper, it gets more complicated. We’ve seen models that perform extremely well but don’t closely resemble brain activity, and vice versa. So there are edge cases, and the relationship isn’t always consistent. Developing a global theory to explain all of this is still an open and difficult challenge—but as a first approximation, the link between task performance and brain-like representations is a useful one.

The Decoder: So that means a larger model is not automatically better in terms of brain similarity – training also plays a crucial role? For example, in your work with language models like GPT-2 – and what about a model like GPT-4 today?

Jean Rémi King: So GPT-4 is unfortunately closed, so we cannot do this comparison, but within the company, we do have access to open models. And when we run these comparisons, we do see that larger language models tend to resemble the brain more closely—in the sense that there's a stronger linear correspondence between their internal activations and those measured in the brain.

But it’s important to emphasize: it’s not just about size. These larger models also happen to be better at language processing. If you take a very large language model that’s poorly trained, it doesn’t tend to resemble the brain. So, it’s not size alone that matters—it’s what that size enables the model to do.

In other words, size matters only insofar as it contributes to performance. What ultimately determines whether a model’s internal representations align with those of the brain is how well it performs the task it was trained on.

If you have a large model that performs poorly at predicting text, then its internal representations generally won’t be brain-like. So, the key variable isn’t model size in itself—it’s performance.

The Decoder: Let’s talk about multimodal models. Do you think the success of human learning is based on integrating different modalities – and do you see parallels in the progress of multimodal AI models?

Jean Rémi King: This is a highly debated question in the field, and I want to emphasize that what I’m expressing here is my personal opinion—it’s not a scientific consensus.

It’s a long-standing debate in cognitive science. Throughout its history, researchers have argued both for and against the necessity of grounding language in sensory experience—having access to images, sounds, and the physical world for language to have meaning, and vice versa.

For instance, Francisco Varela was a prominent advocate of embodied cognition, emphasizing the idea that cognition—including language—must be rooted in sensory and motor systems. While he might not have used the term “multimodal learning,” his work aligns closely with that concept. On the other side of the spectrum, you have figures like Noam Chomsky and his school of thought in linguistics, who have strongly argued for the independence of language. According to that view, the human brain contains a language system capable of combining and manipulating words largely independently of other systems like vision or auditory perception.

Ad
Ad

Now, in terms of where we are today: multimodal models are not yet dominating the field. Despite significant effort to combine modalities—text with images, for instance—it’s still difficult to build a multimodal model that outperforms a unimodal one at its own task. Just having access to multiple input streams doesn’t automatically make a model better at processing each. In fact, it often makes training harder. These models still struggle to reach state-of-the-art performance across all included modalities.

Personally, I tend to lean toward the idea that language can function relatively independently from other modalities. If you look at findings from psychology and cognitive science, it's clear that people who are congenitally blind, for example, can reason perfectly well. On IQ tests and similar measures, their performance matches that of sighted individuals. The same holds for people who are deaf, although deafness can sometimes affect language development, depending on the context. Still, it appears that language—and the reasoning it often supports—can develop largely independently of vision and hearing.

That said, there’s something very compelling about the multimodal perspective. Language, after all, is sparse. We don’t encounter that much language in daily life—perhaps 13,000 to 20,000 words per day. And from an AI standpoint, we’re approaching the limit of how much text data is available for training models. There simply won’t be much more new text.

In contrast, other modalities—like images and video—are virtually limitless. We don’t process the entire corpus of online video today simply because we lack the computational infrastructure to handle it. But there's an enormous amount of untapped information and structure in those formats.

So I think there’s real potential in combining the strengths of both: the depth and structure of language with the scale and richness of visual or other sensory data. That intersection remains a very important and promising direction for future research.

Recommendation

The Decoder: One last question: What’s your view on reasoning models? Systems that explicitly try to draw inferences. Are there plans to study such models in your team?

Jean Rémi King: I’m not an expert in reasoning models, but I find the recent developments in that area really exciting. Concepts like chain-of-thought reasoning have been present in cognitive science for quite some time, so it’s great to see them now being formalized in AI. These aren’t just vague theories anymore—we have concrete models that attempt to test these ideas.

What’s particularly interesting is that some of these models explore whether it’s more effective to carry out reasoning as a sequence of words—a verbal chain of thought—or whether reasoning should take place in the latent space of abstract concepts, not necessarily expressed in language. There’s a lot of potential in exploring how reasoning can be "unrolled," how you can revisit earlier steps in a process, and how different formats of representation might influence that process.

That said, it’s not my area of specialization, so I won’t comment in depth on the specific developments. But we are already seeing promising benefits from this line of research.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

What I do find very encouraging is how this connects to reinforcement learning. The idea of fine-tuning large language models for agentic behavior ties directly back to core concepts in reinforcement learning. In many ways, the so-called “world models” used in LLM fine-tuning are just a new framing of ideas that have been present in reinforcement learning for a while.

All of this points to a broader, and very positive, trend in AI: different subfields—language modeling, reasoning, reinforcement learning—are no longer evolving in isolation. Instead, they’re increasingly converging, and that integration is proving to be incredibly powerful.

About Jean-Rémi King

Jean-Rémi King is a CNRS researcher at the École Normale Supérieure and currently works at Meta AI, where he leads the Brain & AI team. His team investigates the neural and computational foundations of human intelligence – with a focus on language – and develops deep learning algorithms to analyze brain activity (MEG, EEG, electrophysiology, fMRI).

The interview was conducted on March 14, 2025.

Interviewer: Maximilian Schreiner

Ad
Ad
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.