Former OpenAI researcher explains what "Ask the AI" really means

Dec 1, 2024

Screenshot via YouTube

Key Points

Former OpenAI researcher Andrej Karpathy explains that when users "ask an AI" a question, they are actually interacting with the average answers provided by human data labelers, not a "magical AI" with its own knowledge and reasoning abilities.
LLMs are trained in a two-step process: first, it learns from a large corpus of internet documents, and then it is trained on conversations between "human" and "assistant" roles, where the assistant's responses are provided by human annotators, allowing the LLM to learn to imitate this role.
For specialized subject areas, expert data labelers are employed, but this does not necessarily mean that LLMs can answer all questions at the level of these experts, as the underlying knowledge and reasoning skills may not be fully captured in the model's training data.

Andrej Karpathy, former OpenAI researcher and head of AI at Tesla, explains that when people "ask an AI," they're actually interacting with averaged responses from human data labelers, not a magical AI system.

"You're not asking an AI, you're asking some mashup spirit of its average data labeler," Karpathy says. To illustrate his point, Karpathy uses a typical tourism question. When someone asks about "top 10 sights in Amsterdam," the AI generates an answer based on how human data labelers previously responded to similar questions.

For questions not in the training data, the system creates statistically similar responses based on its training, mimicking human answering patterns.

In particular, Karpathy cautions against asking AI systems about complex policy issues like optimal governance, saying that you would get the same answers if you asked the labeling team directly to research the answer in an hour.

"The point is that asking an LLM how to run a government you might as well ask Mary from Ohio, for $10, allowing 30 minutes, some research, and she must comply with the 100-page labeling documentation written by the LLM company on how to answer those kinds of questions," Karpathy explains.

How AI assistants get their "personality"

Large language models go through two stages of training. First, they learn from large amounts of Internet content and other data. Then, during fine-tuning, they train on conversations between "human" and "assistant" roles, with human annotators defining the assistant's responses.

When AI models respond to controversial topics with phrases like "it's a debated question," it's because human labelers are instructed to use such language to maintain neutrality, Karpathy says.

The fine-tuning process teaches the AI to act like a helpful assistant while maintaining its base knowledge but adapting its style to match the fine-tuning data. Many attribute ChatGPT's explosive success two years ago to this fine-tuning process—it made users feel like they were talking to a real, understanding being rather than just an advanced autocomplete system.

Expert knowledge comes from expert labelers

For specialized topics, companies hire relevant experts as data labelers. Karpathy notes that medical questions get answers from professional physicians, while top mathematicians like Terence Tao help with math problems.

Humans don't need answers to every possible question—the system just needs enough examples to learn how to simulate professional responses.

But that doesn't guarantee expert-level answers to all questions. The AI may lack the underlying knowledge or reasoning skills, although its answers typically outperform those of average Internet users, Karpathy says. So LLMs can be both very limited and very useful, depending on the use case.

The renowned AI researcher has previously criticized this approach, known as reinforcement learning from human feedback (RLHF). He considers it a stopgap solution since it lacks objective success criteria, unlike systems such as DeepMind's AlphaGo.

Karpathy, who recently left OpenAI along with several other senior AI researchers, has started his own AI education company.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Karpathy via X