Content
summary Summary

Researchers show how AI agents can learn from each other across generations, outperforming solo learners.

Researchers from the University of Oxford and Google DeepMind have examined how cultural accumulation - the buildup of knowledge and skills over generations - can be achieved in reinforcement learning (RL) agents. Cultural accumulation is considered key to humanity's success.

In the study, the team introduces two corresponding RL models: the "In-Context" model, where accumulation occurs through rapid adaptation to new environments, and the "In-Weights" model, where accumulation happens through slower updating of network weights during training.

The "In-Context" model accumulates knowledge over multiple generations by learning from other agents in individual episodes. But it also improves its behavior through independent learning to provide more useful demonstrations to the next generation.

Ad
Ad

In the "In-Weights" model, an agent's lifespan is equivalent to a full training cycle and the network parameters are equivalent to skills. Here, accumulation occurs more slowly over successive generations of agents, each trained from scratch but benefiting from the observations of the previous generation.

"Generational Intelligence" outperforms solo agents

To test their models, the researchers had artificial agents solve complex tasks in simulated environments. For example, in the "Traveling Salesman Problem (TSP)" environment, they had to find the shortest route between multiple cities.

The agents could only perceive a small portion of their environment at a time. However, each new generation could observe and learn from the previous generation solving the task. In all cases, the accumulating agents outperformed agents that only learned for a lifetime given the same experience budget.

For in-context learning, it was found that teachers that were too reliable or too unreliable in the training phase can hinder accumulation. Therefore, a balance must be found between social learning and independent discovery. In-weight learning, on the other hand, helped avoid biases from learning too early.

Recommendation

The study authors see their results as the basis for an endless, population-based cycle of self-improvement for AI agents. At the same time, the models could also provide new tools to study cultural accumulation in humans.

Future work should address, among other things, learned curricula for guiding social learning and cultural transmission in competitive or cooperative multi-agent scenarios. However, they point out that powerful, self-improving AI systems also pose risks.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at Oxford University and Google DeepMind have developed reinforcement learning (RL) models that enable cultural accumulation - the accumulation of knowledge and skills across generations - in AI agents.
  • In the "in-context" model, agents learn from other agents by quickly adapting to new environments, while in the "in-weights" model, accumulation occurs more slowly by updating network weights over successive generations.
  • In simulated complex tasks, the accumulating agents outperformed those that learned for only one lifetime. The models could form the basis for an endless cycle of self-improvement of AI agents, but they also harbor risks.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.