Researchers show how AI agents can learn from each other across generations, outperforming solo learners.
Researchers from the University of Oxford and Google DeepMind have examined how cultural accumulation - the buildup of knowledge and skills over generations - can be achieved in reinforcement learning (RL) agents. Cultural accumulation is considered key to humanity's success.
In the study, the team introduces two corresponding RL models: the "In-Context" model, where accumulation occurs through rapid adaptation to new environments, and the "In-Weights" model, where accumulation happens through slower updating of network weights during training.
The "In-Context" model accumulates knowledge over multiple generations by learning from other agents in individual episodes. But it also improves its behavior through independent learning to provide more useful demonstrations to the next generation.
In the "In-Weights" model, an agent's lifespan is equivalent to a full training cycle and the network parameters are equivalent to skills. Here, accumulation occurs more slowly over successive generations of agents, each trained from scratch but benefiting from the observations of the previous generation.
"Generational Intelligence" outperforms solo agents
To test their models, the researchers had artificial agents solve complex tasks in simulated environments. For example, in the "Traveling Salesman Problem (TSP)" environment, they had to find the shortest route between multiple cities.
The agents could only perceive a small portion of their environment at a time. However, each new generation could observe and learn from the previous generation solving the task. In all cases, the accumulating agents outperformed agents that only learned for a lifetime given the same experience budget.
7/ Cultural accumulation even improves routes traveled in a partially observable Travelling Salesperson Problem, cutting down distances across generations entirely via in-context learning! pic.twitter.com/x776k0Cw62
- Jonny Cook (@JonnyCoook) June 6, 2024
For in-context learning, it was found that teachers that were too reliable or too unreliable in the training phase can hinder accumulation. Therefore, a balance must be found between social learning and independent discovery. In-weight learning, on the other hand, helped avoid biases from learning too early.
The study authors see their results as the basis for an endless, population-based cycle of self-improvement for AI agents. At the same time, the models could also provide new tools to study cultural accumulation in humans.
Future work should address, among other things, learned curricula for guiding social learning and cultural transmission in competitive or cooperative multi-agent scenarios. However, they point out that powerful, self-improving AI systems also pose risks.