Content
summary Summary

Alibaba’s research lab Tongyi has introduced ZeroSearch, a new method for training large language models to handle search tasks—without relying on real web searches.

Ad

For chatbots to answer questions accurately, especially when their built-in knowledge isn’t enough, they need to learn how to find information on the fly. Most current approaches use reinforcement learning (RL) and depend on actual search engines like Google to teach this skill. But according to Alibaba’s team, this is expensive, hard to control, and doesn’t scale well.

ZeroSearch takes a different approach: instead of using real web searches during training, it simulates the search process with a second language model. This model generates short texts in response to search queries, providing either relevant or intentionally irrelevant information—mimicking real search results, but under full control of the researchers.

Three-stage search simulation

The Qwen-2.5 language model, which is the main model being trained, goes through a structured learning process. In each round, it decides whether it needs to search for more information. If so, it crafts a query and sends it to the simulation model. The model then reviews the generated documents and responds, with its answer evaluated and getting feedback using RL.

Ad
Ad

At the start of training, the simulated search results are intentionally helpful. Over time, the quality is gradually reduced—a curriculum learning approach. This helps the model learn to draw useful conclusions even from unclear or conflicting information, much like searching the real internet.

The simulation model itself is fine-tuned beforehand, learning to generate both “useful” and “useless” search results. This distinction is controlled with subtle changes to the prompts—the instructions given to the model.

Successfully managing multi-level searches

Test runs show that the model can handle complex, multi-step search processes. In one example, it was asked, "Who is the spouse of the person who voices Smokey the Bear?" The simulated search first identified Sam Elliott as the voice actor. The model then conducted a second simulated search for Sam Elliott’s spouse, finding Katharine Ross. It combined both pieces of information correctly and produced an accurate answer.

This ability to break down a question into sub-questions and build on intermediate results is a key goal of ZeroSearch training.

Significant cost savings-and full control

Simulating the search process not only removes dependency on external search services, but also cuts costs dramatically. In experiments, running 64,000 searches through Google’s SerpAPI cost about $586 in API fees. By contrast, using the simulation model on four rented AWS A100 GPUs cost just $71 in compute time.

Recommendation

Another benefit: the simulated search is always available, produces responses in a consistent style, and can be made harder or easier as needed. According to the team, this makes training more predictable and robust.

Outperform Google searches in training

The team evaluated ZeroSearch on seven well-known question-answering benchmarks, including Natural Questions, TriviaQA, and HotpotQA. It matched or outperformed approaches trained with real Google searches, especially when using a large simulation model with 14 billion parameters.

Smaller models with 7 billion parameters also performed well. The key wasn’t just size, but whether the simulation model had been specifically fine-tuned for the task—models only controlled by prompts did much worse.

Alibaba has released some of its models on HuggingFace. More details and the code are available on GitHub.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Alibaba's Tongyi research lab has developed ZeroSearch, a method for training large language models for search tasks without having to access real search engines like Google. Instead, a second language model simulates the search results, allowing for full control and lower costs.
  • During training, the language model learns whether and how to formulate search queries, processes the simulated answers, and improves through a progressively more difficult curriculum. This enables the model to independently perform multi-step searches and derive meaningful answers from unclear information.
  • In tests on seven well-known data sets, ZeroSearch performed better than or equal to real-world web search methods. The system was particularly effective when the simulation model was specifically trained for its task. Training costs were significantly reduced compared to real web searches.
Sources
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.