ZeroSearch: Alibaba trains search assistant in AI simulation

Alibaba’s research lab Tongyi has introduced ZeroSearch, a new method for training large language models to handle search tasks—without relying on real web searches.

For chatbots to answer questions accurately, especially when their built-in knowledge isn’t enough, they need to learn how to find information on the fly. Most current approaches use reinforcement learning (RL) and depend on actual search engines like Google to teach this skill. But according to Alibaba’s team, this is expensive, hard to control, and doesn’t scale well.

ZeroSearch takes a different approach: instead of using real web searches during training, it simulates the search process with a second language model. This model generates short texts in response to search queries, providing either relevant or intentionally irrelevant information—mimicking real search results, but under full control of the researchers.

Three-stage search simulation

The Qwen-2.5 language model, which is the main model being trained, goes through a structured learning process. In each round, it decides whether it needs to search for more information. If so, it crafts a query and sends it to the simulation model. The model then reviews the generated documents and responds, with its answer evaluated and getting feedback using RL.

At the start of training, the simulated search results are intentionally helpful. Over time, the quality is gradually reduced—a curriculum learning approach. This helps the model learn to draw useful conclusions even from unclear or conflicting information, much like searching the real internet.

The simulation model itself is fine-tuned beforehand, learning to generate both “useful” and “useless” search results. This distinction is controlled with subtle changes to the prompts—the instructions given to the model.

Successfully managing multi-level searches

Test runs show that the model can handle complex, multi-step search processes. In one example, it was asked, "Who is the spouse of the person who voices Smokey the Bear?" The simulated search first identified Sam Elliott as the voice actor. The model then conducted a second simulated search for Sam Elliott’s spouse, finding Katharine Ross. It combined both pieces of information correctly and produced an accurate answer.

This ability to break down a question into sub-questions and build on intermediate results is a key goal of ZeroSearch training.

Significant cost savings-and full control

Simulating the search process not only removes dependency on external search services, but also cuts costs dramatically. In experiments, running 64,000 searches through Google’s SerpAPI cost about $586 in API fees. By contrast, using the simulation model on four rented AWS A100 GPUs cost just $71 in compute time.

Recommendation

AI research

So-called reasoning models are more efficient but not more capable than regular LLMs, study finds

Another benefit: the simulated search is always available, produces responses in a consistent style, and can be made harder or easier as needed. According to the team, this makes training more predictable and robust.

Outperform Google searches in training

The team evaluated ZeroSearch on seven well-known question-answering benchmarks, including Natural Questions, TriviaQA, and HotpotQA. It matched or outperformed approaches trained with real Google searches, especially when using a large simulation model with 14 billion parameters.

Smaller models with 7 billion parameters also performed well. The key wasn’t just size, but whether the simulation model had been specifically fine-tuned for the task—models only controlled by prompts did much worse.

Alibaba has released some of its models on HuggingFace. More details and the code are available on GitHub.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

ZeroSearch: Alibaba trains search assistant in AI simulation

Three-stage search simulation

Successfully managing multi-level searches

Significant cost savings-and full control

So-called reasoning models are more efficient but not more capable than regular LLMs, study finds

Outperform Google searches in training

Deepseek says training its R1 model cost just $294,000

Study cautions that monitoring chains of thought soon may no longer ensure genuine AI alignment

Delphi-2M predicts disease risks for over 1,000 conditions using health records

OpenAI outperforms humans and Google at the world's top collegiate programming contest

New data from OpenAI and Anthropic show how people actually use ChatGPT and Claude

Leading AI chatbots are now twice as likely to spread false information as last year, study finds

ZeroSearch: Alibaba trains search assistant in AI simulation

Three-stage search simulation

Successfully managing multi-level searches

Significant cost savings-and full control

Outperform Google searches in training

Share

Bank details