Ad
Skip to content

Baidu's Ernie 5.1 cuts 94 percent of pre-training costs while competing with top models

Image description
Midjourney prompted by THE DECODER

Key Points

  • Baidu has launched Ernie 5.1, a language model distilled from its larger predecessor Ernie 5.0, making it more resource-efficient while currently topping Chinese AI benchmarks.
  • The model uses a four-stage training pipeline with specialized expert models for code, logic, and agent tasks, designed to prevent different capabilities from interfering with each other during the learning process.
  • Ernie 5.1 is accessible through Baidu's platforms and integrated into various creative applications, but the model weights remain closed, making independent verification of its reported performance impossible.

Baidu has released Ernie 5.1, a language model built on the pre-training foundation of its predecessor Ernie 5.0 but with roughly a third of the total parameters and about half the active parameters per query.

Pre-training costs came in at just six percent of what comparable models require, according to Baidu. On the Arena Search Leaderboard, Ernie 5.1 scored 1,223 points as of May 9—4th place globally and 1st among Chinese models.

Balkendiagramm des Search-Arena-Leaderboards mit 15 Modellen. Ernie 5.1 belegt mit 1.223 Punkten Platz 4, hinter Claude Opus 4.6 Search (1.255), GPT-5.5 Search (1.242) und Claude Opus 4.7 (1.236).
On the Search Arena Leaderboard, Ernie 5.1 landed in 4th place with 1,223 points, behind two Claude Opus variants and GPT-5.5 Search. | Image: Baidu

In additional benchmarks, Baidu claims Ernie 5.1 beats DeepSeek-V4-Pro on autonomous AI agent tasks (tau3-bench, SpreadsheetBench-Verified) and comes close to Google's Gemini 3.1 Pro on knowledge and reasoning benchmarks (GPQA, MMLU-Pro). On a tough math benchmark (AIME26), the model with tool access lands just behind Gemini 3.1 Pro. Internal evaluations also show the model matching Western commercial models in creative writing, Baidu says.

Balkendiagramm des Text-Arena-Leaderboards mit 15 Modellen. Ernie-5.1-Preview belegt mit 1.476 Punkten Platz 13. Die Liste wird angeführt von Claude Opus 4.7 (Thinking) mit 1.503 Punkten, gefolgt von Claude Opus 4.6 (Thinking), Claude Opus 4.6 und Claude Opus 4.7.
On the Text Arena Leaderboard, the pre-release Ernie 5.1 Preview sits at 13th place with 1,476 points. Claude Opus variants and Gemini 3.1 Pro hold the top spots. | Image: Baidu

Ernie 5.1 is a smaller model based on its predecessor

Baidu built Ernie 5.1 as a smaller sub-model from Ernie 5.0 using an approach the company calls the "Once-For-All elastic training framework." Instead of running a separate, expensive pre-training pass for each model size, the company optimizes an entire family of differently sized models in a single run.

Ad
DEC_D_Incontent-1

Schema des Once-For-All-Trainings in drei Abschnitten. Links ein Stapel grüner Transformer-Schichten als „Elastic Depth", in der Mitte ein Raster aus Experten-Bausteinen als „Elastic Width", rechts drei Routing-Konfigurationen mit Top-K = 1, 2 und 4 als „Elastic Sparsity".
The Once-For-All framework simultaneously varies depth, expert count, and active experts per request in a single training run. Baidu extracted Ernie 5.1 as a smaller sub-model from this family. | Image: Baidu

The models share weights but differ in depth, width, and how many specialized expert blocks activate for a given query. Baidu picked what it considers the best configuration from this family for Ernie 5.1, which explains the low pre-training costs, since the heavy compute was already done for Ernie 5.0.

In addition, Baidu rebuilt its reinforcement learning infrastructure from the ground up. The key components—model updates, response generation, and evaluation—traditionally run tightly coupled. Baidu now runs them as separate subsystems that scale independently, coordinated by a central controller. Each component gets the right hardware, and a bottleneck in one step doesn't block the others, the company says.

A persistent challenge in large-model reinforcement learning is drift between training and example generation caused by different computation settings. This can destabilize the whole process. Baidu addresses it with a standardized low-precision computation library, plus a correction mechanism for mixture-of-experts models that cuts drift in half without noticeably slowing things down.

Balkendiagramme vergleichen Ernie 5.1, DeepSeek V4 Pro, Claude Opus 4.6 und Gemini 3.1 Pro in acht Benchmarks. Im oberen Block (Agentic) liegt Ernie 5.1 bei AIME26 mit Werkzeugen bei 99,6 Punkten, knapp hinter Gemini 3.1 Pro mit 99,9; in SpreadsheetBench-Verified erreicht das Modell 72,5 Punkte und liegt damit vor DeepSeek V4 Pro, aber hinter den beiden Konkurrenten. Im unteren Block (Knowledge, Reasoning, Instruction Following) liegen die Werte näher beieinander, Ernie 5.1 schneidet hier meist als zweit- oder drittstärkstes Modell ab.
Baidu's benchmark comparison against DeepSeek V4 Pro, Claude Opus 4.6, and Gemini 3.1 Pro. Top row shows agentic tasks, bottom row covers knowledge, reasoning, and instruction following. Ernie 5.1 leads in some categories but not all. | Image: Baidu

A four-stage pipeline tackles the "seesaw effect"

Baidu uses a four-stage fine-tuning process to address a well-known problem: training multiple skills at once often means gains in one area come at the cost of another. Baidu calls this the "seesaw effect:" coding ability, logic, and creativity end up dragging each other down.

Ad
DEC_D_Incontent-2

The pipeline starts with standard supervised training on a broad dataset. Stage two trains several specialized expert models in parallel, one each for code, reasoning, and agent tasks, each with its own evaluation signals.

Flussdiagramm der vierstufigen Post-Training-Pipeline. Stage 1: Unified Supervised Fine-Tuning auf Instruktionsdaten aus Chat, Code, Mathe und Tool-Nutzung. Stage 2: parallele Spezialisierung in Code-, Reasoning- und Agent-Experten. Stage 3: On-Policy Distillation, in der ein Schüler-Modell per Token-Level Reverse KL von mehreren Lehrer-Modellen lernt. Stage 4: General Online RL auf offenen Dialogdaten, am Ende steht Ernie 5.1.
Baidu's four-stage post-training pipeline: joint fine-tuning, then parallel expert training for code, reasoning, and agent tasks, followed by distillation into a student model, and finally open reinforcement learning for dialog and creative tasks. | Image: Baidu

In stage three, a single student model learns from all these teachers simultaneously by generating its own answers and comparing them against the experts' outputs. The final stage adds general reinforcement learning for open-ended dialog and creative tasks. Baidu says this step is necessary because teacher-student distillation tends to produce answers that are too polished and lack variety.

Available on creative platforms, but no open weights

Ernie 5.1 is available through ernie.baidu.com and a playground in Baidu AI Studio. The model will also roll out to more than ten creative platforms, including the role-playing platform Isekai Zero, creative agent Mulan AI, AI canvas app Diting Huanliu, and short drama generator Storymaster.

As with Ernie 5.0, Baidu hasn't released model weights, so the benchmark scores and efficiency claims can't be independently verified.

Baidu laid the groundwork for this leaner release with Ernie 5.0 in January 2026. That model processes text, images, audio, and video in a unified architecture using a mixture-of-experts structure with roughly 2.4 trillion total parameters, fewer than three percent of which activate per query.

AI News Without the Hype – Curated by Humans

Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.

Source: Baidu