Xiaomi introduces MiMo-7B, a compact model for math and coding tasks

With its new MiMo-7B model, Xiaomi aims to demonstrate that mathematical and programming tasks can be handled effectively by relatively small language models. According to the company, the results are intended to match or even surpass those achieved by larger competitors.

While many current open-source reasoning models use 32 billion parameters, Xiaomi relies on a 7B-parameter architecture, seeking to maximize its efficiency through tailored pre- and post-training strategies.

The research team reports that MiMo-7B was pre-trained on roughly 25 trillion tokens, with the goal of familiarizing the model with reasoning patterns early on. To support this, new extraction tools for mathematical formulas and code—covering formats like HTML and PDF—were developed. A three-stage data mixing process was also used, emphasizing synthetically generated tasks.

During the final pre-training phase, the proportion of math and code data was increased to about 70 percent. The context length was extended to 32,768 tokens to allow the model to process more complex, extended reasoning.

Another element of the training process is multi-token prediction (MTP), in which the model attempts to anticipate several subsequent tokens at once. This technique is designed to improve accuracy and speed up inference.

Reinforcement learning with test case-based rewards

After pre-training, two versions of the model were further refined using reinforcement learning (RL): MiMo-7B-RL-Zero was trained directly from the base model, while MiMo-7B-RL was developed from a previously fine-tuned SFT version. The training data includes 130,000 verifiable math and programming tasks.

Special attention was paid to the reward system for code-related tasks, which uses a “Test Difficulty Driven Reward” to weight individual test cases by difficulty. This approach is intended to address the common issue of sparse rewards, where models receive little feedback for particularly challenging problems.

To improve training stability, an “Easy Data Re-Sampling” method was employed. Tasks that the model already handles well are sampled less frequently, increasing sampling efficiency without distorting training.

Benchmark results and competitive performance

According to the report, MiMo-7B-RL achieves a score of 55.4 on the AIME 2025 math benchmark—4.7 points higher than OpenAI’s o1-mini. On LiveCodeBench v5, the model scores 57.8 percent, well ahead of Alibaba’s 32B QwQ-Preview at 41.9 percent. However, Alibaba’s recently released Qwen3-30B-A3B achieves 62.6 percent on the same benchmark, and the Qwen3-4B model also surpasses its larger predecessor with 54.2 percent. These results position MiMo-7B-RL as a competitive entry in the trend toward smaller, high-performing reasoning models.

Recommendation

AI research

AI language models struggle to connect the dots in long texts, study finds

The authors also note ongoing challenges. Maintaining a stable balance between math and code capabilities during RL training is difficult, and issues like unintended language mixing—for example, Chinese output in English tasks—remain unresolved.

Xiaomi has published MiMo-7B-Base, MiMo-7B-RL-Zero, and MiMo-7B-RL under an open license on GitHub. The company also sees the project as a methodological contribution, showing that smaller models can make inroads into areas traditionally dominated by larger systems through targeted training strategies.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Xiaomi introduces MiMo-7B, a compact model for math and coding tasks

Reinforcement learning with test case-based rewards

Benchmark results and competitive performance

AI language models struggle to connect the dots in long texts, study finds

Microsoft’s MAI-DxO boosts AI diagnostic accuracy and cuts costs by nearly 70 percent

Researchers say they may have found a ladder to climb the "data wall"

OmniGen 2 blends image and text generation like GPT-4o, but is open source

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

Xiaomi introduces MiMo-7B, a compact model for math and coding tasks

Reinforcement learning with test case-based rewards

Benchmark results and competitive performance

Share

Bank details