Ad
Skip to content

China is falling behind in the AI race, according to a US government benchmark

Image description
Nano Banana Pro prompted by THE DECODER

A new report from the Center for AI Standards and Innovation (CAISI) claims Chinese AI models are losing ground to their US counterparts.

The agency recently put the new Chinese open-weight model Deepseek V4 Pro through its paces. The verdict: it's roughly eight months behind the leading US models. CAISI tested performance across cybersecurity, software development, math, natural sciences, and abstract reasoning.

CAISI calls Deepseek V4 the most capable Chinese AI model to date. But in private testing, it reportedly performs worse than Deepseek's own technical report suggests. Deepseek pitches the model as roughly on par with current US models like Opus 4.6 and GPT-5.4. CAISI says it's actually closer to the older GPT-5 - especially on abstract reasoning, cybersecurity, and software development. Math is the one area where Deepseek V4 nearly matches the top US models.

Laut der US-Behörde öffnet sich die Schere zwischen US- und China-Modellen zunehmend – Deepseek V4 Pro landet auf dem Niveau des acht Monate älteren GPT-5. | Bild: CAIST
According to CAISI, the gap between US and Chinese models keeps widening, with Deepseek V4 Pro landing at the level of GPT-5, which shipped eight months earlier. | Image: CAISI

The center, which likely has its own political agenda, sits within the National Institute of Standards and Technology (NIST). Its report paints a picture of a widening gap between US and Chinese models. Independent measurements tell a different story, showing the gap has stayed roughly constant.

Ad
DEC_D_Incontent-1

Der unabhängige Intelligence Index zeigt ein anderes Bild – der Abstand zwischen USA und China bleibt über die Zeit relativ konstant. | Bild. Artificial Analysis
The independent Artificial Analysis Intelligence Index tells a different story, with the gap between the US and China holding fairly steady over time. | Image: Artificial Analysis (screenshot)

Price might start to matter more than raw capability

On price, Deepseek V4 has a clear edge. It came in cheaper than the comparable GPT-5.4 mini in five of seven tests. And price is becoming a bigger factor as AI models are expected to run longer and handle more complex tasks. Meanwhile, top-tier US models keep getting pricier.

That matters because no one really knows yet how much these models actually boost productivity. Businesses don't have reliable ways to measure return on investment, especially once you factor in downstream effects like training, upskilling, and error checking.

Past a certain capability threshold, "good enough" performance at a low price could end up more attractive than top-tier performance at premium rates. Cursor, the Claude Code competitor reportedly being acquired by SpaceX, built its custom fine-tuned coding model on top of a Chinese open-weight model, making it significantly cheaper than what OpenAI and Anthropic offer.

OpenAI CEO Sam Altman seems torn on this. In a recent post on X, he wrote: "I keep thinking I want the models to be cheaper/faster more than I want them to be smarter, but it seems that just being smarter is still the most important thing."

Ad
DEC_D_Incontent-2

Altman's view may also rest on the bet that smarter AI could help improve itself, speeding up progress across the board. OpenAI, Anthropic, and Chinese developers have all said recently that their own models are already accelerating their R&D work.

AI News Without the Hype – Curated by Humans

Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.

Source: CAISI