AI research
Jonathan Kemper

Meta's Code "World Model" aims to close the gap between code generation and code understanding

Sora prompted by THE DECODER
Meta's Code
Jonathan writes for THE DECODER about how AI tools can improve both work and creative projects.
Profile
E-Mail
Content
summary Summary

Meta's Code World Model (CWM) is designed not just to generate code but to understand how that code runs on a computer.

Ad

"To master coding, one must understand not just what code looks like but what it does when executed," Meta researchers explain. This kind of reasoning is key for real program understanding, which goes far beyond copying code patterns.

Meta says CWM is meant to act like a "neural debugger," able to simulate program behavior before any code is actually run. The model can predict whether a program will finish or get stuck in an infinite loop. In tests using Meta's new HaltEval benchmark, CWM reached 94 percent accuracy.

Screenshot shows CWM solving a programming task. Text in different colors: blue reasoning sections, purple tool calls such as
CWM tackles code problems by reasoning, generating code, and systematically testing its solutions. | Image: Meta

CWM can also work backward from a description: given only a brief of what a program should do, it simulates execution and generates the corresponding code. The researchers demonstrate this with examples where CWM reconstructs functions from requirement descriptions and expected results, even when it has never seen the original code.

Ad
Ad

The model analyzes algorithm complexity too, estimating how long a program will run for different input sizes. On the BigOBench benchmark, CWM ranks second for predicting time complexity and outperforms other open-source models of similar size at 32 billion parameters.

CWM learned from over 120 million Python program executions, tracking how variables change step by step. The researchers call these "execution traces." During training, the model looked at both the code and the state of all local variables after every line, which helped it learn programming language semantics in a new way.

For realistic training, the team built more than 35,000 executable Docker containers from GitHub projects. Each container was a ready-to-use development environment so code and tests could run without extra setup.

Code example shows CWM format for Python execution tracks. Python function above
CWM uses structured traces to predict Python program execution one step at a time. | Image: Meta

Training happened in three phases: first, the model learned programming basics with 8 trillion tokens; then it trained on code execution with 5 trillion tokens; finally, it handled complex tasks through reinforcement learning across four environments, covering software engineering, competitive programming, and mathematical reasoning.

Benchmark results

CWM's performance shows up in benchmarks. On SWE-bench Verified, a main test of software engineering skills, the 32-billion-parameter model scored 65.8 percent on tasks with test-time scaling and 53.9 percent on the basic version. This is ahead of many smaller open-source models. But larger models, like Qwen3-Coder at up to 480 billion parameters, still lead the category.

Recommendation
AI research

DeepSeek's latest R1 model matches OpenAI's o1 in reasoning benchmarks

Bar chart shows SWE-bench verified results from open-weight and proprietary AI models. CWM achieves 53.6% (base) and 65.8% (test time scaling). Other open-weight models range between 51.6% and 62.4%. Proprietary models achieve between 61.6% and 80.2%.
CWM and other open source and proprietary models on basic and multi-step software engineering tasks. | Image: Meta

CWM also scores 68.6 percent on LiveCodeBench, 96.6 percent on Math-500, and 76 percent on the AIME 2024 Mathematical Olympiad. On CruxEval Output for code comprehension, it reaches 94.3 percent in reasoning mode.

Open for research

Meta has released CWM as an open-weights model under a non-commercial research license, sharing both the final model and intermediate training checkpoints through Hugging Face.

The 32-billion-parameter model can run on a single Nvidia H100 with 80 GB of memory, and it supports context windows up to 131,000 tokens.

Meta emphasizes that CWM is purely a research model focused on programming and mathematical reasoning. It hasn't been tuned for general chat or production use.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Meta has introduced its Code World Model (CWM), an open source AI model with 32 billion parameters that can generate code and simulate how it runs and affects a computer system.
  • Trained on over 120 million Python program runs, CWM uses execution traces to keep track of variable states line by line, achieving high scores on benchmarks such as HaltEval (94% accuracy) and SWE-bench Verified (65.8%).
  • The model is available for non-commercial research and can run on a single Nvidia H100 after quantization. It is specifically designed for programming and mathematical reasoning but is not intended for general applications or production use.
Sources
Meta
Jonathan writes for THE DECODER about how AI tools can improve both work and creative projects.
Profile
E-Mail
AI in practice

Meta in talks with major publishers over AI content licensing

News, tests and reports about VR, AR and MIXED Reality.
What happens next with MIXED My personal farewell to MIXED Meta and Anduril are now jointly developing XR headsets for the US military MIXED-NEWS.com
AI in practice

AI glasses are Meta’s second chance to break into the smartphone market without building a phone

AI and society

Meta forms super PAC in California to push AI policies

Google News
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Meta's Code "World Model" aims to close the gap between code generation and code understanding

Bank details

IBAN: DE88 2507 0070 0053 0014 00
BIC: DEUTDE2HXXX
Account holder: Deep Content GmbH
Purpose: Support THE DECODER
AI in practice

Sam Altman says scaling up compute is the "literal key" to OpenAI's revenue growth

AI research

OpenAI outperforms humans and Google at the world's top collegiate programming contest

AI in practice

New data from OpenAI and Anthropic show how people actually use ChatGPT and Claude

Google News