Content
summary Summary

Meta's Code World Model (CWM) is designed not just to generate code but to understand how that code runs on a computer.

Ad

"To master coding, one must understand not just what code looks like but what it does when executed," Meta researchers explain. This kind of reasoning is key for real program understanding, which goes far beyond copying code patterns.

Meta says CWM is meant to act like a "neural debugger," able to simulate program behavior before any code is actually run. The model can predict whether a program will finish or get stuck in an infinite loop. In tests using Meta's new HaltEval benchmark, CWM reached 94 percent accuracy.

Screenshot shows CWM solving a programming task. Text in different colors: blue reasoning sections, purple tool calls such as
CWM tackles code problems by reasoning, generating code, and systematically testing its solutions. | Image: Meta

CWM can also work backward from a description: given only a brief of what a program should do, it simulates execution and generates the corresponding code. The researchers demonstrate this with examples where CWM reconstructs functions from requirement descriptions and expected results, even when it has never seen the original code.

Ad
Ad

The model analyzes algorithm complexity too, estimating how long a program will run for different input sizes. On the BigOBench benchmark, CWM ranks second for predicting time complexity and outperforms other open-source models of similar size at 32 billion parameters.

CWM learned from over 120 million Python program executions, tracking how variables change step by step. The researchers call these "execution traces." During training, the model looked at both the code and the state of all local variables after every line, which helped it learn programming language semantics in a new way.

For realistic training, the team built more than 35,000 executable Docker containers from GitHub projects. Each container was a ready-to-use development environment so code and tests could run without extra setup.

Code example shows CWM format for Python execution tracks. Python function above
CWM uses structured traces to predict Python program execution one step at a time. | Image: Meta

Training happened in three phases: first, the model learned programming basics with 8 trillion tokens; then it trained on code execution with 5 trillion tokens; finally, it handled complex tasks through reinforcement learning across four environments, covering software engineering, competitive programming, and mathematical reasoning.

Benchmark results

CWM's performance shows up in benchmarks. On SWE-bench Verified, a main test of software engineering skills, the 32-billion-parameter model scored 65.8 percent on tasks with test-time scaling and 53.9 percent on the basic version. This is ahead of many smaller open-source models. But larger models, like Qwen3-Coder at up to 480 billion parameters, still lead the category.

Recommendation
Bar chart shows SWE-bench verified results from open-weight and proprietary AI models. CWM achieves 53.6% (base) and 65.8% (test time scaling). Other open-weight models range between 51.6% and 62.4%. Proprietary models achieve between 61.6% and 80.2%.
CWM and other open source and proprietary models on basic and multi-step software engineering tasks. | Image: Meta

CWM also scores 68.6 percent on LiveCodeBench, 96.6 percent on Math-500, and 76 percent on the AIME 2024 Mathematical Olympiad. On CruxEval Output for code comprehension, it reaches 94.3 percent in reasoning mode.

Open for research

Meta has released CWM as an open-weights model under a non-commercial research license, sharing both the final model and intermediate training checkpoints through Hugging Face.

The 32-billion-parameter model can run on a single Nvidia H100 with 80 GB of memory, and it supports context windows up to 131,000 tokens.

Meta emphasizes that CWM is purely a research model focused on programming and mathematical reasoning. It hasn't been tuned for general chat or production use.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Meta has introduced its Code World Model (CWM), an open source AI model with 32 billion parameters that can generate code and simulate how it runs and affects a computer system.
  • Trained on over 120 million Python program runs, CWM uses execution traces to keep track of variable states line by line, achieving high scores on benchmarks such as HaltEval (94% accuracy) and SWE-bench Verified (65.8%).
  • The model is available for non-commercial research and can run on a single Nvidia H100 after quantization. It is specifically designed for programming and mathematical reasoning but is not intended for general applications or production use.
Sources
Jonathan writes for THE DECODER about how AI tools can improve both work and creative projects.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.