AI video generators like OpenAI's Sora don't grasp basic physics, study finds

A new study by researchers at Bytedance Research and Tsinghua University shows that current AI video models like OpenAI's Sora can create impressive visuals but fail to understand the physical laws that govern them.

While companies like OpenAI want their video AI models to simulate reality accurately, the research reveals significant limitations in how these systems process basic physics.

The scientists tested video generators' capabilities across three scenarios: predictions within known patterns, outside known patterns, and new combinations of familiar elements. Their goal was to determine whether these models truly learn physical laws or simply copy patterns from training data.

Testing the limits of training data

The researchers found that these AI models don't actually learn universal rules. Instead, they rely on surface-level features from their training data, following a strict hierarchy: color takes top priority, followed by size, speed, and shape.

Testing revealed a consistent pattern: the models perform nearly perfectly in familiar scenarios but fail when faced with unknown situations—even with basic physics like straight-line motion or collisions.

Co-author Bingyi Kang demonstrated this limitation on X, explaining that when they trained the model with fast-moving balls traveling left to right and back, then tested it with slow-moving balls, the model showed the balls suddenly changing direction after just a few frames (you can see it in the video at 1:55).

Video: Kang et al.

Scaling isn't the solution

The study shows that simply scaling up models and expanding training data produces only modest gains. While larger models handle familiar patterns and combinations better, they still fail to understand basic physics or work with scenarios beyond their training.

Kang suggests that these systems might work in narrow, specific cases where the training data thoroughly covers the intended use case.

Recommendation

AI research

DeepMind's Genie 2 generates playable 3D worlds from single images

"Personally, I think, if there is a specific scenario and the data coverage is good enough, an overfitted world model is possible," he noted.

However, such limited systems wouldn't qualify as true world models, since the core purpose of a world model is to generalize beyond its training data. Given that it's practically impossible to capture every detail of the world or universe in training data, true world models would need to understand and apply fundamental principles rather than merely memorize patterns.

Reality check for OpenAI

These findings challenge OpenAI's vision for Sora, which the company calls "GPT-1 for video" and plans to develop into a true world model through scaling. OpenAI claims Sora already shows basic understanding of physical interactions and 3D geometry. Other companies, including RunwayML and Google DeepMind, are pursuing similar world model concepts.

But the study renders those ambitions premature. "Our study suggests that naively scaling is insufficient for video generation models to discover fundamental physical laws," the researchers concluded.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Meta's head of AI, Yann LeCun, shared that skepticism when OpenAI published its Sora paper, calling the approach of predicting the world by generating pixels "wasteful and doomed to failure."

That said, many would be delighted to see OpenAI finally release Sora as the video generator it was unveiled in mid-February 2024.

AI video generators like OpenAI's Sora don't grasp basic physics, study finds

Testing the limits of training data

Scaling isn't the solution

DeepMind's Genie 2 generates playable 3D worlds from single images

Reality check for OpenAI

AI system StreamDiT generates livestream videos from text at 16 fps 512p

Researchers used 1,600 YouTube fail videos to show AI models struggle with surprises

AI coding can make developers slower even if they feel faster

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

AI video generators like OpenAI's Sora don't grasp basic physics, study finds

Testing the limits of training data

Scaling isn't the solution

Reality check for OpenAI

Share

Bank details