While Microsoft was shaking up local AI agents, MBZUAI introduced something completely different: a world model called PAN. Instead of generating single video clips that instantly forget everything, PAN builds and maintains a continuous digital world that updates every time you give it a new instruction.
Most video models wipe their memory the moment a clip ends. PAN doesn’t. If you tell it to turn left, then speed up, then pick up a block, each action continues from the last. It’s not producing disconnected visuals—it’s simulating cause and effect. That’s why researchers call it a world model rather than a video generator.
The model uses Qwen2.5-VL for reasoning and a video generator adapted from Wan 2.1. Instead of letting visuals destroy consistency, PAN keeps reasoning in a stable internal space and then translates those states into video. That prevents objects from morphing or drifting during long sequences, something most video models completely fail at.
A big part of PAN’s stability comes from its causal, chunk-based refinement system. The model only looks at past frames—not future ones—which forces it to respect continuity. They even add slight noise to prevent it from over-focusing on tiny pixel details and losing track of the scene’s big picture.
Training this model was a massive project. MBZUAI used 960 Nvidia H200 GPUs, recaptioned thousands of videos to emphasize motion and cause-and-effect, and filtered out anything that wouldn’t help with real-world simulation. The payoff is huge: PAN scores 58.6% overall on action-simulation tasks, outperforming many commercial world models that avoid publishing their numbers.
PAN even works as a planning tool. Plugged into an O3-style reasoning loop, it hits 56.1% accuracy on simulation tasks, making it strong enough to act as the “what happens if I do this?” module inside future AI agents.
This is the direction the industry is moving toward—AI systems that can understand actions, predict consequences, and maintain stable worlds over time. PAN is one of the first open-source models to make that idea feel real.
Leave a comment