Quantitative Video World Model Evaluation for Geometric-Consistency
Abstract
Despite their promise as implicit world models, assessing the 3D physical realism of generative video models remains difficult. Most existing video evaluation pipelines rely heavily on human judgment or learned graders, which can be subjective and weakly diagnostic for geometric failures. We introduce PDI-Bench (Perspective Distortion Index), a quantitative framework for auditing geometric coherence in generated videos. Given a generated clip, we obtain object-centric observations via segmentation and point tracking, lift them to 3D world-space coordinates via monocular reconstruction, and compute a set of projective-geometry residuals capturing three failure dimensions: scale--depth alignment, 3D motion consistency, and 3D structural rigidity. To support systematic evaluation, we build PDI-Dataset, covering diverse scenarios. Across state-of-the-art video generators, PDI reveals consistent geometry-specific failure modes missed by perceptual metrics, and provides a diagnostic signal for progress toward physically grounded video generation and world model.
Overview of the PDI-Bench Evaluation.
(Top) Qualitative samples from our dataset, featuring Ground Truth (GT) videos and generated sequences from state-of-the-art models.
(Bottom) The corresponding PDI-Scores for GT and each model. Lower scores indicate better adherence to 3D physical laws (scale alignment, motion consistency, and structural rigidity).
PDI-Bench Leaderboard
Quantitative comparison of physical consistency on PDI-Bench. Lower PDI is better.
| Rank | Model | Organization | PDI Score ↓ | Details |
|---|---|---|---|---|
| 1 |
Ground Truth (GT)
|
Real World | 0.1206 | |
| 2 |
Seedance 2.0
|
ByteDance | 0.2422 | |
| 3 |
CogVideoX-3
|
Zhipu AI | 0.2480 | |
| 4 |
Veo 3.1
|
0.4521 | ||
| 5 |
Wan 2.2
|
Alibaba | 0.5595 | |
| 6 |
Sora
|
OpenAI | 0.8255 | |
| 7 |
HunyuanVideo
|
Tencent Hunyuan | 0.8825 |
Click "View Radar" to open the three-metric radar chart.
PDI-Bench pipeline.
The three key perspectives for geometric consistency.
Examples
Bear
Black Swan
Kite Surf
Stroller
Soccer Ball
Car Shadow
BibTeX
@article{YourPaperKey2024,
title={Your Paper Title Here},
author={First Author and Second Author and Third Author},
journal={Conference/Journal Name},
year={2024},
url={https://your-domain.com/your-project-page}
}