A Simple Baseline for Streaming Video Understanding Paper • 2604.02317 • Published 14 days ago • 72
Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence? Paper • 2604.03016 • Published 13 days ago • 37
Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models Paper • 2603.17051 • Published 29 days ago • 109
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG Paper • 2603.23497 • Published 22 days ago • 91
Perceptio: Perception Enhanced Vision Language Models via Spatial Token Generation Paper • 2603.18795 • Published 27 days ago • 16
VIDEOP2R: Video Understanding from Perception to Reasoning Paper • 2511.11113 • Published Nov 14, 2025 • 112
LRM: Large Reconstruction Model for Single Image to 3D Paper • 2311.04400 • Published Nov 8, 2023 • 52
MVDream: Multi-view Diffusion for 3D Generation Paper • 2308.16512 • Published Aug 31, 2023 • 106