Scaling Spatial Intelligence with Multimodal Foundation Models Paper • 2511.13719 • Published 20 days ago • 44
Scaling Spatial Intelligence with Multimodal Foundation Models Paper • 2511.13719 • Published 20 days ago • 44
Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals Paper • 2510.27684 • Published Oct 31 • 22
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs Paper • 2411.15296 • Published Nov 22, 2024 • 21
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published Jan 23 • 24
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training Paper • 2509.23661 • Published Sep 28 • 46
Large Motion Model for Unified Multi-Modal Motion Generation Paper • 2404.01284 • Published Apr 1, 2024
SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation Paper • 2501.09782 • Published Jan 16
Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos Paper • 2501.13335 • Published Jan 23
SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation Paper • 2411.19921 • Published Nov 29, 2024
Controllable Human-centric Keyframe Interpolation with Generative Prior Paper • 2506.03119 • Published Jun 3 • 2
TokensGen: Harnessing Condensed Tokens for Long Video Generation Paper • 2507.15728 • Published Jul 21 • 7