MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds
Paper
• 2508.14879
• Published
• 69
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D
Space
Paper
• 2508.19247
• Published
• 43
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from
Pixels
Paper
• 2508.17437
• Published
• 37
Multi-View 3D Point Tracking
Paper
• 2508.21060
• Published
• 23
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
Paper
• 2509.09676
• Published
• 35
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper
• 2509.12201
• Published
• 106
3D-LLM: Injecting the 3D World into Large Language Models
Paper
• 2307.12981
• Published
• 40
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D
Reconstruction with Structured Scene Representation
Paper
• 2510.08551
• Published
• 34
Thinking with Camera: A Unified Multimodal Model for Camera-Centric
Understanding and Generation
Paper
• 2510.08673
• Published
• 126
Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets
Paper
• 2510.19944
• Published
• 21
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial
Representations
Paper
• 2510.23607
• Published
• 179
Error-Driven Scene Editing for 3D Grounding in Large Language Models
Paper
• 2511.14086
• Published
• 7
Depth Anything 3: Recovering the Visual Space from Any Views
Paper
• 2511.10647
• Published
• 99
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised
Reinforcement Learning
Paper
• 2510.27606
• Published
• 31
NaTex: Seamless Texture Generation as Latent Color Diffusion
Paper
• 2511.16317
• Published
• 16
MiMo-Embodied: X-Embodied Foundation Model Technical Report
Paper
• 2511.16518
• Published
• 26
Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model
Paper
• 2512.01030
• Published
• 20
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling
Paper
• 2512.03000
• Published
• 37
SIMA 2: A Generalist Embodied Agent for Virtual Worlds
Paper
• 2512.04797
• Published
• 25
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
Paper
• 2512.05044
• Published
• 17
ProPhy: Progressive Physical Alignment for Dynamic World Simulation
Paper
• 2512.05564
• Published
• 6
Voxify3D: Pixel Art Meets Volumetric Rendering
Paper
• 2512.07834
• Published
• 45
MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos
Paper
• 2512.10881
• Published
• 30
SS4D: Native 4D Generative Model via Structured Spacetime Latents
Paper
• 2512.14284
• Published
• 14
Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation
Paper
• 2512.16913
• Published
• 34
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling
Paper
• 2512.14614
• Published
• 71
Towards Seamless Interaction: Causal Turn-Level Modeling of Interactive 3D Conversational Head Dynamics
Paper
• 2512.15340
• Published
• 3
GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation
Paper
• 2512.17495
• Published
• 20
3D-RE-GEN: 3D Reconstruction of Indoor Scenes with a Generative Framework
Paper
• 2512.17459
• Published
• 12
4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation
Paper
• 2512.17012
• Published
• 47
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence
Paper
• 2512.16793
• Published
• 75
MatSpray: Fusing 2D Material World Knowledge on 3D Geometry
Paper
• 2512.18314
• Published
• 9
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models
Paper
• 2512.19526
• Published
• 12
VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
Paper
• 2601.05138
• Published
• 18
RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes
Paper
• 2601.05249
• Published
• 47
Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models
Paper
• 2601.01321
• Published
• 19
InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams
Paper
• 2601.02281
• Published
• 33
MindGPT-4ov: An Enhanced MLLM via a Multi-Stage Post-Training Paradigm
Paper
• 2512.02895
• Published
• 5