Bim - a JuanRafap Collection

JuanRafap 's Collections

Fondation model

Bim

updated 5 days ago

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

Paper • 2508.14879 • Published Aug 20, 2025 • 69
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26, 2025 • 43
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

Paper • 2508.17437 • Published Aug 20, 2025 • 37
Multi-View 3D Point Tracking

Paper • 2508.21060 • Published Aug 28, 2025 • 23
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

Paper • 2509.09676 • Published Sep 11, 2025 • 35
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Paper • 2509.12201 • Published Sep 15, 2025 • 106
3D-LLM: Injecting the 3D World into Large Language Models

Paper • 2307.12981 • Published Jul 24, 2023 • 40
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation

Paper • 2510.08551 • Published Oct 9, 2025 • 34
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

Paper • 2510.08673 • Published Oct 9, 2025 • 126
Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets

Paper • 2510.19944 • Published Oct 22, 2025 • 21
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations

Paper • 2510.23607 • Published Oct 27, 2025 • 179
Error-Driven Scene Editing for 3D Grounding in Large Language Models

Paper • 2511.14086 • Published Nov 18, 2025 • 7
Depth Anything 3: Recovering the Visual Space from Any Views

Paper • 2511.10647 • Published Nov 13, 2025 • 99
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning

Paper • 2510.27606 • Published Oct 31, 2025 • 31
NaTex: Seamless Texture Generation as Latent Color Diffusion

Paper • 2511.16317 • Published Nov 20, 2025 • 16
MiMo-Embodied: X-Embodied Foundation Model Technical Report

Paper • 2511.16518 • Published Nov 20, 2025 • 26
Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model

Paper • 2512.01030 • Published Nov 30, 2025 • 20
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

Paper • 2512.03000 • Published Dec 2, 2025 • 37
SIMA 2: A Generalist Embodied Agent for Virtual Worlds

Paper • 2512.04797 • Published Dec 4, 2025 • 25
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

Paper • 2512.05044 • Published Dec 4, 2025 • 17
ProPhy: Progressive Physical Alignment for Dynamic World Simulation

Paper • 2512.05564 • Published Dec 5, 2025 • 6
Voxify3D: Pixel Art Meets Volumetric Rendering

Paper • 2512.07834 • Published Dec 8, 2025 • 45
MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos

Paper • 2512.10881 • Published Dec 11, 2025 • 30
SS4D: Native 4D Generative Model via Structured Spacetime Latents

Paper • 2512.14284 • Published Dec 16, 2025 • 14
Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation

Paper • 2512.16913 • Published Dec 18, 2025 • 34
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

Paper • 2512.14614 • Published Dec 16, 2025 • 71
Towards Seamless Interaction: Causal Turn-Level Modeling of Interactive 3D Conversational Head Dynamics

Paper • 2512.15340 • Published Dec 17, 2025 • 3
GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

Paper • 2512.17495 • Published Dec 19, 2025 • 20
3D-RE-GEN: 3D Reconstruction of Indoor Scenes with a Generative Framework

Paper • 2512.17459 • Published Dec 19, 2025 • 12
4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

Paper • 2512.17012 • Published Dec 18, 2025 • 47
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

Paper • 2512.16793 • Published Dec 18, 2025 • 75
MatSpray: Fusing 2D Material World Knowledge on 3D Geometry

Paper • 2512.18314 • Published Dec 20, 2025 • 9
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models

Paper • 2512.19526 • Published Dec 22, 2025 • 12
VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control

Paper • 2601.05138 • Published Jan 8 • 18
RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

Paper • 2601.05249 • Published Jan 8 • 47
Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models

Paper • 2601.01321 • Published Jan 4 • 19
InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams

Paper • 2601.02281 • Published Jan 5 • 33
MindGPT-4ov: An Enhanced MLLM via a Multi-Stage Post-Training Paradigm

Paper • 2512.02895 • Published Dec 2, 2025 • 5