Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 5 days ago • 46
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 5 days ago • 46
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13 • 176
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17 • 89
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO Paper • 2505.13031 • Published May 19 • 4
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Paper • 2507.10548 • Published Jul 14 • 36
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation Paper • 2507.08441 • Published Jul 11 • 61
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Paper • 2507.10548 • Published Jul 14 • 36
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published Feb 27 • 30
TEXGen: a Generative Diffusion Model for Mesh Textures Paper • 2411.14740 • Published Nov 22, 2024 • 17
Image Inpainting via Iteratively Decoupled Probabilistic Modeling Paper • 2212.02963 • Published Dec 6, 2022
Is synthetic data from generative models ready for image recognition? Paper • 2210.07574 • Published Oct 14, 2022
Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoireing Paper • 2207.09935 • Published Jul 20, 2022
GO-NeRF: Generating Virtual Objects in Neural Radiance Fields Paper • 2401.05750 • Published Jan 11, 2024