In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published 15 days ago • 30
Artificial Hippocampus Networks for Efficient Long-Context Modeling Paper • 2510.07318 • Published Oct 8 • 30
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps Paper • 2505.18675 • Published May 24 • 25
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding Paper • 2505.16990 • Published May 22 • 22
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering Paper • 2503.16422 • Published Mar 20 • 14
Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling Paper • 2502.20378 • Published Feb 27 • 5