Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition Paper • 2603.13904 • Published 14 days ago • 2
WAFT-Stereo: Warping-Alone Field Transforms for Stereo Matching Paper • 2603.24836 • Published 3 days ago • 1
Vega: Learning to Drive with Natural Language Instructions Paper • 2603.25741 • Published 2 days ago • 4
AVO: Agentic Variation Operators for Autonomous Evolutionary Search Paper • 2603.24517 • Published 3 days ago • 5
MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models Paper • 2603.25744 • Published 2 days ago • 8
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published 3 days ago • 100
RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models Paper • 2603.25502 • Published 2 days ago • 43
Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration Paper • 2603.24800 • Published 3 days ago • 47
CarePilot: A Multi-Agent Framework for Long-Horizon Computer Task Automation in Healthcare Paper • 2603.24157 • Published 3 days ago • 8
Toward Physically Consistent Driving Video World Models under Challenging Trajectories Paper • 2603.24506 • Published 3 days ago • 3
When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning Paper • 2603.21289 • Published 6 days ago • 17
GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents Paper • 2603.24329 • Published 3 days ago • 17
Repurposing Geometric Foundation Models for Multi-view Diffusion Paper • 2603.22275 • Published 5 days ago • 43
Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models Paper • 2603.21854 • Published 5 days ago • 3
Session Risk Memory (SRM): Temporal Authorization for Deterministic Pre-Execution Safety Gates Paper • 2603.22350 • Published 6 days ago • 1
Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing Paper • 2603.12254 • Published 16 days ago • 21
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought Paper • 2603.22847 • Published 5 days ago • 23
FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use Paper • 2603.08262 • Published 19 days ago • 42