EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models Paper • 2512.14666 • Published 10 days ago • 8
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI Paper • 2512.16676 • Published 8 days ago • 183
Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans? Paper • 2512.13281 • Published 11 days ago • 63
The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation Paper • 2511.20256 • Published Nov 25 • 27
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation Paper • 2511.11434 • Published Nov 14 • 44
nvidia/diar_streaming_sortformer_4spk-v2 Automatic Speech Recognition • Updated 10 days ago • 9.31k • 86
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published Nov 13 • 95
Revisiting Multimodal Positional Encoding in Vision-Language Models Paper • 2510.23095 • Published Oct 27 • 20
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published Nov 4 • 101
ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks Paper • 2510.18455 • Published Oct 21 • 17