FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies Paper • 2605.27284 • Published 24 days ago • 8
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published 22 days ago • 143
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution Paper • 2501.02976 • Published Jan 6, 2025 • 56
Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement Paper • 2411.06558 • Published Nov 10, 2024 • 36