DSR_Suite
Collection
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models
•
3 items
•
Updated
•
5
ARC mainly focuses on areas of computer vision, speech, and natural language processing, including speech/video generation, enhancement, retrieval, understanding, AutoML, etc. Considering research developments and industry trends, ARC consistently pursues exploration, innovation, and breakthroughs in technologies.
TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries