RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark Paper • 2509.24897 • Published Sep 29 • 46
VideoChat-R1 Collection VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning • 4 items • Updated Sep 28 • 8
Pixels, Patterns, but No Poetry: To See The World like Humans Paper • 2507.16863 • Published Jul 21 • 68
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning? Paper • 2505.23359 • Published May 29 • 39
view article Article From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate +2 Jun 13, 2024 • 61
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation Paper • 2503.19622 • Published Mar 25 • 31
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video Paper • 2310.01324 • Published Oct 2, 2023 • 2
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding Paper • 2403.15377 • Published Mar 22, 2024 • 26
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling Paper • 2501.00574 • Published Dec 31, 2024 • 6