video LM
updated
StreamChat: Chatting with Streaming Video
Paper
•
2412.08646
•
Published
•
18
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Paper
•
2412.04432
•
Published
•
16
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding
by Video Spatiotemporal Augmentation
Paper
•
2412.00927
•
Published
•
29
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for
Long-term Streaming Video and Audio Interactions
Paper
•
2412.09596
•
Published
•
98
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper
•
2412.10360
•
Published
•
147
VidTok: A Versatile and Open-Source Video Tokenizer
Paper
•
2412.13061
•
Published
•
8
Video-Panda: Parameter-efficient Alignment for Encoder-free
Video-Language Models
Paper
•
2412.18609
•
Published
•
17
Dispider: Enabling Video LLMs with Active Real-Time Interaction via
Disentangled Perception, Decision, and Reaction
Paper
•
2501.03218
•
Published
•
35
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One
Vision Token
Paper
•
2501.03895
•
Published
•
52
MotionBench: Benchmarking and Improving Fine-grained Video Motion
Understanding for Vision Language Models
Paper
•
2501.02955
•
Published
•
44