TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs Paper • 2512.14698 • Published 9 days ago • 18
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation Paper • 2511.19320 • Published about 1 month ago • 42
UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions Paper • 2511.03334 • Published Nov 5 • 52
CaRe Collection CaReBench data, CaRe models and all the contrastively trained MLLMs (including InternVL2, MiniCPM-V 2.6, LLaVA NeXT Video, Qwen2-VL and Tariser). • 6 items • Updated Mar 17 • 1
CaRe Collection CaReBench data, CaRe models and all the contrastively trained MLLMs (including InternVL2, MiniCPM-V 2.6, LLaVA NeXT Video, Qwen2-VL and Tariser). • 6 items • Updated Mar 17 • 1
CaRe Collection CaReBench data, CaRe models and all the contrastively trained MLLMs (including InternVL2, MiniCPM-V 2.6, LLaVA NeXT Video, Qwen2-VL and Tariser). • 6 items • Updated Mar 17 • 1
CaRe Collection CaReBench data, CaRe models and all the contrastively trained MLLMs (including InternVL2, MiniCPM-V 2.6, LLaVA NeXT Video, Qwen2-VL and Tariser). • 6 items • Updated Mar 17 • 1
CaRe Collection CaReBench data, CaRe models and all the contrastively trained MLLMs (including InternVL2, MiniCPM-V 2.6, LLaVA NeXT Video, Qwen2-VL and Tariser). • 6 items • Updated Mar 17 • 1