HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation
Paper
• 2506.02975 • Published
HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation
Paper: https://arxiv.org/pdf/2506.02975
Code: https://github.com/Tencent/HaploVLM/tree/main/haploomni