-
OmniGen2: Exploration to Advanced Multimodal Generation
Paper • 2506.18871 • Published • 78 -
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 115 -
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation
Paper • 2502.05415 • Published • 21 -
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Paper • 2408.12528 • Published • 51
Collections
Discover the best community collections!
Collections including paper arxiv:2505.02567
-
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 80 -
TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations
Paper • 2505.18125 • Published • 112 -
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Paper • 2505.17612 • Published • 81 -
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper • 2505.18129 • Published • 60
-
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Paper • 2505.02471 • Published • 15 -
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 80 -
Self-Rewarding Vision-Language Model via Reasoning Decomposition
Paper • 2508.19652 • Published • 84
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 46 -
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Paper • 2507.07955 • Published • 25 -
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 80 -
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Paper • 2508.02193 • Published • 132
-
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 80 -
Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey
Paper • 2505.03418 • Published • 9 -
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models
Paper • 2505.03821 • Published • 25 -
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Paper • 2505.04601 • Published • 29
-
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 80 -
OmniGen2: Exploration to Advanced Multimodal Generation
Paper • 2506.18871 • Published • 78 -
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
Paper • 2506.17202 • Published • 10 -
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Paper • 2506.18095 • Published • 66
-
OmniGen2: Exploration to Advanced Multimodal Generation
Paper • 2506.18871 • Published • 78 -
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 115 -
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation
Paper • 2502.05415 • Published • 21 -
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Paper • 2408.12528 • Published • 51
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 46 -
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Paper • 2507.07955 • Published • 25 -
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 80 -
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Paper • 2508.02193 • Published • 132
-
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 80 -
TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations
Paper • 2505.18125 • Published • 112 -
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Paper • 2505.17612 • Published • 81 -
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper • 2505.18129 • Published • 60
-
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 80 -
Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey
Paper • 2505.03418 • Published • 9 -
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models
Paper • 2505.03821 • Published • 25 -
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Paper • 2505.04601 • Published • 29
-
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 80 -
OmniGen2: Exploration to Advanced Multimodal Generation
Paper • 2506.18871 • Published • 78 -
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
Paper • 2506.17202 • Published • 10 -
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Paper • 2506.18095 • Published • 66
-
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Paper • 2505.02471 • Published • 15 -
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 80 -
Self-Rewarding Vision-Language Model via Reasoning Decomposition
Paper • 2508.19652 • Published • 84