-
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 311 -
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper • 2512.23988 • Published • 18 -
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
Paper • 2512.25075 • Published • 15 -
Guiding a Diffusion Transformer with the Internal Dynamics of Itself
Paper • 2512.24176 • Published • 8
Collections
Discover the best community collections!
Collections including paper arxiv:2601.08303
-
MMaDA: Multimodal Large Diffusion Language Models
Paper • 2505.15809 • Published • 98 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 55 -
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Paper • 2506.18095 • Published • 66 -
Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models
Paper • 2506.19103 • Published • 42
-
iFormer: Integrating ConvNet and Transformer for Mobile Application
Paper • 2501.15369 • Published • 13 -
VisPlay: Self-Evolving Vision-Language Models from Images
Paper • 2511.15661 • Published • 43 -
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices
Paper • 2601.08303 • Published • 18
-
Canvas-to-Image: Compositional Image Generation with Multimodal Controls
Paper • 2511.21691 • Published • 36 -
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices
Paper • 2601.08303 • Published • 18 -
Serverless ImgGen Hub
♨20Highly hackable hub w/ Flux, SD 3.5, LoRAs, no GPUs required
-
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Paper • 2410.14059 • Published • 63 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53
-
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Paper • 2501.06282 • Published • 53 -
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
Paper • 2502.13128 • Published • 41 -
PAFT: Prompt-Agnostic Fine-Tuning
Paper • 2502.12859 • Published • 15 -
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices
Paper • 2601.08303 • Published • 18
-
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 311 -
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper • 2512.23988 • Published • 18 -
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
Paper • 2512.25075 • Published • 15 -
Guiding a Diffusion Transformer with the Internal Dynamics of Itself
Paper • 2512.24176 • Published • 8
-
Canvas-to-Image: Compositional Image Generation with Multimodal Controls
Paper • 2511.21691 • Published • 36 -
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices
Paper • 2601.08303 • Published • 18 -
Serverless ImgGen Hub
♨20Highly hackable hub w/ Flux, SD 3.5, LoRAs, no GPUs required
-
MMaDA: Multimodal Large Diffusion Language Models
Paper • 2505.15809 • Published • 98 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 55 -
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Paper • 2506.18095 • Published • 66 -
Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models
Paper • 2506.19103 • Published • 42
-
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Paper • 2410.14059 • Published • 63 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53
-
iFormer: Integrating ConvNet and Transformer for Mobile Application
Paper • 2501.15369 • Published • 13 -
VisPlay: Self-Evolving Vision-Language Models from Images
Paper • 2511.15661 • Published • 43 -
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices
Paper • 2601.08303 • Published • 18
-
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Paper • 2501.06282 • Published • 53 -
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
Paper • 2502.13128 • Published • 41 -
PAFT: Prompt-Agnostic Fine-Tuning
Paper • 2502.12859 • Published • 15 -
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices
Paper • 2601.08303 • Published • 18