-
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 106 -
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 119 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 98 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 65
Collections
Discover the best community collections!
Collections including paper arxiv:2512.14691
-
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 119 -
KlingAvatar 2.0 Technical Report
Paper • 2512.13313 • Published • 43 -
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 93 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 219
-
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 55 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 146 -
Video Reasoning without Training
Paper • 2510.17045 • Published • 8 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 273
-
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
Paper • 2512.02835 • Published • 10 -
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
Paper • 2512.05044 • Published • 17 -
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 17 -
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
Paper • 2512.05343 • Published • 25
-
What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models
Paper • 2405.15668 • Published -
On Large Multimodal Models as Open-World Image Classifiers
Paper • 2503.21851 • Published • 5 -
Benchmarking Large Language Models for Image Classification of Marine Mammals
Paper • 2410.19848 • Published -
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding
Paper • 2501.07783 • Published • 8
-
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 119 -
CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models
Paper • 2512.10655 • Published • 10 -
Aesthetic Alignment Risks Assimilation: How Image Generation and Reward Models Reinforce Beauty Bias and Ideological "Censorship"
Paper • 2512.11883 • Published • 7 -
Rethinking Expert Trajectory Utilization in LLM Post-training
Paper • 2512.11470 • Published • 10
-
Scaling Spatial Intelligence with Multimodal Foundation Models
Paper • 2511.13719 • Published • 47 -
Thinking with Images via Self-Calling Agent
Paper • 2512.08511 • Published • 23 -
DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models
Paper • 2512.15713 • Published • 17 -
In Pursuit of Pixel Supervision for Visual Pre-training
Paper • 2512.15715 • Published • 11
-
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 106 -
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 119 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 98 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 65
-
What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models
Paper • 2405.15668 • Published -
On Large Multimodal Models as Open-World Image Classifiers
Paper • 2503.21851 • Published • 5 -
Benchmarking Large Language Models for Image Classification of Marine Mammals
Paper • 2410.19848 • Published -
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding
Paper • 2501.07783 • Published • 8
-
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 119 -
CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models
Paper • 2512.10655 • Published • 10 -
Aesthetic Alignment Risks Assimilation: How Image Generation and Reward Models Reinforce Beauty Bias and Ideological "Censorship"
Paper • 2512.11883 • Published • 7 -
Rethinking Expert Trajectory Utilization in LLM Post-training
Paper • 2512.11470 • Published • 10
-
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 119 -
KlingAvatar 2.0 Technical Report
Paper • 2512.13313 • Published • 43 -
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 93 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 219
-
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 55 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 146 -
Video Reasoning without Training
Paper • 2510.17045 • Published • 8 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 273
-
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
Paper • 2512.02835 • Published • 10 -
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
Paper • 2512.05044 • Published • 17 -
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 17 -
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
Paper • 2512.05343 • Published • 25
-
Scaling Spatial Intelligence with Multimodal Foundation Models
Paper • 2511.13719 • Published • 47 -
Thinking with Images via Self-Calling Agent
Paper • 2512.08511 • Published • 23 -
DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models
Paper • 2512.15713 • Published • 17 -
In Pursuit of Pixel Supervision for Visual Pre-training
Paper • 2512.15715 • Published • 11