Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2512.14691

Towards Scalable Pre-training of Visual Tokenizers for Generation

Paper • 2512.13687 • Published Dec 15, 2025 • 106
MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 119
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Paper • 2512.23447 • Published Dec 29, 2025 • 98
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Paper • 2512.23576 • Published Dec 29, 2025 • 65

MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 119

MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 119
KlingAvatar 2.0 Technical Report

Paper • 2512.13313 • Published Dec 15, 2025 • 43
SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published Dec 23, 2025 • 93
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published Dec 18, 2025 • 219

Guided Self-Evolving LLMs with Minimal Human Supervision

Paper • 2512.02472 • Published Dec 2, 2025 • 55
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29, 2025 • 146
Video Reasoning without Training

Paper • 2510.17045 • Published Oct 19, 2025 • 8
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 273

ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning

Paper • 2512.02835 • Published Dec 2, 2025 • 10
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

Paper • 2512.05044 • Published Dec 4, 2025 • 17
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Paper • 2512.05591 • Published Dec 5, 2025 • 17
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling

Paper • 2512.05343 • Published Dec 5, 2025 • 25

Multimodal Image Classification

What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models

Paper • 2405.15668 • Published May 24, 2024
On Large Multimodal Models as Open-World Image Classifiers

Paper • 2503.21851 • Published Mar 27, 2025 • 5
Benchmarking Large Language Models for Image Classification of Marine Mammals

Paper • 2410.19848 • Published Oct 22, 2024
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Paper • 2501.07783 • Published Jan 14, 2025 • 8

MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 119
CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models

Paper • 2512.10655 • Published Dec 11, 2025 • 10
Aesthetic Alignment Risks Assimilation: How Image Generation and Reward Models Reinforce Beauty Bias and Ideological "Censorship"

Paper • 2512.11883 • Published Dec 9, 2025 • 7
Rethinking Expert Trajectory Utilization in LLM Post-training

Paper • 2512.11470 • Published Dec 12, 2025 • 10

MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 119
3AM: Segment Anything with Geometric Consistency in Videos

Paper • 2601.08831 • Published Jan 13 • 34

3d gaussian splatting awesome models

depth-anything/DA3NESTED-GIANT-LARGE

Depth Estimation • 2B • Updated Nov 13, 2025 • 101k • 43
MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 119

Scaling Spatial Intelligence with Multimodal Foundation Models

Paper • 2511.13719 • Published Nov 17, 2025 • 47
Thinking with Images via Self-Calling Agent

Paper • 2512.08511 • Published Dec 9, 2025 • 23
DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models

Paper • 2512.15713 • Published Dec 17, 2025 • 17
In Pursuit of Pixel Supervision for Visual Pre-training

Paper • 2512.15715 • Published Dec 17, 2025 • 11

Towards Scalable Pre-training of Visual Tokenizers for Generation

Paper • 2512.13687 • Published Dec 15, 2025 • 106
MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 119
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Paper • 2512.23447 • Published Dec 29, 2025 • 98
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Paper • 2512.23576 • Published Dec 29, 2025 • 65

Multimodal Image Classification

What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models

Paper • 2405.15668 • Published May 24, 2024
On Large Multimodal Models as Open-World Image Classifiers

Paper • 2503.21851 • Published Mar 27, 2025 • 5
Benchmarking Large Language Models for Image Classification of Marine Mammals

Paper • 2410.19848 • Published Oct 22, 2024
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Paper • 2501.07783 • Published Jan 14, 2025 • 8

MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 119

MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 119
CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models

Paper • 2512.10655 • Published Dec 11, 2025 • 10
Aesthetic Alignment Risks Assimilation: How Image Generation and Reward Models Reinforce Beauty Bias and Ideological "Censorship"

Paper • 2512.11883 • Published Dec 9, 2025 • 7
Rethinking Expert Trajectory Utilization in LLM Post-training

Paper • 2512.11470 • Published Dec 12, 2025 • 10

MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 119
KlingAvatar 2.0 Technical Report

Paper • 2512.13313 • Published Dec 15, 2025 • 43
SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published Dec 23, 2025 • 93
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published Dec 18, 2025 • 219

MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 119
3AM: Segment Anything with Geometric Consistency in Videos

Paper • 2601.08831 • Published Jan 13 • 34

Guided Self-Evolving LLMs with Minimal Human Supervision

Paper • 2512.02472 • Published Dec 2, 2025 • 55
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29, 2025 • 146
Video Reasoning without Training

Paper • 2510.17045 • Published Oct 19, 2025 • 8
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 273

3d gaussian splatting awesome models

depth-anything/DA3NESTED-GIANT-LARGE

Depth Estimation • 2B • Updated Nov 13, 2025 • 101k • 43
MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 119

ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning

Paper • 2512.02835 • Published Dec 2, 2025 • 10
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

Paper • 2512.05044 • Published Dec 4, 2025 • 17
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Paper • 2512.05591 • Published Dec 5, 2025 • 17
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling

Paper • 2512.05343 • Published Dec 5, 2025 • 25

Scaling Spatial Intelligence with Multimodal Foundation Models

Paper • 2511.13719 • Published Nov 17, 2025 • 47
Thinking with Images via Self-Calling Agent

Paper • 2512.08511 • Published Dec 9, 2025 • 23
DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models

Paper • 2512.15713 • Published Dec 17, 2025 • 17
In Pursuit of Pixel Supervision for Visual Pre-training

Paper • 2512.15715 • Published Dec 17, 2025 • 11

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs