PoseDreamer: Scalable and Photorealistic Human Data Generation Pipeline with Diffusion Models Paper • 2603.28763 • Published 3 days ago • 4
MMFace-DiT: A Dual-Stream Diffusion Transformer for High-Fidelity Multimodal Face Generation Paper • 2603.29029 • Published 3 days ago • 8
CutClaw: Agentic Hours-Long Video Editing via Music Synchronization Paper • 2603.29664 • Published 2 days ago • 39
VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward Paper • 2603.26599 • Published 6 days ago • 47
AVControl: Efficient Framework for Training Audio-Visual Controls Paper • 2603.24793 • Published 8 days ago • 25
Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting Paper • 2603.25745 • Published 7 days ago • 13
RealMaster: Lifting Rendered Scenes into Photorealistic Video Paper • 2603.23462 • Published 9 days ago • 31
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 10 days ago • 120
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published 16 days ago • 106
Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer Paper • 2603.19227 • Published 14 days ago • 42
Coherent Human-Scene Reconstruction from Multi-Person Multi-View Video in a Single Pass Paper • 2603.12789 • Published 21 days ago • 3
Learning Latent Proxies for Controllable Single-Image Relighting Paper • 2603.15555 • Published 17 days ago • 8
Riemannian Motion Generation: A Unified Framework for Human Motion Representation and Generation via Riemannian Flow Matching Paper • 2603.15016 • Published 17 days ago • 10
WildActor: Unconstrained Identity-Preserving Video Generation Paper • 2603.00586 • Published Feb 28 • 37
Enhancing Spatial Understanding in Image Generation via Reward Modeling Paper • 2602.24233 • Published Feb 27 • 58