AlirezaDoC
's Collections
Papers
updated
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper
•
2602.12036
•
Published
•
92
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper
•
2512.17102
•
Published
•
36
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Paper
•
2512.23705
•
Published
•
45
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper
•
2512.19995
•
Published
•
16
Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone
Paper
•
2512.22615
•
Published
•
49
TimeBill: Time-Budgeted Inference for Large Language Models
Paper
•
2512.21859
•
Published
•
25
Evaluating Parameter Efficient Methods for RLVR
Paper
•
2512.23165
•
Published
•
27
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Paper
•
2512.24873
•
Published
•
105
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper
•
2512.24618
•
Published
•
151
SpotEdit: Selective Region Editing in Diffusion Transformers
Paper
•
2512.22323
•
Published
•
39
Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards
Paper
•
2512.21625
•
Published
•
4
Nested Browser-Use Learning for Agentic Information Seeking
Paper
•
2512.23647
•
Published
•
19
GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models
Paper
•
2512.15560
•
Published
•
25
Distribution Matching Variational AutoEncoder
Paper
•
2512.07778
•
Published
•
29
Self-Improving VLM Judges Without Human Annotations
Paper
•
2512.05145
•
Published
•
20
OmniPSD: Layered PSD Generation with Diffusion Transformer
Paper
•
2512.09247
•
Published
•
48
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
Paper
•
2512.21004
•
Published
•
13
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion
Paper
•
2512.19535
•
Published
•
12
Multi-hop Reasoning via Early Knowledge Alignment
Paper
•
2512.20144
•
Published
•
7
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion
Paper
•
2512.19678
•
Published
•
30
Step-DeepResearch Technical Report
Paper
•
2512.20491
•
Published
•
86
3D-RE-GEN: 3D Reconstruction of Indoor Scenes with a Generative Framework
Paper
•
2512.17459
•
Published
•
12
Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models
Paper
•
2512.21337
•
Published
•
31
KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs
Paper
•
2601.01046
•
Published
•
14
DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer
Paper
•
2601.01425
•
Published
•
52
Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes
Paper
•
2601.02356
•
Published
•
14
VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation
Paper
•
2601.02256
•
Published
•
33
Recursive Language Models
Paper
•
2512.24601
•
Published
•
89
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
Paper
•
2601.02204
•
Published
•
62
K-EXAONE Technical Report
Paper
•
2601.01739
•
Published
•
92
mHC: Manifold-Constrained Hyper-Connections
Paper
•
2512.24880
•
Published
•
307
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
Paper
•
2601.15369
•
Published
•
21
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning
Paper
•
2601.16163
•
Published
•
14
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models
Paper
•
2601.15165
•
Published
•
72
Learning to Discover at Test Time
Paper
•
2601.16175
•
Published
•
42
ActionMesh: Animated 3D Mesh Generation with Temporal 3D Diffusion
Paper
•
2601.16148
•
Published
•
12
Behavior Knowledge Merge in Reinforced Agentic Models
Paper
•
2601.13572
•
Published
•
24
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
Paper
•
2601.16208
•
Published
•
52
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
Paper
•
2601.15892
•
Published
•
53
Agentic-R: Learning to Retrieve for Agentic Search
Paper
•
2601.11888
•
Published
•
19
LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning
Paper
•
2601.10129
•
Published
•
12
Agentic Reasoning for Large Language Models
Paper
•
2601.12538
•
Published
•
195
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development
Paper
•
2601.11077
•
Published
•
65
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper
•
2601.11004
•
Published
•
30
Language of Thought Shapes Output Diversity in Large Language Models
Paper
•
2601.11227
•
Published
•
9
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning
Paper
•
2601.09667
•
Published
•
91
OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer
Paper
•
2601.14250
•
Published
•
47
LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR
Paper
•
2601.14251
•
Published
•
24
PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models
Paper
•
2601.11087
•
Published
•
11
Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning
Paper
•
2602.11748
•
Published
•
30
Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR
Paper
•
2602.05261
•
Published
•
49
LatentMem: Customizing Latent Memory for Multi-Agent Systems
Paper
•
2602.03036
•
Published
•
14
Reinforced Attention Learning
Paper
•
2602.04884
•
Published
•
28
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation
Paper
•
2602.03796
•
Published
•
57
Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations
Paper
•
2602.05885
•
Published
•
28
VLS: Steering Pretrained Robot Policies via Vision-Language Models
Paper
•
2602.03973
•
Published
•
22
Efficient Autoregressive Video Diffusion with Dummy Head
Paper
•
2601.20499
•
Published
•
8
CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs
Paper
•
2602.03048
•
Published
•
33
ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought
Paper
•
2601.23184
•
Published
•
36
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding
Paper
•
2602.01785
•
Published
•
93
Balancing Understanding and Generation in Discrete Diffusion Models
Paper
•
2602.01362
•
Published
•
14
Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization
Paper
•
2601.21358
•
Published
•
7
daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently
Paper
•
2602.02619
•
Published
•
50
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models
Paper
•
2601.22060
•
Published
•
154
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
Paper
•
2602.02493
•
Published
•
42
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System
Paper
•
2602.02488
•
Published
•
32
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning
Paper
•
2601.21468
•
Published
•
25
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery
Paper
•
2601.19325
•
Published
•
79
Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory
Paper
•
2601.16296
•
Published
•
28
Self-Distillation Enables Continual Learning
Paper
•
2601.19897
•
Published
•
26
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
Paper
•
2601.20614
•
Published
•
118
Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation
Paper
•
2601.21406
•
Published
•
5
Reinforcement Learning via Self-Distillation
Paper
•
2601.20802
•
Published
•
40
Visual Personalization Turing Test
Paper
•
2601.22680
•
Published
•
2
TTCS: Test-Time Curriculum Synthesis for Self-Evolving
Paper
•
2601.22628
•
Published
•
35
EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models
Paper
•
2602.04515
•
Published
•
38
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation
Paper
•
2601.22153
•
Published
•
69
Beyond Imitation: Reinforcement Learning for Active Latent Planning
Paper
•
2601.21598
•
Published
•
9
Linear representations in language models can change dramatically over a conversation
Paper
•
2601.20834
•
Published
•
21
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper
•
2601.19895
•
Published
•
23
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper
•
2601.18778
•
Published
•
40
Endless Terminals: Scaling RL Environments for Terminal Agents
Paper
•
2601.16443
•
Published
•
16
iFSQ: Improving FSQ for Image Generation with 1 Line of Code
Paper
•
2601.17124
•
Published
•
32