Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published 7 days ago • 133
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning Paper • 2601.09667 • Published 6 days ago • 78
Towards Comprehensive Stage-wise Benchmarking of Large Language Models in Fact-Checking Paper • 2601.02669 • Published 15 days ago • 2
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs Paper • 2601.03559 • Published 13 days ago • 12
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs Paper • 2601.03559 • Published 13 days ago • 12
MAI-UI Technical Report: Real-World Centric Foundation GUI Agents Paper • 2512.22047 • Published 25 days ago • 26
view article Article Building the Open Agent Ecosystem Together: Introducing OpenEnv +8 Oct 23, 2025 • 145
UserRL: Training Interactive User-Centric Agent via Reinforcement Learning Paper • 2509.19736 • Published Sep 24, 2025 • 12
GTA1 Collection A collection of GUI grounding models trained with GRPO. • 5 items • Updated Oct 31, 2025 • 4
FLEX: Continuous Agent Evolution via Forward Learning from Experience Paper • 2511.06449 • Published Nov 9, 2025 • 12
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction Paper • 2511.07327 • Published Nov 10, 2025 • 77
The Path Not Taken: RLVR Provably Learns Off the Principals Paper • 2511.08567 • Published Nov 11, 2025 • 33
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 105
LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls Paper • 2511.09148 • Published Nov 12, 2025 • 17
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models Paper • 2511.09515 • Published Nov 12, 2025 • 18