Quantile Advantage Estimation for Entropy-Safe Reasoning Paper • 2509.22611 • Published Sep 26, 2025 • 118
Chain of Agents: Large Language Models Collaborating on Long-Context Tasks Paper • 2406.02818 • Published Jun 4, 2024 • 2
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent Paper • 2507.02259 • Published Jul 3, 2025 • 5