AgentFold: Long-Horizon Web Agents with Proactive Context Management Paper • 2510.24699 • Published Oct 28, 2025 • 69
Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures Paper • 2510.14616 • Published Oct 16, 2025 • 12
COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes Paper • 2510.14763 • Published Oct 16, 2025 • 13
InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents Paper • 2510.02271 • Published Oct 2, 2025 • 8
BrowseMaster: Towards Scalable Web Browsing via Tool-Augmented Programmatic Agent Pair Paper • 2508.09129 • Published Aug 12, 2025 • 2
SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development Paper • 2505.16975 • Published May 22, 2025 • 1