CooperBench: Why Coding Agents Cannot be Your Teammates Yet Paper • 2601.13295 • Published Jan 19 • 3
SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions Paper • 2506.23046 • Published Jun 29, 2025 • 1
AutoLibra: Agent Metric Induction from Open-Ended Feedback Paper • 2505.02820 • Published May 5, 2025 • 3
Mind the Gap! Static and Interactive Evaluations of Large Audio Models Paper • 2502.15919 • Published Feb 21, 2025 • 4
EgoNormia: Benchmarking Physical Social Norm Understanding Paper • 2502.20490 • Published Feb 27, 2025 • 6
Grounded Persuasive Language Generation for Automated Marketing Paper • 2502.16810 • Published Feb 24, 2025 • 13