RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style Paper • 2410.16184 • Published Oct 21, 2024 • 25
RLVR Collection Model and data for 'Expanding RL with Verifiable Rewards Across Diverse Domains' • 3 items • Updated Mar 31, 2025 • 13
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models Paper • 2503.24235 • Published Mar 31, 2025 • 54
Congliu/Chinese-DeepSeek-R1-Distill-data-110k-SFT Viewer • Updated Feb 19, 2025 • 110k • 206 • 215
S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning Paper • 2502.12853 • Published Feb 18, 2025 • 29
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning Paper • 2502.12853 • Published Feb 18, 2025 • 29
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published Jun 24, 2024 • 57