DeepSeek-R1 / .eval_results
burtenshaw's picture
burtenshaw HF Staff
Fix task_id to diamond (matching benchmark eval.yaml)
00c6b9d verified