Qwen2.5-1.5B-GRPO-MATH-1EPOCH

Description:

A GRPO-fine-tuned version of Qwen2.5-1.5B trained on the MATH dataset.

Citation

@article{sha2024deepseekmath,
  title     = {DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models},
  author    = {Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and … Guo, Daya},
  journal   = {arXiv preprint arXiv:2402.03300},
  year      = {2024},
}

Downloads last month: 5

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for back-prop/Qwen2.5-GRPO-1.5B

Base model

Qwen/Qwen2.5-1.5B

Finetuned

(289)

this model

Paper for back-prop/Qwen2.5-GRPO-1.5B

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 138