Model Card for Qwen3-0.6B-GRPO-GSM8K-Think-Liger

This model is a fine-tuned version of Qwen/Qwen3-0.6B. It has been trained using TRL.

Quick start

from modelscope import pipeline
from modelscope.msdatasets import MsDataset
ds = MsDataset.load('modelscope/gsm8k', subset_name='main', split='test',trust_remote_code=True)
sample = ds[0]
question = sample['question']
true_answer = sample['answer']
print(question,true_answer)

generator = pipeline(
    task="text-generation",
    model="yuanmodel/Qwen3-0.6B-GRPO-GSM8K-Think",
    device="cuda",trust_remote_code=True
)

output = generator([{"role": "user", "content": question}], max_new_tokens=512)
print("Raw output:", output)
print("answer", output["message"]["content"])

Janet's ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market? Janet sells 16 - 3 - 4 = <<16-3-4=9>>9 duck eggs a day.
She makes 9 * 2 = $<<9*2=18>>18 every day at the farmer's market.
#### 18


Raw output: {'message': {'role': 'assistant', 'content': "<think>\nOkay, let me try to figure out how much Janet makes every day at the farmers' market. Hmm, so first, I need to break down her daily activities and calculate each part. Let me start by listing out what she does each day.\n\nJanet's ducks lay 16 eggs per day. So, if she does that every day, the total eggs laid would be 16 eggs. Then, she eats three eggs for breakfast. So, that's 3 eggs. Then, she bakes muffins for her friends every day with four eggs per muffin. So, 4 eggs per muffin times the number of days she bakes. Since it's a daily activity, that's 4 times 1, so 4 eggs total for muffins. \n\nAfter that, the remainder of the eggs is sold at the farmers' market for $2 per egg. So, the eggs remaining after breakfast and muffins would be 16 eggs minus 3 eggs (breakfast) minus 4 eggs (muffins) equals 9 eggs. Then, 9 eggs multiplied by $2 per egg gives the total amount for the market.\n\nLet me check the math step by step to make sure I didn't make a mistake. 16 eggs total. Subtract 3 for breakfast: 16 - 3 = 13 eggs left. Then 13 - 4 = 9 eggs. Multiply 9 by 2: 18 dollars. So, $18 should be the amount she makes at the market. That seems right.\n\nWait, just to confirm, 16 - 3 is 13, 13 - 4 is 9, 9*2 is 18. Yep, that adds up. I don't think I missed anything here. So the answer should be $18.\n</think>\n\nJanet's ducks lay 16 eggs per day. She eats 3 eggs for breakfast, and bakes 4 eggs for muffins daily. The remainder is sold at the farmers' market for $2 per egg.\n\n1. Total eggs: $16$\n2. Eggs eaten: $16 - 3 = 13$\n3. Eggs remaining: $13 - 4 = 9$\n4. Revenue from market: $9 \\times 2 = 18$\n\n**Answer:** Janet makes $\\boxed{18}$ dollars every day at the farmers' market."}}
answer <think>
Okay, let me try to figure out how much Janet makes every day at the farmers' market. Hmm, so first, I need to break down her daily activities and calculate each part. Let me start by listing out what she does each day.

Janet's ducks lay 16 eggs per day. So, if she does that every day, the total eggs laid would be 16 eggs. Then, she eats three eggs for breakfast. So, that's 3 eggs. Then, she bakes muffins for her friends every day with four eggs per muffin. So, 4 eggs per muffin times the number of days she bakes. Since it's a daily activity, that's 4 times 1, so 4 eggs total for muffins. 

After that, the remainder of the eggs is sold at the farmers' market for $2 per egg. So, the eggs remaining after breakfast and muffins would be 16 eggs minus 3 eggs (breakfast) minus 4 eggs (muffins) equals 9 eggs. Then, 9 eggs multiplied by $2 per egg gives the total amount for the market.

Let me check the math step by step to make sure I didn't make a mistake. 16 eggs total. Subtract 3 for breakfast: 16 - 3 = 13 eggs left. Then 13 - 4 = 9 eggs. Multiply 9 by 2: 18 dollars. So, $18 should be the amount she makes at the market. That seems right.

Wait, just to confirm, 16 - 3 is 13, 13 - 4 is 9, 9*2 is 18. Yep, that adds up. I don't think I missed anything here. So the answer should be $18.
</think>

Janet's ducks lay 16 eggs per day. She eats 3 eggs for breakfast, and bakes 4 eggs for muffins daily. The remainder is sold at the farmers' market for $2 per egg.

1. Total eggs: $16$
2. Eggs eaten: $16 - 3 = 13$
3. Eggs remaining: $13 - 4 = 9$
4. Revenue from market: $9 \times 2 = 18$

**Answer:** Janet makes $\boxed{18}$ dollars every day at the farmers' market.

Quick start with chat template

from modelscope import pipeline
from modelscope.msdatasets import MsDataset

SYSTEM_PROMPT = """
Please solve the math problem step by step. Use <think> tags to show your reasoning process, then provide the final numerical answer.

IMPORTANT: Your final answer must be a pure number only, following these rules:
- No currency symbols: use 42, not $42 or 42 dollars
- No thousand separators: use 16000, not 16,000 or 16 000  
- No decimal if integer: use 42, not 42.0 or 42.00
- No units or text: use 144, not 144 gallons or 144 total

Examples:
- Good: 42, 16000, 144, 999000
- Bad: $42, 16,000, 42.0, 144 gallons

Format:
<think>
Your step-by-step reasoning here...
</think>
Final answer: [pure number only]
"""


ds = MsDataset.load('modelscope/gsm8k', subset_name='main', split='test', trust_remote_code=True)
sample = ds[0]
question = sample['question']
true_answer = sample['answer']
print("Question:", question)
print("True answer:", true_answer)

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": question}
]

generator = pipeline(
    task="text-generation",
    model="yuanmodel/Qwen3-0.6B-GRPO-GSM8K-Think",
    device="cuda",
    trust_remote_code=True
)

output = generator(messages, max_new_tokens=512)
print("Raw output:", output)
print("Model answer:", output["message"]["content"])

Raw output: {'message': {'role': 'assistant', 'content': "<think>\nOkay, let's see. Janet's ducks lay 16 eggs per day. She eats three eggs for breakfast and four eggs for baking muffins each day. Then she sells the rest at $2 per egg. So, I need to calculate how many eggs she has left each day, multiply that by $2, and that should give the total amount she makes.\n\nFirst, let's find out the total eggs per day. She lays 16 eggs, eats 3 + 4 = 7 eggs. So total eggs left = 16 - 7 = 9 eggs per day. Then multiply by $2: 9 * 2 = $18. So, the answer should be $18.\n</think>\n\nFinal answer: 18"}}
Model answer: <think>
Okay, let's see. Janet's ducks lay 16 eggs per day. She eats three eggs for breakfast and four eggs for baking muffins each day. Then she sells the rest at $2 per egg. So, I need to calculate how many eggs she has left each day, multiply that by $2, and that should give the total amount she makes.

First, let's find out the total eggs per day. She lays 16 eggs, eats 3 + 4 = 7 eggs. So total eggs left = 16 - 7 = 9 eggs per day. Then multiply by $2: 9 * 2 = $18. So, the answer should be $18.
</think>

Final answer: 18

Training procedure

This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.

GSM8K Validation and Test Set Evaluation Results

Model	GSM8K-Val	GSM8K-Test	Improvement
Qwen/Qwen3-0.6B (official)	-	59.59%	-
Qwen/Qwen3-0.6B (local reproduction)	53.25%	56.33%	-
Qwen/Qwen3-0.6B GRPO-checkpoint-160	-	59.67%	+5.92%
Qwen/Qwen3-0.6B GRPO-checkpoint-180	62.8%	60.65%	+7.66%

"official" refers to results published by ModelScope or HuggingFace.
"local reproduction" refers to results reproduced by the local pipeline.
"GRPO-checkpoint-160/180" are intermediate/final checkpoints trained in this project.
"Improvement" is relative to the local reproduction Qwen3-0.6B baseline.

Framework versions

TRL: 0.19.0
Transformers: 4.52.4
Pytorch: 2.7.1
Datasets: 3.6.0
Tokenizers: 0.21.2

Citations

Cite GRPO as:

@article{zhihong2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Downloads last month: 59

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for x32/Qwen3-0.6B-GRPO-GSM8K-Think

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

(443)

this model