This is a fine-tune of gemma, trained using GRPO with a reward function to incentivise ~40 char outputs beginning with "I" such that it outputs TL;DR summaries for reddit comments.
Downloads last month
5
Safetensors
Model size
0.3B params
Tensor type
F32
·
Model tree for alex-treebeard/gemma-3-270m-it-tldr