TFPI
Collection
ICLR2026: Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners https://arxiv.org/abs/2509.26226 • 14 items • Updated
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B