moro72842
/

Edge-Omega-Distilled-V1

knowledge-distillation

sovereign-omega

Model card Files Files and versions

Edge-Omega-Distilled-V1

Distillation Pipeline

High-fidelity knowledge distillation following the DeepSeek-R1 recipe (arxiv:2501.12948):

Teacher traces: Sovereign-Omega-SFT-V1 + OpenR1-Math-220k (verified)
Method: SFT on teacher reasoning traces (trace-based KD)
Student: Qwen2.5-3B-Instruct + LoRA (r=64, α=128)
Config: lr=1e-5, 3 epochs, cosine schedule, packing=True

Why Trace-Based KD (Not Logit-Based)

From DeepSeek-R1 §2.4 and REDI (arxiv:2505.24850):

Trace-based SFT: 72.6% AIME2024 (32B student)
Equivalent RL without distillation: 47.0% AIME2024 (same base)
25-point gap proves trace KD is decisively superior for reasoning

Evaluation

Pre/post distillation comparison on:

GPQA Diamond (198 PhD-level MCQs)
ARC-Challenge (1172 science MCQs)

Launch

pip install trl transformers torch datasets accelerate peft trackio

# Distill with Sovereign-Omega + OpenR1 traces
STUDENT_MODEL=Qwen/Qwen2.5-3B-Instruct \
OPENR1_SAMPLES=50000 \
python distillation_pipeline.py

# Larger student on multi-GPU
STUDENT_MODEL=Qwen/Qwen2.5-7B-Instruct \
OPENR1_SAMPLES=220000 \
accelerate launch --config_file deepspeed_zero3.yaml distillation_pipeline.py

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for moro72842/Edge-Omega-Distilled-V1

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Finetuned

(1238)

this model

Datasets used to train moro72842/Edge-Omega-Distilled-V1

Papers for moro72842/Edge-Omega-Distilled-V1

Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning

Paper • 2505.24850 • Published May 30, 2025 • 8

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 448