distillation
knowledge-distillation
sovereign-omega
reasoning
moro72842 commited on
Commit
86a2d05
·
verified ·
1 Parent(s): f362f1d

Add distillation pipeline + evaluation harness + report generator

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - distillation
5
+ - knowledge-distillation
6
+ - sovereign-omega
7
+ - reasoning
8
+ base_model:
9
+ - Qwen/Qwen2.5-3B-Instruct
10
+ datasets:
11
+ - moro72842/Sovereign-Omega-SFT-V1
12
+ - open-r1/OpenR1-Math-220k
13
+ ---
14
+
15
+ # Edge-Omega-Distilled-V1
16
+
17
+ ## Distillation Pipeline
18
+
19
+ High-fidelity knowledge distillation following the DeepSeek-R1 recipe (arxiv:2501.12948):
20
+ - **Teacher traces**: Sovereign-Omega-SFT-V1 + OpenR1-Math-220k (verified)
21
+ - **Method**: SFT on teacher reasoning traces (trace-based KD)
22
+ - **Student**: Qwen2.5-3B-Instruct + LoRA (r=64, α=128)
23
+ - **Config**: lr=1e-5, 3 epochs, cosine schedule, packing=True
24
+
25
+ ## Why Trace-Based KD (Not Logit-Based)
26
+
27
+ From DeepSeek-R1 §2.4 and REDI (arxiv:2505.24850):
28
+ - Trace-based SFT: **72.6% AIME2024** (32B student)
29
+ - Equivalent RL without distillation: **47.0% AIME2024** (same base)
30
+ - 25-point gap proves trace KD is decisively superior for reasoning
31
+
32
+ ## Evaluation
33
+
34
+ Pre/post distillation comparison on:
35
+ - GPQA Diamond (198 PhD-level MCQs)
36
+ - ARC-Challenge (1172 science MCQs)
37
+
38
+ ## Launch
39
+
40
+ ```bash
41
+ pip install trl transformers torch datasets accelerate peft trackio
42
+
43
+ # Distill with Sovereign-Omega + OpenR1 traces
44
+ STUDENT_MODEL=Qwen/Qwen2.5-3B-Instruct \
45
+ OPENR1_SAMPLES=50000 \
46
+ python distillation_pipeline.py
47
+
48
+ # Larger student on multi-GPU
49
+ STUDENT_MODEL=Qwen/Qwen2.5-7B-Instruct \
50
+ OPENR1_SAMPLES=220000 \
51
+ accelerate launch --config_file deepspeed_zero3.yaml distillation_pipeline.py
52
+ ```