File size: 3,593 Bytes
1e9f710 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | ---
library_name: transformers
license: apache-2.0
base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
tags:
- eagle
- eagle3
- speculative-decoding
- draft-model
- sglang
- qwen3
- code
language:
- en
- zh
pipeline_tag: text-generation
---
# SGLang-EAGLE3-Qwen3-Coder-30B-A3B-Instruct
This is an **EAGLE3 draft model** for speculative decoding with [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct).
## Model Description
EAGLE3 (Efficient Auto-regressive Language model Generation with Learned Embeddings) is a speculative decoding technique that uses a lightweight draft model to predict future tokens, which are then verified by the target model in parallel. This can significantly accelerate inference speed (2-3x) without any loss in output quality.
### Key Features
- **Target Model**: Qwen3-Coder-30B-A3B-Instruct (30B parameters, 3B active)
- **Draft Model Size**: ~350MB (single transformer layer)
- **Training Data**: OpenPromptContainer (OPC) regenerated dataset
- **Training Steps**: 295,000 (Epoch 1)
- **Framework**: Trained with [SpecForge](https://github.com/sgl-project/SpecForge)
### Training Metrics
| Metric | Value |
|--------|-------|
| First Token Accuracy (acc_0) | 88.19% |
| Average Accuracy (7 positions) | 85.19% |
| Training Epochs | 1+ (295k steps) |
## Usage
### With SGLang
```python
import sglang as sgl
# Launch with EAGLE3 speculative decoding
llm = sgl.Engine(
model_path="Qwen/Qwen3-Coder-30B-A3B-Instruct",
speculative_algorithm="EAGLE",
speculative_draft_model_path="sgl-project/SGLang-EAGLE3-Qwen3-Coder-30B-A3B-Instruct",
speculative_num_steps=5,
speculative_eagle_topk=8,
speculative_num_draft_tokens=64,
)
# Generate text
output = llm.generate("Write a Python function to sort a list:")
print(output)
```
### With SGLang Server
```bash
python -m sglang.launch_server \
--model-path Qwen/Qwen3-Coder-30B-A3B-Instruct \
--speculative-algorithm EAGLE \
--speculative-draft-model-path sgl-project/SGLang-EAGLE3-Qwen3-Coder-30B-A3B-Instruct \
--speculative-num-steps 5 \
--speculative-eagle-topk 8 \
--speculative-num-draft-tokens 64 \
--tp 8
```
## Model Architecture
The EAGLE3 draft model is a lightweight transformer that:
- Shares embeddings with the target model
- Uses a single transformer layer (hidden_size=2048, intermediate_size=12288)
- Predicts multiple future tokens autoregressively
- Uses the target model's hidden states as input
```json
{
"architectures": ["LlamaForCausalLMEagle3"],
"hidden_size": 2048,
"intermediate_size": 12288,
"num_attention_heads": 32,
"num_key_value_heads": 4,
"num_hidden_layers": 1,
"vocab_size": 151936
}
```
## Training Details
- **Framework**: SpecForge with SGLang backend
- **Hardware**: 4x NVIDIA H200 GPUs (TP=4)
- **Batch Size**: 1 per GPU
- **Learning Rate**: 1e-4 with cosine annealing
- **Max Sequence Length**: 4096
- **Attention Backend**: FlexAttention
## Citation
If you use this model, please cite:
```bibtex
@article{li2024eagle,
title={EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty},
author={Li, Yuhui and Wei, Fangyun and Zhang, Chao and Zhang, Hongyang},
journal={arXiv preprint arXiv:2401.15077},
year={2024}
}
@misc{sglang2024,
title={SGLang: Efficient Execution of Structured Language Model Programs},
author={Zheng, Lianmin and others},
year={2024},
url={https://github.com/sgl-project/sglang}
}
```
## License
This model is released under the Apache 2.0 License, following the base model's license.
|