Steelman-14B-Ada v0.3 -- GGUF
Quantized GGUF of Steelman-14B-Ada v0.3 for use with Ollama, llama.cpp, or any GGUF-compatible runtime.
Download
| File | Size | Quantization | Notes |
|---|---|---|---|
steelman-r7-coder-base-q8_0.gguf |
~15 GB | Q8_0 (8-bit) | Highest quality, recommended |
Legacy GGUFs from prior rounds are kept for reproducibility but are superseded by the R7 version.
Benchmark
62.4% on Steelman Eval v4 (754 prompts, 10 categories, strict GNAT compilation + functional scoring) -- outperforms Claude Opus 4.6 (12.7%), GPT-5.4 (12.9%), and every other frontier model tested.
85.4% compile rate on HumanEval-Ada -- highest of any model tested, including all frontier models.
See the model card for full benchmark tables, per-category breakdown, and evaluation methodology.
Usage with Ollama
Download
steelman-r7-coder-base-q8_0.ggufCreate a
Modelfile:
FROM ./steelman-r7-coder-base-q8_0.gguf
TEMPLATE "Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{{ .Prompt }}
### Response:
{{ .Response }}"
SYSTEM "You are an expert Ada 2022 and SPARK programmer."
PARAMETER stop "### Instruction:"
PARAMETER temperature 0.0
PARAMETER num_ctx 32768
- Create and run:
ollama create steelman -f Modelfile
ollama run steelman "Write an Ada procedure implementing a producer-consumer pattern with protected objects"
Important: This model uses an Alpaca template, not ChatML. Using the wrong template will severely degrade output quality.
Usage with llama.cpp
llama-cli -m steelman-r7-coder-base-q8_0.gguf \
--temp 0 -n 2048 -c 32768 \
-p "Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Write an Ada 2022 generic package implementing a bounded stack with SPARK Pre/Post contracts.
### Response:
"
Performance
Q8_0 vs full precision: 0.5 percentage point difference on the 754-prompt eval (61.9% Q8_0 vs 62.4% fp16). Quantization has negligible impact on output quality.
Runs on any machine with 24GB+ RAM. GPU optional -- the model runs from system RAM via Ollama. On 64GB systems, the model stays in page cache for sub-second reloads between sessions.
License
Apache 2.0 (same as base model Qwen2.5-Coder-14B).
- Downloads last month
- 221
4-bit
8-bit