🪄 GPT-OSS 20B — FableFlux (MXFP4)
Author: garethpaul
Base Model: openai/gpt-oss-20b
Adapter Dataset: garethpaul/children-stories-dataset
Format: MXFP4 quantized (safetensors)
License: MIT
✨ Overview
This model is a fine-tuned version of GPT-OSS 20B using QLoRA on the Children Stories Dataset.
It’s optimized for structured children’s story generation with a friendly JSON-style output and designed to run efficiently in vLLM using MXFP4 quantization.
- Architecture: Mixture-of-Experts (MoE) with GPT-OSS layout
- Quantization: MXFP4 (blockwise 4-bit floating-point)
- Context length: 8192 tokens
- Files: 6 × safetensors shards (~42 GB total)
📖 Example Usage
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
messages = [
{"role": "system", "content": "Always respond in JSON with keys: title, characters, setting, story, moral."},
{"role": "user", "content": "Tell me a bedtime story about a brave little car."}
]
resp = client.chat.completions.create(
model="garethpaul/gpt-oss-20b-fableflux-mxfp4",
messages=messages,
max_tokens=700,
temperature=0.7,
top_p=0.9,
)
print(resp.choices[0].message["content"])
Running with VLLM
pip install vllm==0.10.1+gptoss --extra-index-url https://wheels.vllm.ai/gpt-oss/
vllm serve garethpaul/gpt-oss-20b-fableflux-mxfp4 \
--max-model-len 8192 \
--tensor-parallel-size 1
📊 Training Details
- Method: QLoRA (rank=8, α=16)
- Dataset: ~10K synthetic children’s stories https://huggingface.co/datasets/garethpaul/children-stories-dataset/viewer
- Target objective: Structured JSON story generation
- Merged: LoRA merged into base → re-exported to MXFP4
🔗 Related Repos
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for garethpaul/gpt-oss-20b-fableflux-mxfp4
Base model
openai/gpt-oss-20b