🪄 GPT-OSS 20B — FableFlux (MXFP4)

Author: garethpaul
Base Model: openai/gpt-oss-20b
Adapter Dataset: garethpaul/children-stories-dataset
Format: MXFP4 quantized (safetensors)
License: MIT


✨ Overview

This model is a fine-tuned version of GPT-OSS 20B using QLoRA on the Children Stories Dataset.
It’s optimized for structured children’s story generation with a friendly JSON-style output and designed to run efficiently in vLLM using MXFP4 quantization.

  • Architecture: Mixture-of-Experts (MoE) with GPT-OSS layout
  • Quantization: MXFP4 (blockwise 4-bit floating-point)
  • Context length: 8192 tokens
  • Files: 6 × safetensors shards (~42 GB total)

📖 Example Usage

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

messages = [
    {"role": "system", "content": "Always respond in JSON with keys: title, characters, setting, story, moral."},
    {"role": "user", "content": "Tell me a bedtime story about a brave little car."}
]

resp = client.chat.completions.create(
    model="garethpaul/gpt-oss-20b-fableflux-mxfp4",
    messages=messages,
    max_tokens=700,
    temperature=0.7,
    top_p=0.9,
)

print(resp.choices[0].message["content"])

Running with VLLM

pip install vllm==0.10.1+gptoss --extra-index-url https://wheels.vllm.ai/gpt-oss/
vllm serve garethpaul/gpt-oss-20b-fableflux-mxfp4 \
  --max-model-len 8192 \
  --tensor-parallel-size 1

📊 Training Details

🔗 Related Repos

Downloads last month
4
Safetensors
Model size
21B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for garethpaul/gpt-oss-20b-fableflux-mxfp4

Base model

openai/gpt-oss-20b
Quantized
(144)
this model

Dataset used to train garethpaul/gpt-oss-20b-fableflux-mxfp4