granite-guardian-4.1-8b-mlx-6bit

MLX 6-bit quant of ibm-granite/granite-guardian-4.1-8b. Tighter quality than 4-bit at modest cost. Disk: ~6.3 GB. Active memory: ~7 GB plus KV cache.

For a classification model, 4-bit is usually fine. Pick 6-bit if you've measured 4-bit producing wrong calls in your specific risk categories.

Quantization

mlx_lm.convert --hf-path ibm-granite/granite-guardian-4.1-8b \
  --mlx-path ./granite-guardian-4.1-8b-mlx-6bit -q --q-bits 6

Usage and sampling

Same as the 4-bit. Granite Guardian uses a structured prompt to classify a piece of input/output text. See the base model card for the prompt template.

temperature: 0.0
max_tokens: 16

License

Apache-2.0 from base.

Downloads last month
82
Safetensors
Model size
8B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for darthcrawl/granite-guardian-4.1-8b-mlx-6bit

Quantized
(5)
this model