granite-guardian-4.1-8b-mlx-6bit

MLX 6-bit quant of ibm-granite/granite-guardian-4.1-8b. Tighter quality than 4-bit at modest cost. Disk: ~6.3 GB. Active memory: ~7 GB plus KV cache.

For a classification model, 4-bit is usually fine. Pick 6-bit if you've measured 4-bit producing wrong calls in your specific risk categories.

Quantization

mlx_lm.convert --hf-path ibm-granite/granite-guardian-4.1-8b \
  --mlx-path ./granite-guardian-4.1-8b-mlx-6bit -q --q-bits 6

Usage and sampling

Same as the 4-bit. Granite Guardian uses a structured prompt to classify a piece of input/output text. See the base model card for the prompt template.

temperature: 0.0
max_tokens: 16

License

Apache-2.0 from base.

Downloads last month: 82

Safetensors

Model size

8B params

Tensor type

BF16

U32

MLX

Hardware compatibility

6-bit

Model tree for darthcrawl/granite-guardian-4.1-8b-mlx-6bit

Base model

ibm-granite/granite-4.1-8b

Finetuned

ibm-granite/granite-guardian-4.1-8b

Quantized

(5)

this model