Instructions to use darthcrawl/granite-guardian-4.1-8b-mlx-6bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use darthcrawl/granite-guardian-4.1-8b-mlx-6bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir granite-guardian-4.1-8b-mlx-6bit darthcrawl/granite-guardian-4.1-8b-mlx-6bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
granite-guardian-4.1-8b-mlx-6bit
MLX 6-bit quant of ibm-granite/granite-guardian-4.1-8b. Tighter quality than 4-bit at modest cost. Disk: ~6.3 GB. Active memory: ~7 GB plus KV cache.
For a classification model, 4-bit is usually fine. Pick 6-bit if you've measured 4-bit producing wrong calls in your specific risk categories.
Quantization
mlx_lm.convert --hf-path ibm-granite/granite-guardian-4.1-8b \
--mlx-path ./granite-guardian-4.1-8b-mlx-6bit -q --q-bits 6
Usage and sampling
Same as the 4-bit. Granite Guardian uses a structured prompt to classify a piece of input/output text. See the base model card for the prompt template.
temperature: 0.0
max_tokens: 16
License
Apache-2.0 from base.
- Downloads last month
- 82
Model size
8B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
6-bit
Model tree for darthcrawl/granite-guardian-4.1-8b-mlx-6bit
Base model
ibm-granite/granite-4.1-8b Finetuned
ibm-granite/granite-guardian-4.1-8b