Configuration Parsing Warning:Config file tokenizer_config.json cannot be fetched (too big)
Mistral-Small-3.2-24B-Instruct-2506-MLX-2bit
MLX quantized version of Mistral Small 3.2 24B Instruct 2506.
Quantization
- Method: Q2 (2-bit integer quantization)
- Bits per weight: 2
- Details: Uniform 2-bit integer quantization with group size 64.
- Converted with: mlx-lm
Usage
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("alankessler/Mistral-Small-3.2-24B-Instruct-2506-MLX-2bit")
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": "Hello!"}],
add_generation_prompt=True,
tokenize=False,
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512)
print(response)
Base Model
- Model: Mistral Small 3.2 24B Instruct 2506
- Parameters: 24B
- Architecture: Mistral Small 3.2
- License: Apache 2.0
- Downloads last month
- 53
Model size
24B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
2-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for alankessler/Mistral-Small-3.2-24B-Instruct-2506-MLX-2bit
Base model
mistralai/Mistral-Small-3.1-24B-Base-2503