Configuration Parsing Warning:Config file tokenizer_config.json cannot be fetched (too big)

Mistral-Small-3.2-24B-Instruct-2506-MLX-2bit

MLX quantized version of Mistral Small 3.2 24B Instruct 2506.

Quantization

  • Method: Q2 (2-bit integer quantization)
  • Bits per weight: 2
  • Details: Uniform 2-bit integer quantization with group size 64.
  • Converted with: mlx-lm

Usage

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("alankessler/Mistral-Small-3.2-24B-Instruct-2506-MLX-2bit")
prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Hello!"}],
    add_generation_prompt=True,
    tokenize=False,
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512)
print(response)

Base Model

Downloads last month
53
Safetensors
Model size
24B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alankessler/Mistral-Small-3.2-24B-Instruct-2506-MLX-2bit