Llamacpp Quantizations of deepseek-ai/DeepSeek-V3.1-Terminus

Adopting BF16 & Imatrix from unsloth/DeepSeek-V3.1-Terminus-GGUF. (Huge fan of unsloth)

Personalized Replication of Low-Bit Mixed Precision Quant using --tensor-type option in llama.cpp

IQ1_S with more dynamic mix for extreme compression.

- IQ1_S : 137.66 GiB (1.76 BPW)
- IQ1_M : 151.25 GiB (1.94 BPW)
- Q2_K_L : 231.55 GiB (2.96 BPW)
- Q4_K_M : 376.89 GiB (4.82 BPW)

Download Guide

# !pip install huggingface_hub hf_transfer
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "bobchenyx/DeepSeek-V3.1-Terminus-GGUF",
    local_dir = "bobchenyx/DeepSeek-V3.1-Terminus-GGUF",
    allow_patterns = ["*IQ1_M*"], # Q2_K_L,Q4_K_M
)