Llamacpp Quantizations of deepseek-ai/DeepSeek-V3.1-Terminus

Adopting BF16 & Imatrix from unsloth/DeepSeek-V3.1-Terminus-GGUF. (Huge fan of unsloth)

Personalized Replication of Low-Bit Mixed Precision Quant using --tensor-type option in llama.cpp

IQ1_S with more dynamic mix for extreme compression.

- IQ1_S : 137.66 GiB (1.76 BPW)
- IQ1_M : 151.25 GiB (1.94 BPW)
- Q2_K_L : 231.55 GiB (2.96 BPW)
- Q4_K_M : 376.89 GiB (4.82 BPW)

Download Guide

# !pip install huggingface_hub hf_transfer
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "bobchenyx/DeepSeek-V3.1-Terminus-GGUF",
    local_dir = "bobchenyx/DeepSeek-V3.1-Terminus-GGUF",
    allow_patterns = ["*IQ1_M*"], # Q2_K_L,Q4_K_M
)
Downloads last month
36
GGUF
Model size
671B params
Architecture
deepseek2
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bobchenyx/DeepSeek-V3.1-Terminus-GGUF

Quantized
(17)
this model

Collection including bobchenyx/DeepSeek-V3.1-Terminus-GGUF