Add IQ2_KT

Browse files

Files changed (2) hide show

README.md +61 -0
images/perplexity.png +2 -2

README.md CHANGED Viewed

@@ -445,6 +445,67 @@ numactl -N "$SOCKET" -m "$SOCKET" \
 </details>
 ## IQ2_KS 193.144 GiB (2.472 BPW)
 Final estimate: PPL = 3.9583 +/- 0.02433

 </details>
+## IQ2_KT 204.592 GiB (2.619 BPW)
+Final estimate: PPL = 3.8109 +/- 0.02294
+Remember, the KT quants are better suited for full GPU offload as calculating trellis on CPU bottlenecks token generation.
+<details>
+<summary>👈 Secret Recipe</summary>
+```bash
+#!/usr/bin/env bash
+custom="
+## Attention [0-60] (GPU)
+blk\..*\.attn_k_b\.weight=q8_0
+blk\..*\.attn_v_b\.weight=q8_0
+# Balance of attn tensors
+blk\..*\.attn_kv_a_mqa\.weight=q8_0
+blk\..*\.attn_q_a\.weight=q8_0
+blk\..*\.attn_q_b\.weight=q8_0
+blk\..*\.attn_output\.weight=q8_0
+## First Three Dense Layers [0-2] (GPU)
+blk\..*\.ffn_down\.weight=q8_0
+blk\..*\.ffn_(gate|up)\.weight=q8_0
+## Shared Expert [3-60] (GPU)
+blk\..*\.ffn_down_shexp\.weight=q8_0
+blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0
+## Routed Experts [3-60] (CPU)
+blk\..*\.ffn_down_exps\.weight=iq3_kt
+blk\..*\.ffn_(gate|up)_exps\.weight=iq2_kt
+## Token embedding and output tensors (GPU)
+token_embd\.weight=iq6_k
+output\.weight=iq6_k
+"
+custom=$(
+  echo "$custom" | grep -v '^#' | \
+  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
+)
+SOCKET=0
+numactl -N "$SOCKET" -m "$SOCKET" \
+./build/bin/llama-quantize \
+    --custom-q "$custom" \
+    --imatrix /mnt/raid/models/ubergarm/DeepSeek-V3.1-GGUF/imatrix-DeepSeek-V3.1-Q8_0.dat \
+    /mnt/raid/models/ubergarm/DeepSeek-V3.1-GGUF/DeepSeek-V3.1-256x20B-safetensors-BF16-00001-of-00030.gguf \
+    /mnt/raid/models/ubergarm/DeepSeek-V3.1-GGUF/DeepSeek-V3.1-IQ2_KT.gguf \
+    IQ2_KT \
+    192
+```
+</details>
 ## IQ2_KS 193.144 GiB (2.472 BPW)
 Final estimate: PPL = 3.9583 +/- 0.02433

images/perplexity.png CHANGED Viewed

Git LFS Details

SHA256: fe25f4620ec79012373f48dd61d15a9bb982c50ebc2f2685d02e045689839d5c
Pointer size: 131 Bytes
Size of remote file: 173 kB

Git LFS Details

SHA256: edc71eb2278d31ecaa810bec58afcaa578395fb708970471788e6c2a7e13e978
Pointer size: 131 Bytes
Size of remote file: 179 kB