Phi-4-mini-instruct-exl3 / README.md

turboderp

Update README.md

d80df0a verified about 1 month ago

preview code

raw

history blame contribute delete

1.34 kB

metadata

license: mit
base_model: microsoft/Phi-4-mini-instruct
base_model_relation: quantized
quantized_by: turboderp
tags:
  - exl3

EXL3 quants of Phi-4-mini-instruct

At the moment these are all converted with 8-bpw output layers. Currently investigating why there's a small-but-noticeable drop in accuracy at 6-bpw. Likely it has to do with the tied embeddings.

2.00 bits per weight / H8
2.25 bits per weight / H8
2.50 bits per weight / H8
3.00 bits per weight / H8
3.50 bits per weight / H8
4.00 bits per weight / H8
5.00 bits per weight / H8
6.00 bits per weight / H8
8.00 bits per weight / H8