Mamba GGUF

These are the Mamba base models, converted to GGUF for use with llama.cpp, in a variety of precisions (2, 3, 4, 5, 6, 8, 16, and 32-bit).

Please click "Files and versions" at the top of the page to choose your desired model size, and then click the "📦LFS ↓" button next to your desired quantization.

Here is a table adapted from TheBloke explaining the various precisions:

Quant method	Use case
Q2_K	significant quality loss - not recommended for most purposes
Q3_K_S	very small, high quality loss
Q3_K_M	very small, high quality loss
Q3_K_L	small, substantial quality loss
Q4_0	legacy; small, very high quality loss - prefer using Q3_K_M
Q4_K_S	small, greater quality loss
Q4_K_M	medium, balanced quality - recommended
Q5_0	legacy; medium, balanced quality - prefer using Q4_K_M
Q5_K_S	large, low quality loss - recommended
Q5_K_M	large, very low quality loss - recommended
Q6_K	very large, extremely low quality loss
Q8_0	very large, extremely low quality loss - not recommended
F16	half precision - almost identical to the original
F32	original precision - recommended by the Mamba authors