Mamba GGUF
These are the Mamba base models, converted to GGUF for use with llama.cpp, in a variety of precisions (2, 3, 4, 5, 6, 8, 16, and 32-bit).
Please click "Files and versions" at the top of the page to choose your desired model size, and then click the "๐ฆLFS โ" button next to your desired quantization.
Here is a table adapted from TheBloke explaining the various precisions:
| Quant method | Use case |
|---|---|
| Q2_K | significant quality loss - not recommended for most purposes |
| Q3_K_S | very small, high quality loss |
| Q3_K_M | very small, high quality loss |
| Q3_K_L | small, substantial quality loss |
| Q4_0 | legacy; small, very high quality loss - prefer using Q3_K_M |
| Q4_K_S | small, greater quality loss |
| Q4_K_M | medium, balanced quality - recommended |
| Q5_0 | legacy; medium, balanced quality - prefer using Q4_K_M |
| Q5_K_S | large, low quality loss - recommended |
| Q5_K_M | large, very low quality loss - recommended |
| Q6_K | very large, extremely low quality loss |
| Q8_0 | very large, extremely low quality loss - not recommended |
| F16 | half precision - almost identical to the original |
| F32 | original precision - recommended by the Mamba authors |
- Downloads last month
- 1,693
Hardware compatibility
Log In
to view the estimation
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
32-bit