Open4bits / Qwen3 0.6B GGUF

This repository provides GGUF-format quantized builds of the Qwen3 0.6B model, published by Open4bits for efficient local inference using llama.cpp-compatible runtimes.

The underlying Qwen3 model architecture and weights are owned by the original model authors. This repository contains only converted and quantized GGUF files and does not include training code or datasets.

These builds are intended for fast, low-memory inference on CPUs and GPUs across a wide range of hardware.


Model Overview

Qwen3 0.6B is a small-scale transformer language model designed for lightweight text generation tasks.
The GGUF format enables efficient execution in environments such as llama.cpp, llama-cpp-python, and compatible frontends.

This repository includes multiple quantization variants to balance quality, speed, and memory usage.


Model Details

  • Model family: Qwen3
  • Model size: 0.6B parameters
  • Format: GGUF
  • Task: Text Generation
  • Compatibility: llama.cpp, llama-cpp-python, GGUF-compatible runtimes

Available Files

The following quantized variants are provided:

FP16

  • qwen3-0.6b-f16.gguf — 1.51 GB

Q8

  • qwen3-0.6b-Q8_0.gguf — 805 MB

Q6

  • qwen3-0.6b-Q6_K.gguf — 623 MB

Q5

  • qwen3-0.6b-Q5_0.gguf — 544 MB
  • qwen3-0.6b-Q5_1.gguf — 581 MB
  • qwen3-0.6b-Q5_K_M.gguf — 551 MB
  • qwen3-0.6b-Q5_K_S.gguf — 544 MB

Q4

  • qwen3-0.6b-Q4_0.gguf — 469 MB
  • qwen3-0.6b-Q4_K_M.gguf — 484 MB
  • qwen3-0.6b-Q4_K_S.gguf — 471 MB
  • qwen3-0.6b-IQ4_NL.gguf — 470 MB
  • qwen3-0.6b-IQ4_XS.gguf — 452 MB

Intended Use

These GGUF builds are intended for:

  • Local text generation
  • CPU or low-VRAM GPU inference
  • Embedded and edge deployments
  • Research, experimentation, and prototyping

Usage

Example usage with llama-cpp-python:

from llama_cpp import Llama

llm = Llama(
    model_path="qwen3-0.6b-Q4_K_M.gguf",
    n_ctx=2048
)

output = llm("Write a short explanation of quantization.")
print(output["choices"][0]["text"])

Limitations

  • Output quality is limited by the small model size
  • Lower-bit quantizations may reduce accuracy
  • Not instruction-tuned unless combined with external prompting strategies

License

This repository follows the Apache License 2.0, consistent with the upstream model licensing.

The original Qwen3 model and associated intellectual property are owned by the original model authors.


Support

If you find this model useful, please consider supporting the project. Your support helps us continue releasing and maintaining high-quality open models. Support us with a heart.

Downloads last month
593
GGUF
Model size
0.8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Open4bits/Qwen3-0.6b-gguf

Quantized
(60)
this model

Collection including Open4bits/Qwen3-0.6b-gguf