--- language: - en license: apache-2.0 library_name: gguf tags: - reranker - gguf - llama.cpp base_model: Qwen/Qwen3-Reranker-0.6B --- # Qwen3-Reranker-0.6B-Q4_K_M-GGUF This model was converted to GGUF format from [Qwen/Qwen3-Reranker-0.6B](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B) using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the [original model card](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B) for more details on the model. ## Model Information - **Base Model**: [Qwen/Qwen3-Reranker-0.6B](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B) - **Quantization**: Q4_K_M - **Format**: GGUF (GPT-Generated Unified Format) - **Converted with**: llama.cpp ## Quantization Details This is a **Q4_K_M** quantization of the original model: - **F16**: Full 16-bit floating point - highest quality, largest size - **Q8_0**: 8-bit quantization - high quality, good balance - **Q4_K_M**: 4-bit quantization with medium quality - smaller size, faster inference ## Usage This model can be used with llama.cpp and other GGUF-compatible inference engines. ```bash # Example using llama.cpp ./llama-rerank -m Qwen3-Reranker-0.6B-Q4_K_M.gguf ``` ## Model Files | Quantization | Use Case | |-------------|----------| | F16 | Maximum quality, largest size | | Q8_0 | High quality, good balance of size/performance | | Q4_K_M | Good quality, smallest size, fastest inference | ## Citation If you use this model, please cite the original model: ```bibtex # See original model card for citation information ``` ## License This model inherits the license from the original model. Please refer to the [original model card](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B) for license details. ## Acknowledgements - Original model by the authors of [Qwen/Qwen3-Reranker-0.6B](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B) - GGUF conversion via llama.cpp by ggml.ai - Converted and uploaded by [sinjab](https://huggingface.co/sinjab)