Qwen3.6-27B NVFP4

This is an AutoRound NVFP4 quantization of Qwen/Qwen3.6-27B.

The checkpoint uses the compressed-tensors nvfp4-pack-quantized format produced through AutoRound's llm_compressor export path. It is intended for runtimes that support this format, such as recent vLLM builds with NVFP4/compressed-tensors support on compatible NVIDIA hardware.

Quantization details

  • Base model: Qwen/Qwen3.6-27B
  • Base snapshot: 6a9e13bd6fc8f0983b9b99948120bc37f49c13e9
  • Quantizer: AutoRound 0.12.3
  • Format: compressed-tensors nvfp4-pack-quantized
  • Weight format: 4-bit float, group size 16, FP8 scales

Quantized weights:

  • language MLP projections: gate_proj, up_proj, down_proj

Kept in bf16:

  • token embeddings
  • lm_head
  • visual tower
  • MTP tensors
  • linear_attn.*
  • self_attn.*

Size

  • Original indexed tensor size: about 51.75 GiB
  • Quantized indexed tensor size: about 28.05 GiB
  • Repository folder size: about 30 GiB
  • Indexed tensor size reduction: about 45.8%

Notes

This model has not been benchmarked here. Please run your own validation for the runtime, context length, and workload you plan to use.

Downloads last month
364
Safetensors
Model size
10B params
Tensor type
BF16
F32
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for qmxme/Qwen3.6-27B-NVFP4

Base model

Qwen/Qwen3.6-27B
Quantized
(301)
this model