Qwen3.6-27B AWQ 4-bit (native)

AWQ 4-bit quantization of Qwen3.6-27B (dense VL) with thinking + vision preserved, optimized for AMD RDNA4 (gfx1201) inference with SGLang.

Model Details

Base model Qwen/Qwen3.6-27B
Architecture Qwen3.5 dense+DeltaNet hybrid + vision tower
Parameters 27B
Layers 48 (mixed full-attention + DeltaNet linear-attn)
Context 262K (native)
Modalities text + image + video (no audio)
Quantization Native AWQ 4-bit, group_size=128, fused Triton GEMM
Calibration GPTQ via llmcompressor, 256 samples ร— 1024 tokens, thinking_vision recipe; DeltaNet in_proj_a/b and vision tower kept BF16

Performance (2x AMD Radeon AI PRO R9700, TP=2)

sglang.bench_serving, single user, FP8 KV cache:

Context TPOT (ms) tok/s
128 41.5 24.1
8192 42.2 23.7
32768 54.5 18.3
65536 70.4 14.2
131072 102.4 9.8

Dense attention scales quadratically โ€” the curve drops past 16K, unlike the 35B-A3B MoE which stays flat. For long-context coding/agent workloads on RDNA4, prefer the Qwen3.6-35B-A3B AWQ (~21 tok/s flat through 131K).

Notes

  • Same calibration recipe + DeltaNet preservation as the 35B variant; differs only in dense-vs-MoE architecture.
  • Native AWQ format (not compressed-tensors) โ€” preserves the mattbucci/Qwen3.6-27B-AWQ-CT variant for cross-engine compatibility.
  • Greedy decode (temperature=0) loops on Qwen3 family โ€” use temperature=0.7, top_p=0.95, top_k=20. SGLang picks this up automatically via sampling_defaults='model'.

Usage with SGLang (RDNA4)

git clone https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference
cd 2x-R9700-RDNA4-GFX1201-sglang-inference
./scripts/setup.sh
MODEL=mattbucci/Qwen3.6-27B-AWQ scripts/launch.sh qwen36-27b

Hardware

Calibrated and benchmarked on 2ร— AMD Radeon AI PRO R9700 (gfx1201, RDNA4, 64 GB total VRAM) with ROCm 7.2 + SGLang v0.5.10 + RDNA4 patches.

Downloads last month
1,969
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mattbucci/Qwen3.6-27B-AWQ

Base model

Qwen/Qwen3.6-27B
Quantized
(234)
this model