Qwen3.6-27B AWQ 4-bit (native)

AWQ 4-bit quantization of Qwen3.6-27B (dense VL) with thinking + vision preserved, optimized for AMD RDNA4 (gfx1201) inference with SGLang.

Model Details


Base model	Qwen/Qwen3.6-27B
Architecture	Qwen3.5 dense+DeltaNet hybrid + vision tower
Parameters	27B
Layers	48 (mixed full-attention + DeltaNet linear-attn)
Context	262K (native)
Modalities	text + image + video (no audio)
Quantization	Native AWQ 4-bit, group_size=128, fused Triton GEMM
Calibration	GPTQ via llmcompressor, 256 samples × 1024 tokens, `thinking_vision` recipe; DeltaNet `in_proj_a/b` and vision tower kept BF16

Performance (2x AMD Radeon AI PRO R9700, TP=2)

sglang.bench_serving, single user, FP8 KV cache:

Context	TPOT (ms)	tok/s
128	41.5	24.1
8192	42.2	23.7
32768	54.5	18.3
65536	70.4	14.2
131072	102.4	9.8

Dense attention scales quadratically — the curve drops past 16K, unlike the 35B-A3B MoE which stays flat. For long-context coding/agent workloads on RDNA4, prefer the Qwen3.6-35B-A3B AWQ (~21 tok/s flat through 131K).

Notes

Same calibration recipe + DeltaNet preservation as the 35B variant; differs only in dense-vs-MoE architecture.
Native AWQ format (not compressed-tensors) — preserves the mattbucci/Qwen3.6-27B-AWQ-CT variant for cross-engine compatibility.
Greedy decode (temperature=0) loops on Qwen3 family — use temperature=0.7, top_p=0.95, top_k=20. SGLang picks this up automatically via sampling_defaults='model'.

Usage with SGLang (RDNA4)

git clone https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference
cd 2x-R9700-RDNA4-GFX1201-sglang-inference
./scripts/setup.sh
MODEL=mattbucci/Qwen3.6-27B-AWQ scripts/launch.sh qwen36-27b

Hardware

Calibrated and benchmarked on 2× AMD Radeon AI PRO R9700 (gfx1201, RDNA4, 64 GB total VRAM) with ROCm 7.2 + SGLang v0.5.10 + RDNA4 patches.

Downloads last month: 1,969

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mattbucci/Qwen3.6-27B-AWQ

Base model

Qwen/Qwen3.6-27B

Quantized

(234)

this model