Qwen3.6-27B AWQ 4-bit (native)
AWQ 4-bit quantization of Qwen3.6-27B (dense VL) with thinking + vision preserved, optimized for AMD RDNA4 (gfx1201) inference with SGLang.
Model Details
| Base model | Qwen/Qwen3.6-27B |
| Architecture | Qwen3.5 dense+DeltaNet hybrid + vision tower |
| Parameters | 27B |
| Layers | 48 (mixed full-attention + DeltaNet linear-attn) |
| Context | 262K (native) |
| Modalities | text + image + video (no audio) |
| Quantization | Native AWQ 4-bit, group_size=128, fused Triton GEMM |
| Calibration | GPTQ via llmcompressor, 256 samples ร 1024 tokens, thinking_vision recipe; DeltaNet in_proj_a/b and vision tower kept BF16 |
Performance (2x AMD Radeon AI PRO R9700, TP=2)
sglang.bench_serving, single user, FP8 KV cache:
| Context | TPOT (ms) | tok/s |
|---|---|---|
| 128 | 41.5 | 24.1 |
| 8192 | 42.2 | 23.7 |
| 32768 | 54.5 | 18.3 |
| 65536 | 70.4 | 14.2 |
| 131072 | 102.4 | 9.8 |
Dense attention scales quadratically โ the curve drops past 16K, unlike the 35B-A3B MoE which stays flat. For long-context coding/agent workloads on RDNA4, prefer the Qwen3.6-35B-A3B AWQ (~21 tok/s flat through 131K).
Notes
- Same calibration recipe + DeltaNet preservation as the 35B variant; differs only in dense-vs-MoE architecture.
- Native AWQ format (not compressed-tensors) โ preserves the
mattbucci/Qwen3.6-27B-AWQ-CTvariant for cross-engine compatibility. - Greedy decode (
temperature=0) loops on Qwen3 family โ usetemperature=0.7, top_p=0.95, top_k=20. SGLang picks this up automatically viasampling_defaults='model'.
Usage with SGLang (RDNA4)
git clone https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference
cd 2x-R9700-RDNA4-GFX1201-sglang-inference
./scripts/setup.sh
MODEL=mattbucci/Qwen3.6-27B-AWQ scripts/launch.sh qwen36-27b
Hardware
Calibrated and benchmarked on 2ร AMD Radeon AI PRO R9700 (gfx1201, RDNA4, 64 GB total VRAM) with ROCm 7.2 + SGLang v0.5.10 + RDNA4 patches.
- Downloads last month
- 1,969
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for mattbucci/Qwen3.6-27B-AWQ
Base model
Qwen/Qwen3.6-27B