Qwen3.5-9B-Darwin-Opus-NOESIS-AWQ-INT4
AWQ INT4 quantization of FINAL-Bench/Darwin-9B-Opus optimized for low-VRAM consumer hardware (RTX 3060 6 GB).
Released as part of the NOESIS Professional Multilingual Dubbing Automation Platform (framework: DHCF-FNO — Deterministic Hybrid Control Framework for Frozen Neural Operators).
- Founder: Ilia Bolotnikov
- Organization: AMAImedia.com
- X (Twitter): @AMAImediacom
- LinkedIn: Ilia Bolotnikov
- Telegram: @djbionicl
- NOESIS version: v14.7
- Release date: 2026-04
⚠️ License notice
This model is derived from FINAL-Bench/Darwin-9B-Opus which is licensed under
Apache 2.0. This AWQ quantization retains the same Apache 2.0 license — see
the LICENSE file in this repository for the full text.
Model summary
| Property | Value |
|---|---|
| Base model | FINAL-Bench/Darwin-9B-Opus |
| Underlying architecture | Qwen3_5ForConditionalGeneration (VLM — text + vision encoder) |
| Model type | qwen3_5 |
| Original precision | BF16 safetensors (~18 GB) |
| Quantized precision | AWQ INT4 (group_size=128, GEMM, zero_point=True) |
| Text vocab size | 248 320 |
| Context length | 131 072 tokens |
| Hidden size (text) | 4 096 |
| Layers (text) | 32 (hybrid: 24 GDN/linear_attention + 8 full_attention, interval=4) |
| Languages | 201 (native Qwen3.5-9B multilingual) |
| Disk footprint | ~4.7 GB |
| Inference VRAM | ~5.2 GB (text-only path, no vision input) |
| Quantization library | AutoAWQ 0.2.9 |
| Calibration set | 128 diverse prompts (code/reasoning/chat/research), max_seq_len=512 |
| RNG seed | 1729 (NOESIS reproducibility lock) |
Architecture note: Darwin-9B-Opus is a merge model built with the Darwin V5 methodology (MRI-guided per-tensor diagnostics 70% + evolutionary genome optimization 30%, implemented via direct PyTorch DARE-TIES).
- Father: Qwen/Qwen3.5-9B (original pre-training + RLHF)
- Mother: Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled (LoRA SFT on Claude 4.6 Opus reasoning chains)
The underlying Qwen3.5 backbone is the VLM branch (
Qwen3_5ForConditionalGeneration) with a vision encoder and a hybrid text decoder (qwen3_5_textsub-config): 24 GatedDeltaNet/linear_attention layers + 8 full self-attention layers (full_attn at every 4th layer). AWQ quantization targets the text decoder weights only: GDN layers have their MLP quantized; full_attention layers get self_attn + MLP quantized.
Why this quantization
The original Darwin-9B-Opus weights are in BF16 (~18 GB) which does not fit a 6 GB consumer GPU. This AWQ build:
- Fits inside ~5.2 GB VRAM for text-only inference on an RTX 3060 6 GB
- Uses GEMM kernel — compatible with
device_map={"":0}(no CPU offload) - Provenance-tracked (
noesis_provenance.jsonships alongside the model) - Calibrated on diverse multilingual prompts matching Darwin's broad training domain
Quantization methodology
This checkpoint was produced using a proprietary quantization pipeline developed by AMAImedia as part of the NOESIS DHCF-FNO framework (v14.7). The Qwen3.5 hybrid architecture used in Darwin-9B-Opus is not supported by upstream AutoAWQ; quantization required original engineering work developed internally at AMAImedia.
How to use
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
import torch
model_id = "amaimedia/Qwen3.5-9B-Darwin-Opus-NOESIS-AWQ-INT4"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoAWQForCausalLM.from_quantized(
model_id,
device_map={"": 0},
torch_dtype=torch.float16,
fuse_layers=False,
)
prompt = "Explain the difference between REST and GraphQL with code examples."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Note: Vision inputs are not supported through AutoAWQ's text-only path. For multimodal use, load the BF16 base model with
trust_remote_code=True.
NOESIS context
In NOESIS this model serves as a multilingual broad-domain teacher for Specialists M4-CHAT, M5-CODE, and M6-RESEARCH during knowledge distillation. It is loaded sequentially (per the NOESIS swapping protocol) onto the RTX 3060, producing top-K=512 logits at temperature=4.0.
⚠️ KD pipeline note: Darwin-9B-Opus has
vocab_size=248 320(Qwen3.5 extended vocab), while NOESIS student models use Qwen3-8B native vocab151 936. Logit extraction requires vocab head truncation to index 151 936 viapurify_logits()before ensemble aggregation inbuild_ensemble_labels.py. Proposed KD weight: w=0.30.
NOESIS specialists overview:
| ID | Role | Size |
|---|---|---|
| M1 | ASR (150+ langs) | 10B/3B |
| M2 | Dubbing LM (30 langs full) | 10B/3B |
| M3 | TTS + voice cloning | 10B/3B |
| M4 | Chat + creative writing | 10B/3B |
| M5 | Code + math | 10B/3B |
| M6 | Deep research (1M ctx) | 10B/3B |
| M7 | Prompt engineering | 4B/0.8B |
| M8 | Quality control (PRM) | 4B/0.8B |
| M9 | Orchestrator + routing | 4B/0.8B |
Acknowledgements & citation
Base model: Darwin-9B-Opus by FINAL-Bench (derived from Qwen3.5-9B + Claude Opus distillation).
@misc{darwin9b_opus,
title = {Darwin-9B-Opus},
author = {FINAL-Bench},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/FINAL-Bench/Darwin-9B-Opus}
}
Quantization & NOESIS integration:
@misc{noesis_v14,
title = {NOESIS v14.7: DHCF-FNO Multilingual Dubbing Platform},
author = {Bolotnikov, Ilia},
year = {2026},
publisher = {AMAImedia},
url = {https://amaimedia.com}
}
- Downloads last month
- 595
Model tree for AMAImedia/Qwen3.5-9B-Darwin-Opus-NOESIS-AWQ-INT4
Base model
FINAL-Bench/Darwin-9B-Opus