Qwen3.5-9B-Darwin-Opus-NOESIS-AWQ-INT4

AWQ INT4 quantization of FINAL-Bench/Darwin-9B-Opus optimized for low-VRAM consumer hardware (RTX 3060 6 GB).

Released as part of the NOESIS Professional Multilingual Dubbing Automation Platform (framework: DHCF-FNO — Deterministic Hybrid Control Framework for Frozen Neural Operators).

Founder: Ilia Bolotnikov
Organization: AMAImedia.com
X (Twitter): @AMAImediacom
LinkedIn: Ilia Bolotnikov
Telegram: @djbionicl
NOESIS version: v14.7
Release date: 2026-04

⚠️ License notice

This model is derived from FINAL-Bench/Darwin-9B-Opus which is licensed under Apache 2.0. This AWQ quantization retains the same Apache 2.0 license — see the LICENSE file in this repository for the full text.

Model summary

Property	Value
Base model	FINAL-Bench/Darwin-9B-Opus
Underlying architecture	Qwen3_5ForConditionalGeneration (VLM — text + vision encoder)
Model type	`qwen3_5`
Original precision	BF16 safetensors (~18 GB)
Quantized precision	AWQ INT4 (group_size=128, GEMM, zero_point=True)
Text vocab size	248 320
Context length	131 072 tokens
Hidden size (text)	4 096
Layers (text)	32 (hybrid: 24 GDN/linear_attention + 8 full_attention, interval=4)
Languages	201 (native Qwen3.5-9B multilingual)
Disk footprint	~4.7 GB
Inference VRAM	~5.2 GB (text-only path, no vision input)
Quantization library	AutoAWQ 0.2.9
Calibration set	128 diverse prompts (code/reasoning/chat/research), max_seq_len=512
RNG seed	1729 (NOESIS reproducibility lock)

Architecture note: Darwin-9B-Opus is a merge model built with the Darwin V5 methodology (MRI-guided per-tensor diagnostics 70% + evolutionary genome optimization 30%, implemented via direct PyTorch DARE-TIES).

Father: Qwen/Qwen3.5-9B (original pre-training + RLHF)

Mother: Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled (LoRA SFT on Claude 4.6 Opus reasoning chains)

The underlying Qwen3.5 backbone is the VLM branch (Qwen3_5ForConditionalGeneration) with a vision encoder and a hybrid text decoder (qwen3_5_text sub-config): 24 GatedDeltaNet/linear_attention layers + 8 full self-attention layers (full_attn at every 4th layer). AWQ quantization targets the text decoder weights only: GDN layers have their MLP quantized; full_attention layers get self_attn + MLP quantized.

Why this quantization

The original Darwin-9B-Opus weights are in BF16 (~18 GB) which does not fit a 6 GB consumer GPU. This AWQ build:

Fits inside ~5.2 GB VRAM for text-only inference on an RTX 3060 6 GB
Uses GEMM kernel — compatible with device_map={"":0} (no CPU offload)
Provenance-tracked (noesis_provenance.json ships alongside the model)
Calibrated on diverse multilingual prompts matching Darwin's broad training domain

Quantization methodology

This checkpoint was produced using a proprietary quantization pipeline developed by AMAImedia as part of the NOESIS DHCF-FNO framework (v14.7). The Qwen3.5 hybrid architecture used in Darwin-9B-Opus is not supported by upstream AutoAWQ; quantization required original engineering work developed internally at AMAImedia.

How to use

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
import torch

model_id = "amaimedia/Qwen3.5-9B-Darwin-Opus-NOESIS-AWQ-INT4"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoAWQForCausalLM.from_quantized(
    model_id,
    device_map={"": 0},
    torch_dtype=torch.float16,
    fuse_layers=False,
)

prompt = "Explain the difference between REST and GraphQL with code examples."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Note: Vision inputs are not supported through AutoAWQ's text-only path. For multimodal use, load the BF16 base model with trust_remote_code=True.

NOESIS context

In NOESIS this model serves as a multilingual broad-domain teacher for Specialists M4-CHAT, M5-CODE, and M6-RESEARCH during knowledge distillation. It is loaded sequentially (per the NOESIS swapping protocol) onto the RTX 3060, producing top-K=512 logits at temperature=4.0.

⚠️ KD pipeline note: Darwin-9B-Opus has vocab_size=248 320 (Qwen3.5 extended vocab), while NOESIS student models use Qwen3-8B native vocab 151 936. Logit extraction requires vocab head truncation to index 151 936 via purify_logits() before ensemble aggregation in build_ensemble_labels.py. Proposed KD weight: w=0.30.

NOESIS specialists overview:

ID	Role	Size
M1	ASR (150+ langs)	10B/3B
M2	Dubbing LM (30 langs full)	10B/3B
M3	TTS + voice cloning	10B/3B
M4	Chat + creative writing	10B/3B
M5	Code + math	10B/3B
M6	Deep research (1M ctx)	10B/3B
M7	Prompt engineering	4B/0.8B
M8	Quality control (PRM)	4B/0.8B
M9	Orchestrator + routing	4B/0.8B

Acknowledgements & citation

Base model: Darwin-9B-Opus by FINAL-Bench (derived from Qwen3.5-9B + Claude Opus distillation).

@misc{darwin9b_opus,
  title     = {Darwin-9B-Opus},
  author    = {FINAL-Bench},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/FINAL-Bench/Darwin-9B-Opus}
}

Quantization & NOESIS integration:

@misc{noesis_v14,
  title     = {NOESIS v14.7: DHCF-FNO Multilingual Dubbing Platform},
  author    = {Bolotnikov, Ilia},
  year      = {2026},
  publisher = {AMAImedia},
  url       = {https://amaimedia.com}
}

Downloads last month: 595

Safetensors

Model size

9B params

Tensor type

I32

BF16

Model tree for AMAImedia/Qwen3.5-9B-Darwin-Opus-NOESIS-AWQ-INT4

Base model

FINAL-Bench/Darwin-9B-Opus

Quantized

(4)

this model