Qwen3.5-9B-Darwin-Opus-NOESIS-AWQ-INT4

AWQ INT4 quantization of FINAL-Bench/Darwin-9B-Opus optimized for low-VRAM consumer hardware (RTX 3060 6 GB).

Released as part of the NOESIS Professional Multilingual Dubbing Automation Platform (framework: DHCF-FNO — Deterministic Hybrid Control Framework for Frozen Neural Operators).


⚠️ License notice

This model is derived from FINAL-Bench/Darwin-9B-Opus which is licensed under Apache 2.0. This AWQ quantization retains the same Apache 2.0 license — see the LICENSE file in this repository for the full text.


Model summary

Property Value
Base model FINAL-Bench/Darwin-9B-Opus
Underlying architecture Qwen3_5ForConditionalGeneration (VLM — text + vision encoder)
Model type qwen3_5
Original precision BF16 safetensors (~18 GB)
Quantized precision AWQ INT4 (group_size=128, GEMM, zero_point=True)
Text vocab size 248 320
Context length 131 072 tokens
Hidden size (text) 4 096
Layers (text) 32 (hybrid: 24 GDN/linear_attention + 8 full_attention, interval=4)
Languages 201 (native Qwen3.5-9B multilingual)
Disk footprint ~4.7 GB
Inference VRAM ~5.2 GB (text-only path, no vision input)
Quantization library AutoAWQ 0.2.9
Calibration set 128 diverse prompts (code/reasoning/chat/research), max_seq_len=512
RNG seed 1729 (NOESIS reproducibility lock)

Architecture note: Darwin-9B-Opus is a merge model built with the Darwin V5 methodology (MRI-guided per-tensor diagnostics 70% + evolutionary genome optimization 30%, implemented via direct PyTorch DARE-TIES).

  • Father: Qwen/Qwen3.5-9B (original pre-training + RLHF)
  • Mother: Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled (LoRA SFT on Claude 4.6 Opus reasoning chains)

The underlying Qwen3.5 backbone is the VLM branch (Qwen3_5ForConditionalGeneration) with a vision encoder and a hybrid text decoder (qwen3_5_text sub-config): 24 GatedDeltaNet/linear_attention layers + 8 full self-attention layers (full_attn at every 4th layer). AWQ quantization targets the text decoder weights only: GDN layers have their MLP quantized; full_attention layers get self_attn + MLP quantized.


Why this quantization

The original Darwin-9B-Opus weights are in BF16 (~18 GB) which does not fit a 6 GB consumer GPU. This AWQ build:

  1. Fits inside ~5.2 GB VRAM for text-only inference on an RTX 3060 6 GB
  2. Uses GEMM kernel — compatible with device_map={"":0} (no CPU offload)
  3. Provenance-tracked (noesis_provenance.json ships alongside the model)
  4. Calibrated on diverse multilingual prompts matching Darwin's broad training domain

Quantization methodology

This checkpoint was produced using a proprietary quantization pipeline developed by AMAImedia as part of the NOESIS DHCF-FNO framework (v14.7). The Qwen3.5 hybrid architecture used in Darwin-9B-Opus is not supported by upstream AutoAWQ; quantization required original engineering work developed internally at AMAImedia.


How to use

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
import torch

model_id = "amaimedia/Qwen3.5-9B-Darwin-Opus-NOESIS-AWQ-INT4"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoAWQForCausalLM.from_quantized(
    model_id,
    device_map={"": 0},
    torch_dtype=torch.float16,
    fuse_layers=False,
)

prompt = "Explain the difference between REST and GraphQL with code examples."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Note: Vision inputs are not supported through AutoAWQ's text-only path. For multimodal use, load the BF16 base model with trust_remote_code=True.


NOESIS context

In NOESIS this model serves as a multilingual broad-domain teacher for Specialists M4-CHAT, M5-CODE, and M6-RESEARCH during knowledge distillation. It is loaded sequentially (per the NOESIS swapping protocol) onto the RTX 3060, producing top-K=512 logits at temperature=4.0.

⚠️ KD pipeline note: Darwin-9B-Opus has vocab_size=248 320 (Qwen3.5 extended vocab), while NOESIS student models use Qwen3-8B native vocab 151 936. Logit extraction requires vocab head truncation to index 151 936 via purify_logits() before ensemble aggregation in build_ensemble_labels.py. Proposed KD weight: w=0.30.

NOESIS specialists overview:

ID Role Size
M1 ASR (150+ langs) 10B/3B
M2 Dubbing LM (30 langs full) 10B/3B
M3 TTS + voice cloning 10B/3B
M4 Chat + creative writing 10B/3B
M5 Code + math 10B/3B
M6 Deep research (1M ctx) 10B/3B
M7 Prompt engineering 4B/0.8B
M8 Quality control (PRM) 4B/0.8B
M9 Orchestrator + routing 4B/0.8B

Acknowledgements & citation

Base model: Darwin-9B-Opus by FINAL-Bench (derived from Qwen3.5-9B + Claude Opus distillation).

@misc{darwin9b_opus,
  title     = {Darwin-9B-Opus},
  author    = {FINAL-Bench},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/FINAL-Bench/Darwin-9B-Opus}
}

Quantization & NOESIS integration:

@misc{noesis_v14,
  title     = {NOESIS v14.7: DHCF-FNO Multilingual Dubbing Platform},
  author    = {Bolotnikov, Ilia},
  year      = {2026},
  publisher = {AMAImedia},
  url       = {https://amaimedia.com}
}
Downloads last month
595
Safetensors
Model size
9B params
Tensor type
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AMAImedia/Qwen3.5-9B-Darwin-Opus-NOESIS-AWQ-INT4

Quantized
(4)
this model