meΒ²TARA – Qwen3‑4B‑Thinking (GGUF, Q4_K_M)

This repository contains a GGUF quantized version of Qwen/Qwen3-4B-Thinking-2507, prepared for use with llama.cpp and compatible runtimes, and used as the core thinking model inside the meΒ²TARA empathetic assistant.

  • Base model: Qwen/Qwen3-4B-Thinking-2507
  • Architecture: Qwen3 (4B parameters, thinking‑tuned)
  • Format: GGUF
  • Quantization: Q4_K_M (good quality vs RAM / speed)
  • Intended use: General thinking following + warm, practical assistant behavior for local / offline inference.

Note: Safety, routing and domain logic are handled by the meΒ²TARA backend. This repo provides the quantized Qwen/Qwen3-4B-Thinking-2507 backbone.


Available files

Filename Quant type Size Notes
meetara-qwen3-4b-thinking-gguf-Q4_K_M.gguf Q4_K_M ~2.3G Default quant, recommended

More quantizations (e.g., Q5_K_M, Q8_0) can be added later to this repo as additional .gguf files.


Prompt format (recommended)

The model uses a Qwen‑style chat template. A simple, robust pattern is:

<|im_start|>system
You are meΒ²TARA, an emotionally intelligent AI assistant built on top of a Qwen3‑4B‑Thinking base model. Always answer clearly, kindly, and with practical steps the user can take.
<|im_end|>
<|im_start|>user
{user_message}
<|im_end|>
<|im_start|>assistant

Example:

<|im_start|>system
You are meΒ²TARA, an emotionally intelligent AI assistant built on top of a Qwen3‑4B‑Thinking base model. Always answer clearly, kindly, and with practical steps the user can take.
<|im_end|>
<|im_start|>user
How can I improve my sleep quality and manage stress naturally?
<|im_end|>
<|im_start|>assistant

Example usage (llama.cpp)

Basic interactive chat

./llama-simple-chat -m /path/to/meetara-qwen3-4b-thinking-gguf-Q4_K_M.gguf

With explicit system prompt

./llama-cli \
  -m /path/to/meetara-qwen3-4b-thinking-gguf-Q4_K_M.gguf \
  -p "<|im_start|>system You are meΒ²TARA, an emotionally intelligent AI assistant built on top of a Qwen3‑4B‑Thinking base model. Always answer clearly, kindly, and with practical steps the user can take. <|im_end|> <|im_start|>user How can I improve my sleep quality and manage stress naturally? <|im_end|> <|im_start|>assistant"

Adjust flags like -n (max tokens), --temperature, --top_p, --top_k, etc. according to your hardware and latency/quality trade‑offs.


Downloading via huggingface-cli

pip install -U "huggingface_hub[cli]"

huggingface-cli download \
  meetara-qwen3-4b-thinking-gguf \
  --include "meetara-qwen3-4b-thinking-gguf-Q4_K_M.gguf" \
  --local-dir .

This will download only the Q4_K_M file into the current directory.


Intended behavior / meΒ²TARA flavor

Compared to the raw Qwen/Qwen3-4B-Thinking-2507 model, this quantization is used in meΒ²TARA to:

  • Respond in a warm, supportive tone while still being precise.
  • Emphasize practical, step‑by‑step guidance (especially for education, wellness, and daily‑life questions).
  • Handle follow‑up turns naturally (e.g., if a user says "yes please" after an offer, it should deliver the promised plan or template rather than repeating the question), when used with the meΒ²TARA backend.

On its own, it behaves like a strong general thinking model with a gentle style; downstream systems can add domain‑specific prompts and safety policies.


Credits

  • Base model and original training: Qwen/Qwen3-4B-Thinking-2507 by Alibaba Cloud's Tongyi Lab.
  • Quantization and MeeTARA integration: meetara‑lab.

If you use this GGUF in your work, please also cite the original Qwen3 paper/model in addition to this repository.

Downloads last month
25
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for meetara-lab/meetara-qwen3-4b-thinking-gguf

Quantized
(78)
this model