me²TARA – Qwen3‑4B‑Thinking (GGUF, Q4_K_M)

This repository contains a GGUF quantized version of Qwen/Qwen3-4B-Thinking-2507, prepared for use with llama.cpp and compatible runtimes, and used as the core thinking model inside the me²TARA empathetic assistant.

Base model: Qwen/Qwen3-4B-Thinking-2507
Architecture: Qwen3 (4B parameters, thinking‑tuned)
Format: GGUF
Quantization: Q4_K_M (good quality vs RAM / speed)
Intended use: General thinking following + warm, practical assistant behavior for local / offline inference.

Note: Safety, routing and domain logic are handled by the me²TARA backend. This repo provides the quantized Qwen/Qwen3-4B-Thinking-2507 backbone.

Available files

Filename	Quant type	Size	Notes
meetara-qwen3-4b-thinking-gguf-Q4_K_M.gguf	Q4_K_M	~2.3G	Default quant, recommended

More quantizations (e.g., Q5_K_M, Q8_0) can be added later to this repo as additional .gguf files.

Prompt format (recommended)

The model uses a Qwen‑style chat template. A simple, robust pattern is:

<|im_start|>system
You are me²TARA, an emotionally intelligent AI assistant built on top of a Qwen3‑4B‑Thinking base model. Always answer clearly, kindly, and with practical steps the user can take.
<|im_end|>
<|im_start|>user
{user_message}
<|im_end|>
<|im_start|>assistant

Example:

<|im_start|>system
You are me²TARA, an emotionally intelligent AI assistant built on top of a Qwen3‑4B‑Thinking base model. Always answer clearly, kindly, and with practical steps the user can take.
<|im_end|>
<|im_start|>user
How can I improve my sleep quality and manage stress naturally?
<|im_end|>
<|im_start|>assistant

Example usage (llama.cpp)

Basic interactive chat

./llama-simple-chat -m /path/to/meetara-qwen3-4b-thinking-gguf-Q4_K_M.gguf

With explicit system prompt

./llama-cli \
  -m /path/to/meetara-qwen3-4b-thinking-gguf-Q4_K_M.gguf \
  -p "<|im_start|>system You are me²TARA, an emotionally intelligent AI assistant built on top of a Qwen3‑4B‑Thinking base model. Always answer clearly, kindly, and with practical steps the user can take. <|im_end|> <|im_start|>user How can I improve my sleep quality and manage stress naturally? <|im_end|> <|im_start|>assistant"

Adjust flags like -n (max tokens), --temperature, --top_p, --top_k, etc. according to your hardware and latency/quality trade‑offs.

Downloading via `huggingface-cli`

pip install -U "huggingface_hub[cli]"

huggingface-cli download \
  meetara-qwen3-4b-thinking-gguf \
  --include "meetara-qwen3-4b-thinking-gguf-Q4_K_M.gguf" \
  --local-dir .

This will download only the Q4_K_M file into the current directory.

Intended behavior / me²TARA flavor

Compared to the raw Qwen/Qwen3-4B-Thinking-2507 model, this quantization is used in me²TARA to:

Respond in a warm, supportive tone while still being precise.
Emphasize practical, step‑by‑step guidance (especially for education, wellness, and daily‑life questions).
Handle follow‑up turns naturally (e.g., if a user says "yes please" after an offer, it should deliver the promised plan or template rather than repeating the question), when used with the me²TARA backend.

On its own, it behaves like a strong general thinking model with a gentle style; downstream systems can add domain‑specific prompts and safety policies.

Credits

Base model and original training: Qwen/Qwen3-4B-Thinking-2507 by Alibaba Cloud's Tongyi Lab.
Quantization and MeeTARA integration: meetara‑lab.

If you use this GGUF in your work, please also cite the original Qwen3 paper/model in addition to this repository.

Downloads last month: 25

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for meetara-lab/meetara-qwen3-4b-thinking-gguf

Base model

Qwen/Qwen3-4B-Thinking-2507

Quantized

(78)

this model