nani-qwen-3.5-2B
A fine-tuned Qwen3.5-2B for crypto wallet tool calling. Trained with Unsloth LoRA on 2,029 examples covering 103 blockchain tools from agentek.
Quickstart
# Download the GGUF
huggingface-cli download NaniDAO/nani-qwen-3.5-2B-gguf-q4km \
--local-dir ~/models/nani-2b-q4km
# Register with Ollama
cd ~/models/nani-2b-q4km
ollama create nani -f Modelfile
# Run
ollama run nani --system "$(cat system-prompt.txt)" "resolve vitalik.eth"
Model Details
| Base model | Qwen3.5-2B |
| Method | LoRA (r=16, alpha=16, dropout=0.05) |
| Training data | 2,029 examples, 103 tools |
| Epochs | 3 (optimal — 4th showed no improvement) |
| Hardware | 1x T4 GPU (Kaggle) |
| Quantization | Q4_K_M (this repo) |
| Context | 4096 tokens |
| License | Same as Qwen3.5-2B |
Eval Results
Evaluated on 50 held-out examples. Base = Qwen3.5-2B without fine-tuning.
| Metric | Base | Nani | Delta |
|---|---|---|---|
| Tool call accuracy | 98.0% | 96.0% | -2.0% |
| Correct function | 75.5% | 81.6% | +6.1% |
| Correct params | 67.3% | 71.4% | +4.1% |
| Format valid | 100.0% | 100.0% | — |
Has <think> block |
28.0% | 100.0% | +72.0% |
| No-tool correct | 0.0% | 100.0% | +100.0% |
The model learned to pick the right tool (+6.1%), use correct parameters (+4.1%), reason before acting (100% thinking), and know when NOT to call a tool (+100%). The small drop in tool call accuracy (-2%) is mostly sampling noise — 3 of 4 "failures" succeed on rerun.
Tool Call Format
The model uses Qwen3.5's native XML tool calling format:
<think>
The user wants to resolve an ENS name. I'll use resolveENS.
</think>
<tool_call>
<function=resolveENS>
<parameter=name>vitalik.eth</parameter>
</function>
</tool_call>
Tool results are passed back as tool role messages, then the model generates a final response.
System Prompt Format
Tools must be defined in the system prompt as newline-separated JSON inside <tools> tags. This matches the training data format exactly:
# Tools
You have access to the following functions:
<tools>
{"type":"function","function":{"name":"resolveENS","description":"Resolves an ENS name to an Ethereum address","parameters":{"type":"object","properties":{"name":{"type":"string","description":"The ENS name to resolve"}},"required":["name"]}}}
{"type":"function","function":{"name":"getBalance","description":"Get the native token (ETH) balance for an address","parameters":{"type":"object","properties":{"address":{"type":"string","description":"The wallet address (0x...)"},"chainId":{"type":"number","description":"Chain ID (1=Ethereum, 8453=Base)"}},"required":["address"]}}}
</tools>
If you choose to call a function ONLY reply in the following format with NO suffix:
<tool_call>
<function=example_function_name>
<parameter=example_parameter_1>
value_1
</parameter>
</function>
</tool_call>
<IMPORTANT>
Reminder:
- Function calls MUST follow the specified format
- Required parameters MUST be specified
- You may provide optional reasoning BEFORE the function call, but NOT after
- If there is no function call available, answer the question like normal
</IMPORTANT>
You are Nani, a crypto wallet assistant.
Keep tool schemas simple — use {"type": "string", "description": "..."} per property. Complex schemas with anyOf, $ref, or nested objects confuse the 2B model.
Supported Tools
Trained on 103 tools from agentek. Top tools by training examples:
| Tool | Examples | Category |
|---|---|---|
| intentSwap | 183 | DEX trading |
| intentTransfer | 157 | Token transfers |
| resolveENS | 71 | ENS resolution |
| getBalance | 64 | Balance queries |
| getBalanceOf | 59 | ERC20 balances |
| resolveWNS | 58 | WNS resolution |
| getCryptoPrice | 55 | Price data |
| lookupENS | 54 | Reverse ENS |
| getQuote | 53 | Swap quotes |
| getFearAndGreedIndex | 50 | Market sentiment |
Full coverage includes: ENS/WNS, ERC20, Uniswap V3, Aave, bridging (Across), security (ScamSniffer), blockscout explorer, gas estimation, DeFillama yields, NFTs, and more.
Ollama Modelfile
FROM ./nani-qwen-3.5-2B-Q4_K_M.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.8
PARAMETER top_k 20
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
PARAMETER num_ctx 4096
TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{- range .Messages }}{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ .Content }}<|im_end|>
{{ else if eq .Role "tool" }}<|im_start|>tool
{{ .Content }}<|im_end|>
{{ end }}{{- end }}<|im_start|>assistant
"""
SYSTEM """You are Nani, a crypto wallet assistant."""
Do NOT add </tool_call> as a stop token — it prevents the model from outputting the closing tag, breaking tool call parsing.
Ollama API Usage
Call Ollama directly and parse tool calls from the text output:
curl http://localhost:11434/api/chat -d '{
"model": "nani",
"messages": [
{"role": "system", "content": "<your system prompt with tools>"},
{"role": "user", "content": "resolve vitalik.eth"}
],
"stream": false
}'
Parse <tool_call> XML from message.content, execute the tool, then send the result back:
curl http://localhost:11434/api/chat -d '{
"model": "nani",
"messages": [
{"role": "system", "content": "<your system prompt with tools>"},
{"role": "user", "content": "resolve vitalik.eth"},
{"role": "assistant", "content": "<tool_call>\n<function=resolveENS>\n<parameter=name>vitalik.eth</parameter>\n</function>\n</tool_call>"},
{"role": "tool", "content": "0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045"}
],
"stream": false
}'
Tool Call Parsing
Regex to extract tool calls from model output:
const regex = /<tool_call>\s*<function=(\w+)>([\s\S]*?)<\/function>\s*(?:<\/tool_call>)?/g;
const paramRegex = /<parameter=(\w+)>([\s\S]*?)<\/parameter>/g;
All parameter values are strings — coerce to numbers/booleans based on the tool's schema before execution.
Training Config
model = FastLanguageModel.get_peft_model(
model,
r=16,
lora_alpha=16,
lora_dropout=0.05,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
)
# Tokenize with enable_thinking=True (critical)
text = tokenizer.apply_chat_template(messages, tokenize=False,
add_generation_prompt=False, enable_thinking=True)
TrainingArguments(
num_train_epochs=3,
learning_rate=5e-5,
per_device_train_batch_size=1,
gradient_accumulation_steps=16,
lr_scheduler_type="cosine",
warmup_ratio=0.1,
fp16=True,
optim="adamw_8bit",
)
enable_thinking=True is critical. With it disabled (v2), the model regressed -64% in tool accuracy because the tokenizer mangled the <think> blocks present in 97% of training data.
Local Testing
See nani-local for a Vite + React test app that connects to Ollama and executes real agentek tools with streaming, tool call visualization, and regex-based tool filtering (5 tools per message).
HuggingFace Repos
| Repo | Format | Size |
|---|---|---|
| NaniDAO/nani-qwen-3.5-2B | Merged fp16 | ~5GB |
| NaniDAO/nani-qwen-3.5-2B-gguf-q4km | GGUF Q4_K_M | ~1.3GB |
Vision Support
Qwen3.5-2B is a vision-language model (VLM), and the base weights include a vision encoder. However, this GGUF only contains the text/language model weights. The vision encoder (mmproj) was not included in the GGUF conversion.
Current state:
- Text-based tool calling works fully via Ollama
- Image input is NOT supported in this GGUF
- Ollama does not yet support Qwen3.5's mmproj format
- llama.cpp supports it with separate
--mmprojflag, but requires reconversion
To add vision later: reconvert from the merged fp16 model using llama.cpp's convert_hf_to_gguf.py which extracts the mmproj automatically, then use llama.cpp directly with --model nani.gguf --mmproj nani-mmproj-f16.gguf.
Known Limitations
- Text-only — vision encoder not included in this GGUF (see above)
- 2B model size — sometimes hallucinates tool names or picks wrong parameters. Works best with 5 or fewer tools in the system prompt.
- Simple schemas only — complex JSON schemas with
anyOf,$ref, or nested objects confuse the model. Keep tool definitions flat:{"type": "string", "description": "..."}per property. - Training data imbalance — 78 of 103 tools have fewer than 15 examples. Performance on underrepresented tools is weaker.
Future Work
The model has plateaued with the current ~2K dataset. Next improvements require more data:
- Expand to 4,000-5,000 examples
- Cover all 150+ agentek tools (currently 103)
- Balance tool distribution (cap at 50, min 20-25 per tool)
- More multi-tool chains (currently 47, target 150+)
- More no-tool conversations (currently 94, target 250+)
- Parameter edge cases (optional params, wei values, chain variants)
- Vision encoder GGUF extraction + Ollama/llama.cpp support
- Downloads last month
- 562
4-bit
Model tree for NaniDAO/nani-qwen-3.5-2B-gguf-q4km
Evaluation results
- Tool Call Accuracyself-reported96.000
- Correct Functionself-reported81.600
- Correct Paramsself-reported71.400
- Format Validself-reported100.000
- No-Tool Correctself-reported100.000