---
base_model: nvidia/Orchestrator-8B
tags:
- llama-cpp
- gguf-my-repo
- reinforcementlearning
- tool-calling
- multi-reasoning
- orchestrator-model
- orchestration
language:
- en
pipeline_tag: reinforcement-learning
---
# AXONVERTEX-AI-RESEARCH/Orchestrator-8B-Q8_0-GGUF
This model was converted to GGUF format from [`nvidia/Orchestrator-8B`](https://huggingface.co/nvidia/Orchestrator-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
Refer to the [original model card](https://huggingface.co/nvidia/Orchestrator-8B) for more details on the model.
## Use with ollama
```bash
root@90dd7d73d62b:/# ollama pull hf.co/AXONVERTEX-AI-RESEARCH/Qwen3-Embedding-0.6B-Q8_0-GGUF:Q8_0
pulling manifest
pulling ee029816fb96: 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 639 MB
pulling eb4402837c78: 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.5 KB
pulling 4a6ce91d86a8: 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 99 B
pulling be570f0686c3: 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 549 B
verifying sha256 digest
writing manifest
success
root@90dd7d73d62b:/# ollama pull hf.co/AXONVERTEX-AI-RESEARCH/Orchestrator-8B-Q8_0-GGUF:Q8_0
pulling manifest
pulling 7ba8f19c5542: 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 8.7 GB
pulling eb4402837c78: 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.5 KB
pulling 4a6ce91d86a8: 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 99 B
pulling 9dfdfd94d3aa: 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 552 B
verifying sha256 digest
writing manifest
success
root@90dd7d73d62b:/# ollama run hf.co/AXONVERTEX-AI-RESEARCH/Orchestrator-8B-Q8_0-GGUF:Q8_0
>>> Hello
Okay, the user said "Hello". I need to respond appropriately. Since they just greeted me, I should acknowledge their greeting and offer assistance. Let me make sure my response is friendly and
open-ended. Maybe something like, "Hello! How can I assist you today?" That sounds good. I should keep it simple and inviting.
Hello! How can I assist you today? 😊
```
## chat-template
```bash
{%- if tools %}
{{- '<|im_start|>system\n' }}
{%- if messages[0].role == 'system' %}
{{- messages[0].content + '\n\n' }}
{%- endif %}
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within XML tags:\n" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n\n\nFor each function call, return a json object with function name and arguments within XML tags:\n\n{\"name\": , \"arguments\": }\n<|im_end|>\n" }}
{%- else %}
{%- if messages[0].role == 'system' %}
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('') and message.content.endswith('')) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endfor %}
{%- for message in messages %}
{%- if message.content is string %}
{%- set content = message.content %}
{%- else %}
{%- set content = '' %}
{%- endif %}
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
{%- elif message.role == "assistant" %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is string %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '' in content %}
{%- set reasoning_content = content.split('')[0].rstrip('\n').split('')[-1].lstrip('\n') %}
{%- set content = content.split('')[-1].lstrip('\n') %}
{%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_query_index %}
{%- if loop.last or (not loop.last and reasoning_content) %}
{{- '<|im_start|>' + message.role + '\n\n' + reasoning_content.strip('\n') + '\n\n\n' + content.lstrip('\n') }}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- if message.tool_calls %}
{%- for tool_call in message.tool_calls %}
{%- if (loop.first and content) or (not loop.first) %}
{{- '\n' }}
{%- endif %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '\n{"name": "' }}
{{- tool_call.name }}
{{- '", "arguments": ' }}
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{{- tool_call.arguments | tojson }}
{%- endif %}
{{- '}\n' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "tool" %}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '\n\n' }}
{{- content }}
{{- '\n' }}
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- if enable_thinking is defined and enable_thinking is false %}
{{- '\n\n\n\n' }}
{%- endif %}
{%- endif %}
```
## Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)
```bash
brew install llama.cpp
```
Invoke the llama.cpp server or the CLI.
### CLI:
```bash
llama-cli --hf-repo AXONVERTEX-AI-RESEARCH/Orchestrator-8B-Q8_0-GGUF --hf-file orchestrator-8b-q8_0.gguf -p "The meaning to life and the universe is"
```
### Server:
```bash
llama-server --hf-repo AXONVERTEX-AI-RESEARCH/Orchestrator-8B-Q8_0-GGUF --hf-file orchestrator-8b-q8_0.gguf -c 2048
```
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
Step 1: Clone llama.cpp from GitHub.
```
git clone https://github.com/ggerganov/llama.cpp
```
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
```
cd llama.cpp && LLAMA_CURL=1 make
```
Step 3: Run inference through the main binary.
```
./llama-cli --hf-repo AXONVERTEX-AI-RESEARCH/Orchestrator-8B-Q8_0-GGUF --hf-file orchestrator-8b-q8_0.gguf -p "The meaning to life and the universe is"
```
or
```
./llama-server --hf-repo AXONVERTEX-AI-RESEARCH/Orchestrator-8B-Q8_0-GGUF --hf-file orchestrator-8b-q8_0.gguf -c 2048
```