Instructions to use alpha-ai/qwen2.5-reason-thought-lite-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use alpha-ai/qwen2.5-reason-thought-lite-GGUF with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("alpha-ai/qwen2.5-reason-thought-lite-GGUF", dtype="auto")

llama-cpp-python

How to use alpha-ai/qwen2.5-reason-thought-lite-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="alpha-ai/qwen2.5-reason-thought-lite-GGUF",
	filename="qwen2.5-reason-thought-lite.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use alpha-ai/qwen2.5-reason-thought-lite-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf alpha-ai/qwen2.5-reason-thought-lite-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf alpha-ai/qwen2.5-reason-thought-lite-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf alpha-ai/qwen2.5-reason-thought-lite-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf alpha-ai/qwen2.5-reason-thought-lite-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf alpha-ai/qwen2.5-reason-thought-lite-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf alpha-ai/qwen2.5-reason-thought-lite-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf alpha-ai/qwen2.5-reason-thought-lite-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf alpha-ai/qwen2.5-reason-thought-lite-GGUF:Q4_K_M

Use Docker

docker model run hf.co/alpha-ai/qwen2.5-reason-thought-lite-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use alpha-ai/qwen2.5-reason-thought-lite-GGUF with Ollama:
```
ollama run hf.co/alpha-ai/qwen2.5-reason-thought-lite-GGUF:Q4_K_M
```

Unsloth Studio

How to use alpha-ai/qwen2.5-reason-thought-lite-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for alpha-ai/qwen2.5-reason-thought-lite-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for alpha-ai/qwen2.5-reason-thought-lite-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for alpha-ai/qwen2.5-reason-thought-lite-GGUF to start chatting

How to use alpha-ai/qwen2.5-reason-thought-lite-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf alpha-ai/qwen2.5-reason-thought-lite-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "alpha-ai/qwen2.5-reason-thought-lite-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use alpha-ai/qwen2.5-reason-thought-lite-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf alpha-ai/qwen2.5-reason-thought-lite-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default alpha-ai/qwen2.5-reason-thought-lite-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use alpha-ai/qwen2.5-reason-thought-lite-GGUF with Docker Model Runner:
```
docker model run hf.co/alpha-ai/qwen2.5-reason-thought-lite-GGUF:Q4_K_M
```

Lemonade

How to use alpha-ai/qwen2.5-reason-thought-lite-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull alpha-ai/qwen2.5-reason-thought-lite-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.qwen2.5-reason-thought-lite-GGUF-Q4_K_M

List all available models

lemonade list

Website - https://www.alphaai.biz

Uploaded Model

Developed by: alphaaico
License: apache-2.0
Finetuned from model: Qwen/Qwen2.5-3B-Instruct

This model, qwen2.5-reason-thought-lite, is a fine-tuned version of Qwen1.5 designed to not only reason through problems but also introspect on the reasoning process itself before delivering the final response. Its unique selling proposition (USP) is that it generates both a detailed reasoning and an internal thought on why that reasoning was made, all before presenting the final answer.

Overview

qwen2.5-reason-thought-lite has been finetuned using GRPO and advanced reward modelling techniques—including custom functions such as sequence_format_reward_func—to enforce a strict response structure and encourage deep reasoning. While we won't divulge all the details, these techniques ensure that the model generates responses in a precise sequence that includes both a detailed reasoning process and a subsequent internal reflection before providing the final answer.

Model Details

Base Model: Qwen/Qwen2.5-3B-Instruct
Fine-tuned by: alphaaico
Training Framework: Unsloth and Hugging Face’s TRL library
Finetuning Techniques: GRPO and additional reward modelling methods

Prompt Structure

The model is designed to generate responses in the following exact format:

Respond in the following exact format:
<reasoning>
[Your detailed reasoning here...]
</reasoning>
<thought>
[Your internal thought process about the reasoning...]
</thought>
<answer>
[Your final answer here...]
</answer>

Key Features

Enhanced Reasoning & Introspection: Produces detailed reasoning enclosed in <reasoning> tags and follows it with an internal thought process (the "why" behind the reasoning) enclosed in <thought> tags before giving the final answer in <answer> tags.
Structured Output: The response format is strictly enforced, making it easy to parse and integrate into downstream applications.
Optimized Inference: Fine-tuned using Unsloth and TRL for faster and more efficient performance on consumer hardware.
Versatile Deployment: Supports multiple quantization formats, including GGUF and 16-bit, to accommodate various hardware configurations.

Quantization Levels Available

q4_k_m
q5_k_m
q8_0
16 Bit (https://huggingface.co/alpha-ai/qwen2.5-reason-thought-lite)

Ideal Configuration for Using the Model

Temperature: 0.8
Top-p: 0.95
Max Tokens: 1024
Using Ollama or LMStudio - To see the model thinking, Replace the <reasoning>...</reasoning> tokens with <think>...</think> tokens.

Use Cases

qwen1.5-reason-thought-lite is best suited for:

Conversational AI: Empowering chatbots and virtual assistants with multi-step reasoning and introspective capabilities.
AI Research: Investigating advanced reasoning and decision-making processes.
Automated Decision Support: Enhancing business intelligence, legal reasoning, and financial analysis systems with structured, step-by-step outputs.
Educational Tools: Assisting students and professionals in structured learning and problem solving.
Creative Applications: Generating reflective and detailed content for storytelling, content creation, and more.

Limitations & Considerations

Domain Specificity: May require additional fine-tuning for specialized domains.
Factual Accuracy: Primarily focused on reasoning and introspection; not intended as a comprehensive factual knowledge base.
Inference Speed: Enhanced reasoning capabilities may result in slightly longer inference times.
Potential Biases: Output may reflect biases present in the training data.

License

This model is released under the Apache-2.0 license.

Acknowledgments

Special thanks to the Unsloth team for providing an optimized training pipeline and to Hugging Face’s TRL library for enabling advanced fine-tuning techniques.

Downloads last month: 36

GGUF

Model size

3B params

Architecture

qwen2

Hardware compatibility

4-bit

5-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alpha-ai/qwen2.5-reason-thought-lite-GGUF

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Quantized

(229)

this model

Dataset used to train alpha-ai/qwen2.5-reason-thought-lite-GGUF

Collection including alpha-ai/qwen2.5-reason-thought-lite-GGUF

Finetunes | SLMs and LLMs

Collection

Various variants of LLMs finetuned using proprietary data. • 27 items • Updated Jan 19 • 4