Instructions to use rubra-ai/Qwen2-7B-Instruct-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rubra-ai/Qwen2-7B-Instruct-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="rubra-ai/Qwen2-7B-Instruct-GGUF",
	filename="rubra-qwen2-7b-instruct.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use rubra-ai/Qwen2-7B-Instruct-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rubra-ai/Qwen2-7B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf rubra-ai/Qwen2-7B-Instruct-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rubra-ai/Qwen2-7B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf rubra-ai/Qwen2-7B-Instruct-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf rubra-ai/Qwen2-7B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf rubra-ai/Qwen2-7B-Instruct-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf rubra-ai/Qwen2-7B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf rubra-ai/Qwen2-7B-Instruct-GGUF:Q4_K_M

Use Docker

docker model run hf.co/rubra-ai/Qwen2-7B-Instruct-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use rubra-ai/Qwen2-7B-Instruct-GGUF with Ollama:
```
ollama run hf.co/rubra-ai/Qwen2-7B-Instruct-GGUF:Q4_K_M
```

Unsloth Studio new

How to use rubra-ai/Qwen2-7B-Instruct-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rubra-ai/Qwen2-7B-Instruct-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rubra-ai/Qwen2-7B-Instruct-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for rubra-ai/Qwen2-7B-Instruct-GGUF to start chatting

Docker Model Runner
How to use rubra-ai/Qwen2-7B-Instruct-GGUF with Docker Model Runner:
```
docker model run hf.co/rubra-ai/Qwen2-7B-Instruct-GGUF:Q4_K_M
```

Lemonade

How to use rubra-ai/Qwen2-7B-Instruct-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull rubra-ai/Qwen2-7B-Instruct-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Qwen2-7B-Instruct-GGUF-Q4_K_M

List all available models

lemonade list

Qwen2 7B Instruct GGUF

Original model: rubra-ai/Qwen2-7B-Instruct

Model description

The model is the result of further post-training Qwen/Qwen2-7B-Instruct. It is capable of complex multi-turn tool/function calling.

Training

The model was post-trained (freeze tuned & DPO) on a proprietary dataset consisting of diverse function calling, chat, and instruct data.

How to use

Refer to https://docs.rubra.ai/inference/llamacpp for usage. Feel free to ask/open issues up in our Github repo: https://github.com/rubra-ai/rubra

Limitations and Bias

While the model performs well on a wide range of tasks, it may still produce biased or incorrect outputs. Users should exercise caution and critical judgment when using the model in sensitive or high-stakes applications. The model's outputs are influenced by the data it was trained on, which may contain inherent biases.

Ethical Considerations

Users should ensure that the deployment of this model adheres to ethical guidelines and consider the potential societal impact of the generated text. Misuse of the model for generating harmful or misleading content is strongly discouraged.

Acknowledgements

We would like to thank Alibaba Cloud for the model.

Contact Information

For questions or comments about the model, please reach out to the rubra team.

Citation

If you use this work, please cite it as:

@misc {rubra_ai_2024,
    author       = { Sanjay Nadhavajhala and Yingbei Tong },
    title        = { Rubra-Qwen2-7B-Instruct },
    year         = 2024,
    url          = { https://huggingface.co/rubra-ai/Qwen2-7B-Instruct },
    doi          = { 10.57967/hf/2683 },
    publisher    = { Hugging Face }
}

Downloads last month: 102

GGUF

Model size

9B params

Architecture

qwen2

Hardware compatibility

4-bit

5-bit

6-bit

8-bit

View +1 variant

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including rubra-ai/Qwen2-7B-Instruct-GGUF

Rubra v0.1-GGUF

Collection

llama.cpp quants for top LLMs enhanced with function (tool) calling • 7 items • Updated Sep 25, 2024

Evaluation results

5-shot on MMLU
self-reported

68.880
0-shot on GPQA
self-reported

30.360
8-shot, CoT on GSM-8K
self-reported

75.820
4-shot, CoT on MATH
self-reported

28.720
GPT-4 as Judge on MT-bench
self-reported

8.080