Instructions to use prithivMLmods/Marco-o1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/Marco-o1-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="prithivMLmods/Marco-o1-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("prithivMLmods/Marco-o1-GGUF", dtype="auto")

llama-cpp-python

How to use prithivMLmods/Marco-o1-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="prithivMLmods/Marco-o1-GGUF",
	filename="marco-o1-f16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use prithivMLmods/Marco-o1-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf prithivMLmods/Marco-o1-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf prithivMLmods/Marco-o1-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf prithivMLmods/Marco-o1-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf prithivMLmods/Marco-o1-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf prithivMLmods/Marco-o1-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf prithivMLmods/Marco-o1-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf prithivMLmods/Marco-o1-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf prithivMLmods/Marco-o1-GGUF:Q4_K_M

Use Docker

docker model run hf.co/prithivMLmods/Marco-o1-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use prithivMLmods/Marco-o1-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/Marco-o1-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Marco-o1-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/Marco-o1-GGUF:Q4_K_M

SGLang

How to use prithivMLmods/Marco-o1-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/Marco-o1-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Marco-o1-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/Marco-o1-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Marco-o1-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use prithivMLmods/Marco-o1-GGUF with Ollama:
```
ollama run hf.co/prithivMLmods/Marco-o1-GGUF:Q4_K_M
```

Unsloth Studio new

How to use prithivMLmods/Marco-o1-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for prithivMLmods/Marco-o1-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for prithivMLmods/Marco-o1-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for prithivMLmods/Marco-o1-GGUF to start chatting

Docker Model Runner
How to use prithivMLmods/Marco-o1-GGUF with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/Marco-o1-GGUF:Q4_K_M
```

Lemonade

How to use prithivMLmods/Marco-o1-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull prithivMLmods/Marco-o1-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Marco-o1-GGUF-Q4_K_M

List all available models

lemonade list

MarcoPolo / Marco-o1-GGUF Modelfile

The Marco-o1 model is designed to excel not only in structured disciplines like mathematics, physics, and coding, which traditionally benefit from reinforcement learning (RL), but also in addressing open-ended problems that require creativity, reasoning, and nuanced understanding. This unique capability positions it as a versatile tool for tasks demanding both precision and innovation.

File Name [ Updated Files ]	Size	Description	Upload Status
`.gitattributes`	2.25 kB	Git attributes configuration file	Uploaded
`README.md`	5.23 kB	Updated README	Uploaded
`config.json`	29 Bytes	Configuration file	Uploaded
`marco-o1-f16.gguf`	15.2 GB	MARCO-O1 model (F16 precision)	Uploaded (LFS)
`marco-o1-q2_k.gguf`	3.02 GB	MARCO-O1 model (Q2_K quantization)	Uploaded (LFS)
`marco-o1-q3_k_l.gguf`	4.09 GB	MARCO-O1 model (Q3_K_L quantization)	Uploaded (LFS)
`marco-o1-q3_k_m.gguf`	3.81 GB	MARCO-O1 model (Q3_K_M quantization)	Uploaded (LFS)
`marco-o1-q3_k_s.gguf`	3.49 GB	MARCO-O1 model (Q3_K_S quantization)	Uploaded (LFS)
`marco-o1-q4_0.gguf`	4.43 GB	MARCO-O1 model (Q4_0 quantization)	Uploaded (LFS)
`marco-o1-q4_k_m.gguf`	4.68 GB	MARCO-O1 model (Q4_K_M quantization)	Uploaded (LFS)
`marco-o1-q4_k_s.gguf`	4.46 GB	MARCO-O1 model (Q4_K_S quantization)	Uploaded (LFS)
`marco-o1-q5_0.gguf`	5.32 GB	MARCO-O1 model (Q5_0 quantization)	Uploaded (LFS)
`marco-o1-q5_k_m.gguf`	5.44 GB	MARCO-O1 model (Q5_K_M quantization)	Uploaded (LFS)
`marco-o1-q5_k_s.gguf`	5.32 GB	MARCO-O1 model (Q5_K_S quantization)	Uploaded (LFS)
`marco-o1-q6_k.gguf`	6.25 GB	MARCO-O1 model (Q6_K quantization)	Uploaded (LFS)
`marco-o1-q8_0.gguf`	8.1 GB	MARCO-O1 model (Q8_0 quantization)	Uploaded (LFS)

Key Features:

Structured Problem Solving: Strong performance in tasks with definitive answers, such as calculations, algorithm design, and logical problem-solving.
Open-Ended Resolutions: Exceptional capability to generate thoughtful, nuanced responses for abstract or subjective queries, making it ideal for discussions, ideation, and explorative problem-solving.
Reinforcement Learning Optimization: Utilizes RL for improving accuracy in structured tasks while employing sophisticated datasets to enhance performance in creative or subjective domains.

Intended Applications:

Academic Assistance: Solve complex mathematical or scientific problems and explain concepts with clarity.
Creative Ideation: Generate innovative ideas, solutions, or approaches to open-ended challenges.
Coding and Debugging: Provide reliable coding solutions, optimizations, and error debugging in various programming languages.
Discussion and Debate: Engage in meaningful conversations on subjective or philosophical topics, offering well-reasoned perspectives.

The Marco-o1 model seamlessly blends analytical rigor with creative adaptability, making it an exceptional choice for a wide range of applications in education, research, and beyond.

Run Ollama [ Marco-o1 ]

Ollama is a powerful tool that simplifies running machine learning models, allowing you to manage GGUF models effortlessly. This guide outlines the steps to download, install, and run your models quickly. To get started, download Ollama from https://ollama.com/download and install it on your Windows or Mac system. Once installed, creating a GGUF model involves a few straightforward steps.

First, create a model file and name it appropriately, such as metallama. Inside this file, include a FROM line to specify the base model you want to use. For instance, you can use FROM Llama-3.2-1B.F16.gguf. Ensure the specified model file is in the same directory as your script. Next, open your terminal and run the command ollama create metallama -f ./metallama to create and patch your model. After the process completes, you can confirm the successful creation of the model by running ollama list and ensuring metallama appears in the list.

To run your newly created model, use the command ollama run metallama in your terminal. You can then interact with the model directly. For example, asking the model to "write a mini passage about Space X" might generate a response highlighting Space X's revolutionary role in aerospace, its reusable rockets, and its vision for establishing colonies on Mars.

Sample Usage

In the command prompt, you can execute:

D:\>ollama run metallama

You can interact with the model like this:

>>> write a mini passage about space x
Space X, the private aerospace company founded by Elon Musk, is revolutionizing the field of space exploration.
With its ambitious goals to make humanity a multi-planetary species and establish a sustainable human presence in
the cosmos, Space X has become a leading player in the industry. The company's spacecraft, like the Falcon 9, have
demonstrated remarkable capabilities, allowing for the transport of crews and cargo into space with unprecedented
efficiency. As technology continues to advance, the possibility of establishing permanent colonies on Mars becomes
increasingly feasible, thanks in part to the success of reusable rockets that can launch multiple times without
sustaining significant damage. The journey towards becoming a multi-planetary species is underway, and Space X
plays a pivotal role in pushing the boundaries of human exploration and settlement.

With these easy steps, Ollama enables you to download, install, and operate custom or pre-trained models seamlessly. Whether you're exploring Llama’s capabilities or working on custom GGUF models, Ollama offers an efficient and user-friendly solution to achieve your machine learning objectives.

Downloads last month: 113

GGUF

Model size

8B params

Architecture

qwen2

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for prithivMLmods/Marco-o1-GGUF

Base model

AIDC-AI/Marco-o1

Quantized

(34)

this model

Collection including prithivMLmods/Marco-o1-GGUF

GPT-Generated Unified Format (GGUF)

Collection

ease of reading • 53 items • Updated 19 days ago • 4