Instructions to use nyuuzyou/SmolLM2-360M-Eagle-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nyuuzyou/SmolLM2-360M-Eagle-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="nyuuzyou/SmolLM2-360M-Eagle-GGUF",
	filename="SmolLM2-360M-Eagle-bf16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use nyuuzyou/SmolLM2-360M-Eagle-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf nyuuzyou/SmolLM2-360M-Eagle-GGUF:BF16
# Run inference directly in the terminal:
llama-cli -hf nyuuzyou/SmolLM2-360M-Eagle-GGUF:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf nyuuzyou/SmolLM2-360M-Eagle-GGUF:BF16
# Run inference directly in the terminal:
llama-cli -hf nyuuzyou/SmolLM2-360M-Eagle-GGUF:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf nyuuzyou/SmolLM2-360M-Eagle-GGUF:BF16
# Run inference directly in the terminal:
./llama-cli -hf nyuuzyou/SmolLM2-360M-Eagle-GGUF:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf nyuuzyou/SmolLM2-360M-Eagle-GGUF:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf nyuuzyou/SmolLM2-360M-Eagle-GGUF:BF16

Use Docker

docker model run hf.co/nyuuzyou/SmolLM2-360M-Eagle-GGUF:BF16

LM Studio
Jan

vLLM

How to use nyuuzyou/SmolLM2-360M-Eagle-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nyuuzyou/SmolLM2-360M-Eagle-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nyuuzyou/SmolLM2-360M-Eagle-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/nyuuzyou/SmolLM2-360M-Eagle-GGUF:BF16

Ollama
How to use nyuuzyou/SmolLM2-360M-Eagle-GGUF with Ollama:
```
ollama run hf.co/nyuuzyou/SmolLM2-360M-Eagle-GGUF:BF16
```

Unsloth Studio new

How to use nyuuzyou/SmolLM2-360M-Eagle-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nyuuzyou/SmolLM2-360M-Eagle-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nyuuzyou/SmolLM2-360M-Eagle-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for nyuuzyou/SmolLM2-360M-Eagle-GGUF to start chatting

Docker Model Runner
How to use nyuuzyou/SmolLM2-360M-Eagle-GGUF with Docker Model Runner:
```
docker model run hf.co/nyuuzyou/SmolLM2-360M-Eagle-GGUF:BF16
```

Lemonade

How to use nyuuzyou/SmolLM2-360M-Eagle-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull nyuuzyou/SmolLM2-360M-Eagle-GGUF:BF16

Run and chat with the model

lemonade run user.SmolLM2-360M-Eagle-GGUF-BF16

List all available models

lemonade list

SmolLM2-360M-Eagle-GGUF

SmolLM2-360M-Eagle-GGUF is a GGUF conversion of the SmolLM2-360M-Eagle model, which itself is a fine-tuned version of SmolLM2-360M on the EagleSFT dataset. This model is designed to improve capabilities in both Russian and English language tasks while being optimized for efficient local deployment.

Model Description

SmolLM2-360M-Eagle is a lightweight language model that has been fine-tuned specifically to handle bilingual content. This fine-tuning extends the base model's capabilities to better understand and generate content in Russian while maintaining its English competency.

Base Model

The model is built upon SmolLM2-360M, a compact language model with 360 million parameters that offers a good balance between performance and resource requirements.

Fine-tuning Details

Dataset

The model was fine-tuned on the EagleSFT dataset, which contains 536,231 pairs of human questions and machine-generated responses in both Russian and English languages. The dataset primarily focuses on educational content but also includes everyday questions and casual conversations.

Environmental Impact

Training duration: 41h 14m total in Saint-Petersburg, Russia
Power consumption: 380W average
Hardware: 1 x RTX 4090
Carbon emissions: Approximately 5.48 kg CO2eq
- Calculated based on average power consumption and average CO2eq/kWh (350g) in this region
- Saint-Petersburg: 380W * 41.23h * 350g/kWh = 5.48 kg CO2eq

Training Parameters

Training approach: Supervised Fine-Tuning (SFT)
Training epochs: 2
Learning rate: 3.0e-04
Precision: bfloat16

Limitations and Capabilities

It's important to note that this model was not pre-trained but only underwent SFT on a relatively small number of tokens. This means that the model has a limited amount of data to rely on when answering in Russian compared to its English capabilities.

Despite extensive limitations, the model shows minimal improvement in:

Basic recognition of Russian prompts (though with frequent misunderstandings)
Handling simple tasks formatted as "{question in Russian}, answer in English"
Basic translation from Russian to English (though quality remains poor)

The model's minimal understanding of Russian language comes solely from the supervised fine-tuning process without any proper pre-training with Russian text corpus, resulting in severely limited capabilities.

Experimental Capabilities

The model demonstrates some experimental capabilities, but with significant limitations:

Basic Russian text understanding (with frequent errors and misinterpretations)
Limited question answering in Russian (quality significantly lower than English)
Basic Russian to English translation (better than English to Russian)

Limitations

NOT SUITABLE FOR PRODUCTION USE: This model should not be used in production environments in any form
Extremely limited knowledge base for Russian language due to lack of pre-training with Russian text
Unoptimized tokenizer performance for Russian language results in inefficient token usage
Output quality in Russian will be unsatisfactory for most use cases
May produce inaccurate, inconsistent, or inappropriate responses, especially in Russian
All limitations of the base SmolLM2-360M model still apply