Instructions to use SupraLabs/Supra-Mini-v2-0.1M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SupraLabs/Supra-Mini-v2-0.1M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="SupraLabs/Supra-Mini-v2-0.1M")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("SupraLabs/Supra-Mini-v2-0.1M")
model = AutoModelForCausalLM.from_pretrained("SupraLabs/Supra-Mini-v2-0.1M")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use SupraLabs/Supra-Mini-v2-0.1M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SupraLabs/Supra-Mini-v2-0.1M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-v2-0.1M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/SupraLabs/Supra-Mini-v2-0.1M

SGLang

How to use SupraLabs/Supra-Mini-v2-0.1M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SupraLabs/Supra-Mini-v2-0.1M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-v2-0.1M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SupraLabs/Supra-Mini-v2-0.1M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-v2-0.1M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use SupraLabs/Supra-Mini-v2-0.1M with Docker Model Runner:
```
docker model run hf.co/SupraLabs/Supra-Mini-v2-0.1M
```

A newer version of this model is available: SupraLabs/Supra-Mini-v5-8M

🦅 Supra Mini v2 0.1M

Supra Mini v2 0.1M is a very, and we mean very small base model trained on 700 million tokens of Fineweb-Edu for 3 epochs as the second version of our Supra Mini series.

Model Config

Parameters: 167,760 (0.1M)
Architecture: Llama
Vocab size with custom BPE tokenizer: 2048
Hidden Size: 48
Intermediate Size: 96
Hidden Layers: 3
Attention Heads: 4
Max Position Embeddings: 256
Learning rate: 6e-4
Weight Decay: 0.01

Final Loss

This model reached a final train loss of 4.413.

Benchmarks

All benchmarks were executed using lm-eval.

Task	Value	Random level
Arc_Easy	0.2677	0.25 (25%)
Wikitext	7.7940	-
BLiMP	0.5354	0.5 (50%)

Examples

Prompt: "Artificial intelligence is "
Output:: "Artificial intelligence is irreciously, and the diet of a battery. These are also known as the following: - Foods, in the most commonly used to be taken by the priority of the South African American Modela, which was nothing for the first time. The federal government has been a wall of the world’s two moisture and the came on the national range of the Great Department of Amazonia, and the Politary Society of the Carli. This is that the Letters were the first third of the Building S"

Prompt: "The main concept of physics is "
Output:: "The main concept of physics is utilized in the most commonly used to be achieved. Chotos, an efficient for a dietary pathogene, and that are also known as a source of the molecularly. The failure wastered in the national categories of the California, and the Modela’s brought by the Florida. In the world's this time, it is not only on the sense of the first-metrial gardens, but they can be living in the task of the Political School of History (professed"

Prompt: "Once upon a time, "
Output:: "Once upon a time, utilizing the fire. The Samboard is that the lot of the bill. After the day to be a widely money and in the world’s a harmful force. Despite the size of the Bradese, and his owner, and he was noted by the Department of Management, and the Lord of Petersonia and the Council, which had been a destroyed on the tree, but the giving a came from the Hiska. The Great Marius and Jewish Amazon's Rich"

Usage

To use our model, just run this code using HF Transformers to execute the model:

from transformers import pipeline
import torch

print("[*] Loading Supra Mini v2 0.1M model from Hugging Face Hub...")
pipe = pipeline(
    "text-generation", 
    model="SupraLabs/Supra-Mini-v2-0.1M",
    device_map="auto",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)

def generate_text(prompt, max_length=150):
    result = pipe(
        prompt, 
        max_new_tokens=max_length,
        do_sample=True,
        temperature=0.5,
        top_k=25,
        top_p=0.9,
        repetition_penalty=1.2,
        pad_token_id=pipe.tokenizer.pad_token_id,
        eos_token_id=pipe.tokenizer.eos_token_id
    )
    return result[0]['generated_text']

test_prompt = "The importance of education is"
print(f"\nPrompt: {test_prompt}")
print("-" * 30)
print("\nOutput:\n" + generate_text(test_prompt))

Use cases

Educational research
deployment or testing/fine-tuning on edge environments
Or more simply, for fun

Limitations

Cannot reason, chat, or code
Incoherent more often than not
Mostly unfactual

Training guide

We trained Supra Mini v2 0.1M on a single T4 GPU in ~2 hours for 3 epochs.
The full training code can be found in this repo as run.sh (easily run the complete pipeline), train_tokenizer.py (train costum BPE tokenizer with vocab size of 2048), train.py (train the model) and inference.py (test the model).
The model was trained on the first 700 million tokens of Sample-10BT from Fineweb-Edu using streaming tokenization.

Final thoughts

As this is the second version of the Supra Mini series, we are very proud to release it today!
But: stay tuned for more models and follow us to support our open-source work! 😊

Downloads last month: 135

Safetensors

Model size

168k params

Tensor type

F32

Dataset used to train SupraLabs/Supra-Mini-v2-0.1M

Collections including SupraLabs/Supra-Mini-v2-0.1M

Supra Mini series

Collection

All models of the Supra Mini series. • 5 items • Updated 2 days ago • 1

All Supra models

Collection

ALL the family(micro, nano, small, large, ALL SIZES AND EXPERIMENTS!) • 8 items • Updated 1 day ago • 1