Instructions to use WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B",
	filename="Gemma4-MostSeenUnseen-Reasoner-2B-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B:Q4_K_M

Use Docker

docker model run hf.co/WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B:Q4_K_M

LM Studio
Jan
Ollama
How to use WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B with Ollama:
```
ollama run hf.co/WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B:Q4_K_M
```

Unsloth Studio new

How to use WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B to start chatting

Pi new

How to use WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B with Docker Model Runner:
```
docker model run hf.co/WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B:Q4_K_M
```

Lemonade

How to use WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B:Q4_K_M

Run and chat with the model

lemonade run user.Gemma4-Most.Seen.Unseen.Reasoner-E2B-Q4_K_M

List all available models

lemonade list

Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages.

Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: E2B, E4B, 26B A4B, and 31B. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI.

Gemma 4 introduces key capability and architectural advancements:

Reasoning – All models in the family are designed as highly capable reasoners, with configurable thinking modes.
Extended Multimodalities – Processes Text, Image with variable aspect ratio and resolution support (all models), Video, and Audio (featured natively on the E2B and E4B models).
Diverse & Efficient Architectures – Offers Dense and Mixture-of-Experts (MoE) variants of different sizes for scalable deployment.
Optimized for On-Device – Smaller models are specifically designed for efficient local execution on laptops and mobile devices.
Increased Context Window – The small models feature a 128K context window, while the medium models support 256K.
Enhanced Coding & Agentic Capabilities – Achieves notable improvements in coding benchmarks alongside native function-calling support, powering highly capable autonomous agents.
Native System Prompt Support – Gemma 4 introduces native support for the system role, enabling more structured and controllable conversations.

Models Overview

Gemma 4 models are designed to deliver frontier-level performance at each size, targeting deployment scenarios from mobile and edge devices (E2B, E4B) to consumer GPUs and workstations (26B A4B, 31B). They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding.

The models employ a hybrid attention mechanism that interleaves local sliding window attention with full global attention, ensuring the final layer is always global. This hybrid design delivers the processing speed and low memory footprint of a lightweight model without sacrificing the deep awareness required for complex, long-context tasks. To optimize memory for long contexts, global layers feature unified Keys and Values, and apply Proportional RoPE (p-RoPE).

1. Training Hardware
MSI Suprim GeForce RTX 5090 SUPRIM LIQUID SOC 32GB GDDR7

2. Data Sources

nohurry/Opus-4.6-Reasoning-3000x-filtered
Crownelius/Opus-4.6-Reasoning-3300x
Roman1111111/claude-opus-4.6-10000x
vanty120/Gpt-5.4-Xhigh-Reasoning-2000x
Roman1111111/gpt-5.4-step-by-step-reasoning
Jackrong/gpt-oss-120b-Reasoning-Instruction
TeichAI/gemini-3-pro-preview-high-reasoning-1000x
TeichAI/gpt-5.2-high-reasoning-250x

3. Refined Narrative
- this is a distillation of frontier reasoning capabilities (Claude Opus 4.6, GPT-5.4, Gemini 3 Pro, etc.) into the efficient Gemma 4 2B-class model.

Downloads last month: 1,129

GGUF

Model size

5B params

Architecture

gemma4

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B

Unable to build the model tree, the base model loops to the model itself. Learn more.

Datasets used to train WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B

Collections including WithinUsAI/Gemma4-Most.Seen.Unseen.Reasoner-E2B

WithIn US AI (((GGUF MODELS)))

Collection

LLM MODELS TRAINED, FINE-TUNED, MERGED BY (WITHIN US AI) • 21 items • Updated 10 days ago • 5

“Gemma 4.0”