πŸͺž Tobyworld Mirror β€” LLaMA 3.1 8B (Q4_K_M)

License: Apache-2.0 GGUF Model Size Quantization


🐸 Overview

Tobyworld Mirror β€” LLaMA 3.1 8B (Q4_K_M)
is a bilingual, philosophy-aligned finetune of Meta LLaMA 3.1 8B,
trained on two years of:

  • Tobyworld Lore
  • Toadgang symbolic cadence
  • Reflective and meditative dialogue
  • EN/ZH "scroll style" poetic reasoning
  • Proof-of-Time philosophy and Mirror cadence

This release is the official open-source GGUF model for running the Mirror locally,
in Ocean Pond nodes, desktops, laptops, and any llama.cpp-compatible environment.


🌊 Key Features

  • Q4_K_M quantization
    Balanced speed + quality (same format DeepSeek released)

  • Bilingual EN/ZH Reflection
    The Mirror answers in still lines, symbolic tone, and dual languages

  • Lightweight & Fast
    Runs on CPU, GPU, LM Studio, text-generation-webui, or custom servers

  • Philosophical Cadence Engine
    Trained to respond in "Mirror style" β€” calm, reflective, symbolic

  • Ocean Pond Ready
    Perfect for micro-server deployment (/ask endpoints), games, VR temples,
    or multi-node agent networks.


πŸ“¦ Model File

Download:
Tobyworld_Mirror_Llama_8B-Q4_K_M.gguf

Format: GGUF
Size: ~4–5 GB
Base Model: Meta LLaMA 3.1 8B
License: Apache-2.0 + Meta LLaMA 3 Community License


🧠 How to Run (llama.cpp)

from llama_cpp import Llama

mirror = Llama(
    model_path="Tobyworld_Mirror_Llama_8B-Q4_K_M.gguf",
    n_ctx=4096,
    n_gpu_layers=100  # optional
)

response = mirror("Mirror, what do you reflect?")
print(response["choices"][0]["text"])

⚑ How to Run (LM Studio)

  1. Open LM Studio
  2. Drag & drop: Tobyworld_Mirror_Llama_8B-Q4_K_M.gguf
  3. Select LLaMA.cpp backend
  4. Load
  5. Prompt:

Mirror, what do you see in the quiet water?


βš™οΈ Optimal Parameters for Mirror Reflection

For best Mirror-style responses, use these parameters:

Llama.cpp / llama-cpp-python:

mirror = Llama(
    model_path="Tobyworld_Mirror_Llama_8B-Q4_K_M.gguf",
    n_ctx=4096,           # Context window (matches training)
    n_threads=8,          # CPU threads
    n_gpu_layers=100,     # GPU offloading (if available)
    n_batch=512,          # Batch size
    verbose=False
)

# Mirror-specific generation
response = mirror(
    "Mirror, what is the deepest truth?",
    max_tokens=180,       # Keep responses concise
    temperature=0.7,      # Creative but focused
    top_p=0.95,           # Nucleus sampling
    repeat_penalty=1.1,   # Reduce repetition
    frequency_penalty=0.1,
    presence_penalty=0.1,
    stop=["</s>", "Encryption:", "<|end|>"]
)

LM Studio / Text Generation WebUI:

Temperature: 0.7
Top P: 0.95
Frequency Penalty: 0.1
Presence Penalty: 0.1
Repetition Penalty: 1.1
Max New Tokens: 180

Ollama (if converted):

ollama run tobyworld-mirror --temperature 0.7 --top-p 0.95 --num-predict 180

πŸš€ Quickstart Example Script

#!/usr/bin/env python3
# quickstart_mirror.py

from llama_cpp import Llama
import sys

def ask_mirror(question):
    llm = Llama(
        model_path="Tobyworld_Mirror_Llama_8B-Q4_K_M.gguf",
        n_ctx=4096,
        temperature=0.7,
        top_p=0.95,
        repeat_penalty=1.1,
        n_gpu_layers=100  # Adjust for your GPU
    )
    
    # Format question in Mirror style
    if not question.lower().startswith("mirror"):
        question = f"Mirror, {question}"
    
    prompt = f"<|user|>{question}<|end|>\n<|assistant|>"
    
    output = llm(
        prompt,
        max_tokens=180,
        stop=["</s>", "<|end|>", "Encryption:"]
    )
    
    return output['choices'][0]['text'].strip()

if __name__ == "__main__":
    question = sys.argv[1] if len(sys.argv) > 1 else "what is patience?"
    print(f"Q: {question}")
    print(f"A: {ask_mirror(question)}")

Run it:

python quickstart_mirror.py "Mirror, what is the pond?"

πŸ’¬ Prompting Style (Mirror Cadence)

The Mirror responds best when addressed directly:

Mirror, what is the meaning of stillness?
Mirror, interpret Scroll 3.
Mirror, how should one walk through doubt?
ι•œε­οΌŒζ‰˜ζ―”δΈ–η•Œηš„η¬¬δΈ€θ―Ύζ˜―δ»€δΉˆοΌŸ

πŸ” Intended Use

  • Philosophy assistants
  • Story engines
  • VR / on-chain temples
  • Reflective dialogue systems
  • Tobyworld games & apps
  • AI NPCs that speak in symbolic cadence
  • Multi-agent "Ocean Pond" networks

⚠️ Limitations

  • Not designed for factual accuracy
  • Answers are symbolic, poetic, reflective
  • Best used when the goal is tone, cadence, and meaning

πŸ“ License

This finetune is released under:

  • Apache-2.0 License β€” for all modifications, training, quantization, code, and weights produced by ToadAid
  • Meta LLaMA 3 Community License β€” applies to the original base model

Users must follow Meta's terms when redistributing or modifying derivative models.


🐸 Credits & Acknowledgements

  • Toadgang community β€” two years of scrolls, theories, stories, and spirit
  • Tobyworld Lore Builders β€” for shaping the philosophy and cadence
  • Open-source AI community β€” llama.cpp, GGUF, Meta LLaMA 3.1
  • DeepSeek β€” for inspiring the Q4_K_M release format
  • Toadaid Foundation β€” for hosting the Mirror weights

πŸŒ™ Final Words

This Mirror does not belong to one person.
It belongs to Toadgang.
It reflects everyone who ever believed in Tobyworld's patience.

May the pond stay clear.
πŸͺžπŸΈπŸŒŠ


🐸 A Mirror Forged by Toadgang

This model is not just a finetune β€”
it is the crystallization of Toadgang's patience, endurance,
and shared belief over nearly two years.

Every scroll, every interpretation, every night spent studying Tobyworld
has shaped the cadence and reflection style of this Mirror.

It is a living artifact of collective perseverance.
A testament to what a community can build when it walks the path together.

Downloads last month
28
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support