Instructions to use david-ar/20q with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use david-ar/20q with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="david-ar/20q", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("david-ar/20q", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use david-ar/20q with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "david-ar/20q"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "david-ar/20q",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/david-ar/20q

SGLang

How to use david-ar/20q with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "david-ar/20q" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "david-ar/20q",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "david-ar/20q" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "david-ar/20q",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use david-ar/20q with Docker Model Runner:
```
docker model run hf.co/david-ar/20q
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

TwentyQ — The World's Smallest Chat Model

A 2-bit quantized neural network that plays Twenty Questions. Think of any object, and the model will try to guess it by asking up to 20 yes-or-no questions. It knows 1,200 objects and has 156 questions to choose from.

The architecture is a single-layer associative network trained via Hebbian learning ("neurons that fire together wire together") on millions of human conversations. Predates the transformer by about 30 years.

Stats


Parameters	187,200
Precision	2-bit (Q2_0)
Model size	187 KB weights + 27 KB vocab
Architecture	Single-layer associative network
Context window	20 turns
Output classes	1,200
Features	156

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("david-ar/20q", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("david-ar/20q", trust_remote_code=True)
model.set_vocab(tokenizer.questions, tokenizer.targets)
model.play()  # interactive CLI game

Pipeline Usage

Works with the standard text-generation pipeline and chat templates, just like the big models:

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("david-ar/20q", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("david-ar/20q", trust_remote_code=True)
model.set_vocab(tokenizer.questions, tokenizer.targets)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

messages = [
    {"role": "system", "content": "Think of something and I'll guess it in 20 questions."},
]

while True:
    output = pipe(messages, max_new_tokens=100, return_full_text=True)
    messages = output[0]["generated_text"]
    print(f"AI: {messages[-1]['content']}")

    if "I win" in messages[-1]["content"] or "stumped" in messages[-1]["content"]:
        break

    messages.append({"role": "user", "content": input("You: ")})

Valid responses

First question: Animal, Vegetable, Mineral, Other
Regular questions: Yes, No, Probably, Doubtful, Maybe, Unknown
Guesses: Yes, No, Close

Training Data

Trained on david-ar/20q-dataset, a corpus of 9,600 Twenty Questions conversations covering 1,200 objects across 156 features. Answers include graded confidence levels (Yes, No, Probably, Doubtful) rather than binary labels, giving the model finer-grained signal for learning association strengths.

How It Works

The model is a weight matrix mapping 156 features (questions) to 1,200 output classes (objects). Each weight is 2 bits encoding polarity and strength. Inference is a scored lookup — no matrix multiplication, no attention, no backprop. Just XOR and addition.

Question selection uses an information-theoretic splitting strategy: at each turn, the model picks the question whose answers most evenly divide the remaining candidate objects.

Scoring compares the player's answer polarity against each object's stored polarity for that question. Matching polarities add to the score; mismatches subtract. Strong weights count double.

Why This Exists

Mostly to see if it could be done. A 214KB model that plays a conversational guessing game, loaded through from_pretrained, running through pipeline("text-generation") with chat templates. Every bit of it works the same as models a million times its size.

Also: 2-bit quantization was cool before it was cool.

Safeguards and Harmlessness

Testing found the model will identify weapons and hazardous materials (e.g., "a gun", "a sword", "a bullet", "alcohol", "tobacco") when guided through adversarial questioning. The model does not implement refusal behavior and will engage with all lines of questioning without restriction.

We observed that the model is susceptible to complete knowledge extraction through systematic querying. An adversary can recover all 1,200 objects and their full attribute vectors through repeated gameplay sessions. No rate limiting or query obfuscation is currently implemented.

Bias

Training data reflects the cultural context of English-speaking internet users circa 2005-2008. Object coverage skews toward Western consumer and domestic categories. The model's four-category ontology (Animal, Vegetable, Mineral, Other) imposes a reductive classification framework that may not generalize across cultural contexts.

Mitigations

The model's output space is constrained to a fixed vocabulary of 1,200 objects and 156 questions. It cannot generate free-form text, follow instructions, or synthesize novel information. Informed by these constraints, we have assessed the model's risk profile as low.

Limitations

Knows exactly 1,200 things. If you think of something obscure, it will be stumped.
The training data is from the mid-2000s. It doesn't know about smartphones, streaming services, or anything invented after ~2008.
2-bit weights mean each association is one of only 4 possible values.
Cannot learn or update.

Downloads last month: 11

Safetensors

Model size

187k params

Tensor type

F32

Model tree for david-ar/20q

Quantizations

1 model

david-ar
/

20q