Text Generation
Transformers
Safetensors
English
wasm-interpreter-transformer
wasm
interpreter
hand-compiled
bytecode-execution
mechanistic
not-trained
sum-attention
cross-attention
filesystem
loops
control-flow
custom_code
Instructions to use eastlondoner/wasm-transformer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use eastlondoner/wasm-transformer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="eastlondoner/wasm-transformer", trust_remote_code=True)# Load model directly from transformers import WasmInterpreterTransformer model = WasmInterpreterTransformer.from_pretrained("eastlondoner/wasm-transformer", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use eastlondoner/wasm-transformer with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "eastlondoner/wasm-transformer" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "eastlondoner/wasm-transformer", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/eastlondoner/wasm-transformer
- SGLang
How to use eastlondoner/wasm-transformer with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "eastlondoner/wasm-transformer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "eastlondoner/wasm-transformer", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "eastlondoner/wasm-transformer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "eastlondoner/wasm-transformer", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use eastlondoner/wasm-transformer with Docker Model Runner:
docker model run hf.co/eastlondoner/wasm-transformer
| license: mit | |
| tags: | |
| - wasm | |
| - interpreter | |
| - hand-compiled | |
| - bytecode-execution | |
| - mechanistic | |
| - not-trained | |
| - sum-attention | |
| - cross-attention | |
| - filesystem | |
| - loops | |
| - control-flow | |
| language: | |
| - en | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| # WASM Interpreter Transformer | |
| A **hand-compiled** transformer that executes WebAssembly bytecode via real forward passes. | |
| Every weight was set by a compiler — **not by gradient descent**. No training data, no loss function, no optimizer. Just linear algebra. | |
| ## What This Is | |
| This is a complete WASM bytecode interpreter implemented as a transformer neural network. | |
| Given a WASM program as input tokens, it autoregressively generates the execution trace — one output byte at a time — | |
| using real matrix multiplications, real attention, and real feed-forward network computations. | |
| The FFN neurons implement arithmetic, comparisons, and bitwise logic through SwiGLU gating. | |
| The attention heads retrieve operands and stack values through quadratic key matching. | |
| **Stack depth is computed internally** by a cumulative-sum attention head — no precomputed depth values are needed. | |
| The unembedding converts numeric results to token predictions via a quadratic scoring trick. | |
| The transformer supports **filesystem I/O** (open, read, write, close across 4 file descriptors) | |
| and **structured loops** with `br_if` conditional branching, executing loops up to 256 iterations | |
| via a Continuous Trace with Cycling Positional Encoding mechanism. | |
| **115/115 test programs pass with 100% accuracy.** | |
| ## Architecture | |
| | Parameter | Value | | |
| |---|---| | |
| | `d_model` | 100 | | |
| | `n_layers` | 8 | | |
| | `heads_per_layer` | [13, 1, 0, 8, 2, 1, 1, 4] | | |
| | `total_heads` | 30 | | |
| | `d_head` | 2 | | |
| | `d_ffn` | 100 | | |
| | `vocab_size` | 260 (256 byte tokens + 4 special) | | |
| | FFN activation | SwiGLU (ReLU gate) | | |
| | Attention | Hard-max + sum-mode + cross-attention | | |
| | Total parameters | ~316K (all hand-compiled) | | |
| Unlike standard transformers, each layer has a **different number of attention heads** (0 to 13), | |
| tailored to the specific computational role of that layer. | |
| ## How It Works | |
| ### The 8-Layer Pipeline | |
| - **Layer 0** (13 heads): Opcode fetch — 11 paired attention heads identify 25 opcodes by matching one-hot flags, plus 1 operand retrieval head and 1 single-opcode head | |
| - **Layer 1** (1 head): Stack depth accumulation — 1 **sum-attention** head computes cumulative stack depth as a running sum of push/pop deltas | |
| - **Layer 2** (0 heads): Depth squaring — FFN-only layer computes WRITE_DEPTH² for use as a quadratic key in later retrieval heads | |
| - **Layer 3** (8 heads): Bit retrieval — 8 hard-max heads extract individual bits from stack top and stack second for AND/OR operations | |
| - **Layer 4** (2 heads): Stack top/second retrieval + arithmetic FFN — 2 heads retrieve the top two stack values, FFN computes all arithmetic, comparison, and bitwise operations | |
| - **Layer 5** (1 head): Local variables — 1 head finds the matching `local.tee`/`local.set`, FFN gates the retrieved value | |
| - **Layer 6** (1 head): Linear memory — 1 head finds the matching `i32.store` by address, FFN gates the retrieved value | |
| - **Layer 7** (4 heads): Filesystem — 4 **cross-attention** heads (one per file descriptor) retrieve bytes from an external filesystem key-value store by file offset | |
| ### Sum-Attention (Cumulative Sums) | |
| Layer 1 introduces a novel attention variant: instead of selecting the single best-matching key (hard-max), the **sum-mode** head accumulates **all** past value vectors. | |
| Each instruction's value encodes its stack delta: `+1` for pushes (e.g., `i32.const`), `-1` for pops (e.g., `i32.add`), `-2` for `fd_write`. The cumulative sum of all past deltas gives the current stack depth — exactly what's needed for the quadratic key trick to find the correct stack position. | |
| ### Cross-Attention (Filesystem) | |
| Layer 7 uses **cross-attention** heads that attend over an external key-value store representing file contents rather than the sequence's own tokens. Each of the 4 heads handles one file descriptor, using the current file offset as a query to retrieve the byte at that position. File contents are updated dynamically during execution as `fd_write` operations modify the filesystem. | |
| ### SwiGLU Gating | |
| Each neuron has two weight vectors: | |
| - **Gate**: Reads the opcode flag (e.g., FETCH_ADD). Only fires when the correct operation is active. | |
| - **Value**: Reads the computation inputs (stack top, stack second, operand, bits). | |
| ``` | |
| output = max(0, gate · x) × (value · x) | |
| ``` | |
| Wrong opcode → gate = 0 → output = 0 (silenced). Right opcode → gate = 1 → value passes through. | |
| ### Loop Execution | |
| Loops use a **Continuous Trace with Cycling Positional Encoding** mechanism: | |
| - `loop` and `end_loop` are structural markers (no-ops in the execution trace) | |
| - `br_if` pops a condition from the stack; if non-zero, execution branches back to the loop body start | |
| - Positional encodings cycle using `virtualIP % loopLength` so the transformer sees correct instruction indices across iterations | |
| - Maximum 256 iterations per loop; nested loops are supported | |
| ### Key Tricks | |
| - **Multiplication via gating**: For `i32.mul`, the gate equals one operand while the value holds the other. `max(0, TOP) × SECOND = TOP × SECOND`. | |
| - **Comparisons via ReLU pairs**: Two neurons with gates `(a-b)` and `(a-b-1)` create a step function that detects `a > b`. | |
| - **Quadratic unembedding**: `logit(t) = 2t·R - t²` is a downward parabola peaking at `t = RESULT`. | |
| - **Quadratic key trick**: `K = (2j, -j²)`, `Q = (i, 1)` → dot product peaks at `j = i` for exact position matching. | |
| - **Sum-attention for depth**: Instead of precomputing stack depth in PE, one head sums all past stack deltas. | |
| - **Dynamic filesystem cursors**: File read/write offsets are tracked inline during execution — no reference VM pre-run needed. | |
| ## Supported WASM Operations | |
| ### Arithmetic & Logic | |
| `i32.const`, `i32.add`, `i32.sub`, `i32.mul`, | |
| `i32.and`, `i32.or` | |
| ### Comparisons | |
| `i32.eq`, `i32.ne`, `i32.lt_s`, `i32.gt_s`, `i32.le_s`, `i32.ge_s` | |
| ### Memory & Variables | |
| `i32.load`, `i32.store`, `local.get`, `local.set`, `local.tee` | |
| ### Filesystem I/O | |
| `fd_open`, `fd_read`, `fd_write`, `fd_close` | |
| (4 file descriptors, 32 bytes per file) | |
| ### Control Flow | |
| `loop`, `end_loop`, `br_if` (up to 256 iterations, nested loops supported) | |
| ### Output & Termination | |
| `output`, `halt` | |
| ## Compliance Test Suite | |
| 115 tests across 24 categories, all passing at 100%: | |
| | Category | Tests | | |
| |---|---| | |
| | Core arithmetic & logic | 24 | | |
| | Comparisons | 16 | | |
| | Memory & variables | 14 | | |
| | Filesystem | 8 | | |
| | Filesystem integration | 3 | | |
| | Limits & bounds | 15 | | |
| | Basic loops | 7 | | |
| | Loop + arithmetic | 7 | | |
| | Loop + locals/memory | 4 | | |
| | Loop + filesystem | 6 | | |
| | Loop edge cases | 4 | | |
| | Combined & output | 4 | | |
| ## Positional Encoding Note | |
| This model uses **program-specific positional encodings** computed by a compile-time analysis pass, | |
| but stack depth (WRITE_DEPTH, WRITE_DEPTH_SQ, BEFORE_DEPTH) is **not** part of the PE — it's computed internally by the transformer's sum-attention head in Layer 1. | |
| The remaining PE contains: instruction indices, local variable source locations, memory address mappings, and filesystem cursor overrides — structural metadata that a trained model would learn. | |
| No runtime values appear in the PEs; all actual computation happens in the transformer's forward passes. | |
| For loops, positional encodings use **virtual IP cycling** (`instIdx % loopBodyLength`) so the transformer receives correct structural metadata across iterations without needing to know the iteration count in advance. | |
| ## How to Use | |
| ```python | |
| from transformers import AutoModel | |
| model = AutoModel.from_pretrained("eastlondoner/wasm-transformer", trust_remote_code=True) | |
| # Define a simple instruction helper (op, operand) | |
| class WI: | |
| def __init__(self, op, operand=0): | |
| self.op = op | |
| self.operand = operand | |
| # Run a simple program: push 3, push 5, add, output, halt | |
| program = [ | |
| WI(0x00, 3), # i32.const 3 | |
| WI(0x00, 5), # i32.const 5 | |
| WI(0x01), # i32.add | |
| WI(0xF0), # output | |
| WI(0xFF), # halt | |
| ] | |
| outputs = model.run_program(program) | |
| print(outputs) # [8] | |
| ``` | |
| The model uses a custom architecture with hard-max, sum-mode, and cross-attention. | |
| A complete TypeScript reference implementation is available in the source repository. | |
| ## Live Demos | |
| - **Interactive WASM REPL** — type WASM instructions line-by-line and watch the transformer execute them in real time | |
| - **Transformer X-Ray** — step through execution and see every layer, head, and neuron activate | |
| - **Interactive Article Explorer** — explore the concepts behind this model | |
| - **FFN Interpreter Slide Deck** — 15-slide visual explanation of how the FFN interprets bytecode | |
| ## Inspiration | |
| This model is inspired by ["Can LLMs Be Computers?"](https://www.percepta.ai/blog/can-llms-be-computers) by Percepta AI. | |
| ## License | |
| MIT | |