sato2ru commited on
Commit
0028132
Β·
verified Β·
1 Parent(s): be40c77

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +191 -0
README.md ADDED
@@ -0,0 +1,191 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - wordle
5
+ - pytorch
6
+ - reinforcement-learning
7
+ - supervised-learning
8
+ - game-ai
9
+ - nlp
10
+ license: mit
11
+ ---
12
+
13
+ # 🟩 Wordle AI Solver
14
+
15
+ Neural network models for solving Wordle puzzles. This repo contains two models β€” a supervised baseline and a reinforcement learning variant β€” both deployable via the [live app](https://wordle-solver-tan.vercel.app).
16
+
17
+ ---
18
+
19
+ ## Files
20
+
21
+ | File | Description |
22
+ |------|-------------|
23
+ | `model_weights.pt` | Supervised model (WordleNet) |
24
+ | `config.json` | Supervised model config |
25
+ | `rl_model_weights.pt` | RL model (REINFORCE-filtered) |
26
+ | `rl_config.json` | RL model config |
27
+ | `answers.json` | 2,315 valid Wordle answers |
28
+ | `allowed.json` | 12,972 valid guess words |
29
+
30
+ ---
31
+
32
+ ## Model Comparison
33
+
34
+ | | 🧠 Supervised | πŸ€– Reinforcement |
35
+ |---|---|---|
36
+ | **Training method** | CrossEntropy on entropy-optimal games | REINFORCE with elite game filtering |
37
+ | **Win rate** | 100% | 98.2% |
38
+ | **Avg guesses** | 3.46 | 3.75 |
39
+ | **Opener** | CRANE | CRANE |
40
+ | **Parameters** | ~13M | ~13M |
41
+
42
+ ---
43
+
44
+ ## Architecture
45
+
46
+ Both models share the same encoder:
47
+
48
+ ```
49
+ Input: 390-dim binary vector
50
+ (26 letters Γ— 5 positions Γ— 3 states: grey/yellow/green)
51
+
52
+ Hidden: Linear(390 β†’ 512) β†’ BatchNorm1d β†’ ReLU β†’ Dropout(0.3)
53
+ Linear(512 β†’ 512) β†’ BatchNorm1d β†’ ReLU β†’ Dropout(0.3)
54
+ Linear(512 β†’ 256) β†’ BatchNorm1d β†’ ReLU
55
+
56
+ Output: Linear(256 β†’ 12972)
57
+ logits over all 12,972 allowed guess words
58
+ ```
59
+
60
+ Board encoding:
61
+ ```python
62
+ vec[letter_index * 15 + position * 3 + state] = 1.0
63
+ # letter_index: 0-25 (a-z)
64
+ # position: 0-4
65
+ # state: 0=grey, 1=yellow, 2=green
66
+ ```
67
+
68
+ ---
69
+
70
+ ## Training
71
+
72
+ ### Supervised Model
73
+ Trained on ~10,000 (board_state, best_guess) pairs generated by an entropy-optimal solver that plays all 2,315 Wordle games. The solver picks the guess maximising expected information gain at each step:
74
+
75
+ $$E[\text{Info}] = \sum_{p} P(p) \cdot \log_2\left(\frac{1}{P(p)}\right)$$
76
+
77
+ ### RL Model
78
+ 1. **Warm start** from supervised weights
79
+ 2. **Elite game collection** β€” greedy rollouts with constraint-filtered action masking, keeping only games solved in ≀3 guesses (~11% hit rate)
80
+ 3. **REINFORCE training** β€” supervised loss on elite (state, action) pairs
81
+ 4. **Benchmark** against all 2,315 answers using constraint-filtered suggestion logic
82
+
83
+ The RL model learns purely from reward signal (win/lose, guesses used) without access to the entropy oracle used to train the supervised model.
84
+
85
+ ---
86
+
87
+ ## Inference
88
+
89
+ The models are not used as raw classifiers β€” the backend combines model logits with constraint filtering:
90
+
91
+ ```python
92
+ # 1. Get top-20 model words
93
+ logits = model(encode_board(history))
94
+ model_words = [ALLOWED[i] for i in logits.topk(20).indices]
95
+
96
+ # 2. Filter to words consistent with all previous guesses
97
+ possible = filter_words(ANSWERS, history)
98
+
99
+ # 3. Score by entropy against remaining possible set
100
+ candidates = model_words + possible
101
+ best = max(candidates, key=lambda w: entropy_score(w, possible))
102
+ ```
103
+
104
+ This hybrid approach is why the supervised model achieves 100% β€” the neural net narrows the search, entropy scoring picks the optimal move.
105
+
106
+ ---
107
+
108
+ ## Usage
109
+
110
+ ```python
111
+ import torch
112
+ import torch.nn as nn
113
+ from huggingface_hub import hf_hub_download
114
+ import json
115
+
116
+ REPO_ID = "sato2ru/wordle-solver"
117
+
118
+ config = json.load(open(hf_hub_download(REPO_ID, "config.json")))
119
+ ALLOWED = json.load(open(hf_hub_download(REPO_ID, "allowed.json")))
120
+
121
+ class WordleNet(nn.Module):
122
+ def __init__(self):
123
+ super().__init__()
124
+ h = config["hidden"]
125
+ self.net = nn.Sequential(
126
+ nn.Linear(390, h), nn.BatchNorm1d(h), nn.ReLU(), nn.Dropout(0.3),
127
+ nn.Linear(h, h), nn.BatchNorm1d(h), nn.ReLU(), nn.Dropout(0.3),
128
+ nn.Linear(h, 256), nn.BatchNorm1d(256), nn.ReLU(),
129
+ nn.Linear(256, 12972)
130
+ )
131
+ def forward(self, x): return self.net(x)
132
+
133
+ # Load supervised model
134
+ model = WordleNet()
135
+ model.load_state_dict(
136
+ torch.load(hf_hub_download(REPO_ID, "model_weights.pt"), map_location="cpu")
137
+ )
138
+ model.eval()
139
+ ```
140
+
141
+ Or use the live API directly:
142
+ ```bash
143
+ curl -X POST "https://web-production-ea1d.up.railway.app/suggest?model=supervised" \
144
+ -H "Content-Type: application/json" \
145
+ -d '{"history": []}'
146
+
147
+ curl -X POST "https://web-production-ea1d.up.railway.app/suggest?model=rl" \
148
+ -H "Content-Type: application/json" \
149
+ -d '{"history": []}'
150
+ ```
151
+
152
+ ---
153
+
154
+ ## Results
155
+
156
+ ### Supervised β€” all 2,315 answers (greedy + entropy filter)
157
+ ```
158
+ 1 guess : 1
159
+ 2 guesses: 59 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
160
+ 3 guesses: 1188 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
161
+ 4 guesses: 1010 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
162
+ 5 guesses: 56 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
163
+ 6 guesses: 1
164
+ FAILED : 0 βœ… 100% win rate
165
+ ```
166
+
167
+ ### RL β€” all 2,315 answers (greedy + entropy filter)
168
+ ```
169
+ 1 guess : 1
170
+ 2 guesses: 141 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
171
+ 3 guesses: 810 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
172
+ 4 guesses: 893 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
173
+ 5 guesses: 343 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
174
+ 6 guesses: 86 β–ˆβ–ˆβ–ˆβ–ˆ
175
+ FAILED : 41 βœ… 98.2% win rate
176
+ ```
177
+
178
+ ---
179
+
180
+ ## Links
181
+
182
+ - **Live App:** [wordle-solver-tan.vercel.app](https://wordle-solver-tan.vercel.app)
183
+ - **GitHub:** [github.com/Jeanwrld/wordle-solver](https://github.com/Jeanwrld/wordle-solver)
184
+ - **Backend:** [github.com/Jeanwrld/wordle-api](https://github.com/Jeanwrld/wordle-api)
185
+ - **Gradio Demo:** [huggingface.co/spaces/sato2ru/wordle](https://huggingface.co/spaces/sato2ru/wordle)
186
+
187
+ ---
188
+
189
+ ## License
190
+
191
+ MIT