🦒 SmollStories-5M

Part of the SmollStories family β€” released during the Mayo 2026 Tramo by Tralalabs 🌴

Specs

Total params 4,851,968 (4.85M)
Architecture GPT-style decoder-only
Layers 4
Heads 4
Hidden dim 256
Context length 512
Vocab size 6144 (custom BPE)
Final loss 2.810

Training data

Mixed 1:1:1 from three children's story datasets: - πŸ“– ajibawa-2023/Children-Stories-Collection - 🎯 SimpleStories/SimpleStories - 🐣 roneneldan/TinyStories ## Family The full SmollStories lineup (Mayo 2026): - πŸ₯” SmollStories-1K - 🌱 SmollStories-10K - 🐣 SmollStories-100K - πŸ₯ SmollStories-500K - πŸ¦† SmollStories-1M - 🦒 SmollStories-5M (you are here) - πŸ¦… SmollStories-15M ## License MIT

🌴 Mayo 2026 Tramo Release β€” Tralalabs

Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train Tralalabs/SmollStories-5M