π΄ PicoLM-6M-Mayo
Pretrained from scratch during the Mayo 2026 Tramo by Tralalabs π΄
A small GPT-style language model trained from scratch on TinyStories with a custom 8K BPE tokenizer.
Total wall time on Google Colab T4: ~2 minutes β‘
## Specs
| | |
|---|---|
| Total params | ~5.38M |
| Non-embedding | ~3.28M |
| Architecture | GPT-style decoder-only |
| Layers | 4 |
| Heads | 4 |
| Hidden dim | 256 |
| Context length | 512 |
| Vocab size | 8192 (custom 8K BPE) |
| Tokenizer | Trained on TinyStories from scratch |
| Dataset | TinyStories |
| Hardware | Google Colab T4 GPU |
| Training time | ~2 minutes |
| Final loss | ~2.45 |
## Sample output
Prompt: "Once upon a time" Once upon a time, there was a little girl named Lily. She loved to play outside with her friends and play together. One day, Lily went to a walk by the park. She got scared and didn't know what to do...
## Usage
python import torch from transformers import PreTrainedTokenizerFast # Load tokenizer tok = PreTrainedTokenizerFast.from_pretrained("Tralalabs/PicoLM-6M-Mayo") # Load model β note: this is a custom architecture, see model.py in the repo # For now, use the training script's PicoLM class to instantiate # Then load weights: model.load_state_dict(torch.load("pytorch_model.bin"))
## About
Part of the Tralalabs PicoLM family.
Released during May 2026 as part of the Mayo 2026 Tramo β a personal series of
fast-pretrained models capturing this specific moment in time.
Architecture is intentionally minimal: weight-tied embeddings, no biases,
no dropout, SDPA flash attention. Designed to fit a complete pretrain run
into a single short Colab session.
## License
MIT
π΄ Mayo 2026 Tramo Release β captured: 2026-04-30
- Downloads last month
- 26