🌴 PicoLM-6M-Mayo

Pretrained from scratch during the Mayo 2026 Tramo by Tralalabs 🌴

A small GPT-style language model trained from scratch on TinyStories with a custom 8K BPE tokenizer. Total wall time on Google Colab T4: ~2 minutes ⚑ ## Specs | | | |---|---| | Total params | ~5.38M | | Non-embedding | ~3.28M | | Architecture | GPT-style decoder-only | | Layers | 4 | | Heads | 4 | | Hidden dim | 256 | | Context length | 512 | | Vocab size | 8192 (custom 8K BPE) | | Tokenizer | Trained on TinyStories from scratch | | Dataset | TinyStories | | Hardware | Google Colab T4 GPU | | Training time | ~2 minutes | | Final loss | ~2.45 | ## Sample output Prompt: "Once upon a time" Once upon a time, there was a little girl named Lily. She loved to play outside with her friends and play together. One day, Lily went to a walk by the park. She got scared and didn't know what to do... ## Usage python import torch from transformers import PreTrainedTokenizerFast # Load tokenizer tok = PreTrainedTokenizerFast.from_pretrained("Tralalabs/PicoLM-6M-Mayo") # Load model β€” note: this is a custom architecture, see model.py in the repo # For now, use the training script's PicoLM class to instantiate # Then load weights: model.load_state_dict(torch.load("pytorch_model.bin")) ## About Part of the Tralalabs PicoLM family. Released during May 2026 as part of the Mayo 2026 Tramo β€” a personal series of fast-pretrained models capturing this specific moment in time. Architecture is intentionally minimal: weight-tied embeddings, no biases, no dropout, SDPA flash attention. Designed to fit a complete pretrain run into a single short Colab session. ## License MIT

🌴 Mayo 2026 Tramo Release β€” captured: 2026-04-30

Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Tralalabs/PicoLM-6M-Mayo