Model Card for MiniGPT Shakespeare
MiniGPT Shakespeare is a small decoder-only Transformer trained from scratch on the complete works of William Shakespeare.
It generates Shakespeare-style dialogue at the character level.
This project demonstrates building and training a GPT-style language model from scratch using PyTorch.
Model Details
Model Description
MiniGPT Shakespeare is a lightweight autoregressive language model trained to predict the next character in Shakespeare’s text corpus.
The model learns formatting patterns such as speaker tags (e.g., ROMEO:, KING RICHARD III:) and generates structured play-like dialogue.
- Developed by: Shreyaj
- Funded by: Self-funded academic project
- Shared by: Shreyaj
- Model type: Decoder-only Transformer (GPT-style), character-level
- Language(s): English
- License: MIT
- Finetuned from model: Trained from scratch
Model Sources
- Repository: https://github.com/Shreyaj-pseudo/Mini-Shakespeare-Transformer-model-
- Paper: N/A (educational project)
- Demo: Generation script included in repository
Uses
Direct Use
This model is intended for:
- Educational purposes (understanding Transformer architectures)
- Demonstrating GPT-style training from scratch
- Generating Shakespeare-inspired creative text
- Portfolio and research experimentation
Downstream Use
The model can be:
- Extended to larger datasets
- Used as a starting point for experimenting with scaling laws
- Modified into a word-level or BPE tokenizer model
- Integrated into small creative writing applications
Out-of-Scope Use
This model is not suitable for:
- Production systems
- Safety-critical applications
- Factual question answering
- Long-context reasoning tasks
- Modern conversational AI
It is a small experimental research model.
Bias, Risks, and Limitations
- Trained solely on Shakespeare’s works (16th–17th century English).
- May reflect outdated language, themes, and cultural biases present in the original texts.
- Limited long-range coherence due to small model size.
- Occasional word corruption due to character-level modeling.
Recommendations
Users should:
- Treat outputs as creative text only.
- Avoid relying on outputs for factual or advisory content.
- Be aware that stylistic imitation does not imply understanding.
Evaluation
Testing Data, Factors & Metrics
Testing Data
Evaluation was performed on held-out segments of the Shakespeare corpus not seen during training.
Factors
Evaluation focuses on:
- Coherence of generated dialogue
- Structural correctness (speaker formatting, punctuation)
- Word-level stability (reduced character corruption)
- Overall qualitative fluency
Metrics
- Cross-entropy loss (primary quantitative metric)
- Qualitative text generation assessment
Results
- Training loss reduced from ~2.0 to ~1.7
- Model generates structured Shakespeare-style dialogue
- Speaker tags (e.g., "ROMEO:", "KING RICHARD III:") are learned correctly
- Occasional character corruption remains at current scale
Summary
The model demonstrates successful learning of Shakespearean formatting and stylistic structure.
Generation quality improves steadily as training loss decreases, with significant coherence emerging below ~1.6 loss.
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator:
https://mlco2.github.io/impact#compute
- Hardware Type: Consumer GPU (e.g., RTX-class GPU)
- Hours Used: Estimated < 24 hours total training
- Cloud Provider: Local training
- Compute Region: N/A
- Carbon Emitted: Low (small-scale experiment)
Technical Specifications
Model Architecture and Objective
- Decoder-only Transformer (GPT-style)
- Autoregressive next-token prediction
- Character-level tokenization
- Cross-entropy loss objective
The model predicts the next character given previous context.
Compute Infrastructure
Hardware
- Single GPU (or CPU for smaller-scale experiments)
Software
- Python 3.x
- PyTorch
- Hugging Face Hub
Citation
This model is an independent educational implementation inspired by:
Radford et al., Improving Language Understanding by Generative Pre-Training.
More Information
This project was built as a learning exercise to understand:
- Transformer architecture internals
- Training stability and loss dynamics
- Autoregressive language modeling
- Generation behavior under different sampling strategies
Model Card Authors
Shreyaj
Model Card Contact
For questions, suggestions, or collaboration inquiries, please contact via GitHub.
How to Get Started with the Model
import torch
from model import MiniGPT
from config import device
from dataset import stoi, itos
from huggingface_hub import hf_hub_download
checkpoint_path = hf_hub_download(
repo_id="Shreyaj-pseudo/shreyaj-mini-gpt-shakespeare",
filename="MODEL_NAME",
local_dir="./downloaded_model" # exact filename you want
)
checkpoint = torch.load(checkpoint_path, map_location=device)
model = MiniGPT().to(device)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Generate text
def generate(prompt, max_new_tokens="TOKEN_NUMBER"):
context = torch.tensor(
[stoi[c] for c in prompt], dtype=torch.long
).unsqueeze(0).to(device)
output = model.generate(context, max_new_tokens=max_new_tokens)
return ''.join([itos[i] for i in output[0].tolist()])
print(generate("ROMEO:"))