Model Card for MiniGPT Shakespeare

MiniGPT Shakespeare is a small decoder-only Transformer trained from scratch on the complete works of William Shakespeare.
It generates Shakespeare-style dialogue at the character level.

This project demonstrates building and training a GPT-style language model from scratch using PyTorch.

Model Details

Model Description

MiniGPT Shakespeare is a lightweight autoregressive language model trained to predict the next character in Shakespeare’s text corpus.
The model learns formatting patterns such as speaker tags (e.g., ROMEO:, KING RICHARD III:) and generates structured play-like dialogue.

Developed by: Shreyaj
Funded by: Self-funded academic project
Shared by: Shreyaj
Model type: Decoder-only Transformer (GPT-style), character-level
Language(s): English
License: MIT
Finetuned from model: Trained from scratch

Model Sources

Repository: https://github.com/Shreyaj-pseudo/Mini-Shakespeare-Transformer-model-
Paper: N/A (educational project)
Demo: Generation script included in repository

Uses

Direct Use

This model is intended for:

Educational purposes (understanding Transformer architectures)
Demonstrating GPT-style training from scratch
Generating Shakespeare-inspired creative text
Portfolio and research experimentation

Downstream Use

The model can be:

Extended to larger datasets
Used as a starting point for experimenting with scaling laws
Modified into a word-level or BPE tokenizer model
Integrated into small creative writing applications

Out-of-Scope Use

This model is not suitable for:

Production systems
Safety-critical applications
Factual question answering
Long-context reasoning tasks
Modern conversational AI

It is a small experimental research model.

Bias, Risks, and Limitations

Trained solely on Shakespeare’s works (16th–17th century English).
May reflect outdated language, themes, and cultural biases present in the original texts.
Limited long-range coherence due to small model size.
Occasional word corruption due to character-level modeling.

Recommendations

Users should:

Treat outputs as creative text only.
Avoid relying on outputs for factual or advisory content.
Be aware that stylistic imitation does not imply understanding.

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluation was performed on held-out segments of the Shakespeare corpus not seen during training.

Factors

Evaluation focuses on:

Coherence of generated dialogue
Structural correctness (speaker formatting, punctuation)
Word-level stability (reduced character corruption)
Overall qualitative fluency

Metrics

Cross-entropy loss (primary quantitative metric)
Qualitative text generation assessment

Results

Training loss reduced from ~2.0 to ~1.7
Model generates structured Shakespeare-style dialogue
Speaker tags (e.g., "ROMEO:", "KING RICHARD III:") are learned correctly
Occasional character corruption remains at current scale

Summary

The model demonstrates successful learning of Shakespearean formatting and stylistic structure.
Generation quality improves steadily as training loss decreases, with significant coherence emerging below ~1.6 loss.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator:
https://mlco2.github.io/impact#compute

Hardware Type: Consumer GPU (e.g., RTX-class GPU)
Hours Used: Estimated < 24 hours total training
Cloud Provider: Local training
Compute Region: N/A
Carbon Emitted: Low (small-scale experiment)

Technical Specifications

Model Architecture and Objective

Decoder-only Transformer (GPT-style)
Autoregressive next-token prediction
Character-level tokenization
Cross-entropy loss objective

The model predicts the next character given previous context.

Compute Infrastructure

Hardware

Single GPU (or CPU for smaller-scale experiments)

Software

Python 3.x
PyTorch
Hugging Face Hub

Citation

This model is an independent educational implementation inspired by:

Radford et al., Improving Language Understanding by Generative Pre-Training.

More Information

This project was built as a learning exercise to understand:

Transformer architecture internals
Training stability and loss dynamics
Autoregressive language modeling
Generation behavior under different sampling strategies

Model Card Authors

Shreyaj

Model Card Contact

For questions, suggestions, or collaboration inquiries, please contact via GitHub.

How to Get Started with the Model

import torch
from model import MiniGPT
from config import device
from dataset import stoi, itos
from huggingface_hub import hf_hub_download

checkpoint_path = hf_hub_download(
    repo_id="Shreyaj-pseudo/shreyaj-mini-gpt-shakespeare",
    filename="MODEL_NAME",
    local_dir="./downloaded_model"  # exact filename you want
)

checkpoint = torch.load(checkpoint_path, map_location=device)

model = MiniGPT().to(device)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Generate text
def generate(prompt, max_new_tokens="TOKEN_NUMBER"):
    context = torch.tensor(
        [stoi[c] for c in prompt], dtype=torch.long
    ).unsqueeze(0).to(device)

    output = model.generate(context, max_new_tokens=max_new_tokens)
    return ''.join([itos[i] for i in output[0].tolist()])

print(generate("ROMEO:"))

Downloads last month: -; Downloads are not tracked for this model. How to track