Model Card for MiniGPT Shakespeare

MiniGPT Shakespeare is a small decoder-only Transformer trained from scratch on the complete works of William Shakespeare.
It generates Shakespeare-style dialogue at the character level.

This project demonstrates building and training a GPT-style language model from scratch using PyTorch.


Model Details

Model Description

MiniGPT Shakespeare is a lightweight autoregressive language model trained to predict the next character in Shakespeare’s text corpus.
The model learns formatting patterns such as speaker tags (e.g., ROMEO:, KING RICHARD III:) and generates structured play-like dialogue.

  • Developed by: Shreyaj
  • Funded by: Self-funded academic project
  • Shared by: Shreyaj
  • Model type: Decoder-only Transformer (GPT-style), character-level
  • Language(s): English
  • License: MIT
  • Finetuned from model: Trained from scratch

Model Sources


Uses

Direct Use

This model is intended for:

  • Educational purposes (understanding Transformer architectures)
  • Demonstrating GPT-style training from scratch
  • Generating Shakespeare-inspired creative text
  • Portfolio and research experimentation

Downstream Use

The model can be:

  • Extended to larger datasets
  • Used as a starting point for experimenting with scaling laws
  • Modified into a word-level or BPE tokenizer model
  • Integrated into small creative writing applications

Out-of-Scope Use

This model is not suitable for:

  • Production systems
  • Safety-critical applications
  • Factual question answering
  • Long-context reasoning tasks
  • Modern conversational AI

It is a small experimental research model.


Bias, Risks, and Limitations

  • Trained solely on Shakespeare’s works (16th–17th century English).
  • May reflect outdated language, themes, and cultural biases present in the original texts.
  • Limited long-range coherence due to small model size.
  • Occasional word corruption due to character-level modeling.

Recommendations

Users should:

  • Treat outputs as creative text only.
  • Avoid relying on outputs for factual or advisory content.
  • Be aware that stylistic imitation does not imply understanding.

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluation was performed on held-out segments of the Shakespeare corpus not seen during training.

Factors

Evaluation focuses on:

  • Coherence of generated dialogue
  • Structural correctness (speaker formatting, punctuation)
  • Word-level stability (reduced character corruption)
  • Overall qualitative fluency

Metrics

  • Cross-entropy loss (primary quantitative metric)
  • Qualitative text generation assessment

Results

  • Training loss reduced from ~2.0 to ~1.7
  • Model generates structured Shakespeare-style dialogue
  • Speaker tags (e.g., "ROMEO:", "KING RICHARD III:") are learned correctly
  • Occasional character corruption remains at current scale

Summary

The model demonstrates successful learning of Shakespearean formatting and stylistic structure.
Generation quality improves steadily as training loss decreases, with significant coherence emerging below ~1.6 loss.


Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator:
https://mlco2.github.io/impact#compute

  • Hardware Type: Consumer GPU (e.g., RTX-class GPU)
  • Hours Used: Estimated < 24 hours total training
  • Cloud Provider: Local training
  • Compute Region: N/A
  • Carbon Emitted: Low (small-scale experiment)

Technical Specifications

Model Architecture and Objective

  • Decoder-only Transformer (GPT-style)
  • Autoregressive next-token prediction
  • Character-level tokenization
  • Cross-entropy loss objective

The model predicts the next character given previous context.

Compute Infrastructure

Hardware

  • Single GPU (or CPU for smaller-scale experiments)

Software

  • Python 3.x
  • PyTorch
  • Hugging Face Hub

Citation

This model is an independent educational implementation inspired by:

Radford et al., Improving Language Understanding by Generative Pre-Training.


More Information

This project was built as a learning exercise to understand:

  • Transformer architecture internals
  • Training stability and loss dynamics
  • Autoregressive language modeling
  • Generation behavior under different sampling strategies

Model Card Authors

Shreyaj


Model Card Contact

For questions, suggestions, or collaboration inquiries, please contact via GitHub.


How to Get Started with the Model

import torch
from model import MiniGPT
from config import device
from dataset import stoi, itos
from huggingface_hub import hf_hub_download

checkpoint_path = hf_hub_download(
    repo_id="Shreyaj-pseudo/shreyaj-mini-gpt-shakespeare",
    filename="MODEL_NAME",
    local_dir="./downloaded_model"  # exact filename you want
)

checkpoint = torch.load(checkpoint_path, map_location=device)

model = MiniGPT().to(device)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Generate text
def generate(prompt, max_new_tokens="TOKEN_NUMBER"):
    context = torch.tensor(
        [stoi[c] for c in prompt], dtype=torch.long
    ).unsqueeze(0).to(device)

    output = model.generate(context, max_new_tokens=max_new_tokens)
    return ''.join([itos[i] for i in output[0].tolist()])

print(generate("ROMEO:"))

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support