LightOnOCR-2-1B for Latin (Line-Level)

This model is a fine-tuned version of lightonai/LightOnOCR-2-1B-base specifically trained for line-level OCR.

CATMuS medieval manuscript OCR model trained on line-level images from diverse European manuscripts.

Model Description

Base Model: lightonai/LightOnOCR-2-1B-base
Training Data: CATMuS/medieval
Task: Line-level text transcription from document images
Language: Latin (la/fr/de/en)
Architecture: Vision-Language Model (1B parameters)

This is a line-level model - it expects cropped line images as input, not full pages. Each image should contain a single line of text.

Evaluation Results

Evaluated on 50 samples from the test set:

Metric	Base Model	Finetuned	Improvement
CER (%)	194.75	43.81	+150.94
WER (%)	218.38	76.56	+141.82
Perfect Matches	0	6	+6

Lower CER/WER is better. Higher perfect matches is better.

Example Outputs

#	Ground Truth	Base Model	Finetuned
1	dos cartas anios: de un deuor Esta carta...		cas casas anias de yera Esa cara fue rre...
2	cl̃igo de yung̃ra. Et yo Maran gañz esc...		dugo de uanima Etc̃ coa de darin ꝑm̃s oͣ...
3	ssegũd q̃ enella diçe. Siño q̃l qͥer ...		s̃ ꝑt̃s q̃ cuella dize. Si no q̃l q̃l q̃...
4	consseiõ ⁊ con otorgamiento dela Reyna ...		cuel̃to ⁊ in cõrgimento della Reyna Stõn...
5	tos ⁊ con todas sus ꝑtenençias ⁊ con to...	তত্ত্ব ন এম সোলার নীতি যুক্তিমানতা ন এম ...	tõs ⁊ en todas sus premençias ⁊ en todas...

✓ = exact match

Usage

Installation

# Requires transformers from source
pip install git+https://github.com/huggingface/transformers
pip install pillow torch

Python Usage

import torch
from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor
from PIL import Image

# Load model and processor
model_id = "wjbmattingly/LightOnOCR-2-1B-catmus"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32

processor = LightOnOcrProcessor.from_pretrained(model_id)
model = LightOnOcrForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=dtype,
).to(device)

# Load your line image
image = Image.open("your_image.jpg").convert("RGB")

# Prepare input
messages = [{"role": "user", "content": [{"type": "image"}]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = processor(
    text=[text],
    images=[[image]],
    return_tensors="pt",
    padding=True,
    size={"longest_edge": 700},
).to(device)
inputs["pixel_values"] = inputs["pixel_values"].to(dtype)

# Generate transcription
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)

# Decode output
input_length = inputs["input_ids"].shape[1]
generated_ids = outputs[0, input_length:]
transcription = processor.decode(generated_ids, skip_special_tokens=True)

print(transcription)

Batch Inference

from datasets import load_dataset

# Load dataset
dataset = load_dataset("CATMuS/medieval", split="train[:10]")

# Process batch
images = [[img.convert("RGB")] for img in dataset["image"]]
messages = [{"role": "user", "content": [{"type": "image"}]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
texts = [text] * len(images)

inputs = processor(
    text=texts,
    images=images,
    return_tensors="pt",
    padding=True,
    size={"longest_edge": 700},
).to(device)
inputs["pixel_values"] = inputs["pixel_values"].to(dtype)

outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
predictions = processor.batch_decode(outputs[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)

for pred, gt in zip(predictions, dataset["text"]):
    print(f"Prediction: {pred}")
    print(f"Ground Truth: {gt}")
    print()

Training Details

Base Model: lightonai/LightOnOCR-2-1B-base
Training Method: Fine-tuning with frozen language model backbone
Optimizer: AdamW (fused)
Learning Rate: 6e-5 with linear decay
Precision: bfloat16

Limitations

This model is trained on line-level images. For full-page transcription, you need to first segment the page into individual lines.
Performance may vary on document styles not represented in the training data.

Citation

If you use this model, please cite:

@misc{lightonocr2_finetuned_2026,
  title = {LightOnOCR Fine-tuned for Latin},
  author = {William Mattingly},
  year = {2026},
  howpublished = {\url{https://huggingface.co/wjbmattingly/LightOnOCR-2-1B-catmus}}
}

And the original LightOnOCR paper:

@misc{lightonocr2_2026,
  title = {LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR},
  author = {Said Taghadouini and Adrien Cavaill\`{e}s and Baptiste Aubertin},
  year = {2026},
  howpublished = {\url{https://arxiv.org/pdf/2601.14251}}
}

Acknowledgments

LightOn AI for the excellent LightOnOCR base model
The creators of the CATMuS/medieval dataset

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for wjbmattingly/LightOnOCR-2-1B-catmus

Base model

lightonai/LightOnOCR-2-1B-base

Finetuned

(13)

this model

Dataset used to train wjbmattingly/LightOnOCR-2-1B-catmus

Paper for wjbmattingly/LightOnOCR-2-1B-catmus

LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR

Paper • 2601.14251 • Published Jan 20 • 24