thai_trocr_thaigov_v2

Vision Encoder Decoder Models

Use microsoft/trocr-base-handwritten as encoder.
Use airesearch/wangchanberta-base-att-spm-uncased as decoder
Fine-tune on 250k synthetic text images dataset using ThaiGov V2 Corpus
Use SynthTIGER to generate synthetic text image.
It is useful to fine-tune any Thai OCR task.

Usage

from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel

processor = TrOCRProcessor.from_pretrained("kkatiz/thai-trocr-thaigov-v2")
model = VisionEncoderDecoderModel.from_pretrained("kkatiz/thai-trocr-thaigov-v2")

image = Image.open("... your image path").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)

generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)

Downloads last month: 36

Safetensors

Model size

0.2B params

Tensor type

F32

kkatiz
/

thai-trocr-thaigov-v2

thai_trocr_thaigov_v2

Usage

Spaces using kkatiz/thai-trocr-thaigov-v2 3