arabic-legal-documents-ocr-1.0

Instructions to use 0xskrilla/arabic-legal-documents-ocr-1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use 0xskrilla/arabic-legal-documents-ocr-1.0 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="0xskrilla/arabic-legal-documents-ocr-1.0")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("0xskrilla/arabic-legal-documents-ocr-1.0")
model = AutoModelForImageTextToText.from_pretrained("0xskrilla/arabic-legal-documents-ocr-1.0")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use 0xskrilla/arabic-legal-documents-ocr-1.0 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "0xskrilla/arabic-legal-documents-ocr-1.0"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "0xskrilla/arabic-legal-documents-ocr-1.0",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/0xskrilla/arabic-legal-documents-ocr-1.0

SGLang

How to use 0xskrilla/arabic-legal-documents-ocr-1.0 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "0xskrilla/arabic-legal-documents-ocr-1.0" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "0xskrilla/arabic-legal-documents-ocr-1.0",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "0xskrilla/arabic-legal-documents-ocr-1.0" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "0xskrilla/arabic-legal-documents-ocr-1.0",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use 0xskrilla/arabic-legal-documents-ocr-1.0 with Docker Model Runner:
```
docker model run hf.co/0xskrilla/arabic-legal-documents-ocr-1.0
```

Arabic Legal Documents OCR 1.0 (VLM Finetuned)

Watch the Full 3.5-Hour Masterclass on YouTube

This model is a finetuned version of Gemma-3-4B-IT, optimized for extracting structured data from low-quality, scanned Arabic legal documents using Vision Language Model reasoning.

🛠 Installation

Depending on your usage (Local Inference vs. Production Serving), install the required packages:

For Transformers (Local Inference)

pip install transformers==4.57.6 optimum==1.26.0 accelerate==1.8.0 peft==0.17.0 json-repair PIL

For vLLM (High-Performance Serving)

!pip install -q transformers==4.57.6
!pip install -q optimum==1.26.0
!pip install -q datasets==4.4.0

!pip install -q torch==2.8.0
!pip install -q torchvision==0.23
!pip install -q torchaudio==2.8.0

!pip install -q vllm==0.15.0
!pip install json-repair

🖼 Mandatory Image Preprocessing

To achieve the best OCR results, images must be preprocessed (resized and converted to grayscale) before being sent to the model. Below are the utility functions for both standard PIL usage and Base64 (vLLM/OpenAI API).

import base64
from io import BytesIO
from PIL import Image, ImageEnhance

def preprocess_image(image_path, max_width=1024, do_enhance=True, return_base64=False):
    image = Image.open(image_path)
    
    # 1. Convert to grayscale
    gray_image = image.convert('L')
    
    # 2. Resize maintaining aspect ratio
    if gray_image.width > max_width:
        ratio = max_width / float(gray_image.width)
        new_height = int(gray_image.height * ratio)
        gray_image = gray_image.resize((max_width, new_height), Image.LANCZOS)

    # 3. Enhance contrast
    if do_enhance:
        enhancer = ImageEnhance.Contrast(gray_image)
        gray_image = enhancer.enhance(1.5)

    if return_base64:
        buffered = BytesIO()
        gray_image.save(buffered, format="JPEG", optimize=True, quality=95)
        img_str = base64.b64encode(buffered.getvalue()).decode('utf-8')
        return f"data:image/jpeg;base64,{img_str}"
    
    return gray_image

🚀 Usage Examples

1. Using Transformers & json-repair

import json_repair
from transformers import AutoProcessor, Gemma3ForConditionalGeneration

model_id = "bakrianoo/arabic-legal-documents-ocr-1.0"
model = Gemma3ForConditionalGeneration.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained(model_id)

# Preprocess image first
processed_img = preprocess_image("document.jpg", return_base64=False)

messages = [
    {"role": "user", "content": [{"type": "image", "image": processed_img}, {"type": "text", "text": "Extract details to JSON."}]}
]

inputs = processor.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=2048)
raw_text = processor.decode(output[0], skip_special_tokens=True)

# Fix and parse JSON output
json_data = json_repair.loads(raw_text)
print(json_data)

2. Using vLLM API

Run vLLM server

vllm serve "bakrianoo/arabic-legal-documents-ocr-1.0" \
--dtype bfloat16 --gpu_memory_utilization 0.8 \
--enable-chunked-prefill \
--allowed-local-media-path "/workspace/"

Inference

from openai import OpenAI
import json_repair

client = OpenAI(api_key="any", base_url="http://localhost:8000/v1")

# Preprocess to Base64
b64_image = preprocess_image("document.jpg", return_base64=True)

response = client.chat.completions.create(
    model="bakrianoo/arabic-legal-documents-ocr-1.0",
    messages=[{"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": b64_image}},
        {"type": "text", "text": "Extract details to JSON."}
    ]}]
)

# Robust parsing
structured_output = json_repair.loads(response.choices[0].message.content)