Instructions to use 0xskrilla/arabic-legal-documents-ocr-1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use 0xskrilla/arabic-legal-documents-ocr-1.0 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="0xskrilla/arabic-legal-documents-ocr-1.0") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("0xskrilla/arabic-legal-documents-ocr-1.0") model = AutoModelForImageTextToText.from_pretrained("0xskrilla/arabic-legal-documents-ocr-1.0") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use 0xskrilla/arabic-legal-documents-ocr-1.0 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "0xskrilla/arabic-legal-documents-ocr-1.0" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "0xskrilla/arabic-legal-documents-ocr-1.0", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/0xskrilla/arabic-legal-documents-ocr-1.0
- SGLang
How to use 0xskrilla/arabic-legal-documents-ocr-1.0 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "0xskrilla/arabic-legal-documents-ocr-1.0" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "0xskrilla/arabic-legal-documents-ocr-1.0", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "0xskrilla/arabic-legal-documents-ocr-1.0" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "0xskrilla/arabic-legal-documents-ocr-1.0", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use 0xskrilla/arabic-legal-documents-ocr-1.0 with Docker Model Runner:
docker model run hf.co/0xskrilla/arabic-legal-documents-ocr-1.0
Arabic Legal Documents OCR 1.0 (VLM Finetuned)
Watch the Full 3.5-Hour Masterclass on YouTube
This model is a finetuned version of Gemma-3-4B-IT, optimized for extracting structured data from low-quality, scanned Arabic legal documents using Vision Language Model reasoning.
🛠 Installation
Depending on your usage (Local Inference vs. Production Serving), install the required packages:
For Transformers (Local Inference)
pip install transformers==4.57.6 optimum==1.26.0 accelerate==1.8.0 peft==0.17.0 json-repair PIL
For vLLM (High-Performance Serving)
!pip install -q transformers==4.57.6
!pip install -q optimum==1.26.0
!pip install -q datasets==4.4.0
!pip install -q torch==2.8.0
!pip install -q torchvision==0.23
!pip install -q torchaudio==2.8.0
!pip install -q vllm==0.15.0
!pip install json-repair
🖼 Mandatory Image Preprocessing
To achieve the best OCR results, images must be preprocessed (resized and converted to grayscale) before being sent to the model. Below are the utility functions for both standard PIL usage and Base64 (vLLM/OpenAI API).
import base64
from io import BytesIO
from PIL import Image, ImageEnhance
def preprocess_image(image_path, max_width=1024, do_enhance=True, return_base64=False):
image = Image.open(image_path)
# 1. Convert to grayscale
gray_image = image.convert('L')
# 2. Resize maintaining aspect ratio
if gray_image.width > max_width:
ratio = max_width / float(gray_image.width)
new_height = int(gray_image.height * ratio)
gray_image = gray_image.resize((max_width, new_height), Image.LANCZOS)
# 3. Enhance contrast
if do_enhance:
enhancer = ImageEnhance.Contrast(gray_image)
gray_image = enhancer.enhance(1.5)
if return_base64:
buffered = BytesIO()
gray_image.save(buffered, format="JPEG", optimize=True, quality=95)
img_str = base64.b64encode(buffered.getvalue()).decode('utf-8')
return f"data:image/jpeg;base64,{img_str}"
return gray_image
🚀 Usage Examples
1. Using Transformers & json-repair
import json_repair
from transformers import AutoProcessor, Gemma3ForConditionalGeneration
model_id = "bakrianoo/arabic-legal-documents-ocr-1.0"
model = Gemma3ForConditionalGeneration.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained(model_id)
# Preprocess image first
processed_img = preprocess_image("document.jpg", return_base64=False)
messages = [
{"role": "user", "content": [{"type": "image", "image": processed_img}, {"type": "text", "text": "Extract details to JSON."}]}
]
inputs = processor.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=2048)
raw_text = processor.decode(output[0], skip_special_tokens=True)
# Fix and parse JSON output
json_data = json_repair.loads(raw_text)
print(json_data)
2. Using vLLM API
Run vLLM server
vllm serve "bakrianoo/arabic-legal-documents-ocr-1.0" \
--dtype bfloat16 --gpu_memory_utilization 0.8 \
--enable-chunked-prefill \
--allowed-local-media-path "/workspace/"
Inference
from openai import OpenAI
import json_repair
client = OpenAI(api_key="any", base_url="http://localhost:8000/v1")
# Preprocess to Base64
b64_image = preprocess_image("document.jpg", return_base64=True)
response = client.chat.completions.create(
model="bakrianoo/arabic-legal-documents-ocr-1.0",
messages=[{"role": "user", "content": [
{"type": "image_url", "image_url": {"url": b64_image}},
{"type": "text", "text": "Extract details to JSON."}
]}]
)
# Robust parsing
structured_output = json_repair.loads(response.choices[0].message.content)
📺 Full Tutorial
Watch the detailed walkthrough on YouTube to understand the training pipeline: VLM Finetuning for OCR Tasks
Resource
LoRA Adapter: https://huggingface.co/bakrianoo/arabic-legal-documents-ocr-1.0/tree/main/checkpoints
Data: https://huggingface.co/bakrianoo/arabic-legal-documents-ocr-1.0/tree/main/data
Scripts: https://huggingface.co/bakrianoo/arabic-legal-documents-ocr-1.0/tree/main/scripts
- Downloads last month
- 24