|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: mit |
|
|
tags: |
|
|
- handwriting-recognition |
|
|
- ocr |
|
|
- computer-vision |
|
|
- pytorch |
|
|
- crnn |
|
|
- ctc |
|
|
- iam-dataset |
|
|
library_name: pytorch |
|
|
datasets: |
|
|
- Teklia/IAM-line |
|
|
metrics: |
|
|
- cer |
|
|
- wer |
|
|
model-index: |
|
|
- name: handwriting-recognition-iam |
|
|
results: |
|
|
- task: |
|
|
type: image-to-text |
|
|
name: Handwriting Recognition |
|
|
dataset: |
|
|
name: IAM Handwriting Database |
|
|
type: Teklia/IAM-line |
|
|
metrics: |
|
|
- type: cer |
|
|
value: 0.1295 |
|
|
name: Character Error Rate |
|
|
- type: wer |
|
|
value: 0.4247 |
|
|
name: Word Error Rate |
|
|
--- |
|
|
|
|
|
# Handwriting Recognition |
|
|
|
|
|
Complete handwriting recognition system using CNN-BiLSTM-CTC on the IAM dataset. |
|
|
|
|
|
## π Files |
|
|
|
|
|
### 1. **analysis.ipynb** - Dataset Analysis |
|
|
- Exploratory Data Analysis (EDA) |
|
|
- 5 detailed charts saved to `charts/` folder |
|
|
- Run locally or on Colab (no GPU needed) |
|
|
|
|
|
### 2. **train_colab.ipynb** - Model Training (GPU) |
|
|
- **β‘ Google Colab GPU compatible** |
|
|
- Full training pipeline |
|
|
- CNN-BiLSTM-CTC model (~9.1M parameters) |
|
|
- Automatic model saving |
|
|
- Download trained model for deployment |
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
### Option 1: Analyze Dataset (Local/Colab) |
|
|
```bash |
|
|
jupyter notebook analysis.ipynb |
|
|
``` |
|
|
- No GPU needed |
|
|
- Generates 5 EDA charts |
|
|
- Fast (~2 minutes) |
|
|
|
|
|
### Option 2: Train Model (Google Colab GPU) |
|
|
|
|
|
1. **Upload `train_colab.ipynb` to Google Colab** |
|
|
2. **Change runtime to GPU:** |
|
|
- Runtime β Change runtime type β GPU (T4 recommended) |
|
|
3. **Run all cells** |
|
|
4. **Download trained model** (last cell) |
|
|
|
|
|
**Training Time:** ~1-2 hours for 20 epochs on T4 GPU |
|
|
|
|
|
## π Charts Generated |
|
|
|
|
|
From `analysis.ipynb`: |
|
|
1. `charts/01_sample_images.png` - 10 sample handwritten texts |
|
|
2. `charts/02_text_length_distribution.png` - Text statistics |
|
|
3. `charts/03_image_dimensions.png` - Image analysis |
|
|
4. `charts/04_character_frequency.png` - Character distribution |
|
|
5. `charts/05_summary_statistics.png` - Summary table |
|
|
|
|
|
## π― Model Details |
|
|
|
|
|
**Architecture:** |
|
|
- **CNN**: 7 convolutional blocks (feature extraction) |
|
|
- **BiLSTM**: 2 layers, 256 hidden units (sequence modeling) |
|
|
- **CTC Loss**: Alignment-free training |
|
|
|
|
|
**Dataset:** Teklia/IAM-line (Hugging Face) |
|
|
- Train: 6,482 samples |
|
|
- Validation: 976 samples |
|
|
- Test: 2,915 samples |
|
|
|
|
|
**Metrics:** |
|
|
- **CER** (Character Error Rate) |
|
|
- **WER** (Word Error Rate) |
|
|
|
|
|
## πΎ Model Files |
|
|
|
|
|
After training in Colab: |
|
|
- `best_model.pth` - Trained model weights |
|
|
- `training_history.png` - Loss/CER/WER plots |
|
|
- `predictions.png` - Sample predictions |
|
|
|
|
|
## π¦ Requirements |
|
|
|
|
|
``` |
|
|
torch>=2.0.0 |
|
|
datasets>=2.14.0 |
|
|
pillow>=9.5.0 |
|
|
numpy>=1.24.0 |
|
|
matplotlib>=3.7.0 |
|
|
seaborn>=0.13.0 |
|
|
jupyter>=1.0.0 |
|
|
jiwer>=3.0.0 |
|
|
``` |
|
|
|
|
|
## π§ Usage |
|
|
|
|
|
### Load Trained Model |
|
|
```python |
|
|
import torch |
|
|
|
|
|
# Load checkpoint |
|
|
checkpoint = torch.load('best_model.pth') |
|
|
char_mapper = checkpoint['char_mapper'] |
|
|
|
|
|
# Create model |
|
|
from train_colab import CRNN # Copy model class |
|
|
model = CRNN(num_chars=len(char_mapper.chars)) |
|
|
model.load_state_dict(checkpoint['model_state_dict']) |
|
|
model.eval() |
|
|
|
|
|
# Predict |
|
|
# ... (preprocessing + inference) |
|
|
``` |
|
|
|
|
|
## π Notes |
|
|
|
|
|
- **GPU strongly recommended** for training (use Colab T4) |
|
|
- Training on CPU will be extremely slow (~20x slower) |
|
|
- Colab free tier: 12-hour limit, sufficient for 20 epochs |
|
|
- Model checkpoint includes character mapper for deployment |
|
|
|
|
|
## π Training Tips |
|
|
|
|
|
1. **Start with fewer epochs** (5-10) to test |
|
|
2. **Monitor CER/WER** - stop if not improving |
|
|
3. **Increase epochs** if still improving (up to 50) |
|
|
4. **Save checkpoint** before Colab disconnects |
|
|
5. **Download model immediately** after training |
|
|
|
|
|
## π License |
|
|
|
|
|
Dataset: IAM Database (research use) |
|
|
|