Garo ASR - Whisper Small Fine-tuned
Fine-tuned Whisper Small model for Automatic Speech Recognition in Garo language.
Model Details
- Base Model: openai/whisper-small (244M parameters)
- Language: Garo (Tibeto-Burman language family)
- Training Data: ARTPARK-IISc
- Training Samples: 26,784
- Test Samples: 3,348
Performance
| Metric | Score |
|---|---|
| Word Error Rate (WER) | 9.74% |
| Character Error Rate (CER) | 3.82% |
Baseline Comparison
| Model | WER | CER |
|---|---|---|
| Whisper-small (zero-shot) | 382.7% | - |
| Whisper-base (zero-shot) | - | 203.46% |
| This model (fine-tuned) | 9.74% | 3.82% |
Usage
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
# Load model and processor
processor = WhisperProcessor.from_pretrained("MWirelabs/garo-asr")
model = WhisperForConditionalGeneration.from_pretrained("MWirelabs/garo-asr")
# Load audio (16kHz)
# audio_array = your audio as numpy array
# Generate transcription
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
predicted_ids = model.generate(inputs.input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
Training Details
- Training Steps: 4,000 (best checkpoint at 2,500)
- Batch Size: 16 per device
- Gradient Accumulation: 2 steps
- Learning Rate: 1e-5
- Warmup Steps: 500
- Precision: FP16
Limitations
- Performance degrades on English loanwords and code-switching
- 432 samples (12.9%) in test set contain annotation noise
- ~43% error rate on code-switched utterances with English words
Dataset Statistics
- Audio Duration: Mean 4.04s, Median 3.81s (range: 1.78-11.13s)
- Vocabulary: 3,621 unique words
- Type-Token Ratio: 0.148
Inference Speed
- Average: 0.252s per sample
- Real-time Factor: 0.05x (20x faster than real-time)
Citation
If you use this model, please cite:
@misc{garo-asr-2026,
author = {MWire Labs},
title = {Garo ASR: Fine-tuned Whisper for Garo Language},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/MWirelabs/garo-asr}
}
Acknowledgments
- Dataset: ARTPARK-IISc Vaani project
- Base Model: OpenAI Whisper """
- Downloads last month
- 37
Model tree for MWirelabs/garo-asr
Base model
openai/whisper-smallSpace using MWirelabs/garo-asr 1
Evaluation results
- Word Error Rate on Vaani Garotest set self-reported9.740
- Character Error Rate on Vaani Garotest set self-reported3.820