You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Garo ASR - Whisper Small Fine-tuned

Fine-tuned Whisper Small model for Automatic Speech Recognition in Garo language.

Model Details

Base Model: openai/whisper-small (244M parameters)
Language: Garo (Tibeto-Burman language family)
Training Data: ARTPARK-IISc
Training Samples: 26,784
Test Samples: 3,348

Performance

Metric	Score
Word Error Rate (WER)	9.74%
Character Error Rate (CER)	3.82%

Baseline Comparison

Model	WER	CER
Whisper-small (zero-shot)	382.7%	-
Whisper-base (zero-shot)	-	203.46%
This model (fine-tuned)	9.74%	3.82%

Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch

# Load model and processor
processor = WhisperProcessor.from_pretrained("MWirelabs/garo-asr")
model = WhisperForConditionalGeneration.from_pretrained("MWirelabs/garo-asr")

# Load audio (16kHz)
# audio_array = your audio as numpy array

# Generate transcription
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    predicted_ids = model.generate(inputs.input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

Training Details

Training Steps: 4,000 (best checkpoint at 2,500)
Batch Size: 16 per device
Gradient Accumulation: 2 steps
Learning Rate: 1e-5
Warmup Steps: 500
Precision: FP16

Limitations

Performance degrades on English loanwords and code-switching
432 samples (12.9%) in test set contain annotation noise
~43% error rate on code-switched utterances with English words

Dataset Statistics

Audio Duration: Mean 4.04s, Median 3.81s (range: 1.78-11.13s)
Vocabulary: 3,621 unique words
Type-Token Ratio: 0.148

Inference Speed

Average: 0.252s per sample
Real-time Factor: 0.05x (20x faster than real-time)

Citation

If you use this model, please cite:

@misc{garo-asr-2026,
  author = {MWire Labs},
  title = {Garo ASR: Fine-tuned Whisper for Garo Language},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/MWirelabs/garo-asr}
}

Acknowledgments

Dataset: ARTPARK-IISc Vaani project
Base Model: OpenAI Whisper """

Downloads last month: 37

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for MWirelabs/garo-asr

Base model

openai/whisper-small

Finetuned

(3207)

this model

Space using MWirelabs/garo-asr 1

Evaluation results

Word Error Rate on Vaani Garo
test set self-reported

9.740
Character Error Rate on Vaani Garo
test set self-reported

3.820