Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string

Svarah-Whisper-v1

This model is a LoRA fine-tuned version of the openai/whisper-small model on the Svarah dataset for Automatic Speech Recognition (ASR). It uses Parameter-Efficient Fine-Tuning (PEFT) to adapt the base Whisper model.

Model Details

Base Model: openai/whisper-small
Adapter Type: LoRA (Low-Rank Adaptation)
Language: English
Task: Automatic Speech Recognition (ASR)
Dataset: Svarah

Evaluation Results

The model was evaluated on the preprocessed Svarah evaluation set containing 665 samples.

Word Error Rate (WER): 39.09%
Word Accuracy Rate (WAR): 60.91%

Training Details

Training Procedure

The model was trained directly via a PyTorch loop leveraging torch.amp (BF16) and gradient accumulation.

Training Hyperparameters

The following hyperparameters were used during training:

Learning Rate: 1e-04
Per-Device Train Batch Size: 8
Gradient Accumulation Steps: 2
Effective Training Batch Size: 16
Warmup Steps: 100
Max Training Steps: 5000
Optimizer: AdamW (weight_decay=0.01)
Learning Rate Scheduler: Linear (with warmup)
Mixed Precision Training: BF16 (torch.bfloat16)
Max Gradient Norm: 1.0

LoRA Configuration (`peft`)

Rank (r): 32
Alpha: 64
Dropout: 0.05
Target Modules: q_proj, v_proj
Bias: none

Framework Versions

PEFT: 0.18.1
Transformers
PyTorch

How to use

You can load and use this model with the peft and transformers libraries:

import torch
from peft import PeftModel
from transformers import WhisperForConditionalGeneration, WhisperProcessor

base_model_name = "openai/whisper-small"
peft_model_id = "your-username/whisper-small-svarah-lora-final" # Replace with your Kaggle/HF path

processor = WhisperProcessor.from_pretrained(base_model_name)
base_model = WhisperForConditionalGeneration.from_pretrained(
    base_model_name,
    low_cpu_mem_usage=True
)

# Apply the LoRA adapter
model = PeftModel.from_pretrained(base_model, peft_model_id)
model = model.to("cuda")
model.eval()

# Process audio
# inputs = processor(audio_array, return_tensors="pt", sampling_rate=16000)
# generated_ids = model.generate(inputs["input_features"].to("cuda"), max_new_tokens=225)
# transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)

Downloads last month: 24

Model tree for Akshatkasera007/Svarah-Whisper-v1

Base model

openai/whisper-small

Adapter

(202)

this model