Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string

Svarah-Whisper-v1

This model is a LoRA fine-tuned version of the openai/whisper-small model on the Svarah dataset for Automatic Speech Recognition (ASR). It uses Parameter-Efficient Fine-Tuning (PEFT) to adapt the base Whisper model.

Model Details

  • Base Model: openai/whisper-small
  • Adapter Type: LoRA (Low-Rank Adaptation)
  • Language: English
  • Task: Automatic Speech Recognition (ASR)
  • Dataset: Svarah

Evaluation Results

The model was evaluated on the preprocessed Svarah evaluation set containing 665 samples.

  • Word Error Rate (WER): 39.09%
  • Word Accuracy Rate (WAR): 60.91%

Training Details

Training Procedure

The model was trained directly via a PyTorch loop leveraging torch.amp (BF16) and gradient accumulation.

Training Hyperparameters

The following hyperparameters were used during training:

  • Learning Rate: 1e-04
  • Per-Device Train Batch Size: 8
  • Gradient Accumulation Steps: 2
  • Effective Training Batch Size: 16
  • Warmup Steps: 100
  • Max Training Steps: 5000
  • Optimizer: AdamW (weight_decay=0.01)
  • Learning Rate Scheduler: Linear (with warmup)
  • Mixed Precision Training: BF16 (torch.bfloat16)
  • Max Gradient Norm: 1.0

LoRA Configuration (peft)

  • Rank (r): 32
  • Alpha: 64
  • Dropout: 0.05
  • Target Modules: q_proj, v_proj
  • Bias: none

Framework Versions

  • PEFT: 0.18.1
  • Transformers
  • PyTorch

How to use

You can load and use this model with the peft and transformers libraries:

import torch
from peft import PeftModel
from transformers import WhisperForConditionalGeneration, WhisperProcessor

base_model_name = "openai/whisper-small"
peft_model_id = "your-username/whisper-small-svarah-lora-final" # Replace with your Kaggle/HF path

processor = WhisperProcessor.from_pretrained(base_model_name)
base_model = WhisperForConditionalGeneration.from_pretrained(
    base_model_name,
    low_cpu_mem_usage=True
)

# Apply the LoRA adapter
model = PeftModel.from_pretrained(base_model, peft_model_id)
model = model.to("cuda")
model.eval()

# Process audio
# inputs = processor(audio_array, return_tensors="pt", sampling_rate=16000)
# generated_ids = model.generate(inputs["input_features"].to("cuda"), max_new_tokens=225)
# transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)
Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Akshatkasera007/Svarah-Whisper-v1

Adapter
(202)
this model