Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string
Svarah-Whisper-v1
This model is a LoRA fine-tuned version of the openai/whisper-small model on the Svarah dataset for Automatic Speech Recognition (ASR). It uses Parameter-Efficient Fine-Tuning (PEFT) to adapt the base Whisper model.
Model Details
- Base Model:
openai/whisper-small - Adapter Type: LoRA (Low-Rank Adaptation)
- Language: English
- Task: Automatic Speech Recognition (ASR)
- Dataset: Svarah
Evaluation Results
The model was evaluated on the preprocessed Svarah evaluation set containing 665 samples.
- Word Error Rate (WER): 39.09%
- Word Accuracy Rate (WAR): 60.91%
Training Details
Training Procedure
The model was trained directly via a PyTorch loop leveraging torch.amp (BF16) and gradient accumulation.
Training Hyperparameters
The following hyperparameters were used during training:
- Learning Rate: 1e-04
- Per-Device Train Batch Size: 8
- Gradient Accumulation Steps: 2
- Effective Training Batch Size: 16
- Warmup Steps: 100
- Max Training Steps: 5000
- Optimizer: AdamW (weight_decay=0.01)
- Learning Rate Scheduler: Linear (with warmup)
- Mixed Precision Training: BF16 (
torch.bfloat16) - Max Gradient Norm: 1.0
LoRA Configuration (peft)
- Rank (r): 32
- Alpha: 64
- Dropout: 0.05
- Target Modules:
q_proj,v_proj - Bias: none
Framework Versions
- PEFT: 0.18.1
- Transformers
- PyTorch
How to use
You can load and use this model with the peft and transformers libraries:
import torch
from peft import PeftModel
from transformers import WhisperForConditionalGeneration, WhisperProcessor
base_model_name = "openai/whisper-small"
peft_model_id = "your-username/whisper-small-svarah-lora-final" # Replace with your Kaggle/HF path
processor = WhisperProcessor.from_pretrained(base_model_name)
base_model = WhisperForConditionalGeneration.from_pretrained(
base_model_name,
low_cpu_mem_usage=True
)
# Apply the LoRA adapter
model = PeftModel.from_pretrained(base_model, peft_model_id)
model = model.to("cuda")
model.eval()
# Process audio
# inputs = processor(audio_array, return_tensors="pt", sampling_rate=16000)
# generated_ids = model.generate(inputs["input_features"].to("cuda"), max_new_tokens=225)
# transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)
- Downloads last month
- 24
Model tree for Akshatkasera007/Svarah-Whisper-v1
Base model
openai/whisper-small