STT / README.md
harsh2ai
Rebrand to Ringg Parrot STT V1
b672ef4

A newer version of the Gradio SDK is available: 6.2.0

Upgrade
metadata
title: Ringg Parrot STT V1
emoji: 🦜
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: High-Accuracy Hindi Speech-to-Text System

tags: - speech-to-text - asr - bilingual - english - hindi - audio - transcription - ringg - real-time

πŸŽ™οΈ Ringg Parrot STT V1 :parrot:

Bilingual Speech-to-Text for English & Hindi

Hugging Face Spaces License

🌟 Overview

Ringg Parrot STT V1 is a state-of-the-art speech-to-text system that provides real-time transcription for English and Hindi languages. Our model ranks 1st place among top bilingual ASR models, outperforming OpenAI Whisper Large-v3 and other leading solutions.

πŸ“Š Performance Benchmarks

Model Indic Norm WER ↓ Whisper Norm WER ↓
IndicWav2Vec (Winner) 18.55% 63.31%
Ringg Parrot STT V1 21.03% 66.27%
VakyanSh Wav2Vec2 24.06% 66.34%
Whisper Large-v3 29.17% 63.31%
Whisper Large-v2 37.50% 66.27%

Lower WER (Word Error Rate) indicates better accuracy. Ringg Parrot STT V1 achieves competitive performance while supporting bilingual transcription.

✨ Features

  • 🌐 Bilingual Support: Native support for English and Hindi speech recognition
  • ⚑ Real-time Streaming: Instant transcription as you speak
  • 🎯 High Accuracy: 2nd place among top bilingual ASR models
  • πŸ“ File Upload: Support for various audio formats (WAV, MP3, FLAC, M4A, etc.)
  • πŸš€ Fast Processing: Optimized for low-latency inference
  • πŸ’¬ Code-switching: Handles mixed English-Hindi speech

🎯 Model Details

Specification Details
Model Name Ringg Parrot STT V1
Languages English (EN) & Hindi (HI)
Performance 2nd place among top models
Sample Rate 16kHz

πŸš€ Usage

Real-time Streaming

  1. Go to the "Real-time Streaming" tab
  2. Allow microphone permissions when prompted
  3. Start speaking in English or Hindi
  4. See real-time transcription appear

File Upload

  1. Go to the "File Upload" tab
  2. Upload your audio file (WAV, MP3, FLAC, M4A, etc.)
  3. Click "Transcribe"
  4. View the transcription result

πŸ’‘ Tips for Best Results

  • Audio Quality: Use clear audio with minimal background noise
  • Speaking Style: Speak naturally at a moderate pace
  • File Format: 16kHz or higher sample rate recommended
  • Code-switching: Model handles English-Hindi mixing, but accuracy is best when minimizing switches within sentences

πŸ“Š Use Cases

  • πŸ€– Voice assistants and chatbots
  • πŸ“ Meeting transcription
  • 🎬 Content creation and subtitling
  • β™Ώ Accessibility applications
  • πŸ” Voice search and commands
  • πŸ“ž Call center automation
  • πŸŽ“ Educational tools
  • 🌍 Multilingual communication

πŸ”§ Technical Details

Audio Processing

  • Input Format: Mono audio, automatically resampled to 16kHz
  • Processing: Chunked streaming with 3-second buffers
  • Latency: ~2-3 seconds for real-time streaming
  • GPU Acceleration: CUDA-enabled for faster inference

Supported Audio Formats

  • WAV (PCM, 16-bit, 24-bit, 32-bit)
  • MP3
  • FLAC
  • M4A
  • OGG
  • OPUS

πŸ“ Limitations

  • Works best with clear audio and minimal background noise
  • Accuracy may vary with strong accents and dialects
  • Code-switching within sentences may occasionally affect accuracy
  • Very long audio files may take longer to process

πŸ“ˆ Performance

  • WER (Word Error Rate): Optimized for conversational speech
  • RTF (Real-Time Factor): < 0.3 on GPU (faster than real-time)
  • Languages: English & Hindi with native support

πŸ”— Links

πŸ‘₯ Team

Made with ❀️ by the RinggAI Team


Note: This model is designed for research and development purposes. For production use, please ensure compliance with your local regulations regarding speech processing and data privacy.

Dependency Version
gradio 5.49.1
gradio-client 1.13.3
pandas 2.3.3
requests 2.32.5