A newer version of the Gradio SDK is available:
6.2.0
metadata
title: Ringg Parrot STT V1
emoji: π¦
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: High-Accuracy Hindi Speech-to-Text System
tags: - speech-to-text - asr - bilingual - english - hindi - audio - transcription - ringg - real-time
ποΈ Ringg Parrot STT V1 :parrot:
Bilingual Speech-to-Text for English & Hindi
π Overview
Ringg Parrot STT V1 is a state-of-the-art speech-to-text system that provides real-time transcription for English and Hindi languages. Our model ranks 1st place among top bilingual ASR models, outperforming OpenAI Whisper Large-v3 and other leading solutions.
π Performance Benchmarks
| Model | Indic Norm WER β | Whisper Norm WER β |
|---|---|---|
| IndicWav2Vec (Winner) | 18.55% | 63.31% |
| Ringg Parrot STT V1 | 21.03% | 66.27% |
| VakyanSh Wav2Vec2 | 24.06% | 66.34% |
| Whisper Large-v3 | 29.17% | 63.31% |
| Whisper Large-v2 | 37.50% | 66.27% |
Lower WER (Word Error Rate) indicates better accuracy. Ringg Parrot STT V1 achieves competitive performance while supporting bilingual transcription.
β¨ Features
- π Bilingual Support: Native support for English and Hindi speech recognition
- β‘ Real-time Streaming: Instant transcription as you speak
- π― High Accuracy: 2nd place among top bilingual ASR models
- π File Upload: Support for various audio formats (WAV, MP3, FLAC, M4A, etc.)
- π Fast Processing: Optimized for low-latency inference
- π¬ Code-switching: Handles mixed English-Hindi speech
π― Model Details
| Specification | Details |
|---|---|
| Model Name | Ringg Parrot STT V1 |
| Languages | English (EN) & Hindi (HI) |
| Performance | 2nd place among top models |
| Sample Rate | 16kHz |
π Usage
Real-time Streaming
- Go to the "Real-time Streaming" tab
- Allow microphone permissions when prompted
- Start speaking in English or Hindi
- See real-time transcription appear
File Upload
- Go to the "File Upload" tab
- Upload your audio file (WAV, MP3, FLAC, M4A, etc.)
- Click "Transcribe"
- View the transcription result
π‘ Tips for Best Results
- Audio Quality: Use clear audio with minimal background noise
- Speaking Style: Speak naturally at a moderate pace
- File Format: 16kHz or higher sample rate recommended
- Code-switching: Model handles English-Hindi mixing, but accuracy is best when minimizing switches within sentences
π Use Cases
- π€ Voice assistants and chatbots
- π Meeting transcription
- π¬ Content creation and subtitling
- βΏ Accessibility applications
- π Voice search and commands
- π Call center automation
- π Educational tools
- π Multilingual communication
π§ Technical Details
Audio Processing
- Input Format: Mono audio, automatically resampled to 16kHz
- Processing: Chunked streaming with 3-second buffers
- Latency: ~2-3 seconds for real-time streaming
- GPU Acceleration: CUDA-enabled for faster inference
Supported Audio Formats
- WAV (PCM, 16-bit, 24-bit, 32-bit)
- MP3
- FLAC
- M4A
- OGG
- OPUS
π Limitations
- Works best with clear audio and minimal background noise
- Accuracy may vary with strong accents and dialects
- Code-switching within sentences may occasionally affect accuracy
- Very long audio files may take longer to process
π Performance
- WER (Word Error Rate): Optimized for conversational speech
- RTF (Real-Time Factor): < 0.3 on GPU (faster than real-time)
- Languages: English & Hindi with native support
π Links
- Organization: RinggAI on Hugging Face
- TTS Space: Ringg TTS V0
π₯ Team
Made with β€οΈ by the RinggAI Team
Note: This model is designed for research and development purposes. For production use, please ensure compliance with your local regulations regarding speech processing and data privacy.
| Dependency | Version |
|---|---|
| gradio | 5.49.1 |
| gradio-client | 1.13.3 |
| pandas | 2.3.3 |
| requests | 2.32.5 |