Spaces:

RinggAI
/

STT

Running

App Files Files Community

STT / README.md

harsh2ai

Rebrand to Ringg Parrot STT V1

b672ef4 about 1 month ago

preview code

raw

history blame contribute delete

4.59 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

metadata

title: Ringg Parrot STT V1
emoji: 🦜
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: High-Accuracy Hindi Speech-to-Text System

tags: - speech-to-text - asr - bilingual - english - hindi - audio - transcription - ringg - real-time

🎙️ Ringg Parrot STT V1 :parrot:

Bilingual Speech-to-Text for English & Hindi

🌟 Overview

Ringg Parrot STT V1 is a state-of-the-art speech-to-text system that provides real-time transcription for English and Hindi languages. Our model ranks 1st place among top bilingual ASR models, outperforming OpenAI Whisper Large-v3 and other leading solutions.

📊 Performance Benchmarks

Model	Indic Norm WER ↓	Whisper Norm WER ↓
IndicWav2Vec (Winner)	18.55%	63.31%
Ringg Parrot STT V1	21.03%	66.27%
VakyanSh Wav2Vec2	24.06%	66.34%
Whisper Large-v3	29.17%	63.31%
Whisper Large-v2	37.50%	66.27%

Lower WER (Word Error Rate) indicates better accuracy. Ringg Parrot STT V1 achieves competitive performance while supporting bilingual transcription.

✨ Features

🌐 Bilingual Support: Native support for English and Hindi speech recognition
⚡ Real-time Streaming: Instant transcription as you speak
🎯 High Accuracy: 2nd place among top bilingual ASR models
📁 File Upload: Support for various audio formats (WAV, MP3, FLAC, M4A, etc.)
🚀 Fast Processing: Optimized for low-latency inference
💬 Code-switching: Handles mixed English-Hindi speech

🎯 Model Details

Specification	Details
Model Name	Ringg Parrot STT V1
Languages	English (EN) & Hindi (HI)
Performance	2nd place among top models
Sample Rate	16kHz

🚀 Usage

Real-time Streaming

Go to the "Real-time Streaming" tab
Allow microphone permissions when prompted
Start speaking in English or Hindi
See real-time transcription appear

File Upload

Go to the "File Upload" tab
Upload your audio file (WAV, MP3, FLAC, M4A, etc.)
Click "Transcribe"
View the transcription result

💡 Tips for Best Results

Audio Quality: Use clear audio with minimal background noise
Speaking Style: Speak naturally at a moderate pace
File Format: 16kHz or higher sample rate recommended
Code-switching: Model handles English-Hindi mixing, but accuracy is best when minimizing switches within sentences

📊 Use Cases

🤖 Voice assistants and chatbots
📝 Meeting transcription
🎬 Content creation and subtitling
♿ Accessibility applications
🔍 Voice search and commands
📞 Call center automation
🎓 Educational tools
🌍 Multilingual communication

🔧 Technical Details

Audio Processing

Input Format: Mono audio, automatically resampled to 16kHz
Processing: Chunked streaming with 3-second buffers
Latency: ~2-3 seconds for real-time streaming
GPU Acceleration: CUDA-enabled for faster inference

Supported Audio Formats

WAV (PCM, 16-bit, 24-bit, 32-bit)
MP3
FLAC
M4A
OGG
OPUS

📝 Limitations

Works best with clear audio and minimal background noise
Accuracy may vary with strong accents and dialects
Code-switching within sentences may occasionally affect accuracy
Very long audio files may take longer to process

📈 Performance

WER (Word Error Rate): Optimized for conversational speech
RTF (Real-Time Factor): < 0.3 on GPU (faster than real-time)
Languages: English & Hindi with native support

🔗 Links

Organization: RinggAI on Hugging Face
TTS Space: Ringg TTS V0

👥 Team

Made with ❤️ by the RinggAI Team

Note: This model is designed for research and development purposes. For production use, please ensure compliance with your local regulations regarding speech processing and data privacy.

Dependency	Version
gradio	5.49.1
gradio-client	1.13.3
pandas	2.3.3
requests	2.32.5