CLSRIL-23: Cross Lingual Speech Representations for Indic Languages
Paper • 2107.07402 • Published
How to use Harveenchadha/wav2vec2-pretrained-clsril-23-10k with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("feature-extraction", model="Harveenchadha/wav2vec2-pretrained-clsril-23-10k") # Load model directly
from transformers import AutoProcessor, AutoModel
processor = AutoProcessor.from_pretrained("Harveenchadha/wav2vec2-pretrained-clsril-23-10k")
model = AutoModel.from_pretrained("Harveenchadha/wav2vec2-pretrained-clsril-23-10k")YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
We present a CLSRIL-23 (Cross Lingual Speech Representations on Indic Languages), a self supervised learning based audio pre-trained model which learns cross lingual speech representations from raw audio across 23 Indic languages. It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations and jointly learns the quantization of latents shared across all languages.
Original Repo contains models in fairseq format.
| Language | Data (In Hrs) |
|---|---|
| Assamese | 254.9 |
| Bengali | 331.3 |
| Bodo | 26.9 |
| Dogri | 17.1 |
| English | 819.7 |
| Gujarati | 336.7 |
| Hindi | 4563.7 |
| Kannada | 451.8 |
| Kashmiri | 67.8 |
| Konkani | 36.8 |
| Maithili | 113.8 |
| Malayalam | 297.7 |
| Manipuri | 171.9 |
| Marathi | 458.2 |
| Nepali | 31.6 |
| Odia | 131.4 |
| Punjabi | 486.05 |
| Sanskrit | 58.8 |
| Santali | 6.56 |
| Sindhi | 16 |
| Tamil | 542.6 |
| Telugu | 302.8 |
| Urdu | 259.68 |
Experimentation platform built on top of fairseq.