Simba-M / README.md

Update README.md

f401058 verified 3 months ago

8.65 kB

	---
	language:
	- am # Amharic
	- ar # Arabic
	- tw # Asante Twi
	- bm # Bambara
	- fr # French
	- lg # Ganda
	- ha # Hausa
	- ig # Igbo
	- rw # Kinyarwanda
	- kg # Kongo
	- ln # Lingala
	- lu # Luba-Katanga
	- mg # Malagasy
	- nso # Northern Sotho
	- ny # Nyanja
	- om # Oromo
	- pt # Portuguese
	- sn # Shona
	- so # Somali
	- st # Southern Sotho
	- sw # Swahili
	- ss # Swati
	- ti # Tigrinya
	- ts # Tsonga
	- tn # Tswana
	- ak # Twi
	- ve # Venda
	- wo # Wolof
	- xh # Xhosa
	- yo # Yoruba
	- zu # Zulu
	- tzm # Tamazight
	- sg # Sango
	- din # Dinka
	- ee # Ewe
	- fo # Fon
	- luo # Luo
	- mos # Mossi
	- umb # Umbundu
	license: cc-by-4.0
	tags:
	- automatic-speech-recognition
	- audio
	- speech
	- african-languages
	- multilingual
	- simba
	- low-resource
	- speech-recognition
	- asr
	datasets:
	- UBC-NLP/SimbaBench
	metrics:
	- wer
	- cer
	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	---
	<div align="center">

	<img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo">


	[![EMNLP 2025 Paper](https://img.shields.io/badge/EMNLP_2025-Paper-B31B1B?style=for-the-badge&logo=arxiv&logoColor=B31B1B&labelColor=FFCDD2)](https://aclanthology.org/2025.emnlp-main.559/)
	[![Official Website](https://img.shields.io/badge/Official-Website-2EA44F?style=for-the-badge&logo=googlechrome&logoColor=2EA44F&labelColor=C8E6C9)](https://africa.dlnlp.ai/simba/)
	[![SimbaBench](https://img.shields.io/badge/SimbaBench-Benchmark-8A2BE2?style=for-the-badge&logo=googlecharts&logoColor=8A2BE2&labelColor=E1BEE7)](https://huggingface.co/spaces/UBC-NLP/SimbaBench)
	[![GitHub Repository](https://img.shields.io/badge/GitHub-Repository-181717?style=for-the-badge&logo=github&logoColor=181717&labelColor=E0E0E0)](https://github.com/UBC-NLP/simba)
	[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-FFD21E?style=for-the-badge&logoColor=181717&labelColor=FFF9C4)](https://huggingface.co/collections/UBC-NLP/simba-speech-series)
	[![Hugging Face Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-FFD21E?style=for-the-badge&logoColor=181717&labelColor=FFF9C4)](https://huggingface.co/datasets/UBC-NLP/SimbaBench_dataset)

	</div>

	## Bridging the Digital Divide for African AI

	Voice of a Continent is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.

	## Best-in-Class Multilingual Models

	Introduced in our EMNLP 2025 paper [Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/), the Simba Series represents the current state-of-the-art for African speech AI.

	- Unified Suite: Models optimized for African languages.
	- Superior Accuracy: Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
	- Multitask Capability: Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
	- Inclusion-First: Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.

	The Simba family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.

	### 🗣️✍️ Simba-ASR
	> The New Standard for African Speech-to-Text

	🎯 Task `Automatic Speech Recognition` — Powering high-accuracy transcription across the continent.

	🌍 Language Coverage (43 African languages)
	> Amharic (`amh`), Arabic (`ara`), Asante Twi (`asanti`), Bambara (`bam`), Baoulé (`bau`), Bemba (`bem`), Ewe (`ewe`), Fanti (`fat`), Fon (`fon`), French (`fra`), Ganda (`lug`), Hausa (`hau`), Igbo (`ibo`), Kabiye (`kab`), Kinyarwanda (`kin`), Kongo (`kon`), Lingala (`lin`), Luba-Katanga (`lub`), Luo (`luo`), Malagasy (`mlg`), Mossi (`mos`), Northern Sotho (`nso`), Nyanja (`nya`), Oromo (`orm`), Portuguese (`por`), Shona (`sna`), Somali (`som`), Southern Sotho (`sot`), Swahili (`swa`), Swati (`ssw`), Tigrinya (`tir`), Tsonga (`tso`), Tswana (`tsn`), Twi (`twi`), Umbundu (`umb`), Venda (`ven`), Wolof (`wol`), Xhosa (`xho`), Yoruba (`yor`), Zulu (`zul`), Tamazight (`tzm`), Sango (`sag`), Dinka (`din`).

	🏗️ Base Architectures

	- Simba-S (SeamlessM4T-v2-MT) — Top Performer
	- Simba-W (Whisper-v3-large)
	- Simba-X (Wav2Vec2-XLS-R-2b)
	- Simba-M (MMS-1b-all)
	- Simba-H (AfriHuBERT)

	🌐 Explore the Frontier

	\| ASR Models \| Architecture \| #Parameters \| 🤗 Hugging Face Model Card \| Status \|
	\|---------\|:------------------:\| :------------------:\| :------------------:\|:------------------:\|
	\| 🔥Simba-S🔥\| SeamlessM4T-v2 \| 2.3B \| 🤗 [https://huggingface.co/UBC-NLP/Simba-S](https://huggingface.co/UBC-NLP/Simba-S) \| ✅ Released \|
	\| 🔥Simba-W🔥\| Whisper \| 1.5B \| 🤗 [https://huggingface.co/UBC-NLP/Simba-W](https://huggingface.co/UBC-NLP/Simba-W) \| ✅ Released \|
	\| 🔥Simba-X🔥\| Wav2Vec2 \| 1B \| 🤗 [https://huggingface.co/UBC-NLP/Simba-X](https://huggingface.co/UBC-NLP/Simba-X) \| ✅ Released \|
	\| 🔥Simba-M🔥\| MMS \| 1B \| 🤗 [https://huggingface.co/UBC-NLP/Simba-M](https://huggingface.co/UBC-NLP/Simba-M) \| ✅ Released \|
	\| 🔥Simba-H🔥\| HuBERT \| 94M \| 🤗 [https://huggingface.co/UBC-NLP/Simba-H](https://huggingface.co/UBC-NLP/Simba-H) \| ✅ Released \|

	* Simba-S emerged as the best-performing ASR model overall.


	🧩 Usage Example

	You can easily run inference using the Hugging Face `transformers` library.

	```python
	from transformers import pipeline

	# Load Simba-S for ASR
	asr_pipeline = pipeline(
	"automatic-speech-recognition",
	model="UBC-NLP/Simba-S" #Simba mdoels `UBC-NLP/Simba-S`, `UBC-NLP/Simba-W`, `UBC-NLP/Simba-X`, `UBC-NLP/Simba-H`, `UBC-NLP/Simba-M`
	)

	##### Load the multilingual African adapter (Only for `UBC-NLP/Simba-M`)
	asr_pipeline.model.load_adapter("multilingual_african") # Only for `UBC-NLP/Simba-M`
	###########################

	# Transcribe audio from file
	result = asr_pipeline("https://africa.dlnlp.ai/simba/audio/afr_Lwazi_afr_test_idx3889.wav")
	print(result["text"])


	# Transcribe audio from audio array
	result = asr_pipeline({
	"array": audio_array,
	"sampling_rate": 16_000
	})
	print(result["text"])

	```

	#### Example Outputs

	Using the same audio file with different Simba models:

	```python
	# Simba-S
	{'text': 'watter verontwaardiging sou daar, in ons binneste gewees het.'}
	```

	```python
	# Simba-W
	{'text': 'watter veronwaardigingsel daar, in ons binneste gewees het.'}
	```

	```python
	# Simba-X
	{'text': 'fator fr on ar taamsodr is'}
	```

	```python
	# Simba-M
	{'text': 'watter veronwaardiging sodaar in ons binniste gewees het'}
	```

	```python
	# Simba-H
	{'text': 'watter vironwaardiging so daar in ons binneste geweeshet'}
	```

	Get started with Simba models in minutes using our interactive Colab notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/UBC-NLP/simba/blob/main/simba_models.ipynb)


	## Citation

	If you use the Simba models or SimbaBench benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.

	```bibtex

	@inproceedings{elmadany-etal-2025-voice,
	title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
	author = "Elmadany, AbdelRahim A. and
	Kwon, Sang Yun and
	Toyin, Hawau Olamide and
	Alcoba Inciarte, Alcides and
	Aldarmaki, Hanan and
	Abdul-Mageed, Muhammad",
	editor = "Christodoulopoulos, Christos and
	Chakraborty, Tanmoy and
	Rose, Carolyn and
	Peng, Violet",
	booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
	month = nov,
	year = "2025",
	address = "Suzhou, China",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2025.emnlp-main.559/",
	doi = "10.18653/v1/2025.emnlp-main.559",
	pages = "11039--11061",
	ISBN = "979-8-89176-332-6",
	}

	```