openslr/librispeech_asr
Viewer • Updated • 585k • 101k • 223
How to use WhissleAI/STT-meta-1B with NeMo:
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("WhissleAI/STT-meta-1B")
transcriptions = asr_model.transcribe(["file.wav"])This is a multilingual Automatic Speech Recognition (ASR) model fine-tuned with NVIDIA NeMo. It is different from standard transcription models, as it can mark intents, get voice bio, and emotions in streaming.
You can use this model directly with the NeMo toolkit for inference.
import nemo.collections.asr as nemo_asr
# Load the model from Hugging Face Hub
asr_model = nemo_asr.models.ASRModel.from_pretrained("WhissleAI/STT-meta-1B")
# Transcribe an audio file
transcriptions = asr_model.transcribe(["/path/to/your/audio.wav"])
print(transcriptions)
This model can also be used with the inference server provided in the PromptingNemo repository.
See this folder for fine-tuning and inference scripts https://github.com/WhissleAI/PromptingNemo/scripts/asr/meta-asr for details.
Base model
nvidia/parakeet-ctc-0.6b