parakeet-ctc-0.6b-with-meta

This is a multilingual Automatic Speech Recognition (ASR) model fine-tuned with NVIDIA NeMo. It is different from standard transcription models, as it can mark intents, get voice bio, and emotions in streaming.

How to Use

You can use this model directly with the NeMo toolkit for inference.

import nemo.collections.asr as nemo_asr

# Load the model from Hugging Face Hub
asr_model = nemo_asr.models.ASRModel.from_pretrained("WhissleAI/STT-meta-1B")

# Transcribe an audio file
transcriptions = asr_model.transcribe(["/path/to/your/audio.wav"])
print(transcriptions)

This model can also be used with the inference server provided in the PromptingNemo repository. See this folder for fine-tuning and inference scripts https://github.com/WhissleAI/PromptingNemo/scripts/asr/meta-asr for details.

Downloads last month: 38

Model tree for WhissleAI/STT-meta-1B

Base model

nvidia/parakeet-ctc-0.6b

Finetuned

(9)

this model

WhissleAI
/

STT-meta-1B

parakeet-ctc-0.6b-with-meta

How to Use

Model tree for WhissleAI/STT-meta-1B

Datasets used to train WhissleAI/STT-meta-1B