Zipformer: A faster and better encoder for automatic speech recognition
Paper
• 2310.11230 • Published
• 1
This repo is forked from https://huggingface.co/reazon-research/reazonspeech-k2-v2
reazonspeech-k2-v2 is an automatic speech recognition (ASR) model
trained on ReazonSpeech v2.0 corpus.
This model provides end-to-end Japanese speech recognition based on Next-gen Kaldi.
Character-based RNN-T model. The total parameter count is 159.34M.
This model utilizes an enhanced Transformer architecture called Zipformer.
The training recipe is available on k2-fsa/icefall.
Note that this model can process Japanese audio clips up to ~30 seconds.
We recommend to use this model through our reazonspeech library.
from reazonspeech.k2.asr import load_model, transcribe, audio_from_path
audio = audio_from_path("speech.wav")
model = load_model()
ret = transcribe(model, audio)
print(ret.text)