Audio Models
Collection
12 items
•
Updated
本项目提供了基于 Axera 平台的 Zipformer 语音识别(ASR,音频转文本)推理 demo,支持中英文音频识别,适用于板端和算力卡环境。
.
|-- README.md
|-- ax_pretrained_infer.py
|-- inputs
| |-- axmodels_630C # 模型文件夹
| |-- axmodels_650N
| |-- lang_char_bpe # tokens.txt
| `-- test_wavs # 音频文件夹
|-- requirements.txt # 依赖
|-- zipformer_infer_demo_AX630.sh # 推理
`-- zipformer_infer_demo_AX650.sh
# 创建虚拟环境并激活
conda create -n zipformer python=3.10
conda activate zipformer
# 工程下载
hf download AXERA-TECH/Zipformer.axera --local-dir Zipformer.axera
# 安装项目依赖
1、安装K2库:从 https://k2-fsa.github.io/k2/cpu.html 下载库文件,然后执行安装指令:
eg: pip install k2-1.24.4.dev20250807%2Bcpu.torch2.8.0-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
2、安装kaldifeat库:从 https://csukuangfj.github.io/kaldifeat/cpu.html 下载库文件,然后执行安装指令:
eg: pip install kaldifeat-1.25.5.dev20250807%2Bcpu.torch2.8.0-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
注意:选择k2和kaldifeat两个库时torch及python版本要匹配
3、cd Zipformer.axera
pip install -r requirements.txt
# 安装 axengine(如未安装)
hf download AXERA-TECH/PyAXEngine --local-dir PyAXEngine
cd PyAXEngine
pip install axengine-0.1.3-py3-none-any.whl
AX650N平台:
sh zipformer_infer_demo_AX650.sh
AX630C平台:
sh zipformer_infer_demo_AX630.sh
| 参数 | 说明 |
|---|---|
| --encoder-model-filename | Path to the encoder onnx model |
| --decoder-model-filename | Path to the decoder onnx model |
| --joiner-model-filename | Path to the joiner onnx model |
| --tokens | Path to tokens.txt. |
| --sound-dir | Path to test audio directory |
控制台输出识别文本及耗时信息
延迟:AX650N RTF:0.173 AX630C RTF:0.330
[1/10] 0.wav Audio loading time: 0.014 s
Audio processing time: 1.737 s
2025-12-16 19:04:39,975 INFO [ax_pretrained_infer.py:597] ./inputs/test_wavs/0.wav
2025-12-16 19:04:39,976 INFO [ax_pretrained_infer.py:598] 昨天是 MONDAY TODAY IS TOMORROW
Audio duration: 10.053 s
Total (load+process) time: 1.750 s
RTF (total_time/audio_duration): 0.174
[2/10] 002.mp3 Audio loading time: 0.041 s
Audio processing time: 5.057 s
2025-12-16 19:04:45,128 INFO [ax_pretrained_infer.py:597] ./inputs/test_wavs/002.mp3
2025-12-16 19:04:45,129 INFO [ax_pretrained_infer.py:598] 那么一般是非本人医院的嗯单位厅堡的时候如果是给您妻子填写检测停保原因或者说如果单位听报原因停止了导致你妻子身孕育今天这个申请试验金申请的这个原因就本人医院医院中间这个原因不符合的那么需要提供身身体湿液晶面提供一个单位开具的解除劳动关系证明鉴明具体解除劳动关系原因证明是对本人医院的
Audio duration: 29.952 s
Total (load+process) time: 5.098 s
RTF (total_time/audio_duration): 0.170
[3/10] 1.wav Audio loading time: 0.004 s
Audio processing time: 0.848 s
2025-12-16 19:04:46,041 INFO [ax_pretrained_infer.py:597] ./inputs/test_wavs/1.wav
2025-12-16 19:04:46,042 INFO [ax_pretrained_infer.py:598] 这是第一种第二种叫呃与 ALWAYS什么
Audio duration: 5.100 s
Total (load+process) time: 0.852 s
RTF (total_time/audio_duration): 0.167
[4/10] 2.wav Audio loading time: 0.004 s
Audio processing time: 0.796 s
2025-12-16 19:04:46,893 INFO [ax_pretrained_infer.py:597] ./inputs/test_wavs/2.wav
2025-12-16 19:04:46,894 INFO [ax_pretrained_infer.py:598] 这个是频繁的啊不认识记下来 FREQUENTLY平凡的
Audio duration: 4.690 s
Total (load+process) time: 0.799 s
RTF (total_time/audio_duration): 0.170
[5/10] 3.wav Audio loading time: 0.005 s
Audio processing time: 1.564 s
2025-12-16 19:04:48,517 INFO [ax_pretrained_infer.py:597] ./inputs/test_wavs/3.wav
2025-12-16 19:04:48,517 INFO [ax_pretrained_infer.py:598] 第一句是个什么时态加了 YES是一般现在时对后面它时态系形状
Audio duration: 8.830 s
Total (load+process) time: 1.569 s
RTF (total_time/audio_duration): 0.178
[6/10] 4.wav Audio loading time: 0.008 s
Audio processing time: 2.951 s
2025-12-16 19:04:51,530 INFO [ax_pretrained_infer.py:597] ./inputs/test_wavs/4.wav
2025-12-16 19:04:51,531 INFO [ax_pretrained_infer.py:598] 嗯 ON TIME要准时 IN TIME是及时叫他总是准时教他的作业那用一般现在时是没有什么感情色彩呢陈述一个事实下一句话为什么要用现在进行时态的意思并不是说他现在正在教
Audio duration: 17.640 s
Total (load+process) time: 2.959 s
RTF (total_time/audio_duration): 0.168
[7/10] 46.wav Audio loading time: 0.003 s
Audio processing time: 0.666 s
2025-12-16 19:04:52,257 INFO [ax_pretrained_infer.py:597] ./inputs/test_wavs/46.wav
2025-12-16 19:04:52,257 INFO [ax_pretrained_infer.py:598] 你好石头把厨房扫脱下
Audio duration: 3.901 s
Total (load+process) time: 0.670 s
RTF (total_time/audio_duration): 0.172
[8/10] demo.wav Audio loading time: 0.003 s
Audio processing time: 0.692 s
2025-12-16 19:04:53,002 INFO [ax_pretrained_infer.py:597] ./inputs/test_wavs/demo.wav
2025-12-16 19:04:53,003 INFO [ax_pretrained_infer.py:598] 甚至出现交易几乎停滞的情况
Audio duration: 4.204 s
Total (load+process) time: 0.695 s
RTF (total_time/audio_duration): 0.165
[9/10] fileid_144.wav Audio loading time: 0.003 s
Audio processing time: 0.512 s
2025-12-16 19:04:53,569 INFO [ax_pretrained_infer.py:597] ./inputs/test_wavs/fileid_144.wav
2025-12-16 19:04:53,569 INFO [ax_pretrained_infer.py:598] HE GOES ABOUT BEGGING FROM HOUSE TO HOUSE AND HAS NE
Audio duration: 3.000 s
Total (load+process) time: 0.515 s
RTF (total_time/audio_duration): 0.172
[10/10] fileid_249.wav Audio loading time: 0.003 s
Audio processing time: 0.587 s
2025-12-16 19:04:54,209 INFO [ax_pretrained_infer.py:597] ./inputs/test_wavs/fileid_249.wav
2025-12-16 19:04:54,210 INFO [ax_pretrained_infer.py:598] SUCH A DASH
Audio duration: 3.000 s
Total (load+process) time: 0.590 s
RTF (total_time/audio_duration): 0.197
Average RTF over 10 files: 0.173