---
license: bsd-3-clause
pipeline_tag: automatic-speech-recognition
---
# Whisper
- [English](https://huggingface.co/AXERA-TECH/Whisper/blob/main/README_EN.md)
- [中文](https://huggingface.co/AXERA-TECH/Whisper/blob/main/README.md)
OpenAI Whisper on Axera
- 目前支持 C++ 和 Python 两种语言
- 预编译模型下载
- [Huggingface](https://huggingface.co/AXERA-TECH/Whisper)
- 如需自行转换请参考[模型转换](https://github.com/ml-inory/whisper.axera/blob/main/model_convert/README.md)
## Update
- 2026/01/14: 更简单的模型结构,现在只需要encoder和decoder,去掉原来的decoder_main和decoder_loop;支持来自HuggingFace的模型导出
## 支持平台
- [x] AX650N
- [x] AX630C
## 模型转换
目前支持的模型规模:
- tiny
- base
- small
- medium
- turbo
目前测试过的语言:
- English
- Chinese
- Japanese
- Korean
- Malaysian
[模型转换](https://github.com/ml-inory/whisper.axera/blob/main/model_convert/README.md)
## 上板部署
- 基于 AX650N、AX630C 的设备已预装 Ubuntu22.04
- 链接互联网,确保设备能正常执行 `apt install`, `pip install` 等指令
- 已验证设备:
- [爱芯派Pro(AX650N)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
- [M.2 Accelerator card(AX650N)](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
- [爱芯派2(AX630C)](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
- [Module-LLM(AX630C)](https://docs.m5stack.com/zh_CN/module/Module-LLM)
- [LLM630 Compute Kit(AX630C)](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)
- 支持编程语言:
- [Python](#Python)
- [C++](#CPP)
Python
#### Requirements
推荐在板上安装Miniconda管理虚拟环境,安装方法如下:
```
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh
source ~/miniconda3/bin/activate
conda init --all
```
安装Whisper依赖
```
cd python
conda create -n whisper python=3.12
conda activate whisper
pip3 install -r requirements.txt
```
#### 安装pyaxenigne
参考 https://github.com/AXERA-TECH/pyaxengine 安装 NPU Python API
在0.1.3rc2上测试通过,可通过
```
pip install https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc2/axengine-0.1.3-py3-none-any.whl
```
安装,或把版本号更改为你想使用的版本
#### 运行
登陆开发板后
输入命令
```
cd python
conda activate whisper
python3 main.py --model_type small --model_path ../models-ax650 --wav ../demo.wav --language zh
```
输出结果
```
(whisper) root@ax650:/mnt/data/Github/whisper.axera/python# python whisper_cli.py -t tiny -w ../demo.wav
[INFO] Available providers: ['AxEngineExecutionProvider']
{'wav': '../demo.wav', 'model_type': 'tiny', 'model_path': '../models-ax650', 'language': 'zh', 'task': 'transcribe'}
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.12.0s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.0 76f70fdc
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.0 76f70fdc
ASR result:
擅职出现交易几乎停止的情况
RTF: 0.11406774537746188
```
运行参数说明:
| 参数名称 | 说明 | 默认值 |
| --- | --- | --- |
| --wav | 输入音频文件 | |
| --model_type/-t | 模型类型, tiny/base/small | |
| --model_path/-p | 模型所在目录 | ../models |
| --language/-l | 识别语言 | zh |
##### 服务端
```
(whisper) root@ax650:/mnt/data/Github/whisper.axera/python# python whisper_svr.py
[INFO] Available providers: ['AxEngineExecutionProvider']
Server started at http://0.0.0.0:8000
```
测试服务端
```
python test_svr.py
```
CPP
#### 运行
在 AX650N 设备上执行
```
cd cpp
./whisper_cli -w ../demo.wav -t tiny
```
或
```
cd cpp
./whisper_cli --model_type small -w ../demo.wav
```
输出结果
```
(whisper) root@ax650:/mnt/data/HF/Whisper/cpp/ax650# ./whisper_cli -w ../../demo.wav -t tiny
wav_file: ../../demo.wav
model_path: ../../models-ax650
model_type: tiny
language: zh
Init whisper success, take 0.3540seconds
Result: 甚至出现交易几乎停止的情况
RTF: 0.0968
```
### 服务端
```
cd cpp/ax650
./whisper_srv --model_type tiny --language zh --port 8080
```
### 客户端
curl命令行测试(请自行替换IP和端口):
```
ffmpeg -i demo.wav -f f32le -c:a pcm_f32le - 2>/dev/null | \
curl -X POST 10.126.33.192:8080/asr \
-H "Content-Type: application/octet-stream" \
--data-binary @-
```
## 模型性能
### Latency
RTF: Real-Time Factor
CPP:
| Models | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny | 0.08 | |
| Whisper-Base | 0.11 | 0.35 |
| Whisper-Small | 0.24 | |
| Whisper-Turbo | 0.48 | |
Python:
| Models | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny | 0.12 | |
| Whisper-Base | 0.16 | 0.35 |
| Whisper-Small | 0.50 | |
| Whisper-Turbo | 0.60 | |
### Word Error Rate(Test on AIShell dataset)
| Models | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny | 0.24 | |
| Whisper-Base | 0.18 | |
| Whisper-Small | 0.11 | |
| Whisper-Turbo | 0.06 | |
若要复现测试结果,请按照以下步骤:
解压数据集:
```
unzip datasets.zip
```
运行测试脚本:
```
cd python
conda activate whisper
python test_wer.py -d aishell --gt_path ../datasets/ground_truth.txt --model_type tiny
```
### MEM Usage
* CMM Stands for Physical memory used by Axera modules like VDEC(Video decoder), VENC(Video encoder), NPU, etc.
Python:
| Models | CMM(MB)| OS(MB) |
| ------------- | ------ | ------ |
| Whisper-Tiny | 332 | 512 |
| Whisper-Base | 533 | 644 |
| Whisper-Small | 1106 | 906 |
| Whisper-Turbo | 2065 | 2084 |
C++:
| Models | CMM(MB)| OS(MB) |
| ------------- | ------ | ------ |
| Whisper-Tiny | 332 | 31 |
| Whisper-Base | 533 | 54 |
| Whisper-Small | 1106 | 146 |
| Whisper-Turbo | 2065 | 86 |
## 技术讨论
- Github issues
- QQ 群: 139953715