pdf-icon

Product Guide

Offline Voice Recognition

Industrial Control

IoT Measuring Instruments

Air Quality

Module13.2 PPS

Ethernet Camera

DIP Switch Usage Guide

Module GPS v2.0

Module GNSS

Module ExtPort For Core2

Module LoRa868 V1.2

SenseVoice

SenseVoice is a speech recognition and understanding model that can efficiently and accurately convert spoken content into text, supporting multilingual and multi-scenario speech processing.

  1. Manually download the model and upload it to raspberrypi5, or pull the model repository using the following command.
Note
If git lfs is not installed, please refer to git lfs installation instructions for installation.
git clone https://huggingface.co/AXERA-TECH/SenseVoice

File Description:

m5stack@raspberrypi:~/rsp/SenseVoice $ ls -lh
total 464K
-rw-rw-r-- 1 m5stack m5stack  11K Aug 12 16:38 am.mvn
-rw-rw-r-- 1 m5stack m5stack 369K Aug 12 16:38 chn_jpn_yue_eng_ko_spectok.bpe.model
-rw-rw-r-- 1 m5stack m5stack    0 Aug 12 16:38 config.json
-rw-rw-r-- 1 m5stack m5stack  108 Aug 12 16:38 download_dataset.sh
-rw-rw-r-- 1 m5stack m5stack  893 Aug 12 16:38 download_utils.py
drwxrwxr-x 2 m5stack m5stack 4.0K Aug 12 16:38 embeddings
-rw-rw-r-- 1 m5stack m5stack  17K Aug 12 16:38 frontend.py
-rw-rw-r-- 1 m5stack m5stack 1.1K Aug 12 16:38 LICENSE
-rw-rw-r-- 1 m5stack m5stack 1.6K Aug 12 16:38 main.py
-rw-rw-r-- 1 m5stack m5stack 3.2K Aug 12 16:38 print_utils.py
-rw-rw-r-- 1 m5stack m5stack 1.5K Aug 12 16:38 README.md
-rw-rw-r-- 1 m5stack m5stack   71 Aug 12 16:38 requirements.txt
drwxrwxr-x 2 m5stack m5stack 4.0K Aug 12 16:38 sensevoice_ax650
-rw-rw-r-- 1 m5stack m5stack 9.1K Aug 12 16:38 SenseVoiceAx.py
-rw-rw-r-- 1 m5stack m5stack 2.5K Aug 12 16:38 test_wer.py
-rw-rw-r-- 1 m5stack m5stack 4.7K Aug 12 16:38 tokenizer.py
  1. Create a virtual environment
python -m venv sensevoice
  1. Activate the virtual environment
source sensevoice/bin/activate
  1. Install dependencies
pip install https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc1/axengine-0.1.3-py3-none-any.whl
pip install -r requirements.txt
  1. Run
python main.py -i test.mp3
Parameter Name Description Default
--input/-i Input audio file -
--language/-l Recognition language, supports auto, zh, en, yue, ja, ko auto
Note
The model will be automatically downloaded on the first run.

Execution result example:

(sensevoice) m5stack@raspberrypi:~/rsp/SenseVoice $ python main.py -i test_en.mp3
[INFO] Available providers:  ['AXCLRTExecutionProvider']
input_audio: test_en.mp3
language: auto
use_itn: True
model_path: /home/m5stack/rsp/SenseVoice/models/SenseVoice/sensevoice_ax650/sensevoice.axmodel
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 4.0 156de6f7
RTF: 0.015400904924311537    Latency: 0.463259220123291s  Total length: 30.08s
['You want to be a nurse or an archi', "A lawyer or a member of our military. you''re going to need a", 'Eduducation for every single one of those caree', 'Not drop out of school and just drop into a good j', "You''ve got to train for it", "And learn for it. And this isn't just impo", 'lifeIn your own future. what you make', 'Will decide nothing less than the future of this country..']
On This Page