3D-Speaker-MT 是一种多任务学习的说话人识别模型,能够同时进行说话人验证、辨认等任务,在多场景下实现高精度的语音身份识别。
git clone https://huggingface.co/AXERA-TECH/3D-Speaker-MT.axera文件说明:
m5stack@raspberrypi:~/rsp/3D-Speaker-MT.axera $ ls -lh
total 64K
-rwxrwxr-x 1 m5stack m5stack 7.7K Sep 29 14:39 ax_meeting_transc_demo.py
drwxrwxr-x 4 m5stack m5stack 4.0K Sep 29 14:39 ax_model
-rwxrwxr-x 1 m5stack m5stack    0 Sep 29 14:39 config.json
-rwxrwxr-x 1 m5stack m5stack  33K Sep 29 14:39 model.py
-rwxrwxr-x 1 m5stack m5stack 3.4K Sep 29 14:39 README.md
-rwxrwxr-x 1 m5stack m5stack   74 Sep 29 14:39 requirements.txt
drwxrwxr-x 5 m5stack m5stack 4.0K Sep 29 14:39 utils
drwxrwxr-x 2 m5stack m5stack 4.0K Sep 29 14:39 wavpython -m venv speakersource speaker/bin/activatepip install https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc2/axengine-0.1.3-py3-none-any.whl
pip install -r requirements.txtpython3 ax_meeting_transc_demo.py --output_dir output_dir --wav_file wav/vad_example.wav运行结果:
(esrgan) m5stack@raspberrypi:~/rsp/Real-ESRGAN $ python3 main.py --input test_256.jpeg --output test_256_20e.jpeg --model ax650/realesrgan-x4-256.axmodel
[INFO] Available providers:  ['AXCLRTExecutionProvider']
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 3dfd5692
input.1 [1, 256, 256, 3] uint8
1895 [1, 1024, 1024, 3] float32
Original Image Shape: (243, 243, 3)
Preprocessed Image Shape: (1, 256, 256, 3)
Inference Time: 454.03 ms
Output Shape: (1, 1024, 1024, 3)
Final Output Image Shape: (1024, 1024, 3)