3D-Speaker-MT is a multi-task learning-based speaker recognition model capable of performing tasks such as speaker verification and identification simultaneously, achieving high-precision voice identity recognition across multiple scenarios.
git clone https://huggingface.co/AXERA-TECH/3D-Speaker-MT.axera
File Description:
m5stack@raspberrypi:~/rsp/3D-Speaker-MT.axera $ ls -lh
total 64K
-rwxrwxr-x 1 m5stack m5stack 7.7K Sep 29 14:39 ax_meeting_transc_demo.py
drwxrwxr-x 4 m5stack m5stack 4.0K Sep 29 14:39 ax_model
-rwxrwxr-x 1 m5stack m5stack 0 Sep 29 14:39 config.json
-rwxrwxr-x 1 m5stack m5stack 33K Sep 29 14:39 model.py
-rwxrwxr-x 1 m5stack m5stack 3.4K Sep 29 14:39 README.md
-rwxrwxr-x 1 m5stack m5stack 74 Sep 29 14:39 requirements.txt
drwxrwxr-x 5 m5stack m5stack 4.0K Sep 29 14:39 utils
drwxrwxr-x 2 m5stack m5stack 4.0K Sep 29 14:39 wav
python -m venv speaker
source speaker/bin/activate
pip install https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc1/axengine-0.1.3-py3-none-any.whl
pip install -r requirements.txt
python3 ax_meeting_transc_demo.py --output_dir output_dir --wav_file wav/vad_example.wav
Run result:
(esrgan) m5stack@raspberrypi:~/rsp/Real-ESRGAN $ python3 main.py --input test_256.jpeg --output test_256_20e.jpeg --model ax650/realesrgan-x4-256.axmodel
[INFO] Available providers: ['AXCLRTExecutionProvider']
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 3dfd5692
input.1 [1, 256, 256, 3] uint8
1895 [1, 1024, 1024, 3] float32
Original Image Shape: (243, 243, 3)
Preprocessed Image Shape: (1, 256, 256, 3)
Inference Time: 454.03 ms
Output Shape: (1, 1024, 1024, 3)
Final Output Image Shape: (1024, 1024, 3)