pdf-icon

Product Guide

Offline Voice Recognition

Industrial Control

IoT Measuring Instruments

Air Quality

Module13.2 PPS

Ethernet Camera

DIP Switch Usage Guide

Module GPS v2.0

Module GNSS

Module ExtPort For Core2

Module LoRa868 V1.2

NPU Benchmark

Benchmark is the best way to understand the running speed of network models on a hardware platform. The following data is based on tests conducted with the Raspberry Pi 5 Host and is provided for community reference only, not representative of the final performance in commercial delivery.

Working Conditions

  • Update date: 2024.11.22
  • Toolchain version: Pulsar2 3.2-patch2
  • Test tool: axcl_run_model
  • Batch Size: 1 or 8
  • Unit: IPS (Image/Second)
Due to differences in memcopy and PCIe performance across different hosts, axcl_run_model only counts the inference time of the network model on the Device.

Vision Model

Models Input Size Batch 1(IPS) Batch 8(IPS)
Inceptionv1 224 1073 2494
Inceptionv3 224 478 702
MobileNetv1 224 1508 4854
MobileNetv2 224 1366 5073
ResNet18 224 1066 2254
ResNet50 224 576 1045
SqueezeNet11 224 1560 5961
Swin-T 224 342 507
ViT-B/16 224 162 207
YOLOv5s 640 326 394
YOLOv6s 640 282 322
YOLOv8s 640 248 279
YOLOv9s 640 237
YOLOv10s 640 298
YOLOv11n 640 860
YOLOv11s 640 305
YOLOv11m 640 114
YOLOv11l 640 87
YOLOv11x 640 41

Audio Model

Models RTF
Whisper-Tiny 0.03
Whisper-Small 0.18
MeloTTS 0.04

LLM

Models Prompt length (tokens) TTFT (ms) Generate (tokens/s)
Qwen2.5-0.5B 128 188 28
Qwen2.5-1.5B 128 407.75 9.05
Qwen2.5-1.5B-Int4 128 407.75 9.05

VLM

Models Input Image Image Encoder (ms) Prompt length (tokens) TTFT (ms) Generate (tokens/s)
InternVL2-1B 448*448 4200 320 425 29
On This Page