NPU Benchmark
Benchmark is the best way to understand the running speed of network models on a hardware platform. The following data is based on tests conducted with the Raspberry Pi 5 Host and is provided for community reference only, not representative of the final performance in commercial delivery.
Working Conditions
- Update date: 2024.11.22
- Toolchain version: Pulsar2 3.2-patch2
- Test tool: axcl_run_model
- Batch Size: 1 or 8
- Unit: IPS (Image/Second)
Due to differences in memcopy and PCIe performance across different hosts, axcl_run_model only counts the inference time of the network model on the Device.
Vision Model
| Models | Input Size | Batch 1(IPS) | Batch 8(IPS) |
|---|
| Inceptionv1 | 224 | 1073 | 2494 |
| Inceptionv3 | 224 | 478 | 702 |
| MobileNetv1 | 224 | 1508 | 4854 |
| MobileNetv2 | 224 | 1366 | 5073 |
| ResNet18 | 224 | 1066 | 2254 |
| ResNet50 | 224 | 576 | 1045 |
| SqueezeNet11 | 224 | 1560 | 5961 |
| Swin-T | 224 | 342 | 507 |
| ViT-B/16 | 224 | 162 | 207 |
| YOLOv5s | 640 | 326 | 394 |
| YOLOv6s | 640 | 282 | 322 |
| YOLOv8s | 640 | 248 | 279 |
| YOLOv9s | 640 | 237 | |
| YOLOv10s | 640 | 298 | |
| YOLOv11n | 640 | 860 | |
| YOLOv11s | 640 | 305 | |
| YOLOv11m | 640 | 114 | |
| YOLOv11l | 640 | 87 | |
| YOLOv11x | 640 | 41 | |
Audio Model
| Models | RTF |
|---|
| Whisper-Tiny | 0.03 |
| Whisper-Small | 0.18 |
| MeloTTS | 0.04 |
LLM
| Models | Prompt length (tokens) | TTFT (ms) | Generate (tokens/s) |
|---|
| Qwen2.5-0.5B | 128 | 188 | 28 |
| Qwen2.5-1.5B | 128 | 407.75 | 9.05 |
| Qwen2.5-1.5B-Int4 | 128 | 407.75 | 9.05 |
VLM
| Models | Input Image | Image Encoder (ms) | Prompt length (tokens) | TTFT (ms) | Generate (tokens/s) |
|---|
| InternVL2-1B | 448*448 | 4200 | 320 | 425 | 29 |