Qwen3-VL is the most powerful vision-language model in the Qwen series. This generation delivers comprehensive upgrades: stronger text understanding and generation capabilities, deeper visual perception and reasoning, longer context length, enhanced spatial and video dynamic understanding, and more powerful agent interaction capabilities.
Available NPU Models
INT4 Quantized Model
qwen3-vl-2b-int4-ax650
Provides a 1152-length context window
Maximum output of 2048 tokens
Supported platform: AI Pyramid
Runtime (TTFT) approximately 159.79 ms
Average generation speed approximately 11.93 tokens/s