pdf-icon

StackFlow AI Platform

Module LLM Applications

CV Vision Application

Vision Language Model (VLM)

Large Language Model (LLM)

Voice Assistant

Qwen3-VL-2B-Instruct

Introduction

Qwen3-VL is the most powerful vision-language model in the Qwen series. This generation delivers comprehensive upgrades: stronger text understanding and generation capabilities, deeper visual perception and reasoning, longer context length, enhanced spatial and video dynamic understanding, and more powerful agent interaction capabilities.

Available NPU Models

INT4 Quantized Model

qwen3-vl-2b-int4-ax650

  • Provides a 1152-length context window
  • Maximum output of 2048 tokens
  • Supported platform: AI Pyramid
  • Runtime (TTFT) approximately 159.79 ms
  • Average generation speed approximately 11.93 tokens/s
  • Image encoding resolution: 384 × 384
  • Image encoding time: 190.73 ms

Installation

apt install llm-model-qwen3-vl-2b-int4-ax650
On This Page