Qwen3-VL-2B-Instruct

Introduction

Qwen3-VL is the most powerful vision-language model in the Qwen series. This generation delivers comprehensive upgrades: stronger text understanding and generation capabilities, deeper visual perception and reasoning, longer context length, enhanced spatial and video dynamic understanding, and more powerful agent interaction capabilities.

Available NPU Models

INT4 Quantized Model

qwen3-vl-2b-int4-ax650

Provides a 1152-length context window
Maximum output of 2048 tokens
Supported platform: AI Pyramid
Runtime (TTFT) approximately 159.79 ms
Average generation speed approximately 11.93 tokens/s
Image encoding resolution: 384 × 384
Image encoding time: 190.73 ms

Installation

apt install llm-model-qwen3-vl-2b-int4-ax650

Download llm-model-qwen3-vl-2b-int4-ax650

Next Overview

Page Tools

PDF

Devices & Quick Start

AI Pyramid

Module LLM

LLM630 Compute Kit

Models

Qwen2.5

Qwen3

DeepSeek-R1

SmolVLM

MeloTTS

Whisper

Llama

AI Pyramid Applications

Module LLM Applications

Audio

CV Vision Application

Vision Language Model (VLM)

Large Language Model (LLM)

Voice Assistant

OpenAI API

Introduction

Available NPU Models

INT4 Quantized Model

qwen3-vl-2b-int4-ax650

Installation

On This Page