Qwen2.5-3B-Instruct

Introduction

Qwen2.5-3B-Instruct is an instruction-tuned language model in the Qwen2.5 series, with approximately 3.09 billion parameters. The main features of this model include:

Model Type: Causal Language Model
Training Stages: Pre-training and post-training
Architecture: Transformer, adopting RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings
Number of Parameters: 3.09 billion (2.77 billion non-embedding parameters)
Number of Layers: 36 layers
Number of Attention Heads (GQA): 16 query heads, 2 key-value heads
Context Length: Supports a full 32,768 tokens, with a maximum generation of 8,192 tokens

This model shows significant improvements in instruction understanding, long-text generation, and structured data comprehension, and supports multilingual capabilities across 29 languages including English, Chinese, and French.

Available NPU Models

INT4 Quantized Model

qwen2.5-3B-Int4-ax650

Faster inference speed compared to the base model
Supports a 128-length context window
Maximum output of 1024 tokens
Supported platform: AI Pyramid Pro only
TTFT: 550.30ms
Average generation speed: 9.46 token/s

Installation

apt install llm-model-qwen2.5-3b-int4-ax650

Download llm-model-qwen2.5-3b-int4-ax650

Next Overview

Devices & Quick Start

AI Pyramid

Module LLM

LLM630 Compute Kit

Models

Qwen2.5

Qwen3

DeepSeek-R1

SmolVLM

MeloTTS

Whisper

Llama

AI Pyramid Applications

Module LLM Applications

Audio

CV Vision Application

Vision Language Model (VLM)

Large Language Model (LLM)

Voice Assistant

OpenAI API

Introduction

Available NPU Models

INT4 Quantized Model

qwen2.5-3B-Int4-ax650

Installation

On This Page