Qwen2.5-1.5B-Instruct

Introduction

Qwen2.5-1.5B-Instruct is an instruction-tuned language model in the Qwen2.5 series, with approximately 1.54 billion parameters. The main features of this model include:

Model Type: Causal Language Model
Training Stages: Pre-training and post-training
Architecture: Transformer, adopting RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings
Number of Parameters: 1.54 billion (1.31 billion non-embedding parameters)
Number of Layers: 28 layers
Number of Attention Heads (GQA): 12 query heads, 2 key-value heads
Context Length: Supports a full 32,768 tokens, with a maximum generation of 8,192 tokens

This model shows significant improvements in instruction understanding, long-text generation, and structured data comprehension, and supports multilingual capabilities across 29 languages, including English, Chinese, and French.

Available NPU Models

Base Model

qwen2.5-1.5B-ax630c

Supports a context window length of 128
Maximum output of 1024 tokens
Supported platforms: LLM630 Computing Kit, Module LLM, and Module LLM Kit
ttft (time to first token): 1029.41ms
Average generation speed: 3.59 token/s

Installation

apt install llm-model-qwen2.5-1.5b-ax630c

Download llm-model-qwen2.5-1.5b-ax630c

Long Context Model

qwen2.5-1.5B-p256-ax630c

Compared with the base model, supports a longer context window
256-length context window
Maximum output of 1024 tokens
Supported platforms: LLM630 Computing Kit, Module LLM, and Module LLM Kit
ttft: 3056.54ms
Average generation speed: 3.57 token/s

Installation

apt install llm-model-qwen2.5-1.5b-p256-ax630c

Download llm-model-qwen2.5-1.5b-p256-ax630c

INT4 Quantized Models

qwen2.5-1.5B-Int4-ax630c

Compared with the base model, provides faster inference speed
Supports a context window length of 128
Maximum output of 1024 tokens
Supported platforms: LLM630 Computing Kit, Module LLM, and Module LLM Kit
ttft: 1219.54ms
Average generation speed: 4.63 token/s

Installation

apt install llm-model-qwen2.5-1.5b-int4-ax630c

Download llm-model-qwen2.5-1.5b-int4-ax630c

qwen2.5-1.5B-Int4-ax650

Compared with the base model, provides faster inference speed
Supports a context window length of 128
Maximum output of 1024 tokens
Supported platforms: AI Pyramid
ttft: 289.06ms
Average generation speed: 16.77 token/s

Installation

apt install llm-model-qwen2.5-1.5b-int4-ax650

Download llm-model-qwen2.5-1.5b-int4-ax650

Next Overview

Devices & Quick Start

AI Pyramid

Module LLM

LLM630 Compute Kit

Models

Qwen2.5

Qwen3

DeepSeek-R1

SmolVLM

MeloTTS

Whisper

Llama

AI Pyramid Applications

Module LLM Applications

Audio

CV Vision Application

Vision Language Model (VLM)

Large Language Model (LLM)

Voice Assistant

OpenAI API

Introduction

Available NPU Models

Base Model

qwen2.5-1.5B-ax630c

Installation

Long Context Model

qwen2.5-1.5B-p256-ax630c

Installation

INT4 Quantized Models

qwen2.5-1.5B-Int4-ax630c

Installation

qwen2.5-1.5B-Int4-ax650

Installation

On This Page