Qwen2.5-0.5B-Instruct

Introduction

Qwen2.5-0.5B-Instruct is an instruction-tuned language model in the Qwen2.5 series, with approximately 500 million parameters. The main features of this model include:

Model Type: Causal Language Model
Training Stages: Pre-training and post-training
Architecture: Transformer, using RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings
Number of Parameters: 490 million (360 million non-embedding parameters)
Number of Layers: 24 layers
Number of Attention Heads (GQA): 14 query heads, 2 key-value heads
Context Length: Supports a full 32,768-token context, with a maximum generation length of 8,192 tokens

This model shows significant improvements in instruction understanding, long-text generation, and structured data comprehension, and supports multilingual capabilities across 29 languages including English, Chinese, and French.

Available NPU Models

Base Model

qwen2.5-0.5B-prefill-20e

Supports a 128-length context window
Maximum output of 1024 tokens
Supported platforms: LLM630 Computing Kit, Module LLM, and Module LLM Kit
TTFT (Time to First Token): 359.8ms
Average generation speed: 10.32 token/s

Installation

apt install llm-model-qwen2.5-0.5b-prefill-20e

Download llm-model-qwen2.5-0.5B-prefill-20e

Long-Context Model

qwen2.5-0.5B-p256-ax630c

Compared to the base model, supports a longer context window
256-length context window
Maximum output of 1024 tokens
Supported platforms: LLM630 Computing Kit, Module LLM, and Module LLM Kit
TTFT: 1126.19ms
Average generation speed: 10.30 token/s

Installation

apt install llm-model-qwen2.5-0.5b-p256-ax630c

Download llm-model-qwen2.5-0.5b-p256-ax630c

INT4 Quantized Models

qwen2.5-0.5B-Int4-ax630c

Compared to the base model, provides faster inference speed
Supports a 128-length context window
Maximum output of 1024 tokens
Supported platforms: LLM630 Computing Kit, Module LLM, and Module LLM Kit
TTFT: 442.95ms
Average generation speed: 12.52 token/s

Installation

apt install llm-model-qwen2.5-0.5b-int4-ax630c

Download llm-model-qwen2.5-0.5b-int4-ax630c

qwen2.5-0.5b-int4-ax650

Compared to the base model, provides faster inference speed
Supports a 128-length context window
Maximum output of 1024 tokens
Supported platforms: AI Pyramid
TTFT: 140.17ms
Average generation speed: 37.11 token/s

Installation

apt install llm-model-qwen2.5-0.5b-int4-ax650

Download llm-model-qwen2.5-0.5b-int4-ax650

Next Overview

Devices & Quick Start

AI Pyramid

Module LLM

LLM630 Compute Kit

Models

Qwen2.5

Qwen3

DeepSeek-R1

SmolVLM

MeloTTS

Whisper

Llama

AI Pyramid Applications

Module LLM Applications

Audio

CV Vision Application

Vision Language Model (VLM)

Large Language Model (LLM)

Voice Assistant

OpenAI API

Introduction

Available NPU Models

Base Model

qwen2.5-0.5B-prefill-20e

Installation

Long-Context Model

qwen2.5-0.5B-p256-ax630c

Installation

INT4 Quantized Models

qwen2.5-0.5B-Int4-ax630c

Installation

qwen2.5-0.5b-int4-ax650

Installation

On This Page