pdf-icon

StackFlow AI Platform

Module LLM Applications

CV Vision Application

Vision Language Model (VLM)

Large Language Model (LLM)

Voice Assistant

Qwen2.5-1.5B-Instruct

Introduction

Qwen2.5-1.5B-Instruct is an instruction-tuned language model in the Qwen2.5 series, with approximately 1.54 billion parameters. The main features of this model include:

  • Model Type: Causal Language Model
  • Training Stages: Pre-training and post-training
  • Architecture: Transformer, adopting RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings
  • Number of Parameters: 1.54 billion (1.31 billion non-embedding parameters)
  • Number of Layers: 28 layers
  • Number of Attention Heads (GQA): 12 query heads, 2 key-value heads
  • Context Length: Supports a full 32,768 tokens, with a maximum generation of 8,192 tokens

This model shows significant improvements in instruction understanding, long-text generation, and structured data comprehension, and supports multilingual capabilities across 29 languages, including English, Chinese, and French.

Available NPU Models

Base Model

qwen2.5-1.5B-ax630c

  • Supports a context window length of 128
  • Maximum output of 1024 tokens
  • Supported platforms: LLM630 Computing Kit, Module LLM, and Module LLM Kit
  • ttft (time to first token): 1029.41ms
  • Average generation speed: 3.59 token/s

Installation

apt install llm-model-qwen2.5-1.5b-ax630c

Long Context Model

qwen2.5-1.5B-p256-ax630c

  • Compared with the base model, supports a longer context window
  • 256-length context window
  • Maximum output of 1024 tokens
  • Supported platforms: LLM630 Computing Kit, Module LLM, and Module LLM Kit
  • ttft: 3056.54ms
  • Average generation speed: 3.57 token/s

Installation

apt install llm-model-qwen2.5-1.5b-p256-ax630c

INT4 Quantized Models

qwen2.5-1.5B-Int4-ax630c

  • Compared with the base model, provides faster inference speed
  • Supports a context window length of 128
  • Maximum output of 1024 tokens
  • Supported platforms: LLM630 Computing Kit, Module LLM, and Module LLM Kit
  • ttft: 1219.54ms
  • Average generation speed: 4.63 token/s

Installation

apt install llm-model-qwen2.5-1.5b-int4-ax630c

qwen2.5-1.5B-Int4-ax650

  • Compared with the base model, provides faster inference speed
  • Supports a context window length of 128
  • Maximum output of 1024 tokens
  • Supported platforms: AI Pyramid
  • ttft: 289.06ms
  • Average generation speed: 16.77 token/s

Installation

apt install llm-model-qwen2.5-1.5b-int4-ax650
On This Page