Qwen2.5-0.5B

Manually download the model and upload it to raspberrypi5, or pull the model repository using the following command.

Note

If git lfs is not installed, please refer to the git lfs installation guide to install it first.

git clone https://huggingface.co/AXERA-TECH/Qwen2.5-0.5B-Instruct-GPTQ-Int4

File Description

m5stack@raspberrypi:~/rsp/Qwen2.5-0.5B-Instruct-GPTQ-Int4 $ ls -lh
total 2.9M
-rw-rw-r-- 1 m5stack m5stack    0 Aug 12 10:30 config.json
-rw-rw-r-- 1 m5stack m5stack 976K Aug 12 10:30 main_axcl_aarch64
-rw-rw-r-- 1 m5stack m5stack 999K Aug 12 10:30 main_axcl_x86
-rw-rw-r-- 1 m5stack m5stack 932K Aug 12 10:30 main_prefill
-rw-rw-r-- 1 m5stack m5stack  277 Aug 12 10:30 post_config.json
drwxrwxr-x 2 m5stack m5stack 4.0K Aug 12 10:30 qwen2.5-0.5b-gptq-int4-ax650
drwxrwxr-x 2 m5stack m5stack 4.0K Aug 12 10:30 qwen2.5_tokenizer
-rw-rw-r-- 1 m5stack m5stack 4.2K Aug 12 10:30 qwen2.5_tokenizer.py
-rw-rw-r-- 1 m5stack m5stack 7.3K Aug 12 10:30 README.md
-rw-rw-r-- 1 m5stack m5stack  520 Aug 12 10:30 run_qwen2.5_0.5b_gptq_int4_ax650.sh
-rw-rw-r-- 1 m5stack m5stack  525 Aug 12 10:30 run_qwen2.5_0.5b_gptq_int4_axcl_aarch64.sh
-rw-rw-r-- 1 m5stack m5stack  521 Aug 12 10:30 run_qwen2.5_0.5b_gptq_int4_axcl_x86.sh

Note

If you have already created a qwen virtual environment before, you do not need to create it again, just activate it.

Create a virtual environment

python -m venv qwen

Activate the virtual environment

source qwen/bin/activate

Install dependency packages

pip install transformers jinja2

Start the tokenizer parser

python qwen2.5_tokenizer.py --port 12345

Run the tokenizer service. The host IP defaults to localhost, and the port number is set to 12345. After running, the information is as follows:

(qwen) m5stack@raspberrypi:~/rsp/Qwen2.5-0.5B-Instruct-GPTQ-Int4 $ python qwen2.5_tokenizer.py --port 12345
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None None 151645 <|im_end|>
<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
hello world<|im_end|>
<|im_start|>assistant

[151644, 8948, 198, 2610, 525, 1207, 16948, 11, 3465, 553, 54364, 14817, 13, 1446, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 14990, 1879, 151645, 198, 151644, 77091, 198]
http://localhost:12345

Note

The following operations require opening a new terminal for raspberrypi.

Set executable permissions

chmod +x main_axcl_aarch64 run_qwen2.5_0.5b_gptq_int4_axcl_aarch64.sh

Start the Qwen2.5 model inference service

./run_qwen2.5_0.5b_gptq_int4_axcl_aarch64.sh

After successfully starting, the information is as follows:

m5stack@raspberrypi:~/rsp/Qwen2.5-0.5B-Instruct-GPTQ-Int4$ ./run_qwen2.5_0.5b_gptq_int4_axcl_aarch64.sh
build time: Feb 13 2025 15:44:57
[I][                            Init][ 111]: LLM init start
bos_id: -1, eos_id: 151645
  3% | ██                                |   1 /  27 [0.00s<0.05s, 500.00 count/s] tokenizer init
  96% | ███████████████████████████████   |  26 /  27 [8.35s<8.67s, 3.11 count/s] init 23 axmodel ok
  100% | ████████████████████████████████ |  27 /  27 [11.63s<11.63s, 2.32 count/s] init post axmodel ok
[I][                            Init][ 226]: max_token_len : 1024
[I][                            Init][ 231]: kv_cache_size : 128, kv_cache_num: 1024
[I][                     load_config][ 282]: load config:
{
    "enable_repetition_penalty": false,
    "enable_temperature": true,
    "enable_top_k_sampling": true,
    "enable_top_p_sampling": false,
    "penalty_window": 20,
    "repetition_penalty": 1.2,
    "temperature": 0.9,
    "top_k": 10,
    "top_p": 0.8
}

[I][                            Init][ 288]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
>> hello
Hello! How can I help you today? Let me know if you would like me to assist with anything else.

[N][                             Run][ 610]: hit eos,avg 27.05 token/s

>> 

Next Overview

Linux PC

CM4Stack

CoreMP135

AI Accelerator Card

LLM-8850 Card

Quick Start

Vision Models

Large Language Models

Multimodal Models

Audio Models

Generative Models

Advanced Usage

LLM

Real-Time AI Voice Assistant

OpenAI Voice Assistant

XiaoZhi Voice Assistant

XiaoLing Voice Assistant

AtomS3R-M12 Volcengine Kit

Offline Voice Recognition

Unit ASR

Home Assistant

Industrial Control

StamPLC

IoT Measuring Instruments

Air Quality

Module13.2 PPS

VAMeter

T-Lite

Ezdata

Ethernet Camera

PoECAM

Wi-Fi Camera

TimerCAM

Unit CamS3/-5MP

AI Camera

UnitV2

M5StickV/UnitV

LoRa & LoRaWAN

TTN (The Things Network)

Meshtastic

Motor Control

Unit Roller485/CAN

Develop Tools

Hobby Kit

Restore Factory Firmware