Qwen3-1.7B

モデルを手動でダウンロードして raspberrypi5 にアップロードするか、以下のコマンドでモデルリポジトリを取得します。

ヒント

git lfs がインストールされていない場合は、まずgit lfs インストール手順を参照してインストールしてください。

git clone https://huggingface.co/AXERA-TECH/Qwen3-1.7B

ファイル説明

m5stack@raspberrypi:~/rsp/Qwen3-1.7B$ ls -lh
total 4.4M
-rw-rw-r-- 1 m5stack m5stack    0 Aug 12 09:07 config.json
-rw-rw-r-- 1 m5stack m5stack 959K Aug 12 09:10 main_ax650
-rw-rw-r-- 1 m5stack m5stack 1.7M Aug 12 09:10 main_axcl_aarch64
-rw-rw-r-- 1 m5stack m5stack 1.8M Aug 12 09:10 main_axcl_x86
-rw-rw-r-- 1 m5stack m5stack  277 Aug 12 09:07 post_config.json
drwxrwxr-x 2 m5stack m5stack 4.0K Aug 12 09:07 qwen2.5_tokenizer
drwxrwxr-x 2 m5stack m5stack 4.0K Aug 12 09:10 qwen3-1.7b-ax650
drwxrwxr-x 2 m5stack m5stack 4.0K Aug 12 09:10 qwen3_tokenizer
-rw-rw-r-- 1 m5stack m5stack 7.6K Aug 12 09:07 qwen3_tokenizer_uid.py
-rw-rw-r-- 1 m5stack m5stack  12K Aug 12 09:07 README.md
-rw-rw-r-- 1 m5stack m5stack 2.5K Aug 12 09:07 run_qwen3_1.7b_int8_ctx_ax650.sh
-rw-rw-r-- 1 m5stack m5stack 2.5K Aug 12 09:07 run_qwen3_1.7b_int8_ctx_axcl_aarch64.sh
-rw-rw-r-- 1 m5stack m5stack 2.5K Aug 12 09:07 run_qwen3_1.7b_int8_ctx_axcl_x86.sh

ヒント

以前に qwen の仮想環境を作成済みの場合は、再作成は不要で、有効化するだけで構いません。

仮想環境を作成

python -m venv qwen

仮想環境を有効化

source qwen/bin/activate

依存パッケージをインストール

pip install transformers jinja2 torch

tokenizer パーサーを起動

python qwen3_tokenizer_uid.py --port 12345

tokenizer サービスを実行（Host IP はデフォルトで localhost、ポート番号は 12345）。実行後の表示例：

(qwen) m5stack@raspberrypi:~/Qwen3-0.6B $ python qwen3_tokenizer_uid.py --port 12345
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Server running at http://0.0.0.0:12345

ヒント

以下の操作は、新しい raspberrypi ターミナルで実行してください。

実行権限を付与

chmod +x main_axcl_aarch64 run_qwen3_1.7b_int8_ctx_axcl_aarch64.sh

Qwen3 モデル推論サービスを起動

./run_qwen3_1.7b_int8_ctx_axcl_aarch64.sh

起動成功時の表示例：

m5stack@raspberrypi:~/rsp/Qwen3-1.7B$ ./run_qwen3_1.7b_int8_ctx_axcl_aarch64.sh
[I][                            Init][ 136]: LLM init start
[I][                            Init][  34]: connect http://127.0.0.1:12345 ok
[I][                            Init][  57]: uid: 95e7d5f3-fc8d-48ea-b489-1de9f37924d1
bos_id: -1, eos_id: 151645
  3% | ██                                |   1 /  31 [1.08s<33.54s, 0.92 count/s] tokenizer init ok[I][                            Init][  45]: LLaMaEmbedSelector use mmap
  6% | ███                               |   2 /  31 [1.08s<16.77s, 1.85 count/s] embed_selector init ok
[I][                             run][  30]: AXCLWorker start with devid 0
  100% | ████████████████████████████████ |  31 /  31 [64.75s<64.75s, 0.48 count/s] init post axmodel ok,remain_cmm(3788 MB)
[I][                            Init][ 237]: max_token_len : 2559
[I][                            Init][ 240]: kv_cache_size : 1024, kv_cache_num: 2559
[I][                            Init][ 248]: prefill_token_num : 128
[I][                            Init][ 252]: grp: 1, prefill_max_token_num : 1
[I][                            Init][ 252]: grp: 2, prefill_max_token_num : 512
[I][                            Init][ 252]: grp: 3, prefill_max_token_num : 1024
[I][                            Init][ 252]: grp: 4, prefill_max_token_num : 1536
[I][                            Init][ 252]: grp: 5, prefill_max_token_num : 2048
[I][                            Init][ 256]: prefill_max_token_num : 2048
________________________
|    ID| remain cmm(MB)|
========================
|     0|           3788|
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
[I][                     load_config][ 282]: load config:
{
    "enable_repetition_penalty": false,
    "enable_temperature": false,
    "enable_top_k_sampling": true,
    "enable_top_p_sampling": false,
    "penalty_window": 20,
    "repetition_penalty": 1.2,
    "temperature": 0.9,
    "top_k": 1,
    "top_p": 0.8
}

[I][                            Init][ 279]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
[I][          GenerateKVCachePrefill][ 335]: input token num : 21, prefill_split_num : 1 prefill_grpid : 2
[I][          GenerateKVCachePrefill][ 372]: input_num_token:21
[I][                            main][ 236]: precompute_len: 21
[I][                            main][ 237]: system_prompt: You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
prompt >> hello
[I][                      SetKVCache][ 628]: prefill_grpid:2 kv_cache_num:512 precompute_len:21 input_num_token:12
[I][                      SetKVCache][ 631]: current prefill_max_token_num:1920
[I][                             Run][ 869]: input token num : 12, prefill_split_num : 1
[I][                             Run][ 901]: input_num_token:12
[I][                             Run][1030]: ttft: 796.38 ms
<think>

</think>

Hello! How can I assist you today?

[N][                             Run][1182]: hit eos,avg 7.38 token/s

[I][                      GetKVCache][ 597]: precompute_len:46, remaining:2002
prompt >> 

Next Overview

Linux PC

CM4Stack

CoreMP135

AI Accelerator Card

LLM-8850 Card

Quick Start

Vision Models

Large Language Models

Multimodal Models

Audio Models

Generative Models

Advanced Usage

LLM

リアルタイム音声アシスタント

OpenAI ボイスアシスタント

XiaoZhi ボイスアシスタント

XiaoLing ボイスアシスタント

AtomS3R-M12 Volcengine Kit

オフライン音声認識

Unit ASR

Home Assistant

Home Assistant OS

ESPHome

Industrial Control

StamPLC

IoT Measuring Instruments

Air Quality

Module13.2 PPS

VAMeter

T-Lite

Ezdata

Ethernet Camera

PoECAM

Wi-Fi Camera

TimerCAM

Unit CamS3/-5MP

AI Camera

UnitV2

M5StickV/UnitV

LoRa & LoRaWAN

TTN (The Things Network)

Meshtastic

Motor Control

Unit Roller485/CAN

Develop Tools

Hobby Kit

ファームウェアの初期化