pdf-icon

Product Guide

リアルタイム音声アシスタント

OpenAI ボイスアシスタント

XiaoLing ボイスアシスタント

AtomS3R-M12 Volcengine Kit

オフライン音声認識

Industrial Control

IoT Measuring Instruments

Air Quality

Module13.2 PPS

Ethernet Camera

ディップスイッチ使用ガイド

Module ExtPort For Core2

SmolVLM2-500M-Video-Instruct

SmolVLM2-500M-Video-Instruct は、軽量なマルチモーダル動画言語モデルで、動画内容に基づいたテキストの理解と生成が可能で、指示型のインタラクションに対応しています。

  1. モデルを手動でダウンロード し、raspberrypi5 にアップロードするか、以下のコマンドでモデルリポジトリを取得します。
ヒント
git lfs がインストールされていない場合は、先に git lfs インストール手順 を参照してインストールしてください。
git clone https://huggingface.co/AXERA-TECH/SmolVLM2-500M-Video-Instruct

ファイル説明:

m5stack@raspberrypi:~/rsp/SmolVLM2-500M-Video-Instruct $ ls -lh
total 40K
drwxrwxr-x 2 m5stack m5stack 4.0K Aug 12 09:12 assets
-rw-rw-r-- 1 m5stack m5stack    0 Aug 12 09:12 config.json
drwxrwxr-x 2 m5stack m5stack 4.0K Aug 12 09:12 embeds
-rw-rw-r-- 1 m5stack m5stack  10K Aug 12 09:12 infer_axmodel.py
-rw-rw-r-- 1 m5stack m5stack 2.5K Aug 12 09:12 README.md
drwxrwxr-x 2 m5stack m5stack 4.0K Aug 12 09:12 smolvlm2_axmodel
drwxrwxr-x 2 m5stack m5stack 4.0K Aug 12 09:12 smolvlm2_tokenizer
drwxrwxr-x 2 m5stack m5stack 4.0K Aug 12 09:12 utils
drwxrwxr-x 2 m5stack m5stack 4.0K Aug 12 09:13 vit_model
  1. 仮想環境を作成
python -m venv smolvlm
  1. 仮想環境をアクティベート
source smolvlm/bin/activate
  1. 依存パッケージをインストール
pip install https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc1/axengine-0.1.3-py3-none-any.whl
pip install transformers torch torchvision tqdm pillow num2words onnx onnxruntime
  1. 実行
python infer_axmodel.py --axmodel_path smolvlm2_axmodel/ --vit_model vit_model/vision_model.axmodel --hf_model smolvlm2_tokenizer/ -i assets/bee.jpg

テスト画像:

実行結果:

(smolvlm) m5stack@raspberrypi:~/rsp/SmolVLM2-500M-Video-Instruct $ python infer_axmodel.py --axmodel_path smolvlm2_axmodel/ --vit_model vit_model/vision_model.axmodel --hf_model smolvlm2_tokenizer/ -i assets/bee.jpg
[INFO] Available providers:  ['AXCLRTExecutionProvider']
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 4.1-patch1-dirty 59be8f11-dirty
You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0.
Init InferenceSession:   0%|                                                                                       | 0/32 [00:00<?, ?it/s][INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 4.1-patch1-dirty 59be8f11-dirty
Init InferenceSession: 100%|██████████████████████████████████████████████████████████████████████████████| 32/32 [00:10<00:00,  2.98it/s]
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 4.1-patch1-dirty 59be8f11-dirty
Model loaded successfully!
slice_indices: [0, 1, 2, 3, 4, 5, 6, 7, 8]
Slice prefill done: 0
Slice prefill done: 1
Slice prefill done: 2
Slice prefill done: 3
Slice prefill done: 4
Slice prefill done: 5
Slice prefill done: 6
Slice prefill done: 7
Slice prefill done: 8
answer >>  The image depicts a close-up view of a pink flower with a bee on it. The bee, which appears to be a bumblebee, is perched on the flower's center, which is surrounded by a cluster of other flowers. The bee is in the process of collecting nectar from the flower, which is a common behavior for bees. The flower itself has a yellow center with a cluster of yellow stamens surrounding it. The petals of the flower are a vibrant shade of pink, and the bee is positioned very close to the camera, making it the focal point of the image. The background is slightly blurred, but it appears to be a garden or a field with other flowers and plants, contributing to the overall natural setting of the image.
On This Page