InternVL3-1B

Manually download the model and upload it to raspberrypi5, or pull the model repository using the following command.

Note

If git lfs is not installed, please refer to git lfs Installation Instructions for installation.

git clone https://huggingface.co/AXERA-TECH/InternVL3-1B

File Description

(axcl) m5stack@raspberrypi5:~/InternVL3-1B $ ls -lh
total 17M
-rw-r--r-- 1 m5stack m5stack 3.8K Jul 25 16:04 config.json
-rw-r--r-- 1 m5stack m5stack 3.9K Jul 25 16:04 gradio_demo.py
drwxr-xr-x 2 m5stack m5stack 4.0K Jul 25 16:06 internvl3_1b_ax650
drwxr-xr-x 2 m5stack m5stack 4.0K Jul 25 16:04 internvl3_tokenizer
-rw-r--r-- 1 m5stack m5stack 6.6K Jul 25 16:04 internvl3_tokenizer.py
-rw-r--r-- 1 m5stack m5stack 6.4M Jul 25 16:05 main_api_ax650
-rw-r--r-- 1 m5stack m5stack 1.9M Jul 25 16:05 main_api_axcl_x86
-rw-r--r-- 1 m5stack m5stack 6.3M Jul 25 16:05 main_ax650
-rw-r--r-- 1 m5stack m5stack 1.8M Jul 25 16:05 main_axcl_x86
-rw-r--r-- 1 m5stack m5stack  277 Jul 25 16:04 post_config.json
-rw-r--r-- 1 m5stack m5stack 6.0K Jul 25 16:04 README.md
-rw-r--r-- 1 m5stack m5stack  495 Jul 25 16:04 run_internvl_3_1b_448_api_ax650.sh
-rw-r--r-- 1 m5stack m5stack  516 Jul 25 16:04 run_internvl_3_1b_448_api_axcl_x86.sh
-rw-r--r-- 1 m5stack m5stack  506 Jul 25 16:04 run_internvl_3_1b_448_ax650.sh
-rw-r--r-- 1 m5stack m5stack  527 Jul 25 16:04 run_internvl_3_1b_448_axcl_x86.sh
-rw-r--r-- 1 m5stack m5stack  50K Jul 25 16:04 ssd_car.jpg

Start the tokenizer parser

python internvl3_tokenizer.py --port 12345

Run the tokenizer service, with the default host IP as localhost and the port set to 12345. Once running, information will appear as follows:

(axcl) m5stack@raspberrypi5:~/InternVL3-1B $ python internvl3_tokenizer.py --port 12345
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None None 151645 <|im_end|> 151665 151667
context_len is  256
prompt is <|im_start|>system
You are ShuSheng·WanXiang, English name InternVL, a multimodal large language model jointly developed by Shanghai Artificial Intelligence Laboratory, Tsinghua University, and multiple partners.<|im_end|>
<|im_start|>user
Hello
<img></img>
<|im_end|>
<|im_start|>assistant
47
http://0.0.0.0:12345

Run InternVL3-1B

m5stack@raspberrypi5:~/InternVL3-1B $ chmod +x main_axcl_aarch64 run_internvl_3_1b_448_axcl_aarch64.sh

Test image

Output information

m5stack@raspberrypi5:~/InternVL3-1B $ ./run_internvl_3_1b_448_axcl_aarch64.sh
[I][                            Init][ 160]: LLM init start
[I][                            Init][  34]: connect http://0.0.0.0:12345 ok
bos_id: -1, eos_id: 151645
img_start_token: 151665
img_context_token: 151667
input size: 1
    name:    image [unknown] [unknown]
        1 x 3 x 448 x 448 size:2408448


output size: 1
    name:   output
        1 x 256 x 896 size:917504

[I][                            Init][ 265]: IMAGE_CONTEXT_TOKEN: 151667, IMAGE_START_TOKEN: 151665
[I][                            Init][ 290]: image encoder input nchw@float32
[I][                            Init][ 320]: image encoder output float32

[I][                            Init][ 330]: image_encoder_height : 448, image_encoder_width: 448
[I][                            Init][ 332]: max_token_len : 2047
[I][                            Init][ 335]: kv_cache_size : 128, kv_cache_num: 2047
[I][                            Init][ 343]: prefill_token_num : 128
[I][                            Init][ 347]: grp: 1, prefill_max_token_num : 1
[I][                            Init][ 347]: grp: 2, prefill_max_token_num : 128
[I][                            Init][ 347]: grp: 3, prefill_max_token_num : 256
[I][                            Init][ 347]: grp: 4, prefill_max_token_num : 384
[I][                            Init][ 347]: grp: 5, prefill_max_token_num : 512
[I][                            Init][ 347]: grp: 6, prefill_max_token_num : 640
[I][                            Init][ 347]: grp: 7, prefill_max_token_num : 768
[I][                            Init][ 347]: grp: 8, prefill_max_token_num : 896
[I][                            Init][ 347]: grp: 9, prefill_max_token_num : 1024
[I][                            Init][ 351]: prefill_max_token_num : 1024
________________________
|    ID| remain cmm(MB)|
========================
|     0|           5764|
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
[I][                     load_config][ 282]: load config:
{
    "enable_repetition_penalty": false,
    "enable_temperature": true,
    "enable_top_k_sampling": true,
    "enable_top_p_sampling": false,
    "penalty_window": 20,
    "repetition_penalty": 1.2,
    "temperature": 0.9,
    "top_k": 10,
    "top_p": 0.8
}

[I][                            Init][ 448]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
prompt >> Please describe this image 

Next Overview

Linux PC

CM4Stack

CoreMP135

AI Accelerator Card

LLM-8850 Card

Quick Start

Vision Models

Large Language Models

Multimodal Models

Audio Models

Generative Models

Application List

Advanced Usage

LLM

Real-Time AI Voice Assistant

OpenAI Voice Assistant

XiaoZhi Voice Assistant

XiaoLing Voice Assistant

AtomS3R-M12 Volcengine Kit

Offline Voice Recognition

Unit ASR

Industrial Control

StamPLC

IoT Measuring Instruments

Air Quality

PowerHub

Module13.2 PPS

VAMeter

T-Lite

Input Device

Ezdata

Ethernet Camera

PoECAM

Wi-Fi Camera

TimerCAM

Unit CamS3/-5MP

AI Camera

UnitV2

M5StickV/UnitV

LoRa & LoRaWAN

TTN (The Things Network)

Meshtastic

Motor Control

Unit Roller485/CAN

Develop Tools

Hobby Kit