Module LLM API

Module LLM API

Version	Update Date	Remarks
v1.0.0	2024.10.24	/

Overview

The Module LLM integrates functional units such as KWS (Keyword Spotting), ASR (Speech Recognition), LLM (Large Language Model), and TTS (Text-to-Speech). Each unit can operate independently as a standalone module or support configuration for data workflow integration, enabling more intelligent interactive applications. The module supports interaction with a host device via UART communication, and it uses JSON-formatted data packets, making it very easy to use.

Built-in Functional Units

Unit	Unit Name	Unit Capability
sys	System	Set module parameters, retrieve module status
kws	Keyword Detection	Detect the presence of keywords in audio
asr	Speech-to-Text	Convert audio to text
llm	Generative Model	Generate new text based on input text
tts	Text-to-Speech	Convert text to audio
audio	System Audio Interface	Access microphone audio and playback audio

Usage Process

1. Stack the module with an M5Stack main controller (Basic/M5Core2/M5Core3, etc.) or connect it directly to TX/RX and power supply using a USB-TTL converter. The module will light up green upon successful startup.
1. Initialize the UART interface in the program (pin configuration based on the actual connected device, interface configuration as 115200bps 8N1).
1. Refer to the usage examples below, send an initialization frame to activate the desired unit service.

Communication Interface

The Module LLM’s UART interface is configured by default as 115200bps 8N1.

Data Packet Format

Basic Structure of Sending Frame

{
    "request_id": "001", 
    "work_id": "llm.1001",
    "action": "taskinfo",
    "object": "None",
    "data":"None"
} 

request_id:
- The session ID used to distinguish context, corresponding to the service invocation and response.
work_id:
- When calling the service unit, enter keyword + ID, e.g., llm.xxxx(id).
- When initializing the service unit in setup, enter the unit name keyword without the ID, e.g., llm.
action:
- The method being called, corresponding to the unit method. Please refer to the unit list below.
object:
- Sets the structure of the parameters passed in data. Refer to the parameter structure list for all parameter structures. If there are no parameters, this can be omitted.
data:
- Parameters to be transmitted; can be omitted if there are no parameters.

Basic Structure of Response Frame

{
  "request_id": "002",
  "work_id": "kws.1002",
  "created": 30952,
  "object": "None",
  "data":"None",
  "error":{"code":0, "message":""}
} 

created:
- The time when the operation was completed, in Unix timestamp format (seconds).
error:
- Status information for determining whether the service call succeeded or failed. For more error code information, please see the list below.

Streaming Data Sending Frame Structure

{
    "request_id": "4",
    "work_id": "llm.1003",
    "action": "inference",
    "object": "llm.utf-8.stream",
    "data": {
        "delta": "What's ur name?",
        "index": 0,
        "finish": true
    }
} 

Streaming Data Response Frame Structure

{
    "created": 1692664605,
    "data": {
        "delta": "I'm not a person, but I'm here to help with any questions you may have. How can I assist you today?\n",
        "finish": true,
        "index": 0
    },
    "error": {
        "code": 0,
        "message": ""
    },
    "object": "llm.utf-8.stream",
    "request_id": "4",
    "work_id": "llm.1003"
} 

index:
- Index for identifying the data segment
delta:
- Data segment
finish:
- Indicates the final packet if set to true

Error Codes

Error codes are included in the error field of the response to determine the result of the response:

Error Code	Description	Message	Notes
0	Operation Successful!	Operation Successful!
-1	Communication channel receive state machine reset warning!	reace reset	Continuously sending “}” will trigger this error, used to reset the JSON receive state machine.
-2	JSON parsing error	JSON format error
-3	`sys action` match error	action match false
-4	Inference data push error	inference data push false
-5	Model loading failed	Model loading failed.
-6	Unit does not exist	Unit Does Not Exist
-7	Unknown operation	Unknown Operation
-8	Unit resource allocation failed	Unit Resource Allocation Failed
-9	Unit call failed	unit call false
-10	Model initialization failed	Model init failed.
-11	Model run error	Model run failed.
-12	Module not initialized	Module has not been initialised.
-13	Module is already working	Module already working.
-14	Module is not working	Module is not working.
-19	Unit resource release failed	Unit Resource Free Failed

SYS

The SYS unit is used to set module working parameters and retrieve module operation information.

Method	Function	Input Type	Output Type
lsmode	Retrieve available models	None	sys.lsmode
hwinfo	Retrieve CPU load, memory load, chip temperature	None	sys.hwinfo
reset	Reset the unit	None	Returns reset completion JSON
reboot	Reboot the system	None	None
ping	Check if the system is available	None	None

lsmode

Retrieve available models

{
    "request_id": "001", 
    "work_id": "sys",
    "action": "lsmode"
} 

Retrieve available models response

{
    "created": 1692652687,
    "data": [
        {
            "capabilities": [
                "Automatic_Speech_Recognition"
            ],
            "input_type": [
                "sys.pcm"
            ],
            "model": "sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23",
            "output_type": [
                "asr.utf-8"
            ],
            "type": "asr"
        },
        {
            "capabilities": [
                "Automatic_Speech_Recognition"
            ],
            "input_type": [
                "sys.pcm"
            ],
            "model": "sherpa-ncnn-streaming-zipformer-20M-2023-02-17",
            "output_type": [
                "asr.utf-8"
            ],
            "type": "asr"
        },
        {
            "capabilities": [
                "Keyword_spotting"
            ],
            "input_type": [
                "sys.pcm"
            ],
            "model": "sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01",
            "output_type": [
                "kws.bool"
            ],
            "type": "kws"
        },
        {
            "capabilities": [
                "Keyword_spotting"
            ],
            "input_type": [
                "sys.pcm"
            ],
            "model": "sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01",
            "output_type": [
                "kws.bool"
            ],
            "type": "kws"
        },
        {
            "capabilities": [
                "text_generation",
                "chat"
            ],
            "input_type": "utf-8",
            "model": "qwen2.5-0.5b",
            "output_type": "utf-8",
            "type": "llm"
        },
        {
            "capabilities": [
                "Text_to_speech"
            ],
            "input_type": [
                "sys.utf-8",
                "llm.utf-8"
            ],
            "model": "single_speaker_fast",
            "output_type": [
                "tts.wav"
            ],
            "type": "tts"
        },
        {
            "capabilities": [
                "Text_to_speech"
            ],
            "input_type": [
                "sys.utf-8",
                "llm.utf-8"
            ],
            "model": "single_speaker_english_fast",
            "output_type": [
                "tts.wav"
            ],
            "type": "tts"
        }
    ],
    "error": {
        "code": 0,
        "message": ""
    },
    "object": "sys.lsmode",
    "request_id": "001",
    "work_id": "sys"
} 

hwinfo

Retrieve CPU load, memory load, and chip temperature

{
    "request_id": "001", 
    "work_id": "sys",
    "action": "hwinfo"
} 

Response for retrieving CPU load, memory load, and chip temperature (cpu_loadavg(0%), mem(18%), temperature(46°C))

{
    "created": 1692652642,
    "data": {
        "cpu_loadavg": 0,
        "mem": 18,
        "temperature": 46350
    },
    "error": {
        "code": 0,
        "message": ""
    },
    "object": "sys.hwinfo",
    "request_id": "001",
    "work_id": "sys"
} 

reset

System reset command.

{
    "request_id": "001", 
    "work_id": "sys",
    "action": "reset"
} 

System reset initiated.

{
    "created": 1692652712,
    "error": {
        "code": 0,
        "message": "llm server restarting ..."
    },
    "request_id": "001",
    "work_id": "sys"
} 

System reset completion response.

{
    "request_id": "0",
    "work_id": "sys",
    "created": 1692652723,
    "error": {
        "code": 0,
        "message": "reset over"
    }
} 

reboot

Full system reboot command.

{
    "request_id": "001", 
    "work_id": "sys",
    "action": "reboot"
} 

Full system reboot command.

{
    "created": 1692652822,
    "error": {
        "code": 0,
        "message": "rebooting ..."
    },
    "request_id": "001",
    "work_id": "sys"
} 

Note: After the response message, the system will reboot. During the reboot, a string V0EUEURS will be sent, which is the system startup string and can be ignored.

ping

System service communication test, useful for checking communication status after module power-up.

{
    "request_id": "001", 
    "work_id": "sys",
    "action": "ping"
} 

System service communication test response

{
    "created": 1692652310,
    "error": {
        "code": 0,
        "message": ""
    },
    "request_id": "001",
    "work_id": "sys"
} 

AUDIO

The AUDIO unit is used to control the system sound card, access microphone audio, and playback sound. It provides system audio input and output, supplying audio input for the Keyword Spotting (KWS) and Automatic Speech Recognition (ASR) units and audio output for the Text-to-Speech (TTS) module. The AUDIO unit must be initialized before using the KWS and ASR units.

Method	Function	Input Type	Output Type
setup	Configure audio unit	audio.setup	None (the returned result includes the successful `work_id`)
exit	End the work of `work_id`	None	None
pause	Pause task operation	None	None
work	Resume task operation	None	None
taskinfo	Retrieve all task instance information		audio.taskinfo

setup

Initialize the Audio unit and configure playback volume and sound card slot number (capcard, playcard use defaults)

Parameter Description

Parameter	Description	Input Value
capcard	Microphone sound card index	Default system sound card: 0
capdevice	Microphone device index	Onboard silicon mic: 0
capVolume	Input volume	0.0～10.0 (volume > 1 will amplify, default is 0.5)
playcard	Speaker sound card index	Default system sound card: 0
playdevice	Speaker device index	Onboard speaker: 1
playVolume	Output volume	0.0～10.0 (volume > 1 will amplify, default is 0.5)

{
    "request_id": "1",
    "work_id": "audio",
    "action": "setup",
    "object": "audio.setup",
    "data": {
        "capcard": 0,
        "capdevice": 0,
        "capVolume": 0.5,
        "playcard": 0,
        "playdevice": 1,
        "playVolume": 0.5
    }
} 

Response for Audio unit initialization

{
    "created": 1692659008,
    "error": {
        "code": 0,
        "message": "audio setup successful"
    },
    "request_id": "1",
    "work_id": "audio.1000"
} 

pause

Pause Audio unit command

{
    "request_id": "1",
    "work_id": "audio.1000",
    "action": "pause"
} 

Response for Audio unit pause command

{
    "created": 1692659049,
    "error": {
        "code": 0,
        "message": "audio pause"
    },
    "request_id": "1",
    "work_id": "audio.1000"
} 

work

Start Audio unit command

{
    "request_id": "1",
    "work_id": "audio.1000",
    "action": "work",
    "object": "audio.setup",
    "data": {
        "capcard": 0,
        "capdevice": 0,
        "capVolume": 0.5,
        "playcard": 0,
        "playdevice": 1,
        "playVolume": 0.25
    }
} 

Response for Audio unit start command

{
    "created": 1692659297,
    "error": {
        "code": 0,
        "message": "audio work start"
    },
    "request_id": "1",
    "work_id": "audio.1000"
} 

exit

End and release the Audio unit

{
    "request_id": "1",
    "work_id": "audio.1000",
    "action": "exit"
} 

Response for ending and releasing the Audio unit

{
    "created": 1692659370,
    "error": {
        "code": 0,
        "message": "audio exit"
    },
    "request_id": "1",
    "work_id": "audio.1000"
} 

taskinfo

Query Audio unit status

// Sending data
{
    "request_id": "1",
    "work_id": "audio.1000",
    "action": "taskinfo"
} 

Response when the Audio unit is running

{
    "created": 1692659454,
    "data": "running",
    "error": {
        "code": 0,
        "message": ""
    },
    "object": "audio.state",
    "request_id": "1",
    "work_id": "audio.1000"
} 

Response when the Audio unit is stopped

{
    "created": 1692659499,
    "data": "stopped",
    "error": {
        "code": 0,
        "message": ""
    },
    "object": "audio.state",
    "request_id": "1",
    "work_id": "audio.1000"
} 

Response when the Audio unit is released

{
    "created": 1692659403,
    "data": "deinit",
    "error": {
        "code": 0,
        "message": ""
    },
    "object": "audio.state",
    "request_id": "1",
    "work_id": "audio.1000"
} 

KWS

The KWS unit is used for keyword detection.

Method	Function	Input Type	Output Type
setup	Configure KWS unit	kws.setup	None (the returned result includes the successful `work_id`)
pause	Pause task operation	None	None
work	Resume task operation	None	None
exit	End the work of `work_id`	None	None
taskinfo	Retrieve all task instance information		kws.taskinfo

setup

Initialize the KWS unit and configure for Chinese/English recognition model. (Note: KWS keyword field does not allow a mix of Chinese/English)

Parameter Description

Parameter	Description	Input Value
model	Conversion model	English model: "sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01" Chinese model: "sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01"
kws	KWS keyword text setup	Mixing Chinese/English is not allowed, English should be in all uppercase
enoutput	Enable UART output	Enable: true Disable: false

KWS Setup

Initialize the KWS unit and configure for the English recognition model.

{
    "request_id": "2",
    "work_id": "kws",
    "action": "setup",
    "object": "kws.setup",
    "data": {
        "model": "sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01",
        "response_format": "kws.bool",
        "input": "sys.pcm",
        "enoutput": true,
        "kws": "HELLO"
    }
} 

KWS unit initialization response (Note: the setup process takes approximately 9 seconds)

{
    "created": 1692660576,
    "error": {
        "code": 0,
        "message": "kws setup successful"
    },
    "request_id": "2",
    "work_id": "kws.1001"
} 

KWS response after keyword trigger

{
    "created": 1692660576,
    "error": {
        "code": 0,
        "message": "kws setup successful"
    },
    "request_id": "2",
    "work_id": "kws.1001"
} 

pause

Pause KWS unit command

{
    "request_id": "2",
    "work_id": "kws.1001",
    "action": "pause"
} 

Response for pausing KWS unit command

{
    "created": 1692660626,
    "error": {
        "code": 0,
        "message": "kws pause"
    },

    "request_id": "2",
    "work_id": "kws.1001"
} 

work

Start KWS unit command

{
    "request_id": "2",
    "work_id": "kws.1001",
    "action": "work"
} 

Response for starting KWS unit command

{
    "created": 1692660651,
    "error": {
        "code": 0,
        "message": "kws work"
    },
    "request_id": "2",
    "work_id": "kws.1001"
} 

exit

End and release the KWS unit

{
  "request_id": "2",
  "work_id": "kws.1001",
  "action": "exit"
} 

Response for ending and releasing the KWS unit

{
  "created": 1692654383,
  "error": {
    "code": 0,
    "message": "kws exit"
  },
  "request_id": "2",
  "work_id": "kws.1001"
} 

taskinfo

Query KWS unit status

{
  "created": 1692654383,
  "error": {
    "code": 0,
    "message": "kws exit"
  },
  "request_id": "2",
  "work_id": "kws.1001"
} 

Response when the KWS unit is running

{
  "created": 1692654305,
  "error": {
    "code": 0,
    "message": ""
  },
  "object": "kws.state",
  "data": "running",
  "request_id": "2",
  "work_id": "kws.1001"
} 

Response when the KWS unit is stopped

{
  "created": 1692654535,
  "error": {
    "code": 0,
    "message": ""
  },
  "object": "kws.state",
  "data": "stopped",
  "request_id": "2",
  "work_id": "kws.1001"
} 

KWS unit release response

{
  "created": 1692654452,
  "error": {
    "code": 0,
    "message": ""
  },
  "object": "kws.state",
  "data": "deinit",
  "request_id": "2",
  "work_id": "kws.0"
} 

ASR

The ASR unit is used for converting speech to text.

Method	Function	Input Type	Output Type
setup	Configure ASR unit	asr.setup	None (the returned result includes the successful `work_id`)
pause	Pause task operation	None	None
work	Resume task operation	None	None
exit	End the work of `work_id`	None	None
taskinfo	Retrieve all task instance information		asr.taskinfo

setup

Initialize the ASR unit and configure for Chinese/English model.

Parameter Description

Parameter	Description	Input Value
model	Conversion model	English model: "sherpa-ncnn-streaming-zipformer-20M-2023-02-17" Chinese model: "sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23"
response_format	Output format	Standard output: "asr.utf-8" Streaming output: "asr.utf-8.stream"
input	Input	LLM input: "llm.xxx" (input work_id of the llm unit) UART input: "tts.utf-8" UART streaming input: "tts.utf-8.stream"
enkws	Enable KWS-based activation	Activation via KWS, followed by ASR: true No KWS activation, ASR will operate continuously: false
rule1	Timeout from activation to unrecognized content	Unit: seconds
rule2	Maximum interval time for recognition	Unit: seconds
rule3	Maximum recognition timeout	Unit: seconds
enoutput	Enable UART output	Enable: true Disable: false

ASR Setup

Initialize the ASR unit and configure for the English model.

{
    "request_id": "3",
    "work_id": "asr",
    "action": "setup",
    "object": "asr.setup",
    "data": {
        "model": "sherpa-ncnn-streaming-zipformer-20M-2023-02-17",
        "response_format": "asr.utf-8",
        "input": "sys.pcm",
        "enoutput": true,
        "enkws": true,
        "rule1": 2.4,
        "rule2": 1.2,
        "rule3": 30
    }
} 

ASR unit initialization response

{
    "created": 1692667736,
    "error": {
        "code": 0,
        "message": "asr setup successful"
    },
    "request_id": "3",
    "work_id": "asr.1002"
} 

ASR trigger response

{
    "created": 1692655176,
    "data": {
        "delta": " hello",
        "index": "0"
    },
    "object": "asr.stream",
    "request_id": "004",
    "work_id": "asr.1003"
} 

pause

Pause ASR unit command

{
    "request_id": "3",
    "work_id": "asr.1002",
    "action": "pause"
} 

Pause ASR unit command response

{
    "created": 1692670174,
    "error": {
        "code": 0,
        "message": "asr pause"
    },
    "request_id": "3",
    "work_id": "asr.1002"
} 

work

Start ASR unit command

{
    "request_id": "3",
    "work_id": "asr.1002",
    "action": "work"
} 

Start ASR unit command response

{
    "created": 1692670213,
    "error": {
        "code": 0,
        "message": "asr work"
    },
    "request_id": "3",
    "work_id": "asr.1002"
} 

exit

End and release the ASR unit

{
    "request_id": "3",
    "work_id": "asr.1002",
    "action": "exit"
} 

ASR unit release response

{
    "created": 1692670254,
    "error": {
        "code": 0,
        "message": "asr exit"
    },
    "request_id": "3",
    "work_id": "asr.1002"
} 

taskinfo

Query ASR unit status

{
    "request_id": "3",
    "work_id": "asr.1002",
    "action": "taskinfo"
} 

Response when ASR unit is running

{
    "created": 1692669923,
    "data": "running",
    "error": {
        "code": 0,
        "message": ""
    },
    "object": "asr.state",
    "request_id": "3",
    "work_id": "asr.1002"
} 

Response when ASR unit is stopped

{
  "created": 1692653792,
  "data": "stopped",
  "error": {
    "code": 0,
    "message": ""
  },
  "object": "asr.state",
  "request_id": "3",
  "work_id": "asr.1002"
} 

Response when ASR unit is released

{
    "created": 1692669874,
    "data": "deinit",
    "error": {
        "code": 0,
        "message": ""
    },
    "object": "asr.state",
    "request_id": "3",
    "work_id": "asr.0"
} 

LLM

The LLM (Large Language Model) unit can generate responses based on input text.

Method	Function	Input Type	Output Type
setup	Configure LLM unit	llm.setup	None (the returned result includes the successful `work_id`)
inference	Perform inference	Typical: llm.utf-8 (model difference can be checked via `sys.lsmode`)	None (returns only data submission result; final inference result will depend on configuration)
pause	Pause task operation	None	None
work	Resume task operation	None	None
exit	End the work of `work_id`	None	None
taskinfo	Retrieve all task instance information		llm.taskinfo

setup

Initialize the LLM unit and configure a specified model. Current pre-installed model:
- qwen2.5-0.5b

Parameter Description

Parameter	Description	Input Value
model	Conversion model	Pre-installed model "qwen2.5-0.5b"
response_format	Output format	Standard output: "llm.utf-8" Streaming output: "llm.utf-8.stream"
input	Input	ASR input: "asr.xxx" (input work_id of the ASR unit) UART input: "llm.utf-8" UART streaming input: "llm.utf-8.stream"
enkws	KWS interruption of ongoing process	Interrupt with KWS: true Do not interrupt with KWS: false
max_length	Configure max output token length	Maximum: 1024, recommended: 127
prompt	Model initialization prompt
enoutput	Enable UART output	Enable: true Disable: false

LLM Input From ASR

Initialize LLM unit and configure ASR (speech-to-text) as input data

// Input from ASR
{
    "request_id": "4",
    "work_id": "llm",
    "action": "setup",
    "object": "llm.setup",
    "data": {
        "model": "qwen2.5-0.5b",
        "response_format": "llm.utf-8.stream",
        "input": "asr.1001",
        "enoutput": true,
        "enkws": true,
        "max_token_len": 127,
        "prompt": "You are a knowledgeable assistant capable of answering various questions and providing information."       
    }
} 

LLM Input From UART

Initialize LLM unit and configure UART interface as input data

// Input from UART
{
    "request_id": "4",
    "work_id": "llm",
    "action": "setup",
    "object": "llm.setup",
    "data": {
        "model": "qwen2.5-0.5b",
        "response_format": "llm.utf-8",
        "input": "llm.utf-8.stream",
        "enoutput": true,
        "enkws": true,
        "max_token_len": 127,
        "prompt": "You are a knowledgeable assistant capable of answering various questions and providing information."
    }
} 

LLM unit initialization response

{
    "created": 1692664107,
    "data": "None",
    "error": {
        "code": 0,
        "message": "llm setup successful"
    },
    "object": "None",
    "request_id": "4",
    "work_id": "llm.1003"
} 

inference

UART inference

Submit inference data via UART

// Streaming Input
{
    "request_id": "4",
    "work_id": "llm.1003",
    "action": "inference",
    "object": "llm.utf-8.stream",
    "data": {
        "delta": "What's ur name?",
        "index": 0,
        "finish": true
    }
}
// Non-Streaming Input
{
    "request_id": "4",
    "work_id": "llm.1003",
    "action": "inference",
    "object": "llm.utf-8",
    "data": "What's ur name?"
} 

Inference response data

{
    "created": 1692664605,
    "data": {
        "delta": "I'm not a person, but I'm here to help with any questions you may have. How can I assist you today?\n",
        "finish": true,
        "index": 0
    },
    "error": {
        "code": 0,
        "message": ""
    },
    "object": "llm.utf-8.stream",
    "request_id": "4",
    "work_id": "llm.1003"
} 

pause

Pause LLM unit command

{
    "request_id": "4",
    "work_id": "llm.1003",
    "action": "pause"
} 

LLM unit pause command response

{
    "created": 1692664941,
    "error": {
        "code": 0,
        "message": "llm pause"
    },
    "request_id": "4",
    "work_id": "llm.1003"
} 

work

Start LLM unit command

{
    "request_id": "4",
    "work_id": "llm.1003",
    "action": "work"
} 

LLM unit start command response

{
    "created": 1692664972,
    "error": {
        "code": 0,
        "message": "llm work"
    },
    "request_id": "4",
    "work_id": "llm.1003"
} 

exit

End and release LLM unit

{
    "request_id": "4",
    "work_id": "llm.1003",
    "action": "exit"
} 

LLM unit release response

{
    "created": 1692664858,
    "data": "None",
    "error": {
        "code": 0,
        "message": "llm exit"
    },
    "object": "None",
    "request_id": "4",
    "work_id": "llm.1003"
} 

taskinfo

Query LLM unit status

{
    "request_id": "4",
    "work_id": "llm.1003",
    "action": "taskinfo"
} 

LLM unit running response

{
    "created": 1692664730,
    "data": "running",
    "error": {
        "code": 0,
        "message": ""
    },
    "object": "llm.state",
    "request_id": "4",
    "work_id": "llm.1003"
} 

LLM unit stopped response

{
    "created": 1692664823,
    "data": "stopped",
    "error": {
        "code": 0,
        "message": ""
    },
    "object": "llm.state",
    "request_id": "4",
    "work_id": "llm.1003"
} 

LLM unit release response

{
    "created": 1692664881,
    "data": "deinit",
    "error": {
        "code": 0,
        "message": ""
    },
    "object": "llm.state",
    "request_id": "4",
    "work_id": "llm.1003"
} 

TTS

The TTS unit is used for converting text to speech.

Method	Function	Input Type	Output Type
setup	Configure TTS unit	tts.setup	None (the returned result includes the successful `work_id`)
inference	Perform inference	Typical: tts.utf-8 (model difference can be checked via `sys.lsmode`)	None (returns only data submission result; final inference result will depend on configuration)
pause	Pause task operation	None	None
work	Resume task operation	None	None
exit	End the work of `work_id`	None	None
taskinfo	Retrieve all task instance information		tts.taskinfo

setup

Initialize the TTS unit and configure for Chinese/English model.

Parameter Description

Parameter	Description	Input Value
model	Conversion model	English model: "single_speaker_english_fast" Chinese model: "single_speaker_fast"
input	Input	LLM input: "llm.xxx" (input work_id of the llm unit) UART input: "tts.utf-8" UART streaming input: "tts.utf-8.stream"
enkws	KWS interruption of process	Interrupt with KWS: true Do not interrupt with KWS: false
enoutput	Enable UART output	Enable: true Disable: false

TTS Input From LLM

Initialize the TTS unit, configure for English text-to-speech model, and set LLM inference results as input.

// Input from LLM
{
    "request_id": "5",
    "work_id": "tts",
    "action": "setup",
    "object": "tts.setup",
    "data": {
        "model": "single_speaker_english_fast", 
        "response_format": "tts.base64.wav",
        "input": "llm.1004",
        "enoutput": true,
        "enkws": true
    }
} 

TTS Input From UART

Initialize the TTS unit and configure for the English text-to-speech model, with input configured for UART command streaming input.

// Input from UART
{
  "request_id": "5",
  "work_id": "tts",
  "action": "setup",
  "object": "tts.setup",
  "data": {
        "model": "single_speaker_english_fast", 
        "response_format": "tts.base64.wav",
        "input": "tts.utf-8.stream",
        "enoutput": true,
        "enkws": true
  }
} 

TTS unit initialization response

{
    "created": 1692668824,
    "error": {
        "code": 0,
        "message": "tts setup successful"
    },
    "request_id": "5",
    "work_id": "tts.1004"
} 

inference

UART inference

Submit TTS conversion data content via UART. Each model only supports one language at a time; to convert a different language, please use exit to release the TTS unit and reinitialize with setup.
Note: Text for conversion must end with a period:
- For English text, use an English period . (half-width symbol)
- For Chinese text, use a Chinese period . (full-width symbol)
- Sentence delimiters should use , (half-width symbol)

// Streaming Input
{
    "request_id": "4", 
    "work_id": "tts.1004",
    "action": "inference",
    "object": "tts.utf-8.stream",
    "data": {
        "delta": "I don't know what your name.",
        "index": 0,
        "finish": true
    }
}

// Non-Streaming Input
{
    "request_id": "4", 
    "work_id": "tts.1004",
    "action": "inference",
    "object": "tts.utf-8",
    "data": "I don't know what your name."
} 

pause

Pause TTS unit command

{
    "request_id": "5",
    "work_id": "tts.1004",
    "action": "pause"
} 

Pause TTS unit command response

{
    "created": 1692668916,
    "error": {
        "code": 0,
        "message": "tts pause"
    },
    "request_id": "5",
    "work_id": "tts.1004"
} 

work

Start TTS unit command

{
    "request_id": "5",
    "work_id": "tts.1004",
    "action": "work"
} 

Start TTS unit command response

{
    "created": 1692668944,
    "error": {
        "code": 0,
        "message": "tts work"
    },
    "request_id": "5",
    "work_id": "tts.1004"
} 

exit

End and release the TTS unit

{
    "request_id": "5",
    "work_id": "tts.1004",
    "action": "exit"
} 

TTS unit release response

{
    "created": 1692669052,
    "error": {
        "code": 0,
        "message": "tts exit"
    },
    "request_id": "5",
    "work_id": "tts.1004"
} 

taskinfo

Query TTS unit status

{
    "request_id": "5",
    "work_id": "tts.1004",
    "action": "taskinfo"
} 

TTS unit running response

{
    "created": 1692668878,
    "data": "running",
    "error": {
        "code": 0,
        "message": ""
    },
    "object": "tts.state",
    "request_id": "5",
    "work_id": "tts.1004"
} 

TTS unit stopped response

{
    "created": 1692668968,
    "data": "stopped",
    "error": {
        "code": 0,
        "message": ""
    },
    "object": "tts.state",
    "request_id": "5",
    "work_id": "tts.1004"
} 

TTS unit release response

{
    "created": 1692669081,
    "data": "deinit",
    "error": {
        "code": 0,
        "message": ""
    },
    "object": "tts.state",
    "request_id": "5",
    "work_id": "tts.1004"
} 

Applications

Text To Speech

Convert text to speech via the TTS unit. (TTS)

1. Initialize Audio unit

{
    "request_id": "1",
    "work_id": "audio",
    "action": "setup",
    "object": "audio.setup",
    "data": {
        "capcard": 0,
        "capdevice": 0,
        "capVolume": 0.5,
        "playcard": 0,
        "playdevice": 1,
        "playVolume": 0.5
    }
} 

Audio unit initialization response

{
    "created": 1692652475,
    "error": {
        "code": 0,
        "message": "audio setup successful"
    },
    "request_id": "1",
    "work_id": "audio.1000"
} 

1. Initialize the TTS unit and configure for English text-to-speech model, with input configured for UART command input.

// Input from UART
{
  "request_id": "5",
  "work_id": "tts",
  "action": "setup",
  "object": "tts.setup",
  "data": {
        "model": "single_speaker_english_fast", 
        "response_format": "tts.base64.wav",
        "input": "tts.utf-8",
        "enoutput": true,
        "enkws": true
  }
} 

TTS unit initialization response

{
    "created": 1692652569,
    "error": {
        "code": 0,
        "message": "tts setup successful"
    },
    "request_id": "5",
    "work_id": "tts.1001"
} 

1. Input text to start TTS conversion.

{
    "request_id": "4", 
    "work_id": "tts.1001",
    "action": "inference",
    "object": "tts.utf-8",
    "data": "Hello My Friend."
} 

Text Assistant

Input content via text to the LLM model, process inference, and play back as speech. (LLM+TTS)

1. Initialize Audio unit

{
    "request_id": "1",
    "work_id": "audio",
    "action": "setup",
    "object": "audio.setup",
    "data": {
        "capcard": 0,
        "capdevice": 0,
        "capVolume": 0.5,
        "playcard": 0,
        "playdevice": 1,
        "playVolume": 0.5
    }
} 

Audio unit initialization response

{
    "created": 1692652330,
    "error": {
        "code": 0,
        "message": "audio setup successful"
    },
    "request_id": "1",
    "work_id": "audio.1000"
} 

1. Initialize the LLM unit and configure UART interface as input data

// Input from UART
{
    "request_id": "4",
    "work_id": "llm",
    "action": "setup",
    "object": "llm.setup",
    "data": {
        "model": "qwen2.5-0.5b",
        "response_format": "llm.utf-8",
        "input": "llm.utf-8",
        "enoutput": true,
        "enkws": true,
        "max_token_len": 127,
        "prompt": "You are a knowledgeable assistant capable of answering various questions and providing information."
    }
} 

LLM unit initialization response

{
    "created": 1692652323,
    "error": {
        "code": 0,
        "message": "llm setup successful"
    },
    "request_id": "4",
    "work_id": "llm.1001"
} 

1. Initialize the TTS unit and configure for English text-to-speech model, with input configured for LLM inference results.

// Input from LLM
{
    "request_id": "5",
    "work_id": "tts",
    "action": "setup",
    "object": "tts.setup",
    "data": {
        "model": "single_speaker_english_fast", 
        "response_format": "tts.base64.wav",
        "input": "llm.1001",
        "enoutput": true,
        "enkws": true
    }
} 

TTS unit initialization response

{
    "created": 1692652354,
    "error": {
        "code": 0,
        "message": "tts setup successful"
    },
    "request_id": "5",
    "work_id": "tts.1002"
} 

1. Submit inference data via UART

// Non-Streaming Input
{
    "request_id": "4",
    "work_id": "llm.1001",
    "action": "inference",
    "object": "llm.utf-8",
    "data": "What's ur name?"
} 

1. Inference response data, with audio output.

{
    "created": 1692652407,
    "data": "I'm not a person, but I'm here to help with any questions you may have. How can I assist you today?\n",
    "error": {
        "code": 0,
        "message": ""
    },
    "object": "llm.utf-8",
    "request_id": "4",
    "work_id": "llm.1001"
} 

Voice Assistant

Use KWS for activation -> trigger ASR for speech-to-text -> use converted content as LLM input for inference -> finally output the inference result as speech via TTS. (KWS+ASR+LLM+TTS)

1. Initialize Audio unit

{
    "request_id": "1",
    "work_id": "audio",
    "action": "setup",
    "object": "audio.setup",
    "data": {
        "capcard": 0,
        "capdevice": 0,
        "capVolume": 0.5,
        "playcard": 0,
        "playdevice": 1,
        "playVolume": 0.5
    }
} 

Audio unit initialization response

{
    "created": 1692652330,
    "error": {
        "code": 0,
        "message": "audio setup successful"
    },
    "request_id": "1",
    "work_id": "audio.1000"
} 

1. Initialize KWS unit and configure for English recognition model with wake word "HELLO."

{
    "request_id": "2",
    "work_id": "kws",
    "action": "setup",
    "object": "kws.setup",
    "data": {
        "model": "sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01",
        "response_format": "kws.bool",
        "input": "sys.pcm",
        "enoutput": true,
        "kws": "HELLO"
    }
} 

KWS initialization response (Note: the setup process takes approximately 9 seconds)

{
    "created": 1692652559,
    "error": {
        "code": 0,
        "message": "kws setup successful"
    },
    "request_id": "2",
    "work_id": "kws.1001"
} 

1. Initialize the ASR unit, configure for English speech recognition model, and set KWS to trigger ASR.

{
    "request_id": "3",
    "work_id": "asr",
    "action": "setup",
    "object": "asr.setup",
    "data": {
        "model": "sherpa-ncnn-streaming-zipformer-20M-2023-02-17",
        "response_format": "asr.utf-8",
        "input": "sys.pcm",
        "enoutput": true,
        "enkws": true,
        "rule1": 2.4,
        "rule2": 1.2,
        "rule3": 30
    }
} 

ASR initialization response

{
    "created": 1692652705,
    "error": {
        "code": 0,
        "message": "asr setup successful"
    },
    "request_id": "3",
    "work_id": "asr.1002"
} 

1. Initialize the LLM unit and configure ASR (speech-to-text) as input data

// Input from ASR
{
    "request_id": "4",
    "work_id": "llm",
    "action": "setup",
    "object": "llm.setup",
    "data": {
        "model": "qwen2.5-0.5b",
        "response_format": "llm.utf-8.stream",
        "input": "asr.1002",
        "enoutput": true,
        "enkws": true,
        "max_token_len": 127,
        "prompt": "You are a knowledgeable assistant capable of answering various questions and providing information."
    }
} 

LLM initialization response

{
    "created": 1692653061,
    "error": {
        "code": 0,
        "message": "llm setup successful"
    },
    "request_id": "4",
    "work_id": "llm.1003"
} 

1. Initialize the TTS unit, configure for English text-to-speech model, and set LLM inference results as input.

// Input from LLM
{
    "request_id": "5",
    "work_id": "tts",
    "action": "setup",
    "object": "tts.setup",
    "data": {
        "model": "single_speaker_english_fast", 
        "response_format": "tts.base64.wav",
        "input": "llm.1003",
        "enoutput": true,
        "enkws": true
    }
} 

TTS unit initialization response

{
    "created": 1692653109,
    "error": {
        "code": 0,
        "message": "tts setup successful"
    },
    "request_id": "5",
    "work_id": "tts.1004"
} 

1. Wake up using the keyword "HELLO," then proceed with voice interaction.

Next Overview

Overview

Devices & Quick Start

Module LLM

LLM630 Compute Kit

Models

Qwen2.5

Qwen3

DeepSeek-R1

SmolVLM

MeloTTS

Whisper

Llama

Applications

CV Vision Application

Vision Language Model (VLM)

Large Language Model (LLM)

OpenAI API

Module LLM API

Table of Contents

Overview

Built-in Functional Units

Usage Process

Communication Interface

Data Packet Format

Basic Structure of Sending Frame

Basic Structure of Response Frame

Streaming Data Sending Frame Structure

Streaming Data Response Frame Structure

Error Codes

SYS

lsmode

hwinfo

reset

reboot

ping

AUDIO

setup

Parameter Description

pause

work

exit

taskinfo

KWS

setup

Parameter Description

KWS Setup

pause

work

exit

taskinfo

ASR

setup

Parameter Description

ASR Setup

pause

work

exit

taskinfo

LLM

setup

Parameter Description

LLM Input From ASR

LLM Input From UART

inference

UART inference

pause

work

exit

taskinfo

TTS

setup

Parameter Description

TTS Input From LLM

TTS Input From UART

inference

UART inference

pause

work

exit

taskinfo