Version | Update Date | Remarks |
---|---|---|
v1.0.0 | 2024.10.24 | / |
The Module LLM integrates functional units such as KWS (Keyword Spotting), ASR (Speech Recognition), LLM (Large Language Model), and TTS (Text-to-Speech). Each unit can operate independently as a standalone module or support configuration for data workflow integration, enabling more intelligent interactive applications. The module supports interaction with a host device via UART
communication, and it uses JSON
-formatted data packets, making it very easy to use.
Unit | Unit Name | Unit Capability |
---|---|---|
sys | System | Set module parameters, retrieve module status |
kws | Keyword Detection | Detect the presence of keywords in audio |
asr | Speech-to-Text | Convert audio to text |
llm | Generative Model | Generate new text based on input text |
tts | Text-to-Speech | Convert text to audio |
audio | System Audio Interface | Access microphone audio and playback audio |
UART
interface in the program (pin configuration based on the actual connected device, interface configuration as 115200bps 8N1
).115200bps 8N1
.{
"request_id": "001",
"work_id": "llm.1001",
"action": "taskinfo",
"object": "None",
"data":"None"
}
request_id
: work_id
:action
:object
: data
. Refer to the parameter structure list for all parameter structures. If there are no parameters, this can be omitted.data
:{
"request_id": "002",
"work_id": "kws.1002",
"created": 30952,
"object": "None",
"data":"None",
"error":{"code":0, "message":""}
}
created
:error
: {
"request_id": "4",
"work_id": "llm.1003",
"action": "inference",
"object": "llm.utf-8.stream",
"data": {
"delta": "What's ur name?",
"index": 0,
"finish": true
}
}
{
"created": 1692664605,
"data": {
"delta": "I'm not a person, but I'm here to help with any questions you may have. How can I assist you today?\n",
"finish": true,
"index": 0
},
"error": {
"code": 0,
"message": ""
},
"object": "llm.utf-8.stream",
"request_id": "4",
"work_id": "llm.1003"
}
index
:delta
:finish
:Error codes are included in the error
field of the response to determine the result of the response:
Error Code | Description | Message | Notes |
---|---|---|---|
0 | Operation Successful! | Operation Successful! | |
-1 | Communication channel receive state machine reset warning! | reace reset | Continuously sending “}” will trigger this error, used to reset the JSON receive state machine. |
-2 | JSON parsing error | JSON format error | |
-3 | sys action match error | action match false | |
-4 | Inference data push error | inference data push false | |
-5 | Model loading failed | Model loading failed. | |
-6 | Unit does not exist | Unit Does Not Exist | |
-7 | Unknown operation | Unknown Operation | |
-8 | Unit resource allocation failed | Unit Resource Allocation Failed | |
-9 | Unit call failed | unit call false | |
-10 | Model initialization failed | Model init failed. | |
-11 | Model run error | Model run failed. | |
-12 | Module not initialized | Module has not been initialised. | |
-13 | Module is already working | Module already working. | |
-14 | Module is not working | Module is not working. | |
-19 | Unit resource release failed | Unit Resource Free Failed |
The SYS unit is used to set module working parameters and retrieve module operation information.
Method | Function | Input Type | Output Type |
---|---|---|---|
lsmode | Retrieve available models | None | sys.lsmode |
hwinfo | Retrieve CPU load, memory load, chip temperature | None | sys.hwinfo |
reset | Reset the unit | None | Returns reset completion JSON |
reboot | Reboot the system | None | None |
ping | Check if the system is available | None | None |
{
"request_id": "001",
"work_id": "sys",
"action": "lsmode"
}
{
"created": 1692652687,
"data": [
{
"capabilities": [
"Automatic_Speech_Recognition"
],
"input_type": [
"sys.pcm"
],
"model": "sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23",
"output_type": [
"asr.utf-8"
],
"type": "asr"
},
{
"capabilities": [
"Automatic_Speech_Recognition"
],
"input_type": [
"sys.pcm"
],
"model": "sherpa-ncnn-streaming-zipformer-20M-2023-02-17",
"output_type": [
"asr.utf-8"
],
"type": "asr"
},
{
"capabilities": [
"Keyword_spotting"
],
"input_type": [
"sys.pcm"
],
"model": "sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01",
"output_type": [
"kws.bool"
],
"type": "kws"
},
{
"capabilities": [
"Keyword_spotting"
],
"input_type": [
"sys.pcm"
],
"model": "sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01",
"output_type": [
"kws.bool"
],
"type": "kws"
},
{
"capabilities": [
"text_generation",
"chat"
],
"input_type": "utf-8",
"model": "qwen2.5-0.5b",
"output_type": "utf-8",
"type": "llm"
},
{
"capabilities": [
"Text_to_speech"
],
"input_type": [
"sys.utf-8",
"llm.utf-8"
],
"model": "single_speaker_fast",
"output_type": [
"tts.wav"
],
"type": "tts"
},
{
"capabilities": [
"Text_to_speech"
],
"input_type": [
"sys.utf-8",
"llm.utf-8"
],
"model": "single_speaker_english_fast",
"output_type": [
"tts.wav"
],
"type": "tts"
}
],
"error": {
"code": 0,
"message": ""
},
"object": "sys.lsmode",
"request_id": "001",
"work_id": "sys"
}
{
"request_id": "001",
"work_id": "sys",
"action": "hwinfo"
}
{
"created": 1692652642,
"data": {
"cpu_loadavg": 0,
"mem": 18,
"temperature": 46350
},
"error": {
"code": 0,
"message": ""
},
"object": "sys.hwinfo",
"request_id": "001",
"work_id": "sys"
}
{
"request_id": "001",
"work_id": "sys",
"action": "reset"
}
{
"created": 1692652712,
"error": {
"code": 0,
"message": "llm server restarting ..."
},
"request_id": "001",
"work_id": "sys"
}
{
"request_id": "0",
"work_id": "sys",
"created": 1692652723,
"error": {
"code": 0,
"message": "reset over"
}
}
{
"request_id": "001",
"work_id": "sys",
"action": "reboot"
}
{
"created": 1692652822,
"error": {
"code": 0,
"message": "rebooting ..."
},
"request_id": "001",
"work_id": "sys"
}
V0EUEURS
will be sent, which is the system startup string and can be ignored.{
"request_id": "001",
"work_id": "sys",
"action": "ping"
}
{
"created": 1692652310,
"error": {
"code": 0,
"message": ""
},
"request_id": "001",
"work_id": "sys"
}
The AUDIO unit is used to control the system sound card, access microphone audio, and playback sound. It provides system audio input and output, supplying audio input for the Keyword Spotting (KWS) and Automatic Speech Recognition (ASR) units and audio output for the Text-to-Speech (TTS) module. The AUDIO unit must be initialized before using the KWS
and ASR
units.
Method | Function | Input Type | Output Type |
---|---|---|---|
setup | Configure audio unit | audio.setup | None (the returned result includes the successful work_id ) |
exit | End the work of work_id | None | None |
pause | Pause task operation | None | None |
work | Resume task operation | None | None |
taskinfo | Retrieve all task instance information | audio.taskinfo |
Parameter | Description | Input Value |
---|---|---|
capcard | Microphone sound card index | Default system sound card: 0 |
capdevice | Microphone device index | Onboard silicon mic: 0 |
capVolume | Input volume | 0.0~10.0 (volume > 1 will amplify, default is 0.5) |
playcard | Speaker sound card index | Default system sound card: 0 |
playdevice | Speaker device index | Onboard speaker: 1 |
playVolume | Output volume | 0.0~10.0 (volume > 1 will amplify, default is 0.5) |
{
"request_id": "1",
"work_id": "audio",
"action": "setup",
"object": "audio.setup",
"data": {
"capcard": 0,
"capdevice": 0,
"capVolume": 0.5,
"playcard": 0,
"playdevice": 1,
"playVolume": 0.5
}
}
{
"created": 1692659008,
"error": {
"code": 0,
"message": "audio setup successful"
},
"request_id": "1",
"work_id": "audio.1000"
}
{
"request_id": "1",
"work_id": "audio.1000",
"action": "pause"
}
{
"created": 1692659049,
"error": {
"code": 0,
"message": "audio pause"
},
"request_id": "1",
"work_id": "audio.1000"
}
{
"request_id": "1",
"work_id": "audio.1000",
"action": "work",
"object": "audio.setup",
"data": {
"capcard": 0,
"capdevice": 0,
"capVolume": 0.5,
"playcard": 0,
"playdevice": 1,
"playVolume": 0.25
}
}
{
"created": 1692659297,
"error": {
"code": 0,
"message": "audio work start"
},
"request_id": "1",
"work_id": "audio.1000"
}
{
"request_id": "1",
"work_id": "audio.1000",
"action": "exit"
}
{
"created": 1692659370,
"error": {
"code": 0,
"message": "audio exit"
},
"request_id": "1",
"work_id": "audio.1000"
}
// Sending data
{
"request_id": "1",
"work_id": "audio.1000",
"action": "taskinfo"
}
{
"created": 1692659454,
"data": "running",
"error": {
"code": 0,
"message": ""
},
"object": "audio.state",
"request_id": "1",
"work_id": "audio.1000"
}
{
"created": 1692659499,
"data": "stopped",
"error": {
"code": 0,
"message": ""
},
"object": "audio.state",
"request_id": "1",
"work_id": "audio.1000"
}
{
"created": 1692659403,
"data": "deinit",
"error": {
"code": 0,
"message": ""
},
"object": "audio.state",
"request_id": "1",
"work_id": "audio.1000"
}
The KWS unit is used for keyword detection.
Method | Function | Input Type | Output Type |
---|---|---|---|
setup | Configure KWS unit | kws.setup | None (the returned result includes the successful work_id ) |
pause | Pause task operation | None | None |
work | Resume task operation | None | None |
exit | End the work of work_id | None | None |
taskinfo | Retrieve all task instance information | kws.taskinfo |
Parameter | Description | Input Value |
---|---|---|
model | Conversion model | English model: "sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01" Chinese model: "sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01" |
kws | KWS keyword text setup | Mixing Chinese/English is not allowed, English should be in all uppercase |
enoutput | Enable UART output | Enable: true Disable: false |
{
"request_id": "2",
"work_id": "kws",
"action": "setup",
"object": "kws.setup",
"data": {
"model": "sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01",
"response_format": "kws.bool",
"input": "sys.pcm",
"enoutput": true,
"kws": "HELLO"
}
}
{
"created": 1692660576,
"error": {
"code": 0,
"message": "kws setup successful"
},
"request_id": "2",
"work_id": "kws.1001"
}
{
"created": 1692660576,
"error": {
"code": 0,
"message": "kws setup successful"
},
"request_id": "2",
"work_id": "kws.1001"
}
{
"request_id": "2",
"work_id": "kws.1001",
"action": "pause"
}
{
"created": 1692660626,
"error": {
"code": 0,
"message": "kws pause"
},
"request_id": "2",
"work_id": "kws.1001"
}
{
"request_id": "2",
"work_id": "kws.1001",
"action": "work"
}
{
"created": 1692660651,
"error": {
"code": 0,
"message": "kws work"
},
"request_id": "2",
"work_id": "kws.1001"
}
{
"request_id": "2",
"work_id": "kws.1001",
"action": "exit"
}
{
"created": 1692654383,
"error": {
"code": 0,
"message": "kws exit"
},
"request_id": "2",
"work_id": "kws.1001"
}
{
"created": 1692654383,
"error": {
"code": 0,
"message": "kws exit"
},
"request_id": "2",
"work_id": "kws.1001"
}
{
"created": 1692654305,
"error": {
"code": 0,
"message": ""
},
"object": "kws.state",
"data": "running",
"request_id": "2",
"work_id": "kws.1001"
}
{
"created": 1692654535,
"error": {
"code": 0,
"message": ""
},
"object": "kws.state",
"data": "stopped",
"request_id": "2",
"work_id": "kws.1001"
}
{
"created": 1692654452,
"error": {
"code": 0,
"message": ""
},
"object": "kws.state",
"data": "deinit",
"request_id": "2",
"work_id": "kws.0"
}
The ASR unit is used for converting speech to text.
Method | Function | Input Type | Output Type |
---|---|---|---|
setup | Configure ASR unit | asr.setup | None (the returned result includes the successful work_id ) |
pause | Pause task operation | None | None |
work | Resume task operation | None | None |
exit | End the work of work_id | None | None |
taskinfo | Retrieve all task instance information | asr.taskinfo |
Parameter | Description | Input Value |
---|---|---|
model | Conversion model | English model: "sherpa-ncnn-streaming-zipformer-20M-2023-02-17" Chinese model: "sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23" |
response_format | Output format | Standard output: "asr.utf-8" Streaming output: "asr.utf-8.stream" |
input | Input | LLM input: "llm.xxx" (input work_id of the llm unit) UART input: "tts.utf-8" UART streaming input: "tts.utf-8.stream" |
enkws | Enable KWS-based activation | Activation via KWS, followed by ASR: true No KWS activation, ASR will operate continuously: false |
rule1 | Timeout from activation to unrecognized content | Unit: seconds |
rule2 | Maximum interval time for recognition | Unit: seconds |
rule3 | Maximum recognition timeout | Unit: seconds |
enoutput | Enable UART output | Enable: true Disable: false |
{
"request_id": "3",
"work_id": "asr",
"action": "setup",
"object": "asr.setup",
"data": {
"model": "sherpa-ncnn-streaming-zipformer-20M-2023-02-17",
"response_format": "asr.utf-8",
"input": "sys.pcm",
"enoutput": true,
"enkws": true,
"rule1": 2.4,
"rule2": 1.2,
"rule3": 30
}
}
{
"created": 1692667736,
"error": {
"code": 0,
"message": "asr setup successful"
},
"request_id": "3",
"work_id": "asr.1002"
}
{
"created": 1692655176,
"data": {
"delta": " hello",
"index": "0"
},
"object": "asr.stream",
"request_id": "004",
"work_id": "asr.1003"
}
{
"request_id": "3",
"work_id": "asr.1002",
"action": "pause"
}
{
"created": 1692670174,
"error": {
"code": 0,
"message": "asr pause"
},
"request_id": "3",
"work_id": "asr.1002"
}
{
"request_id": "3",
"work_id": "asr.1002",
"action": "work"
}
{
"created": 1692670213,
"error": {
"code": 0,
"message": "asr work"
},
"request_id": "3",
"work_id": "asr.1002"
}
{
"request_id": "3",
"work_id": "asr.1002",
"action": "exit"
}
{
"created": 1692670254,
"error": {
"code": 0,
"message": "asr exit"
},
"request_id": "3",
"work_id": "asr.1002"
}
{
"request_id": "3",
"work_id": "asr.1002",
"action": "taskinfo"
}
{
"created": 1692669923,
"data": "running",
"error": {
"code": 0,
"message": ""
},
"object": "asr.state",
"request_id": "3",
"work_id": "asr.1002"
}
{
"created": 1692653792,
"data": "stopped",
"error": {
"code": 0,
"message": ""
},
"object": "asr.state",
"request_id": "3",
"work_id": "asr.1002"
}
{
"created": 1692669874,
"data": "deinit",
"error": {
"code": 0,
"message": ""
},
"object": "asr.state",
"request_id": "3",
"work_id": "asr.0"
}
The LLM (Large Language Model) unit can generate responses based on input text.
Method | Function | Input Type | Output Type |
---|---|---|---|
setup | Configure LLM unit | llm.setup | None (the returned result includes the successful work_id ) |
inference | Perform inference | Typical: llm.utf-8 (model difference can be checked via sys.lsmode ) | None (returns only data submission result; final inference result will depend on configuration) |
pause | Pause task operation | None | None |
work | Resume task operation | None | None |
exit | End the work of work_id | None | None |
taskinfo | Retrieve all task instance information | llm.taskinfo |
qwen2.5-0.5b
Parameter | Description | Input Value |
---|---|---|
model | Conversion model | Pre-installed model "qwen2.5-0.5b" |
response_format | Output format | Standard output: "llm.utf-8" Streaming output: "llm.utf-8.stream" |
input | Input | ASR input: "asr.xxx" (input work_id of the ASR unit) UART input: "llm.utf-8" UART streaming input: "llm.utf-8.stream" |
enkws | KWS interruption of ongoing process | Interrupt with KWS: true Do not interrupt with KWS: false |
max_length | Configure max output token length | Maximum: 1024, recommended: 127 |
prompt | Model initialization prompt | |
enoutput | Enable UART output | Enable: true Disable: false |
// Input from ASR
{
"request_id": "4",
"work_id": "llm",
"action": "setup",
"object": "llm.setup",
"data": {
"model": "qwen2.5-0.5b",
"response_format": "llm.utf-8.stream",
"input": "asr.1001",
"enoutput": true,
"enkws": true,
"max_token_len": 127,
"prompt": "You are a knowledgeable assistant capable of answering various questions and providing information."
}
}
// Input from UART
{
"request_id": "4",
"work_id": "llm",
"action": "setup",
"object": "llm.setup",
"data": {
"model": "qwen2.5-0.5b",
"response_format": "llm.utf-8",
"input": "llm.utf-8.stream",
"enoutput": true,
"enkws": true,
"max_token_len": 127,
"prompt": "You are a knowledgeable assistant capable of answering various questions and providing information."
}
}
{
"created": 1692664107,
"data": "None",
"error": {
"code": 0,
"message": "llm setup successful"
},
"object": "None",
"request_id": "4",
"work_id": "llm.1003"
}
// Streaming Input
{
"request_id": "4",
"work_id": "llm.1003",
"action": "inference",
"object": "llm.utf-8.stream",
"data": {
"delta": "What's ur name?",
"index": 0,
"finish": true
}
}
// Non-Streaming Input
{
"request_id": "4",
"work_id": "llm.1003",
"action": "inference",
"object": "llm.utf-8",
"data": "What's ur name?"
}
{
"created": 1692664605,
"data": {
"delta": "I'm not a person, but I'm here to help with any questions you may have. How can I assist you today?\n",
"finish": true,
"index": 0
},
"error": {
"code": 0,
"message": ""
},
"object": "llm.utf-8.stream",
"request_id": "4",
"work_id": "llm.1003"
}
{
"request_id": "4",
"work_id": "llm.1003",
"action": "pause"
}
{
"created": 1692664941,
"error": {
"code": 0,
"message": "llm pause"
},
"request_id": "4",
"work_id": "llm.1003"
}
{
"request_id": "4",
"work_id": "llm.1003",
"action": "work"
}
{
"created": 1692664972,
"error": {
"code": 0,
"message": "llm work"
},
"request_id": "4",
"work_id": "llm.1003"
}
{
"request_id": "4",
"work_id": "llm.1003",
"action": "exit"
}
{
"created": 1692664858,
"data": "None",
"error": {
"code": 0,
"message": "llm exit"
},
"object": "None",
"request_id": "4",
"work_id": "llm.1003"
}
{
"request_id": "4",
"work_id": "llm.1003",
"action": "taskinfo"
}
{
"created": 1692664730,
"data": "running",
"error": {
"code": 0,
"message": ""
},
"object": "llm.state",
"request_id": "4",
"work_id": "llm.1003"
}
{
"created": 1692664823,
"data": "stopped",
"error": {
"code": 0,
"message": ""
},
"object": "llm.state",
"request_id": "4",
"work_id": "llm.1003"
}
{
"created": 1692664881,
"data": "deinit",
"error": {
"code": 0,
"message": ""
},
"object": "llm.state",
"request_id": "4",
"work_id": "llm.1003"
}
The TTS unit is used for converting text to speech.
Method | Function | Input Type | Output Type |
---|---|---|---|
setup | Configure TTS unit | tts.setup | None (the returned result includes the successful work_id ) |
inference | Perform inference | Typical: tts.utf-8 (model difference can be checked via sys.lsmode ) | None (returns only data submission result; final inference result will depend on configuration) |
pause | Pause task operation | None | None |
work | Resume task operation | None | None |
exit | End the work of work_id | None | None |
taskinfo | Retrieve all task instance information | tts.taskinfo |
Parameter | Description | Input Value |
---|---|---|
model | Conversion model | English model: "single_speaker_english_fast" Chinese model: "single_speaker_fast" |
input | Input | LLM input: "llm.xxx" (input work_id of the llm unit) UART input: "tts.utf-8" UART streaming input: "tts.utf-8.stream" |
enkws | KWS interruption of process | Interrupt with KWS: true Do not interrupt with KWS: false |
enoutput | Enable UART output | Enable: true Disable: false |
// Input from LLM
{
"request_id": "5",
"work_id": "tts",
"action": "setup",
"object": "tts.setup",
"data": {
"model": "single_speaker_english_fast",
"response_format": "tts.base64.wav",
"input": "llm.1004",
"enoutput": true,
"enkws": true
}
}
// Input from UART
{
"request_id": "5",
"work_id": "tts",
"action": "setup",
"object": "tts.setup",
"data": {
"model": "single_speaker_english_fast",
"response_format": "tts.base64.wav",
"input": "tts.utf-8.stream",
"enoutput": true,
"enkws": true
}
}
{
"created": 1692668824,
"error": {
"code": 0,
"message": "tts setup successful"
},
"request_id": "5",
"work_id": "tts.1004"
}
Submit TTS conversion data content via UART. Each model only supports one language at a time; to convert a different language, please use exit
to release the TTS unit and reinitialize with setup
.
Note: Text for conversion must end with a period:
.
(half-width symbol).
(full-width symbol),
(half-width symbol)// Streaming Input
{
"request_id": "4",
"work_id": "tts.1004",
"action": "inference",
"object": "tts.utf-8.stream",
"data": {
"delta": "I don't know what your name.",
"index": 0,
"finish": true
}
}
// Non-Streaming Input
{
"request_id": "4",
"work_id": "tts.1004",
"action": "inference",
"object": "tts.utf-8",
"data": "I don't know what your name."
}
{
"request_id": "5",
"work_id": "tts.1004",
"action": "pause"
}
{
"created": 1692668916,
"error": {
"code": 0,
"message": "tts pause"
},
"request_id": "5",
"work_id": "tts.1004"
}
{
"request_id": "5",
"work_id": "tts.1004",
"action": "work"
}
{
"created": 1692668944,
"error": {
"code": 0,
"message": "tts work"
},
"request_id": "5",
"work_id": "tts.1004"
}
{
"request_id": "5",
"work_id": "tts.1004",
"action": "exit"
}
{
"created": 1692669052,
"error": {
"code": 0,
"message": "tts exit"
},
"request_id": "5",
"work_id": "tts.1004"
}
{
"request_id": "5",
"work_id": "tts.1004",
"action": "taskinfo"
}
{
"created": 1692668878,
"data": "running",
"error": {
"code": 0,
"message": ""
},
"object": "tts.state",
"request_id": "5",
"work_id": "tts.1004"
}
{
"created": 1692668968,
"data": "stopped",
"error": {
"code": 0,
"message": ""
},
"object": "tts.state",
"request_id": "5",
"work_id": "tts.1004"
}
{
"created": 1692669081,
"data": "deinit",
"error": {
"code": 0,
"message": ""
},
"object": "tts.state",
"request_id": "5",
"work_id": "tts.1004"
}
Convert text to speech via the TTS unit. (TTS)
{
"request_id": "1",
"work_id": "audio",
"action": "setup",
"object": "audio.setup",
"data": {
"capcard": 0,
"capdevice": 0,
"capVolume": 0.5,
"playcard": 0,
"playdevice": 1,
"playVolume": 0.5
}
}
{
"created": 1692652475,
"error": {
"code": 0,
"message": "audio setup successful"
},
"request_id": "1",
"work_id": "audio.1000"
}
// Input from UART
{
"request_id": "5",
"work_id": "tts",
"action": "setup",
"object": "tts.setup",
"data": {
"model": "single_speaker_english_fast",
"response_format": "tts.base64.wav",
"input": "tts.utf-8",
"enoutput": true,
"enkws": true
}
}
{
"created": 1692652569,
"error": {
"code": 0,
"message": "tts setup successful"
},
"request_id": "5",
"work_id": "tts.1001"
}
{
"request_id": "4",
"work_id": "tts.1001",
"action": "inference",
"object": "tts.utf-8",
"data": "Hello My Friend."
}
Input content via text to the LLM model, process inference, and play back as speech. (LLM+TTS)
{
"request_id": "1",
"work_id": "audio",
"action": "setup",
"object": "audio.setup",
"data": {
"capcard": 0,
"capdevice": 0,
"capVolume": 0.5,
"playcard": 0,
"playdevice": 1,
"playVolume": 0.5
}
}
{
"created": 1692652330,
"error": {
"code": 0,
"message": "audio setup successful"
},
"request_id": "1",
"work_id": "audio.1000"
}
// Input from UART
{
"request_id": "4",
"work_id": "llm",
"action": "setup",
"object": "llm.setup",
"data": {
"model": "qwen2.5-0.5b",
"response_format": "llm.utf-8",
"input": "llm.utf-8",
"enoutput": true,
"enkws": true,
"max_token_len": 127,
"prompt": "You are a knowledgeable assistant capable of answering various questions and providing information."
}
}
{
"created": 1692652323,
"error": {
"code": 0,
"message": "llm setup successful"
},
"request_id": "4",
"work_id": "llm.1001"
}
// Input from LLM
{
"request_id": "5",
"work_id": "tts",
"action": "setup",
"object": "tts.setup",
"data": {
"model": "single_speaker_english_fast",
"response_format": "tts.base64.wav",
"input": "llm.1001",
"enoutput": true,
"enkws": true
}
}
{
"created": 1692652354,
"error": {
"code": 0,
"message": "tts setup successful"
},
"request_id": "5",
"work_id": "tts.1002"
}
// Non-Streaming Input
{
"request_id": "4",
"work_id": "llm.1001",
"action": "inference",
"object": "llm.utf-8",
"data": "What's ur name?"
}
{
"created": 1692652407,
"data": "I'm not a person, but I'm here to help with any questions you may have. How can I assist you today?\n",
"error": {
"code": 0,
"message": ""
},
"object": "llm.utf-8",
"request_id": "4",
"work_id": "llm.1001"
}
Use KWS for activation -> trigger ASR for speech-to-text -> use converted content as LLM input for inference -> finally output the inference result as speech via TTS. (KWS+ASR+LLM+TTS)
{
"request_id": "1",
"work_id": "audio",
"action": "setup",
"object": "audio.setup",
"data": {
"capcard": 0,
"capdevice": 0,
"capVolume": 0.5,
"playcard": 0,
"playdevice": 1,
"playVolume": 0.5
}
}
{
"created": 1692652330,
"error": {
"code": 0,
"message": "audio setup successful"
},
"request_id": "1",
"work_id": "audio.1000"
}
{
"request_id": "2",
"work_id": "kws",
"action": "setup",
"object": "kws.setup",
"data": {
"model": "sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01",
"response_format": "kws.bool",
"input": "sys.pcm",
"enoutput": true,
"kws": "HELLO"
}
}
{
"created": 1692652559,
"error": {
"code": 0,
"message": "kws setup successful"
},
"request_id": "2",
"work_id": "kws.1001"
}
{
"request_id": "3",
"work_id": "asr",
"action": "setup",
"object": "asr.setup",
"data": {
"model": "sherpa-ncnn-streaming-zipformer-20M-2023-02-17",
"response_format": "asr.utf-8",
"input": "sys.pcm",
"enoutput": true,
"enkws": true,
"rule1": 2.4,
"rule2": 1.2,
"rule3": 30
}
}
{
"created": 1692652705,
"error": {
"code": 0,
"message": "asr setup successful"
},
"request_id": "3",
"work_id": "asr.1002"
}
// Input from ASR
{
"request_id": "4",
"work_id": "llm",
"action": "setup",
"object": "llm.setup",
"data": {
"model": "qwen2.5-0.5b",
"response_format": "llm.utf-8.stream",
"input": "asr.1002",
"enoutput": true,
"enkws": true,
"max_token_len": 127,
"prompt": "You are a knowledgeable assistant capable of answering various questions and providing information."
}
}
{
"created": 1692653061,
"error": {
"code": 0,
"message": "llm setup successful"
},
"request_id": "4",
"work_id": "llm.1003"
}
// Input from LLM
{
"request_id": "5",
"work_id": "tts",
"action": "setup",
"object": "tts.setup",
"data": {
"model": "single_speaker_english_fast",
"response_format": "tts.base64.wav",
"input": "llm.1003",
"enoutput": true,
"enkws": true
}
}
{
"created": 1692653109,
"error": {
"code": 0,
"message": "tts setup successful"
},
"request_id": "5",
"work_id": "tts.1004"
}