Speech-to-Text

Implement input speech conversion to output text via API interface.

Preparation

Before running the example program, the corresponding model package must be installed on the device. Refer to Model List for the model package installation tutorial.

Before running this example program, please ensure the following preparations have been completed on the LLM device:

Install the llm-model-whisper-tiny model package using the apt package management tool.

apt install llm-model-whisper-tiny

Install the ffmpeg tool.

apt install ffmpeg

After installation, restart the OpenAI service to make the new model take effect.

systemctl restart llm-openai-api

Example

On the PC side, use the OpenAI API to pass in an audio file to implement speech-to-text conversion. Before running the example program, modify the IP part of the base_url below to the actual IP address of the device.

from openai import OpenAI
client = OpenAI(
    api_key="sk-",
    base_url="http://192.168.20.186:8000/v1"
)

audio_file = open("speech.mp3", "rb")
transcript = client.audio.transcriptions.create(
  model="whisper-tiny",
  language="en",
  file=audio_file
)

print(transcript) 

Request Parameters

Parameter Name	Type	Required	Example Value	Description
file	file	yes	–	The audio file object to be transcribed (not the filename), supported formats include flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
model	string	yes	whisper-base	The speech recognition model ID to use. Options include: `whisper-tiny`, `whisper-base`, `whisper-small`
language	string	yes	en	The language of the input audio, using ISO-639-1 encoding (e.g., `en`). Improves recognition accuracy and speed
response_format	string	no	json	The return format. Currently only `json` is supported. Default is `json`

Response Example

Transcription(text=' Thank you. Thank you everybody. All right everybody go ahead and have a seat. How\'s everybody doing today? .....', 
logprobs=None, task='transcribe', language='en', duration=334.234, segments=12, sample_rate=16000, channels=1, bit_depth=16) 

Next Overview

Overview

Linux PC

CM4Stack

CoreMP135

Industrial Control

StamPLC

LLM

Real-Time AI Voice Assistant

OpenAI Voice Assistant

XiaoZhi Voice Assistant

AtomS3R-M12 Volcengine Kit

Offline Voice Recognition

Unit ASR

Home Assistant

Zigbee

Module Gateway H2

Unit Gateway H2

Thread

Module Gateway H2

Unit Gateway H2

IoT Measuring Instruments

VAMeter

T-Lite

IoT Cloud

AWS IoT Core

Ezdata

Ethernet Camera

PoECAM

Wi-Fi Camera

TimerCAM

Unit CamS3

AI Camera

UnitV2

M5StickV/UnitV

LoRa & LoRaWAN

TTN (The Things Network)

Motor Control

Unit Roller485/CAN

Develop Tools

Network

Hobby Kit

Restore Factory Firmware

DIP Switch Usage Guide

Module GPS v2.0

Module GNSS

Module ExtPort For Core2

Module LoRa868 V1.2

Speech-to-Text

Preparation

Example

Request Parameters

Response Example

On This Page