pdf-icon

Product Guide

Industrial Control

Real-Time AI Voice Assistant

AtomS3R-M12 Volcengine Kit

Offline Voice Recognition

Thread

Module Gateway H2

IoT Measuring Instruments

IoT Cloud

Ethernet Camera

LoRa & LoRaWAN

DIP Switch Usage Guide

Module GPS v2.0

Module GNSS

Module ExtPort For Core2

Speech-to-Text

Implement input speech conversion to output text via API interface.

Preparation

Before running the example program, the corresponding model package must be installed on the device. Refer to Model List for the model package installation tutorial.

Before running this example program, please ensure the following preparations have been completed on the LLM device:

  1. Install the llm-model-whisper-tiny model package using the apt package management tool.
apt install llm-model-whisper-tiny
  1. Install the ffmpeg tool.
apt install ffmpeg
  1. After installation, restart the OpenAI service to make the new model take effect.
systemctl restart llm-openai-api

Example

On the PC side, use the OpenAI API to pass in an audio file to implement speech-to-text conversion. Before running the example program, modify the IP part of the base_url below to the actual IP address of the device.

from openai import OpenAI
client = OpenAI(
    api_key="sk-",
    base_url="http://192.168.20.186:8000/v1"
)

audio_file = open("speech.mp3", "rb")
transcript = client.audio.transcriptions.create(
  model="whisper-tiny",
  language="en",
  file=audio_file
)

print(transcript)

Request Parameters

Parameter Name Type Required Example Value Description
file file yes The audio file object to be transcribed (not the filename), supported formats include flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
model string yes whisper-base The speech recognition model ID to use. Options include: whisper-tiny, whisper-base, whisper-small
language string yes en The language of the input audio, using ISO-639-1 encoding (e.g., en). Improves recognition accuracy and speed
response_format string no json The return format. Currently only json is supported. Default is json

Response Example

Transcription(text=' Thank you. Thank you everybody. All right everybody go ahead and have a seat. How\'s everybody doing today? .....', 
logprobs=None, task='transcribe', language='en', duration=334.234, segments=12, sample_rate=16000, channels=1, bit_depth=16)
On This Page