pdf-icon

StackFlow AI Platform

Module LLM Applications

CV Vision Application

Vision Language Model (VLM)

Large Language Model (LLM)

Voice Assistant

Speech to Text

Convert input audio into text output through an API interface.

Preparation

Before running the example program, you need to install the corresponding model package on the device. For model package installation instructions, refer to the Model List section. For detailed model descriptions, refer to the Model Introduction section.

Before running this example program, please ensure that the following preparations have been completed on the LLM device:

  1. Use the apt package manager to install the SenseVoice model package.
apt install llm-model-sense-voice-small-10s-ax650
  1. Install the ffmpeg tool.
apt install ffmpeg
  1. After installation is complete, restart the OpenAI service to make the new model take effect.
systemctl restart llm-openai-api

Example Program

On the PC side, use the OpenAI API to pass an audio file to implement the speech-to-text function. Before running the example program, modify the IP part of base_url below to the actual IP address of the device.

from openai import OpenAI
client = OpenAI(
    api_key="sk-",
    base_url="http://192.168.20.186:8000/v1"
)

audio_file = open("speech.mp3", "rb")
transcript = client.audio.transcriptions.create(
  model="sense-voice-small-10s-ax650",
  file=audio_file
)

print(transcript)

Request Parameters

Parameter Name Type Required Example Value Description
file file Yes - Audio file object to be transcribed (not the file name). Supported formats include flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
model string Yes sense-voice-small-10s-ax650 SenseVoice models support automatic multilingual recognition, including Chinese, English, Japanese, Cantonese, Korean, etc.
language string No - Language is automatically detected by the model internally
response_format string No json Response format. Currently, only json is supported. The default value is json.

Response Example

Transcription(text=' Thank you. Thank you everybody. All right everybody go ahead and have a seat. How\'s everybody doing today? .....',
logprobs=None, task='transcribe', language='en', duration=334.234, segments=12, sample_rate=16000, channels=1, bit_depth=16)
On This Page