Implement input speech conversion to output text via API interface.
Before running the example program, the corresponding model package must be installed on the device. Refer to Model List for the model package installation tutorial.
Before running this example program, please ensure the following preparations have been completed on the LLM device:
llm-model-whisper-tiny
model package using the apt package management tool.apt install llm-model-whisper-tiny
ffmpeg
tool.apt install ffmpeg
systemctl restart llm-openai-api
On the PC side, use the OpenAI API to pass in an audio file to implement speech-to-text conversion. Before running the example program, modify the IP part of the base_url
below to the actual IP address of the device.
from openai import OpenAI
client = OpenAI(
api_key="sk-",
base_url="http://192.168.20.186:8000/v1"
)
audio_file = open("speech.mp3", "rb")
transcript = client.audio.transcriptions.create(
model="whisper-tiny",
language="en",
file=audio_file
)
print(transcript)
Parameter Name | Type | Required | Example Value | Description |
---|---|---|---|---|
file | file | yes | – | The audio file object to be transcribed (not the filename), supported formats include flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm |
model | string | yes | whisper-base | The speech recognition model ID to use. Options include: whisper-tiny , whisper-base , whisper-small |
language | string | yes | en | The language of the input audio, using ISO-639-1 encoding (e.g., en ). Improves recognition accuracy and speed |
response_format | string | no | json | The return format. Currently only json is supported. Default is json |
Transcription(text=' Thank you. Thank you everybody. All right everybody go ahead and have a seat. How\'s everybody doing today? .....',
logprobs=None, task='transcribe', language='en', duration=334.234, segments=12, sample_rate=16000, channels=1, bit_depth=16)