Convert input audio into text output through an API interface.
Before running the example program, you need to install the corresponding model package on the device. For model package installation instructions, refer to the Model List section. For detailed model descriptions, refer to the Model Introduction section.
Before running this example program, please ensure that the following preparations have been completed on the LLM device:
apt install llm-model-sense-voice-small-10s-ax650 ffmpeg tool.apt install ffmpeg systemctl restart llm-openai-api On the PC side, use the OpenAI API to pass an audio file to implement the speech-to-text function. Before running the example program, modify the IP part of base_url below to the actual IP address of the device.
from openai import OpenAI
client = OpenAI(
api_key="sk-",
base_url="http://192.168.20.186:8000/v1"
)
audio_file = open("speech.mp3", "rb")
transcript = client.audio.transcriptions.create(
model="sense-voice-small-10s-ax650",
file=audio_file
)
print(transcript) | Parameter Name | Type | Required | Example Value | Description |
|---|---|---|---|---|
| file | file | Yes | - | Audio file object to be transcribed (not the file name). Supported formats include flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm |
| model | string | Yes | sense-voice-small-10s-ax650 | SenseVoice models support automatic multilingual recognition, including Chinese, English, Japanese, Cantonese, Korean, etc. |
| language | string | No | - | Language is automatically detected by the model internally |
| response_format | string | No | json | Response format. Currently, only json is supported. The default value is json. |
Transcription(text=' Thank you. Thank you everybody. All right everybody go ahead and have a seat. How\'s everybody doing today? .....',
logprobs=None, task='transcribe', language='en', duration=334.234, segments=12, sample_rate=16000, channels=1, bit_depth=16)