文本转语音

通过 API 接口实现输入文本转换输出语音文件。

准备工作

案例程序执行前需在设备中安装对应的 model 模型包。模型包安装教程可参考模型列表章节。模型详细介绍参考模型介绍章节。

提示

AI Pyramid 有专属的带音色克隆的文本转语音，可参考 CosyVoice 章节。

在运行本示例程序之前，请确保已在 LLM 设备上完成以下准备工作：

使用 apt 包管理工具安装 llm-model-melotts-en-us 模型包。

apt install llm-model-melotts-en-us

安装 ffmpeg 工具。

apt install ffmpeg

安装完成后，重启 OpenAI 服务以使新模型生效。

systemctl restart llm-openai-api

案例程序

在 PC 端通过 OpenAI API 传入文本信息实现文本转换语音功能，案例程序执行前将下方base_url的 IP 部分修改为设备实际 IP 地址。

from pathlib import Path
from openai import OpenAI

client = OpenAI(
    api_key="sk-",
    base_url="http://192.168.20.186:8000/v1"
)

speech_file_path = Path(__file__).parent / "speech.mp3"
with client.audio.speech.with_streaming_response.create(
  model="melotts-en-us",
  voice="alloy",
  input="The quick brown fox jumped over the lazy dog."
) as response:
  response.stream_to_file(speech_file_path) 

请求参数

参数名称	类型	必选	示例值	描述
input	string	是	"你好，欢迎使用系统"	要生成音频的文本内容，最大长度为 1024 个字符
model	string	是	melotts-zh-cn	可用的 TTS 模型，包括 `melotts-ja-jp`、`melotts-zh-cn` 和 `melotts-en-us` 等
voice	-	否	-	MeloTTS 模型不支持语音风格选择
response_format	string	否	mp3	音频输出格式，支持 `mp3`, `opus`, `aac`, `flac`, `wav`, `pcm` 等
speed	number	否	1.0	生成语音的速度，范围为 0.25 ~ 2.0，默认值为 1.0

返回示例

语音文件数据将会存放至示例程序中的 speech_file_path 路径下。

Next 目录索引

Page Tools

PDF

设备开发 & 快速上手

AI Pyramid

Module LLM

LLM630 Compute Kit

模型介绍

Qwen2.5

Qwen3

DeepSeek-R1

SmolVLM

MeloTTS

Whisper

Llama

AI Pyramid 应用案例

Module LLM 应用案例

Audio 音频处理

CV 视觉应用

VLM 多模态

LLM 大语言模型

语音助手

OpenAI API

文本转语音

准备工作

案例程序

请求参数

返回示例

On This Page