CosyVoice2 is a high-quality speech synthesis system based on large language models, capable of generating natural and fluent speech. This document provides a complete invocation method compatible with the OpenAI API. Users can get started quickly by installing the corresponding StackFlow software packages.
Refer to AI Pyramid Software Package Update to complete the installation of the following dependency packages and models:
Install core dependency packages:
apt install lib-llm llm-sys llm-cosy-voice llm-openai-api Install the CosyVoice2 model:
apt install llm-model-cosyvoice2-0.5b-ax650 systemctl restart llm-openai-api command to update the model list.curl http://127.0.0.1:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "CosyVoice2-0.5B-ax650",
"response_format": "wav",
"input": "But thy eternal summer shall not fade, Nor lose possession of that fair thou ow’st; Nor shall Death brag thou wander’st in his shade, When in eternal lines to time thou grow’st; So long as men can breathe or eyes can see, So long lives this, and this gives life to thee.
"
}' \
-o output.wav from pathlib import Path
from openai import OpenAI
client = OpenAI(
api_key="sk-",
base_url="http://127.0.0.1:8000/v1"
)
speech_file_path = Path(__file__).parent / "output.wav"
with client.audio.speech.with_streaming_response.create(
model="CosyVoice2-0.5B-ax650",
voice="prompt_data",
response_format="wav",
input='But thy eternal summer shall not fade, Nor lose possession of that fair thou ow’st; Nor shall Death brag thou wander’st in his shade, When in eternal lines to time thou grow’st; So long as men can breathe or eyes can see, So long lives this, and this gives life to thee.
',
) as response:
response.stream_to_file(speech_file_path) Choose one of the following methods to obtain the CosyVoice2 cloning scripts:
Method 1: Manual Download
Visit the CosyVoice2 Script Repository to download, then upload them to the AI Pyramid device.
Method 2: Command-Line Clone
git clone --recurse-submodules https://huggingface.co/M5Stack/CosyVoice2-scripts After cloning is complete, the directory structure is as follows:
root@m5stack-AI-Pyramid:~/CosyVoice2-scripts# ls -lh
total 28K
drwxr-xr-x 2 root root 4.0K Jan 9 10:26 asset
drwxr-xr-x 2 root root 4.0K Jan 9 10:26 CosyVoice-BlankEN
drwxr-xr-x 2 root root 4.0K Jan 9 10:27 frontend-onnx
drwxr-xr-x 3 root root 4.0K Jan 9 10:26 pengzhendong
-rw-r--r-- 1 root root 24 Jan 9 10:26 README.md
-rw-r--r-- 1 root root 103 Jan 9 10:26 requirements.txt
drwxr-xr-x 3 root root 4.0K Jan 9 10:26 scripts apt install python3.10-venv.python3 -m venv cosyvoice source cosyvoice/bin/activate pip install -r requirements.txt Run the voice processing script to generate voice feature files:
python3 scripts/process_prompt.py --prompt_text asset/zh_woman1.txt --prompt_speech asset/zh_woman1.wav --output zh_woman1 Example output after successful script execution:
(cosyvoice) root@m5stack-AI-Pyramid:~/CosyVoice2-scripts# python3 scripts/process_prompt.py --prompt_text asset/zh_woman1.txt --prompt_speech asset/zh_woman1.wav --output zh_woman1
2026-01-09 10:41:18.655905428 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"
prompt_text 希望你以后能够做的比我还好呦。
fmax 8000
prompt speech token size: torch.Size([1, 87]) Copy the processed voice feature files to the model data directory:
cp -r zh_woman1 /opt/m5stack/data/CosyVoice2-0.5B-ax650/ Restart the model service to load the new voice configuration:
systemctl restart llm-sys prompt_dir field in the /opt/m5stack/data/models/mode_CosyVoice2-0.5B-ax650.json file to the new voice directory. Each time the voice is replaced, the model service needs to be reinitialized.curl http://127.0.0.1:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "CosyVoice2-0.5B-ax650",
"voice": "zh_woman1",
"response_format": "wav",
"input": "But thy eternal summer shall not fade, Nor lose possession of that fair thou ow’st; Nor shall Death brag thou wander’st in his shade, When in eternal lines to time thou grow’st; So long as men can breathe or eyes can see, So long lives this, and this gives life to thee.
"
}' \
-o output.wav from pathlib import Path
from openai import OpenAI
client = OpenAI(
api_key="sk-",
base_url="http://127.0.0.1:8000/v1"
)
speech_file_path = Path(__file__).parent / "output.wav"
with client.audio.speech.with_streaming_response.create(
model="CosyVoice2-0.5B-ax650",
voice="zh_woman1",
response_format="wav",
input='But thy eternal summer shall not fade, Nor lose possession of that fair thou ow’st; Nor shall Death brag thou wander’st in his shade, When in eternal lines to time thou grow’st; So long as men can breathe or eyes can see, So long lives this, and this gives life to thee.
',
) as response:
response.stream_to_file(speech_file_path)