Stream Text to Speech

Convert text to speech with streaming response for real-time playback.

Prerequisites

You must have at least one cloned voice. Use Create Voice (POST /v1/voices/add) to create a voice, then retrieve its voice_id from List Voices (GET /v2/voices).

Authenticate using the PersoPlatform-APIKey header. The ElevenLabs-compatible xi-api-key header is also accepted:

--header 'xi-api-key: <your-api-key>'

URL Path Parameters

Replace {voice_id} in the URL with your voice ID. Retrieve available voice IDs from List Voices (GET /v2/voices).

POST /v1/text-to-speech/{voice_id}/stream

Request Parameters

ParameterRequiredDescription
textYesText to convert to speech
model_idNoDefault: perso_multilingual_v1

Query Parameters

ParameterRequiredDescription
output_formatYesAudio format for streaming. Default: wav_24000 (see formats below)

How It Works

Audio chunks are streamed as they are generated, enabling:

  • Lower latency for first audio
  • Real-time playback during generation
  • Reduced memory usage for long text

Example

Select "TextToSpeechStreamRequest" from the Examples dropdown to see a request example.

curl -X POST "https://platform.perso.ai/api/speech/v1/text-to-speech/{voice_id}/stream?output_format=mp3_44100_192"   -H "PersoPlatform-APIKey: <your-api-key>"   -H "Content-Type: application/json"   -H "Accept: */*"   -d '{"text": "Hello, this is a streaming test."}'   --output audio.mp3
Language
Credentials
Header
Response
Click Try It! to start a request and see the response here!