Configuring Voice Settings

How to adjust the generated audio

Voice Settings

Voices in LiveAvatar voice can be powered by several different providers. You have access to multiple text-to-speech (TTS) providers. Different providers offer different tradeoffs in quality, latency, and language support. We intend to support additional voice providers in LiveAvatar over time.

Voice settings are configured under avatar_persona.voice_settings when creating a session token. Settings are split into shared options that apply regardless of TTS provider, and provider-specific options for ElevenLabs and Fish Audio.

We currently support the following providers.


Note: Fish Audio support is currently in Beta. We are actively working on improving its quality, stability, and feature parity. Expect changes as it matures.


Shared Settings

These settings apply across all TTS providers.

FieldTypeRangeDefaultDescription
speednumber0.8–1.21Controls the speaking speed. Values below 1 slow down speech; values above 1 speed it up.

ElevenLabs-Specific Settings

These settings apply when using ElevenLabs as your TTS provider.

FieldTypeOptionsDefaultDescription
stabilitynumber0–10.75Controls consistency of the voice. Higher values produce a more stable, predictable output; lower values allow more expressive variation.
similarity_boostnumber0–10.75How closely the output should match the original voice. Higher values increase likeness but may reduce naturalness.
stylenumber0–10Controls the intensity of the voice's style. Higher values exaggerate style characteristics.
use_speaker_boostbooleantrueEnhances the similarity and clarity of the speaker's voice. Recommended to leave enabled.
modelstringeleven_flash_v2_5, eleven_multilingual_v2The ElevenLabs TTS model to use. eleven_flash_v2_5 is optimized for low latency; eleven_multilingual_v2 supports a broader range of languages.

Fish Audio-Specific Settings

These settings apply when using Fish Audio as your TTS provider.

Note Fish Audio only supports English, Chinese and Japanese languages at this point in time.

FieldTypeOptionsDefaultDescription
latency_modestringbalanced, normalbalancedControls how fast the response is generated. balanced tends to be faster; normal prioritizes quality over speed.
modelstrings1, s2s2The Fish Audio model to use for speech generation. Different models may vary in voice quality, language support, and speed.