Configuring Voice Settings

Voice Settings

Voices in LiveAvatar voice can be powered by several different providers. You have access to multiple text-to-speech (TTS) providers. Different providers offer different tradeoffs in quality, latency, and language support. We intend to support additional voice providers in LiveAvatar over time.

Voice settings are configured under avatar_persona.voice_settings when creating a session token. Settings are split into shared options that apply regardless of TTS provider, and provider-specific options for ElevenLabs and Fish Audio.

We currently support the following providers.

Provider	Documentation
ElevenLabs	ElevenLabs Docs
Fish Audio	Fish Audio Docs

Note: Fish Audio support is currently in Beta. We are actively working on improving its quality, stability, and feature parity. Expect changes as it matures.

Shared Settings

These settings apply across all TTS providers.

Field	Type	Range	Default	Description
`speed`	number	0.8–1.2	`1`	Controls the speaking speed. Values below `1` slow down speech; values above `1` speed it up.

ElevenLabs-Specific Settings

These settings apply when using ElevenLabs as your TTS provider.

Field	Type	Options	Default	Description
`stability`	number	0–1	`0.75`	Controls consistency of the voice. Higher values produce a more stable, predictable output; lower values allow more expressive variation.
`similarity_boost`	number	0–1	`0.75`	How closely the output should match the original voice. Higher values increase likeness but may reduce naturalness.
`style`	number	0–1	`0`	Controls the intensity of the voice's style. Higher values exaggerate style characteristics.
`use_speaker_boost`	boolean	—	`true`	Enhances the similarity and clarity of the speaker's voice. Recommended to leave enabled.
`model`	string	`eleven_flash_v2_5`, `eleven_multilingual_v2`	—	The ElevenLabs TTS model to use. `eleven_flash_v2_5` is optimized for low latency; `eleven_multilingual_v2` supports a broader range of languages.

Fish Audio-Specific Settings

These settings apply when using Fish Audio as your TTS provider.

Note Fish Audio only supports English, Chinese and Japanese languages at this point in time.

Field	Type	Options	Default	Description
`latency_mode`	string	`balanced`, `normal`	`balanced`	Controls how fast the response is generated. `balanced` tends to be faster; `normal` prioritizes quality over speed.
`model`	string	`s1`, `s2`	`s2`	The Fish Audio model to use for speech generation. Different models may vary in voice quality, language support, and speed.