Configuring Voice Settings
How to adjust the generated audio
Voice Settings
Voices in LiveAvatar voice can be powered by several different providers. You have access to multiple text-to-speech (TTS) providers. Different providers offer different tradeoffs in quality, latency, and language support. We intend to support additional voice providers in LiveAvatar over time.
Voice settings are configured under avatar_persona.voice_settings when creating a session token. Settings are split into shared options that apply regardless of TTS provider, and provider-specific options for ElevenLabs and Fish Audio.
We currently support the following providers.
| Provider | Documentation |
|---|---|
| ElevenLabs | ElevenLabs Docs |
| Fish Audio | Fish Audio Docs |
Note: Fish Audio support is currently in Beta. We are actively working on improving its quality, stability, and feature parity. Expect changes as it matures.
Shared Settings
These settings apply across all TTS providers.
| Field | Type | Range | Default | Description |
|---|---|---|---|---|
speed | number | 0.8–1.2 | 1 | Controls the speaking speed. Values below 1 slow down speech; values above 1 speed it up. |
ElevenLabs-Specific Settings
These settings apply when using ElevenLabs as your TTS provider.
| Field | Type | Options | Default | Description |
|---|---|---|---|---|
stability | number | 0–1 | 0.75 | Controls consistency of the voice. Higher values produce a more stable, predictable output; lower values allow more expressive variation. |
similarity_boost | number | 0–1 | 0.75 | How closely the output should match the original voice. Higher values increase likeness but may reduce naturalness. |
style | number | 0–1 | 0 | Controls the intensity of the voice's style. Higher values exaggerate style characteristics. |
use_speaker_boost | boolean | — | true | Enhances the similarity and clarity of the speaker's voice. Recommended to leave enabled. |
model | string | eleven_flash_v2_5, eleven_multilingual_v2 | — | The ElevenLabs TTS model to use. eleven_flash_v2_5 is optimized for low latency; eleven_multilingual_v2 supports a broader range of languages. |
Fish Audio-Specific Settings
These settings apply when using Fish Audio as your TTS provider.
Note Fish Audio only supports English, Chinese and Japanese languages at this point in time.
| Field | Type | Options | Default | Description |
|---|---|---|---|---|
latency_mode | string | balanced, normal | balanced | Controls how fast the response is generated. balanced tends to be faster; normal prioritizes quality over speed. |
model | string | s1, s2 | s2 | The Fish Audio model to use for speech generation. Different models may vary in voice quality, language support, and speed. |
Updated 1 day ago