Configuration - LiveAvatar Documentation

FULL Mode provides three configurable layers for your LiveAvatar session.

Avatar (Visual Layer)

Defines what the avatar looks like. Choose from a wide selection of avatars, each with unique styles, appearances, and expressions. Each avatar has a unique avatar_id. Browse available avatars through the List Public Avatars or List User Avatars endpoints.

Video Settings

Video settings tune the rendered output. They apply to both FULL and LITE modes.

Quality

The quality parameter controls output resolution:

Value	Resolution
`very_high`	1080p
`high` (default)	720p
`medium`	480p
`low`	360p

Higher resolution increases streaming latency. Use high or medium for most real-time applications.

Encoding

The encoding parameter controls the video codec: VP8 or H264.

Voice Agent (Persona Layer)

Defines how the avatar converses — both how it sounds and how it thinks. This is the voice agent layer: a voice paired with a context. You can configure it two ways:

Reference a stored voice agent with voice_agent (by id). The voice and context are resolved from the stored agent. See Voice Agents.
Configure inline with avatar_persona, supplying voice_id, context_id, and language directly.

voice_agent and avatar_persona are mutually exclusive.

Voice

Defines what the avatar sounds like. Voices vary by gender, age, tone, and accent. The voice_settings parameter enables fine-grained audio control for speed, style, and stability. See Configuring Voice Settings for provider-specific options.

Context

Defines how the avatar thinks and responds. The context layer controls:

Available information and knowledge
Response constraints and guardrails
Personality traits
Opening text (spoken at session start)
Instructions for response generation

If no context is supplied, the avatar operates in restricted mode — it will not respond to user input. User transcripts are still emitted, but the avatar can only repeat pre-set phrases.

Interactivity Type (Conversational Layer)

Controls how user input is registered and when the avatar responds.

Conversational (default)

The system manages conversation flow automatically based on speech pauses and user interruptions.

Push-to-Talk

You control exactly when user input is registered by signaling start/stop events. VAD and ASR are disabled until you signal. See Push-to-Talk for details.

​Avatar (Visual Layer)

​Video Settings

​Quality

​Encoding

​Voice Agent (Persona Layer)

​Voice

​Context

​Interactivity Type (Conversational Layer)

​Conversational (default)

​Push-to-Talk

Avatar (Visual Layer)

Video Settings

Quality

Encoding

Voice Agent (Persona Layer)

Voice

Context

Interactivity Type (Conversational Layer)

Conversational (default)

Push-to-Talk