Skip to main content
FULL Mode provides three configurable layers for your LiveAvatar session.

Avatar (Visual Layer)

Defines what the avatar looks like. Choose from a wide selection of avatars, each with unique styles, appearances, and expressions. Each avatar has a unique avatar_id. Browse available avatars through the List Public Avatars or List User Avatars endpoints.

Video Settings

Video settings tune the rendered output. They apply to both FULL and LITE modes.

Quality

The quality parameter controls output resolution:
ValueResolution
very_high1080p
high (default)720p
medium480p
low360p
Higher resolution increases streaming latency. Use high or medium for most real-time applications.

Encoding

The encoding parameter controls the video codec: VP8 or H264.

Voice Agent (Persona Layer)

Defines how the avatar converses — both how it sounds and how it thinks. This is the voice agent layer: a voice paired with a context. You can configure it two ways:
  • Reference a stored voice agent with voice_agent (by id). The voice and context are resolved from the stored agent. See Voice Agents.
  • Configure inline with avatar_persona, supplying voice_id, context_id, and language directly.
voice_agent and avatar_persona are mutually exclusive.

Voice

Defines what the avatar sounds like. Voices vary by gender, age, tone, and accent. The voice_settings parameter enables fine-grained audio control for speed, style, and stability. See Configuring Voice Settings for provider-specific options.

Context

Defines how the avatar thinks and responds. The context layer controls:
  • Available information and knowledge
  • Response constraints and guardrails
  • Personality traits
  • Opening text (spoken at session start)
  • Instructions for response generation
If no context is supplied, the avatar operates in restricted mode — it will not respond to user input. User transcripts are still emitted, but the avatar can only repeat pre-set phrases.

Interactivity Type (Conversational Layer)

Controls how user input is registered and when the avatar responds.

Conversational (default)

The system manages conversation flow automatically based on speech pauses and user interruptions.

Push-to-Talk

You control exactly when user input is registered by signaling start/stop events. VAD and ASR are disabled until you signal. See Push-to-Talk for details.