Avatar (Visual Layer)
Defines what the avatar looks like. Choose from a wide selection of avatars, each with unique styles, appearances, and expressions. Each avatar has a uniqueavatar_id. Browse available avatars through the List Public Avatars or List User Avatars endpoints.
Video Settings
Video settings tune the rendered output. They apply to both FULL and LITE modes.Quality
Thequality parameter controls output resolution:
| Value | Resolution |
|---|---|
very_high | 1080p |
high (default) | 720p |
medium | 480p |
low | 360p |
Higher resolution increases streaming latency. Use
high or medium for most real-time applications.Encoding
Theencoding parameter controls the video codec: VP8 or H264.
Voice Agent (Persona Layer)
Defines how the avatar converses — both how it sounds and how it thinks. This is the voice agent layer: a voice paired with a context. You can configure it two ways:- Reference a stored voice agent with
voice_agent(byid). The voice and context are resolved from the stored agent. See Voice Agents. - Configure inline with
avatar_persona, supplyingvoice_id,context_id, andlanguagedirectly.
voice_agent and avatar_persona are mutually exclusive.
Voice
Defines what the avatar sounds like. Voices vary by gender, age, tone, and accent. Thevoice_settings parameter enables fine-grained audio control for speed, style, and stability. See Configuring Voice Settings for provider-specific options.
Context
Defines how the avatar thinks and responds. The context layer controls:- Available information and knowledge
- Response constraints and guardrails
- Personality traits
- Opening text (spoken at session start)
- Instructions for response generation