Avatar (Visual Layer)
Defines what the avatar looks like. Choose from a wide selection of avatars, each with unique styles, appearances, and expressions. Each avatar has a uniqueavatar_id. Browse available avatars through the List Public Avatars or List User Avatars endpoints.
Voice (Audio Layer)
Defines what the avatar sounds like. Voices vary by gender, age, tone, and accent. Thevoice_settings parameter enables fine-grained audio control for speed, style, and stability. See Configuring Voice Settings for provider-specific options.
Context (Cognitive Layer)
Defines how the avatar thinks and responds. The context layer controls:- Available information and knowledge
- Response constraints and guardrails
- Personality traits
- Opening text (spoken at session start)
- Instructions for response generation
Interactivity Type (Conversational Layer)
Controls how user input is registered and when the avatar responds.Conversational (default)
The system manages conversation flow automatically based on speech pauses and user interruptions.Push-to-Talk
You control exactly when user input is registered by signaling start/stop events. VAD and ASR are disabled until you signal. See Push-to-Talk for details.Video Settings
These settings apply to both FULL and LITE modes.Quality
Thequality parameter controls output resolution:
| Value | Resolution |
|---|---|
very_high | 1080p |
high (default) | 720p |
medium | 480p |
low | 360p |
Higher resolution increases streaming latency. Use
high or medium for most real-time applications.Encoding
Theencoding parameter controls the video codec: VP8 or H264.