Configuration Options

As of right now, we offer the following ways to personalize your LiveAvatar experience.

Avatar (Visual Layer) — Defines what the avatar looks like. Choose from a wide selection of avatars, each with unique styles, appearances, and expressions.
Voice (Audio Layer) — Defines what the avatar sounds like. Choose a voices that fit your needs — from calm and professional to energetic or youthful.
Context (Cognitive Layer) — Defines how the avatar thinks and responds. Control the personality traits, background knowledge, and behavior, which guide how the LLM generates responses.
Interactivity Type (Conversational Layer) — Controls how/when you want the avatar to respond.

Each of these layers can be configured when creating a session, giving developers precise control over the avatar’s look, sound, and intelligence.

Avatars

An avatar determines the visual appearance of your AI agent. Developers can select from a catalog of avatars — ranging from realistic human presenters to stylized or expressive characters.

Each avatar can be uniquely identified by its avatar_id.

You can preview all available avatars here.

Avatar Personas

In addition to controlling the look of the Avatar, you can control additional features of the Avatar including it's voice and the context (think of it as the Avatar's personality).

Voices

A voice defines the sound and personality of your avatar’s speech. Voices can have many variabilities including gender, age, tones, and accents. Some voices are also better at some languages compared to others.

You can preview all available voices here.

Context

The context allows you to shape how your avatar will think and respond. This layer defines the information available to the avatar, the constraints replied to it, and any quirky personalities you want your avatar to have.

Each context includes an opening text, that will be said at the start of each LiveAvatar session, and instructions to direct the responses you want your avatar to generate. If no opening text is provided in the context, then we will not say anything to begin with.

What happens if no context is passed in

Contexts can be passed in when first creating a session token. If no context is supplied, the avatar will operate in a more restricted manner.

The avatar will not generate any responses to user conversation input We still emit user transcript events, which transcribe what the user says.
Avatars can still repeat phrases that you want them to say.

You can manage all your contexts here.

Interactivity Type

As of now, there are two interactivity types we support:

Conversational - we manage changes in conversation and control when the avatar start/stop responding. This is in response to pauses in users speech or when the user tries to interrupt the avatar. This is the default interactivity type.
Push-to-Talk - you have full control of when the user's input is registered by signaling the exact moments user start/stop talking. We disable our VAD/ASR controls and give you precise control of when you want the avatar to respond.

Wrapping Up

Combining the everything together, you can customize your LiveAvatar to make the avatar look, sound, and respond consistently throughout the conversation, exactly how you want it to.