Custom LiveKit Agent - LiveAvatar Documentation

As part of LITE Mode, LiveAvatar provisions a LiveKit room for every session. You can send your own custom LiveKit Agent into that room to monitor user behavior and drive our avatar — handle STT, LLM, TTS, tool calls, and any orchestration logic you need. This guide walks through a Python starter that does exactly that.

liveavatar-starter-livekit-agent-python

Reference implementation for this guide. Demo entrypoint: src/liveavatar_hosted_demo.py.

This path is best for teams without an existing LiveKit setup. If you already run your own LiveKit project, we recommend checking out our LiveKit avatar plugin.

Prerequisites

Before you start, make sure you have:

A LiveAvatar account with an API key — sign up at app.liveavatar.com.
An AVATAR_ID from your account. The default in the starter repo is sandbox-compatible; swap it for one of your own avatars when going to production.
A LiveKit Cloud project with LIVEKIT_API_KEY and LIVEKIT_API_SECRET. These are used for the inference gateway (STT / LLM / TTS) only — the room itself is provisioned by LiveAvatar. Sign up at cloud.livekit.io if you do not have one.
Python 3.11+ and uv (or pip) to install the starter’s dependencies.

LIVEKIT_URL is not required for this flow — the room URL is returned by start_session.

Quickstart

Clone and run the reference implementation against the LiveAvatar API.

git clone https://github.com/heygen-com/liveavatar-starter-livekit-agent-python
cd liveavatar-starter-livekit-agent-python
cp .env.example .env.local

Set the following in .env.local:

LIVEAVATAR_API_KEY=...   # from app.liveavatar.com
AVATAR_ID=...            # any avatar in your account; sandbox-compatible by default
LIVEKIT_API_KEY=...      # your LK Cloud project — used for the inference gateway only
LIVEKIT_API_SECRET=...
IS_SANDBOX=true          # default; switch to false for production avatars / billed minutes

We do not need to set our own LIVEKIT_URL for this flow — we will instead use the room URL returned by the start_session call. Install and run:

uv sync
python src/liveavatar_hosted_demo.py

A new browser tab should pop up with your avatar in the room. Try speaking to it. Behind the scenes, your agent runs STT → LLM → TTS and streams audio to LiveAvatar for lip-sync. The next section breaks down the full flow.

Understanding the flow

LiveAvatar mints both tokens at session start. start_session returns a viewer token (so your end user can join) and an agent token (so your agent can join the same room).
Your agent joins the LiveAvatar-owned room. Using the agent token, your agent connects to LiveAvatar’s room along with the user.
Your agent processes user input and feeds audio into the LiveAvatar WebSocket. As the user speaks, the agent runs STT → LLM → TTS and forwards synthesized audio over the LiveAvatar WebSocket (ws_url). LiveAvatar handles lip-sync, publishing avatar video back into the room.

Ownership at a glance

Concern	Owned by
LiveKit room	LiveAvatar
Agent + viewer tokens	LiveAvatar
LiveKit room usage (participant-minutes)	LiveAvatar
Agent maintenance	You
Avatar minutes / credits	You (your LiveAvatar account)
Inference billing (STT / LLM / TTS)	You (your LiveKit Cloud project)

We use LiveKit’s framework here for simplicity. You’re free to swap out individual STT, LLM, or TTS providers — or replace LiveKit entirely with a different agent stack.

Understanding the code

When we call python src/liveavatar_hosted_demo.py we do the following:

Start a LiveAvatar session. We call the LiveAvatar API to spin up a new session. In return we get the LiveKit room URL, an agent token, a viewer token, and a WebSocket URL for streaming audio to the avatar.
Spin up the agent locally. We boot a local LiveKit Agents worker — no LiveKit Cloud registration needed since LiveAvatar owns the room.
Send the agent into the room. Using the agent token, the worker joins the LiveAvatar room and opens the WebSocket so it can drive the avatar.
Open the viewer. A local browser tab launches with the viewer token preloaded, so you can see and talk to the avatar.
Run the conversation loop. The agent listens to user audio, runs STT → LLM → TTS, and forwards synthesized audio over the WebSocket. LiveAvatar lip-syncs and publishes video back into the room.
Clean up on exit. When the process stops, the agent closes the LiveAvatar session so credits and room minutes stop billing.

File structure

The starter is split into a backend (the agent itself) and a small frontend used to visualize the session locally. Backend (src/):

File	Responsibility
`liveavatar_client.py`	Makes API calls to LiveAvatar (create session, start session, stop session).
`avatar_ws.py`	Maintains the WebSocket connection to LiveAvatar and forwards audio frames for lip-sync.
`agent.py`	Defines the agent that handles the conversation and tees synthesized audio to the LiveAvatar WebSocket.
`pipeline.py`	Builds the STT → LLM → TTS pipeline and wires up room + session observability.
`worker.py`	Defines the LiveKit worker entrypoint that joins the room and runs the agent.
`liveavatar_hosted_demo.py`	Demo entrypoint — orchestrates the session, the worker, and the local viewer.

Frontend (viewer/): We ship a minimal frontend so you can see the avatar and talk to it locally without standing up your own client. Treat it as a reference, not as production frontend code.

File	Responsibility
`index.html`	Vanilla LiveKit JS client. Connects to the LiveAvatar room with the viewer token, publishes the user’s microphone, and renders the avatar’s audio + video.

Customizing the agent

The most common things you’ll want to tweak after the quickstart:

What	Where
System prompt / agent persona	`agent.py` — the `LiveAvatarAgent` definition.
LLM model, STT provider, TTS voice	`pipeline.py` — `build_session` configures all three.
Audio input options (noise cancellation, etc.)	`pipeline.py` — `build_room_options`.
Logging / observability hooks	`pipeline.py` — `wire_room_observability`, `wire_session_observability`.

Productionizing

This demo is wired up to help you quickly iterate locally. When productionizing, here are some recommendations: remove the local scaffolding, run the agent as a long-lived process, and tighten up session lifecycle.

Improvements beyond the demo

A few things in the starter exist purely for local convenience. We recommend the following swaps:

Build your own frontend. Replace the local viewer HTTP server and auto-launched browser tab with a real client that joins the LiveAvatar room using the viewer token.
Simplify the agent runtime. The demo uses local scaffolding to dispatch a single in-process job. In production, there are many ways to manage the agent — pick the runtime that fits your infrastructure.
Hand off shutdown to your runtime. The demo handles Ctrl-C itself. In production, lifecycle and shutdown signals come from your container, supervisor, or queue worker.
Guard against leaked sessions. Always call stop_session on session end, including error paths. As a safety net for runners that crash without it, set max_session_duration server-side when calling create_session_token so LiveAvatar will close abandoned sessions on its own.
Evaluate the code structure. Our helpers wrap LiveKit’s AgentSession, worker, and room abstractions, encapsulating WebRTC complexity.
- This structure may be over-engineered for your use case — we recommend cleaning up pieces that don’t make sense for your setup.
- If you anticipate running your own LiveKit deployment later, keeping this structure simplifies the transition.

Deployment considerations

A LiveAvatar session is stateful and longer-lived than a typical request — your agent holds an open LiveKit room connection, a WebSocket to LiveAvatar, and live STT / LLM / TTS streams for the duration of the conversation. The right deployment shape depends heavily on your traffic, concurrency, and infrastructure — there’s no one-size-fits-all answer here. We have seen some of the following deployment patterns work well in practice.

Pattern 1 — On-demand spawn per session

A backend HTTP service receives a “start session” request, spawns a runner process for that session, and returns the viewer token + room URL to the client. The process lives until the session ends, then exits.

Pros. Dead simple. No queue, no pool, no orchestration. Each session is fully isolated.
Cons. Cold start hits every session — VAD, turn-detector, and plugin init can add several seconds before the agent joins the room.
Fits. Smaller deployments with modest concurrency, or sessions long enough that startup latency is not user-facing pain.

Pattern 2 — Warm worker pool + queue

A pool of runner processes sits idle, blocked on a queue. The backend enqueues a session-start job; the first idle worker pops it, calls start_session, runs the conversation, then returns to the pool.

Pros. Cold start is paid once at process boot, not per session. Capacity is predictable (pool size = peak concurrency). Autoscaling works off queue depth or active-session count.
Cons. Idle workers cost compute. Pool sizing matters — undersized pools create user-facing queueing.
Fits. Mid-to-large deployments where cold start hurts UX or concurrency exceeds what spawn-per-request can handle.

Guides

Documentation Index

liveavatar-starter-livekit-agent-python

​Prerequisites

​Quickstart

​Understanding the flow

​Ownership at a glance

​Understanding the code

​File structure

​Customizing the agent

​Productionizing

​Improvements beyond the demo

​Deployment considerations

​Pattern 1 — On-demand spawn per session

​Pattern 2 — Warm worker pool + queue

Prerequisites

Quickstart

Understanding the flow

Ownership at a glance

Understanding the code

File structure

Customizing the agent

Productionizing

Improvements beyond the demo

Deployment considerations

Pattern 1 — On-demand spawn per session

Pattern 2 — Warm worker pool + queue