Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.liveavatar.com/llms.txt

Use this file to discover all available pages before exploring further.

As part of LITE Mode, LiveAvatar provisions a LiveKit room for every session. You can send your own custom LiveKit Agent into that room to monitor user behavior and drive our avatar — handle STT, LLM, TTS, tool calls, and any orchestration logic you need. This guide walks through a Python starter that does exactly that.

liveavatar-starter-livekit-agent-python

Reference implementation for this guide. Demo entrypoint: src/liveavatar_hosted_demo.py.
This path is best for teams without an existing LiveKit setup. If you already run your own LiveKit project, we recommend checking out our LiveKit avatar plugin.

Prerequisites

Before you start, make sure you have:
  • A LiveAvatar account with an API key — sign up at app.liveavatar.com.
  • An AVATAR_ID from your account. The default in the starter repo is sandbox-compatible; swap it for one of your own avatars when going to production.
  • A LiveKit Cloud project with LIVEKIT_API_KEY and LIVEKIT_API_SECRET. These are used for the inference gateway (STT / LLM / TTS) only — the room itself is provisioned by LiveAvatar. Sign up at cloud.livekit.io if you do not have one.
  • Python 3.11+ and uv (or pip) to install the starter’s dependencies.
LIVEKIT_URL is not required for this flow — the room URL is returned by start_session.

Quickstart

Clone and run the reference implementation against the LiveAvatar API.
git clone https://github.com/heygen-com/liveavatar-starter-livekit-agent-python
cd liveavatar-starter-livekit-agent-python
cp .env.example .env.local
Set the following in .env.local:
LIVEAVATAR_API_KEY=...   # from app.liveavatar.com
AVATAR_ID=...            # any avatar in your account; sandbox-compatible by default
LIVEKIT_API_KEY=...      # your LK Cloud project — used for the inference gateway only
LIVEKIT_API_SECRET=...
IS_SANDBOX=true          # default; switch to false for production avatars / billed minutes
We do not need to set our own LIVEKIT_URL for this flow — we will instead use the room URL returned by the start_session call. Install and run:
uv sync
python src/liveavatar_hosted_demo.py
A new browser tab should pop up with your avatar in the room. Try speaking to it. Behind the scenes, your agent runs STT → LLM → TTS and streams audio to LiveAvatar for lip-sync. The next section breaks down the full flow.

Understanding the flow

  1. LiveAvatar mints both tokens at session start. start_session returns a viewer token (so your end user can join) and an agent token (so your agent can join the same room).
  2. Your agent joins the LiveAvatar-owned room. Using the agent token, your agent connects to LiveAvatar’s room along with the user.
  3. Your agent processes user input and feeds audio into the LiveAvatar WebSocket. As the user speaks, the agent runs STT → LLM → TTS and forwards synthesized audio over the LiveAvatar WebSocket (ws_url). LiveAvatar handles lip-sync, publishing avatar video back into the room.

Ownership at a glance

ConcernOwned by
LiveKit roomLiveAvatar
Agent + viewer tokensLiveAvatar
LiveKit room usage (participant-minutes)LiveAvatar
Agent maintenanceYou
Avatar minutes / creditsYou (your LiveAvatar account)
Inference billing (STT / LLM / TTS)You (your LiveKit Cloud project)
We use LiveKit’s framework here for simplicity. You’re free to swap out individual STT, LLM, or TTS providers — or replace LiveKit entirely with a different agent stack.

Understanding the code

When we call python src/liveavatar_hosted_demo.py we do the following:
  1. Start a LiveAvatar session. We call the LiveAvatar API to spin up a new session. In return we get the LiveKit room URL, an agent token, a viewer token, and a WebSocket URL for streaming audio to the avatar.
  2. Spin up the agent locally. We boot a local LiveKit Agents worker — no LiveKit Cloud registration needed since LiveAvatar owns the room.
  3. Send the agent into the room. Using the agent token, the worker joins the LiveAvatar room and opens the WebSocket so it can drive the avatar.
  4. Open the viewer. A local browser tab launches with the viewer token preloaded, so you can see and talk to the avatar.
  5. Run the conversation loop. The agent listens to user audio, runs STT → LLM → TTS, and forwards synthesized audio over the WebSocket. LiveAvatar lip-syncs and publishes video back into the room.
  6. Clean up on exit. When the process stops, the agent closes the LiveAvatar session so credits and room minutes stop billing.

File structure

The starter is split into a backend (the agent itself) and a small frontend used to visualize the session locally. Backend (src/):
FileResponsibility
liveavatar_client.pyMakes API calls to LiveAvatar (create session, start session, stop session).
avatar_ws.pyMaintains the WebSocket connection to LiveAvatar and forwards audio frames for lip-sync.
agent.pyDefines the agent that handles the conversation and tees synthesized audio to the LiveAvatar WebSocket.
pipeline.pyBuilds the STT → LLM → TTS pipeline and wires up room + session observability.
worker.pyDefines the LiveKit worker entrypoint that joins the room and runs the agent.
liveavatar_hosted_demo.pyDemo entrypoint — orchestrates the session, the worker, and the local viewer.
Frontend (viewer/): We ship a minimal frontend so you can see the avatar and talk to it locally without standing up your own client. Treat it as a reference, not as production frontend code.
FileResponsibility
index.htmlVanilla LiveKit JS client. Connects to the LiveAvatar room with the viewer token, publishes the user’s microphone, and renders the avatar’s audio + video.

Customizing the agent

The most common things you’ll want to tweak after the quickstart:
WhatWhere
System prompt / agent personaagent.py — the LiveAvatarAgent definition.
LLM model, STT provider, TTS voicepipeline.pybuild_session configures all three.
Audio input options (noise cancellation, etc.)pipeline.pybuild_room_options.
Logging / observability hookspipeline.pywire_room_observability, wire_session_observability.

Productionizing

This demo is wired up to help you quickly iterate locally. When productionizing, here are some recommendations: remove the local scaffolding, run the agent as a long-lived process, and tighten up session lifecycle.

Improvements beyond the demo

A few things in the starter exist purely for local convenience. We recommend the following swaps:
  • Build your own frontend. Replace the local viewer HTTP server and auto-launched browser tab with a real client that joins the LiveAvatar room using the viewer token.
  • Simplify the agent runtime. The demo uses local scaffolding to dispatch a single in-process job. In production, there are many ways to manage the agent — pick the runtime that fits your infrastructure.
  • Hand off shutdown to your runtime. The demo handles Ctrl-C itself. In production, lifecycle and shutdown signals come from your container, supervisor, or queue worker.
  • Guard against leaked sessions. Always call stop_session on session end, including error paths. As a safety net for runners that crash without it, set max_session_duration server-side when calling create_session_token so LiveAvatar will close abandoned sessions on its own.
  • Evaluate the code structure. Our helpers wrap LiveKit’s AgentSession, worker, and room abstractions, encapsulating WebRTC complexity.
    • This structure may be over-engineered for your use case — we recommend cleaning up pieces that don’t make sense for your setup.
    • If you anticipate running your own LiveKit deployment later, keeping this structure simplifies the transition.

Deployment considerations

A LiveAvatar session is stateful and longer-lived than a typical request — your agent holds an open LiveKit room connection, a WebSocket to LiveAvatar, and live STT / LLM / TTS streams for the duration of the conversation. The right deployment shape depends heavily on your traffic, concurrency, and infrastructure — there’s no one-size-fits-all answer here. We have seen some of the following deployment patterns work well in practice.

Pattern 1 — On-demand spawn per session

A backend HTTP service receives a “start session” request, spawns a runner process for that session, and returns the viewer token + room URL to the client. The process lives until the session ends, then exits.
  • Pros. Dead simple. No queue, no pool, no orchestration. Each session is fully isolated.
  • Cons. Cold start hits every session — VAD, turn-detector, and plugin init can add several seconds before the agent joins the room.
  • Fits. Smaller deployments with modest concurrency, or sessions long enough that startup latency is not user-facing pain.

Pattern 2 — Warm worker pool + queue

A pool of runner processes sits idle, blocked on a queue. The backend enqueues a session-start job; the first idle worker pops it, calls start_session, runs the conversation, then returns to the pool.
  • Pros. Cold start is paid once at process boot, not per session. Capacity is predictable (pool size = peak concurrency). Autoscaling works off queue depth or active-session count.
  • Cons. Idle workers cost compute. Pool sizing matters — undersized pools create user-facing queueing.
  • Fits. Mid-to-large deployments where cold start hurts UX or concurrency exceeds what spawn-per-request can handle.