Use this file to discover all available pages before exploring further.
As part of LITE Mode, LiveAvatar provisions a LiveKit room for every session. You can send your own custom LiveKit Agent into that room to monitor user behavior and drive our avatar — handle STT, LLM, TTS, tool calls, and any orchestration logic you need.This guide walks through a Python starter that does exactly that.
liveavatar-starter-livekit-agent-python
Reference implementation for this guide. Demo entrypoint: src/liveavatar_hosted_demo.py.
This path is best for teams without an existing LiveKit setup. If you already run your own LiveKit project, we recommend checking out our LiveKit avatar plugin.
An AVATAR_ID from your account. The default in the starter repo is sandbox-compatible; swap it for one of your own avatars when going to production.
A LiveKit Cloud project with LIVEKIT_API_KEY and LIVEKIT_API_SECRET. These are used for the inference gateway (STT / LLM / TTS) only — the room itself is provisioned by LiveAvatar. Sign up at cloud.livekit.io if you do not have one.
Python 3.11+ and uv (or pip) to install the starter’s dependencies.
LIVEKIT_URL is not required for this flow — the room URL is returned by start_session.
LIVEAVATAR_API_KEY=... # from app.liveavatar.comAVATAR_ID=... # any avatar in your account; sandbox-compatible by defaultLIVEKIT_API_KEY=... # your LK Cloud project — used for the inference gateway onlyLIVEKIT_API_SECRET=...IS_SANDBOX=true # default; switch to false for production avatars / billed minutes
We do not need to set our own LIVEKIT_URL for this flow — we will instead use the room URL returned by the start_session call.Install and run:
uv syncpython src/liveavatar_hosted_demo.py
A new browser tab should pop up with your avatar in the room. Try speaking to it.Behind the scenes, your agent runs STT → LLM → TTS and streams audio to LiveAvatar for lip-sync. The next section breaks down the full flow.
LiveAvatar mints both tokens at session start.start_session returns a viewer token (so your end user can join) and an agent token (so your agent can join the same room).
Your agent joins the LiveAvatar-owned room. Using the agent token, your agent connects to LiveAvatar’s room along with the user.
Your agent processes user input and feeds audio into the LiveAvatar WebSocket. As the user speaks, the agent runs STT → LLM → TTS and forwards synthesized audio over the LiveAvatar WebSocket (ws_url). LiveAvatar handles lip-sync, publishing avatar video back into the room.
We use LiveKit’s framework here for simplicity. You’re free to swap out individual STT, LLM, or TTS providers — or replace LiveKit entirely with a different agent stack.
When we call python src/liveavatar_hosted_demo.py we do the following:
Start a LiveAvatar session. We call the LiveAvatar API to spin up a new session. In return we get the LiveKit room URL, an agent token, a viewer token, and a WebSocket URL for streaming audio to the avatar.
Spin up the agent locally. We boot a local LiveKit Agents worker — no LiveKit Cloud registration needed since LiveAvatar owns the room.
Send the agent into the room. Using the agent token, the worker joins the LiveAvatar room and opens the WebSocket so it can drive the avatar.
Open the viewer. A local browser tab launches with the viewer token preloaded, so you can see and talk to the avatar.
Run the conversation loop. The agent listens to user audio, runs STT → LLM → TTS, and forwards synthesized audio over the WebSocket. LiveAvatar lip-syncs and publishes video back into the room.
Clean up on exit. When the process stops, the agent closes the LiveAvatar session so credits and room minutes stop billing.
The starter is split into a backend (the agent itself) and a small frontend used to visualize the session locally.Backend (src/):
File
Responsibility
liveavatar_client.py
Makes API calls to LiveAvatar (create session, start session, stop session).
avatar_ws.py
Maintains the WebSocket connection to LiveAvatar and forwards audio frames for lip-sync.
agent.py
Defines the agent that handles the conversation and tees synthesized audio to the LiveAvatar WebSocket.
pipeline.py
Builds the STT → LLM → TTS pipeline and wires up room + session observability.
worker.py
Defines the LiveKit worker entrypoint that joins the room and runs the agent.
liveavatar_hosted_demo.py
Demo entrypoint — orchestrates the session, the worker, and the local viewer.
Frontend (viewer/):We ship a minimal frontend so you can see the avatar and talk to it locally without standing up your own client. Treat it as a reference, not as production frontend code.
File
Responsibility
index.html
Vanilla LiveKit JS client. Connects to the LiveAvatar room with the viewer token, publishes the user’s microphone, and renders the avatar’s audio + video.
This demo is wired up to help you quickly iterate locally. When productionizing, here are some recommendations: remove the local scaffolding, run the agent as a long-lived process, and tighten up session lifecycle.
A few things in the starter exist purely for local convenience. We recommend the following swaps:
Build your own frontend. Replace the local viewer HTTP server and auto-launched browser tab with a real client that joins the LiveAvatar room using the viewer token.
Simplify the agent runtime. The demo uses local scaffolding to dispatch a single in-process job. In production, there are many ways to manage the agent — pick the runtime that fits your infrastructure.
Hand off shutdown to your runtime. The demo handles Ctrl-C itself. In production, lifecycle and shutdown signals come from your container, supervisor, or queue worker.
Guard against leaked sessions. Always call stop_session on session end, including error paths. As a safety net for runners that crash without it, set max_session_duration server-side when calling create_session_token so LiveAvatar will close abandoned sessions on its own.
Evaluate the code structure. Our helpers wrap LiveKit’s AgentSession, worker, and room abstractions, encapsulating WebRTC complexity.
This structure may be over-engineered for your use case — we recommend cleaning up pieces that don’t make sense for your setup.
If you anticipate running your own LiveKit deployment later, keeping this structure simplifies the transition.
A LiveAvatar session is stateful and longer-lived than a typical request — your agent holds an open LiveKit room connection, a WebSocket to LiveAvatar, and live STT / LLM / TTS streams for the duration of the conversation.The right deployment shape depends heavily on your traffic, concurrency, and infrastructure — there’s no one-size-fits-all answer here.We have seen some of the following deployment patterns work well in practice.
A backend HTTP service receives a “start session” request, spawns a runner process for that session, and returns the viewer token + room URL to the client. The process lives until the session ends, then exits.
Pros. Dead simple. No queue, no pool, no orchestration. Each session is fully isolated.
Cons. Cold start hits every session — VAD, turn-detector, and plugin init can add several seconds before the agent joins the room.
Fits. Smaller deployments with modest concurrency, or sessions long enough that startup latency is not user-facing pain.
A pool of runner processes sits idle, blocked on a queue. The backend enqueues a session-start job; the first idle worker pops it, calls start_session, runs the conversation, then returns to the pool.
Pros. Cold start is paid once at process boot, not per session. Capacity is predictable (pool size = peak concurrency). Autoscaling works off queue depth or active-session count.