VisionAgents is GetStream’s Python framework for composing real-time AI agents from pluggable STT, LLM, TTS, and avatar components. TheDocumentation Index
Fetch the complete documentation index at: https://docs.liveavatar.com/llms.txt
Use this file to discover all available pages before exploring further.
liveavatar plugin adds a LiveAvatar-rendered avatar to any VisionAgents pipeline, with lip-synced video and audio driven by your agent’s TTS (or Realtime LLM) output.
This page assumes you already have a VisionAgents pipeline running. If not, start with the VisionAgents quickstart, or pick a different integration path.
Prerequisites
- A LiveAvatar account and API key. Sign up at app.liveavatar.com.
- A GetStream account — VisionAgents uses
getstream.Edge()as its transport for the end-user-facing call. - A working VisionAgents project with STT, LLM, and TTS providers configured (or a Realtime LLM).
Installation
Quick Start
Realtime LLM variant
With a Realtime LLM the TTS step is skipped — the LLM’s audio output is forwarded straight to the avatar. Do not passtts= alongside a Realtime() LLM; the realtime audio path bypasses it entirely.
Configuration
The plugin reads credentials from the environment:api_key="..." and avatar_id="..." directly to liveavatar.Avatar(...).
Avatar parameters
| Parameter | Type | Default | Notes |
|---|---|---|---|
avatar_id | string | None | LiveAvatar avatar UUID. Falls back to LIVEAVATAR_AVATAR_ID. |
api_key | string | None | Falls back to LIVEAVATAR_API_KEY. |
base_url | string | None | Override the default LiveAvatar API endpoint. |
is_sandbox | boolean | True | Sandbox sessions are free but duration-capped. Set False in production. |
max_session_duration | integer | None | Session timeout in seconds. None uses LiveAvatar’s default. |
video_quality | string | "high" | low | medium | high | very_high. |
video_encoding | string | "H264" | H264 | VP8. |
width | integer | 1280 | Video width in pixels. |
height | integer | 720 | Video height in pixels. |
fps | integer | 30 | Frame rate. |
buffer_seconds | float | 1.0 | Video buffer depth ahead of audio. |
width, height, and fps configure the outbound track the bot republishes into the Stream Edge room — not the avatar render itself. The render is sized server-side by video_quality. Setting width=1920 with video_quality="low" upscales a low-resolution render; it does not produce a higher-resolution avatar.buffer_seconds is not end-to-end latency. It caps the video queue depth (fps * buffer_seconds frames) used by the A/V synchronizer to absorb jitter. Larger values smooth more network variance at the cost of added smoothing delay; smaller values are tighter but more sensitive to hiccups.Sandbox vs. production
is_sandbox=True (the default) lets you build without burning credits, but sessions are duration-capped. For production set is_sandbox=False and pass your own avatar_id — sessions then run against your account’s credits with no duration limit beyond max_session_duration.
Architecture
VisionAgents uses GetStream’sEdge transport for the user-facing call. The liveavatar.Avatar() component slots into the agent’s avatar parameter: it opens a WebSocket to LiveAvatar, forwards TTS (or Realtime LLM) audio up, receives synchronized avatar audio + video, and republishes the avatar tracks into the Stream Edge room your end user is in.
Your pipeline still owns STT, LLM, and TTS. LiveAvatar only renders the avatar.
How it works
Under the hood,liveavatar.Avatar() manages three things: a control connection to LiveAvatar, an inbound media connection from the LiveKit room LiveAvatar provisions, and an A/V sync stage that feeds frames into the Stream Edge track your bot publishes.
Two channels to LiveAvatar
- WebSocket (control + audio upstream). TTS chunks coming out of your pipeline arrive as
AudioOutputChunkitems on the agent’sinput_audio_stream. The plugin forwards each chunk viasend_audio_frame, signalsend_turnwhen a turn finalizes, and callsinterrupton a flush (barge-in). - LiveKit RTC (avatar A/V downstream). Session start returns a
livekit_url+livekit_agent_token. The plugin joins that LiveKit room as a subscriber and receives the rendered avatar’s video frames and PCM audio.
AVSynchronizer and buffer_seconds
Avatar video and audio arrive on separate tracks and can drift. The plugin runs them through an AVSynchronizer that queues frames and emits them in lock-step on the bot’s outbound Stream tracks.
The video queue is capped at fps * buffer_seconds frames — that is what buffer_seconds controls. Larger values absorb more jitter at the cost of latency; smaller values are tighter but more sensitive to network hiccups. width, height, and fps configure the outbound track the bot publishes to Stream; the avatar render itself is sized by video_quality on the LiveAvatar side.
Session lifecycle
agent.join(call)callsAvatar.start().- The plugin calls
create_session_token(passingavatar_id,is_sandbox,max_session_duration,video_quality,video_encoding), thenstart_sessionto get the LiveKit URL, agent token, and WebSocket URL. - RTC + WebSocket connect; an audio-input task starts draining the agent’s audio stream into the WebSocket.
- On
agent.finish()or context exit,close()cancels the audio task, closes the WebSocket and RTC connection, drains the sync stage, and callsstop_session.
Interruptions
When the agent emits anAudioOutputFlush (e.g. the user barges in and your STT cuts off the current turn), the plugin sends interrupt to LiveAvatar over the WebSocket and flushes the local A/V queue so the avatar stops mid-utterance instead of finishing the cancelled response.
Resources
VisionAgents LiveAvatar integration
Upstream reference for the plugin.
Plugin source
Source, examples, and tests for
vision-agents[liveavatar].Runnable example
End-to-end Gemini + Deepgram + LiveAvatar agent you can run via
Runner / AgentLauncher.