Discussions

Ask a Question
Back to all

Custom Mode Latency: Can LiveAvatar Accept Audio Directly From Backend or Speak From Text?

We are using Custom Mode with LiveAvatar and generating speech using ElevenLabs.

Current flow:

  1. Text → audio generated via ElevenLabs (server)
  2. Audio sent from backend to frontend
  3. Frontend streams audio to LiveAvatar

Because audio is generated first and then passes through backend → frontend → LiveAvatar, this adds noticeable latency.

Questions:

  • Is it possible to send the generated audio directly from the backend to LiveAvatar, skipping the frontend?
  • Or does Custom Mode support making the avatar speak directly from text (server-side), without us handling audio streaming manually?

We’re trying to reduce end-to-end latency as much as possible.

We don’t want to use Full Mode because we already have our own backend LLM stack, and paying nearly 2× more for features we don’t use doesn’t make sense for us.