Custom Mode Latency: Can LiveAvatar Accept Audio Directly From Backend or Speak From Text?

We are using Custom Mode with LiveAvatar and generating speech using ElevenLabs.

Current flow:

Text → audio generated via ElevenLabs (server)
Audio sent from backend to frontend
Frontend streams audio to LiveAvatar

Because audio is generated first and then passes through backend → frontend → LiveAvatar, this adds noticeable latency.

Questions:

Is it possible to send the generated audio directly from the backend to LiveAvatar, skipping the frontend?
Or does Custom Mode support making the avatar speak directly from text (server-side), without us handling audio streaming manually?

We’re trying to reduce end-to-end latency as much as possible.

We don’t want to use Full Mode because we already have our own backend LLM stack, and paying nearly 2× more for features we don’t use doesn’t make sense for us.