Push-to-Talk - LiveAvatar Documentation

Push-to-Talk (PTT) gives you deterministic control over speech boundaries instead of relying on automatic voice activity detection.

When to use PTT

Users with non-linear or thoughtful conversational patterns
Scenarios where short pauses shouldn’t signal speech completion
Applications requiring precise control over speech boundaries
Custom UI/UX (press-and-hold buttons, toggles, keyboard shortcuts)

How it works

Start PTT

Your application sends user.start_push_to_talk. The system begins buffering audio.

User speaks

The user talks freely — pauses, corrections, and hesitations are all captured. No response is generated yet.

End PTT

Your application sends user.stop_push_to_talk. Audio capture stops.

Processing & Response

The avatar processes the captured audio and delivers a response.

Audio outside PTT windows is ignored.

Setup

Set interactivity_type to PUSH_TO_TALK when creating the session token:

{
  "mode": "FULL",
  "interactivity_type": "PUSH_TO_TALK",
  "avatar_id": "<avatar_id>",
  "avatar_persona": {
    "voice_id": "<voice_id>",
    "context_id": "<context_id>"
  }
}

Events

Commands (you send)

Event	Description
`user.start_push_to_talk`	Signal that the user is beginning to speak.
`user.stop_push_to_talk`	Signal that the user has finished speaking.

Responses (you receive)

Event	Description
`user.push_to_talk_started`	PTT successfully started.
`user.push_to_talk_start_failed`	PTT failed to start.
`user.push_to_talk_stopped`	PTT successfully stopped.
`user.push_to_talk_stop_failed`	PTT failed to stop.

UI recommendations

Map PTT controls to familiar patterns:

Press-and-hold button (walkie-talkie style)
Toggle switch (tap to start, tap to stop)
Keyboard shortcut (spacebar hold)
Touch or controller input for mobile/VR

Configuring Voice Settings

Custom LLM Integration

⌘I

​When to use PTT

​How it works

​Setup

​Events

​Commands (you send)

​Responses (you receive)

​UI recommendations