Skip to main content
Push-to-Talk (PTT) gives you deterministic control over speech boundaries instead of relying on automatic voice activity detection.

When to use PTT

  • Users with non-linear or thoughtful conversational patterns
  • Scenarios where short pauses shouldn’t signal speech completion
  • Applications requiring precise control over speech boundaries
  • Custom UI/UX (press-and-hold buttons, toggles, keyboard shortcuts)

How it works

1

Start PTT

Your application sends user.start_push_to_talk. The system begins buffering audio.
2

User speaks

The user talks freely — pauses, corrections, and hesitations are all captured. No response is generated yet.
3

End PTT

Your application sends user.stop_push_to_talk. Audio capture stops.
4

Processing & Response

The avatar processes the captured audio and delivers a response.
Audio outside PTT windows is ignored.

Setup

Set interactivity_type to PUSH_TO_TALK when creating the session token:
{
  "mode": "FULL",
  "interactivity_type": "PUSH_TO_TALK",
  "avatar_id": "<avatar_id>",
  "avatar_persona": {
    "voice_id": "<voice_id>",
    "context_id": "<context_id>"
  }
}

Events

Commands (you send)

EventDescription
user.start_push_to_talkSignal that the user is beginning to speak
user.stop_push_to_talkSignal that the user has finished speaking

Responses (you receive)

EventDescription
user.push_to_talk_startedPTT successfully started
user.push_to_talk_start_failedPTT failed to start
user.push_to_talk_stoppedPTT successfully stopped
user.push_to_talk_stop_failedPTT failed to stop

UI recommendations

Map PTT controls to familiar patterns:
  • Press-and-hold button (walkie-talkie style)
  • Toggle switch (tap to start, tap to stop)
  • Keyboard shortcut (spacebar hold)
  • Touch or controller input for mobile/VR