- Voice Activity Detection (VAD) — detects when the user is speaking
- Speech-to-Text (STT) — transcribes user audio
- Large Language Models (LLM) — generates responses
- Text-to-Speech (TTS) — converts responses to natural speech
When to use FULL Mode
FULL Mode is ideal if you want to:- Delegate WebRTC orchestration and infrastructure management
- Avoid building and maintaining a real-time AI pipeline across audio input, inference, and output
- Ship products faster without managing model coordination, streaming latency, or state management
Getting started
Create a session token
Configure your avatar, voice, context, and interactivity type in a session token. Set
mode to "FULL".Start the session
Call the start session endpoint to initialize the WebRTC room.
Learn more
Lifecycle
Understand the three phases of a FULL Mode session.
Configuration
Customize avatar, voice, context, and interactivity.
Voice Settings
Fine-tune TTS provider settings.
Events
Command and response events reference.