Discussions
# Feature Request: Pause Threshold for Natural Speech Processing
last month by Marvin Hackfort
Problem
In LiveAvatar's FULL mode, STT processing happens immediately when the user pauses. This causes the avatar to respond to incomplete thoughts before the user finishes speaking.
Example:
User: "I want to..." [pauses 1s to think]
Avatar: "Yes, what would you like?" ❌
User: "...discuss the methodology" [interrupted, fragmented]
Proposed Solution
Add a configurable pause threshold (e.g., 3 seconds) that waits for the user to finish their complete thought before processing.
sequenceDiagram
participant User
participant STT
participant Timer
participant Avatar
User->>STT: "I want to..."
STT->>Timer: Start 3s threshold
Note over Timer: Waiting... (1s elapsed)
User->>STT: "...discuss methodology"
STT->>Timer: Reset timer
Note over Timer: User still speaking
User->>STT: [stops speaking]
STT->>Timer: Start 3s threshold
Note over Timer: Waiting... 3s
Timer->>Avatar: ✅ Complete: "I want to discuss methodology"
Avatar->>User: [Responds to complete sentence]
How It Works
flowchart TD
A[User Starts Speaking] --> B[STT Processing Begins]
B --> C[User Stops Speaking]
C --> D[Start Pause Timer]
D --> E{User Resumes?}
E -->|Yes, before 3s| F[Append Text<br/>Reset Timer]
F --> C
E -->|No, 3s elapsed| G[Finalize Complete<br/>Transcription]
G --> H[Send to Avatar]
Benefits
- ✅ Natural conversation flow - Users can pause to think without being interrupted
- ✅ Complete sentences - Avatar receives full context, not fragments
- ✅ Better responses - Avatar understands complete thoughts
- ✅ Accessibility - Helps language learners, people with speech differences
Proposed API
{
"avatar_id": "...",
"mode": "FULL",
"stt_config": {
"pause_threshold_ms": 3000,
"min_utterance_length": 5
}
}
Use Cases
- Research interviews where participants need time to formulate answers
- Educational sessions with complex topics
- Therapy/counseling scenarios requiring thoughtful responses
- Any conversation requiring natural thinking pauses
Note: This requires server-side implementation in LiveAvatar's STT pipeline, as FULL mode processing is server-controlled.