Discussions

Ask a Question
Back to all

# Feature Request: Pause Threshold for Natural Speech Processing

Problem

In LiveAvatar's FULL mode, STT processing happens immediately when the user pauses. This causes the avatar to respond to incomplete thoughts before the user finishes speaking.

Example:

User: "I want to..." [pauses 1s to think]
Avatar: "Yes, what would you like?" ❌
User: "...discuss the methodology" [interrupted, fragmented]

Proposed Solution

Add a configurable pause threshold (e.g., 3 seconds) that waits for the user to finish their complete thought before processing.

sequenceDiagram
    participant User
    participant STT
    participant Timer
    participant Avatar

    User->>STT: "I want to..."
    STT->>Timer: Start 3s threshold
    Note over Timer: Waiting... (1s elapsed)
    User->>STT: "...discuss methodology"
    STT->>Timer: Reset timer
    Note over Timer: User still speaking
    User->>STT: [stops speaking]
    STT->>Timer: Start 3s threshold
    Note over Timer: Waiting... 3s
    Timer->>Avatar: ✅ Complete: "I want to discuss methodology"
    Avatar->>User: [Responds to complete sentence]

How It Works

flowchart TD
    A[User Starts Speaking] --> B[STT Processing Begins]
    B --> C[User Stops Speaking]
    C --> D[Start Pause Timer]
    D --> E{User Resumes?}
    E -->|Yes, before 3s| F[Append Text<br/>Reset Timer]
    F --> C
    E -->|No, 3s elapsed| G[Finalize Complete<br/>Transcription]
    G --> H[Send to Avatar]

Benefits

  • Natural conversation flow - Users can pause to think without being interrupted
  • Complete sentences - Avatar receives full context, not fragments
  • Better responses - Avatar understands complete thoughts
  • Accessibility - Helps language learners, people with speech differences

Proposed API

{
  "avatar_id": "...",
  "mode": "FULL",
  "stt_config": {
    "pause_threshold_ms": 3000,
    "min_utterance_length": 5
  }
}

Use Cases

  • Research interviews where participants need time to formulate answers
  • Educational sessions with complex topics
  • Therapy/counseling scenarios requiring thoughtful responses
  • Any conversation requiring natural thinking pauses

Note: This requires server-side implementation in LiveAvatar's STT pipeline, as FULL mode processing is server-controlled.