From bcc608737b51dcffc7e00f57f431b57cfc35d754 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Fri, 17 Apr 2026 11:59:09 +0000 Subject: [PATCH] docs: update for pipecat PR #4252 (VIVA SDK TT v3 support) - Add Krisp Interruption Prediction (IP) capability to Krisp VIVA feature guide - Add KRISP_VIVA_IP_MODEL_PATH environment variable documentation - Add new Interruption Prediction section with usage example - Update KrispVivaTurn documentation to mention v3 API and VAD integration - Add KrispVivaIPUserTurnStartStrategy to user turn strategies reference --- .../turn-detection/krisp-viva-turn.mdx | 8 +-- .../turn-management/user-turn-strategies.mdx | 51 +++++++++++++++++++ pipecat/features/krisp-viva.mdx | 50 +++++++++++++++++- 3 files changed, 103 insertions(+), 6 deletions(-) diff --git a/api-reference/server/utilities/turn-detection/krisp-viva-turn.mdx b/api-reference/server/utilities/turn-detection/krisp-viva-turn.mdx index 6b7922df..b8ed4acf 100644 --- a/api-reference/server/utilities/turn-detection/krisp-viva-turn.mdx +++ b/api-reference/server/utilities/turn-detection/krisp-viva-turn.mdx @@ -5,7 +5,7 @@ description: "Turn detection using Krisp VIVA SDK" ## Overview -`KrispVivaTurn` is a turn analyzer that uses Krisp's VIVA SDK turn detection (Tt) API to determine when a user has finished speaking. Unlike the [Smart Turn model](/api-reference/server/utilities/turn-detection/smart-turn-overview) which analyzes audio in batches when VAD detects a pause, `KrispVivaTurn` processes audio frame-by-frame in real time using Krisp's streaming model. +`KrispVivaTurn` is a turn analyzer that uses Krisp's VIVA SDK turn detection v3 (Tt) API to determine when a user has finished speaking. The Tt API accepts an external VAD flag with each audio frame, allowing the model to leverage voice activity information for more accurate turn detection. Unlike the [Smart Turn model](/api-reference/server/utilities/turn-detection/smart-turn-overview) which analyzes audio in batches when VAD detects a pause, `KrispVivaTurn` processes audio frame-by-frame in real time using Krisp's streaming model. +### KrispVivaIPUserTurnStartStrategy + +Uses Krisp's Interruption Prediction (IP) model to distinguish genuine user interruptions from backchannels (e.g., "uh-huh", "yeah"). When VAD detects user speech, this strategy feeds audio frames into the Krisp VIVA IP model, which outputs a probability indicating whether the speech is a genuine interruption. A user turn is triggered only when this probability exceeds the configured threshold. + +This strategy is designed to work alongside other start strategies (e.g., `TranscriptionUserTurnStartStrategy` as a fallback). + + + Path to the Krisp VIVA IP model file (.kef extension). If None, uses the + `KRISP_VIVA_IP_MODEL_PATH` environment variable. + + + + IP probability threshold (0.0 to 1.0). When the model's output exceeds this + value, the speech is classified as a genuine interruption. + + + + Frame duration in milliseconds for IP processing. Supported values: 10, 15, + 20, 30, 32. + + + + Krisp SDK API key. If empty, falls back to the `KRISP_VIVA_API_KEY` + environment variable. + + +```python +from pipecat.turns.user_start import ( + KrispVivaIPUserTurnStartStrategy, + TranscriptionUserTurnStartStrategy, +) + +strategy = KrispVivaIPUserTurnStartStrategy( + model_path="/path/to/ip_model.kef", + threshold=0.5, +) + +# Use with a fallback strategy +strategies = UserTurnStrategies( + start=[ + KrispVivaIPUserTurnStartStrategy(threshold=0.5), + TranscriptionUserTurnStartStrategy(), # Fallback + ], +) +``` + + + Requires the Krisp Python SDK. See the [Krisp VIVA + guide](/pipecat/features/krisp-viva) for installation instructions. + + ### ExternalUserTurnStartStrategy Delegates turn start detection to an external processor. This strategy listens for `UserStartedSpeakingFrame` frames emitted by other components in the pipeline (such as speech-to-speech services). diff --git a/pipecat/features/krisp-viva.mdx b/pipecat/features/krisp-viva.mdx index 7fc5165b..1d24e2ca 100644 --- a/pipecat/features/krisp-viva.mdx +++ b/pipecat/features/krisp-viva.mdx @@ -6,10 +6,11 @@ description: "Learn how to integrate Krisp's VIVA voice isolation and turn detec ## Overview -Krisp's VIVA SDK provides three capabilities for Pipecat applications: +Krisp's VIVA SDK provides four capabilities for Pipecat applications: - **Voice Isolation** — Filter out background noise and voices from the user's audio input stream, yielding clearer audio for fewer false interruptions and better transcription. - **Turn Detection** — Determine when a user has finished speaking using Krisp's streaming turn detection model, as an alternative to the [Smart Turn model](/api-reference/server/utilities/turn-detection/smart-turn-overview). +- **Interruption Prediction** — Distinguish genuine user interruptions from backchannels (e.g. "uh-huh", "yeah"), preventing the bot from being interrupted by brief acknowledgements. - **Voice Activity Detection** — Detect speech in audio streams using Krisp's VAD model, supporting sample rates from 8kHz to 48kHz. You can use any combination of these features together. @@ -29,6 +30,13 @@ You can use any combination of these features together. > API reference for turn detection + + API reference for interruption prediction + Each feature uses a **different model**. Set `KRISP_VIVA_FILTER_MODEL_PATH` - for voice isolation, `KRISP_VIVA_TURN_MODEL_PATH` for turn detection, and + for voice isolation, `KRISP_VIVA_TURN_MODEL_PATH` for turn detection, + `KRISP_VIVA_IP_MODEL_PATH` for interruption prediction, and `KRISP_VIVA_VAD_MODEL_PATH` for voice activity detection. @@ -182,6 +194,40 @@ user_aggregator, assistant_aggregator = LLMContextAggregatorPair( See the [KrispVivaTurn reference](/api-reference/server/utilities/turn-detection/krisp-viva-turn) for configuration options. +## Interruption Prediction + +`KrispVivaIPUserTurnStartStrategy` uses Krisp's Interruption Prediction (IP) model to distinguish genuine user interruptions from backchannels. When VAD detects user speech, the IP model analyzes the audio and outputs a probability indicating whether the speech is a real interruption or a brief acknowledgement (e.g., "uh-huh", "yeah"). + +This prevents the bot from being interrupted unnecessarily by short utterances. Configure it as a user turn start strategy: + +```python +from pipecat.audio.vad.silero import SileroVADAnalyzer +from pipecat.processors.aggregators.llm_response_universal import ( + LLMContextAggregatorPair, + LLMUserAggregatorParams, +) +from pipecat.turns.user_start import ( + KrispVivaIPUserTurnStartStrategy, + TranscriptionUserTurnStartStrategy, +) +from pipecat.turns.user_turn_strategies import UserTurnStrategies + +user_aggregator, assistant_aggregator = LLMContextAggregatorPair( + context, + user_params=LLMUserAggregatorParams( + user_turn_strategies=UserTurnStrategies( + start=[ + KrispVivaIPUserTurnStartStrategy(threshold=0.5), + TranscriptionUserTurnStartStrategy(), # Fallback + ], + ), + vad_analyzer=SileroVADAnalyzer(), + ), +) +``` + +See the [KrispVivaIPUserTurnStartStrategy reference](/api-reference/server/utilities/turn-management/user-turn-strategies#krispvivaipuserturnstartstrategy) for configuration options. + ## Voice Activity Detection `KrispVivaVadAnalyzer` detects speech in audio streams using Krisp's VAD model. It supports sample rates from 8kHz to 48kHz, making it suitable for a wide range of applications including telephony and high-quality audio.