From bcc608737b51dcffc7e00f57f431b57cfc35d754 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]" <github-actions[bot]@users.noreply.github.com>
Date: Fri, 17 Apr 2026 11:59:09 +0000
Subject: [PATCH] docs: update for pipecat PR #4252 (VIVA SDK TT v3 support)

- Add Krisp Interruption Prediction (IP) capability to Krisp VIVA feature guide
- Add KRISP_VIVA_IP_MODEL_PATH environment variable documentation
- Add new Interruption Prediction section with usage example
- Update KrispVivaTurn documentation to mention v3 API and VAD integration
- Add KrispVivaIPUserTurnStartStrategy to user turn strategies reference
---
 .../turn-detection/krisp-viva-turn.mdx        |  8 +--
 .../turn-management/user-turn-strategies.mdx  | 51 +++++++++++++++++++
 pipecat/features/krisp-viva.mdx               | 50 +++++++++++++++++-
 3 files changed, 103 insertions(+), 6 deletions(-)
diff --git a/api-reference/server/utilities/turn-detection/krisp-viva-turn.mdx b/api-reference/server/utilities/turn-detection/krisp-viva-turn.mdx
index 6b7922df..b8ed4acf 100644
--- a/api-reference/server/utilities/turn-detection/krisp-viva-turn.mdx
+++ b/api-reference/server/utilities/turn-detection/krisp-viva-turn.mdx
@@ -5,7 +5,7 @@ description: "Turn detection using Krisp VIVA SDK"
 
 ## Overview
 
-`KrispVivaTurn` is a turn analyzer that uses Krisp's VIVA SDK turn detection (Tt) API to determine when a user has finished speaking. Unlike the [Smart Turn model](/api-reference/server/utilities/turn-detection/smart-turn-overview) which analyzes audio in batches when VAD detects a pause, `KrispVivaTurn` processes audio frame-by-frame in real time using Krisp's streaming model.
+`KrispVivaTurn` is a turn analyzer that uses Krisp's VIVA SDK turn detection v3 (Tt) API to determine when a user has finished speaking. The Tt API accepts an external VAD flag with each audio frame, allowing the model to leverage voice activity information for more accurate turn detection. Unlike the [Smart Turn model](/api-reference/server/utilities/turn-detection/smart-turn-overview) which analyzes audio in batches when VAD detects a pause, `KrispVivaTurn` processes audio frame-by-frame in real time using Krisp's streaming model.
 
 <CardGroup cols={2}>
   <Card
@@ -101,10 +101,10 @@ user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
 
 ## How It Works
 
-`KrispVivaTurn` processes audio as a streaming model, analyzing each audio frame in real time:
+`KrispVivaTurn` processes audio as a streaming model, analyzing each audio frame in real time with VAD integration:
 
-1. **Frame-by-frame processing**: Each incoming audio frame is processed by the Krisp turn detection model, which outputs a probability that the user's turn is complete.
-2. **Speech tracking**: VAD signals are used to track when speech starts and stops.
+1. **VAD-enhanced processing**: Each incoming audio frame is processed by the Krisp turn detection v3 model along with a VAD flag indicating whether speech is present. The model uses both the audio and VAD information to output a probability that the user's turn is complete.
+2. **Speech tracking**: VAD signals are used to track when speech starts and stops, providing context to the turn detection model.
 3. **Threshold crossing**: When the model's probability exceeds the configured `threshold` after speech has been detected, the turn is marked as complete.
 
 This differs from the [Smart Turn model](/api-reference/server/utilities/turn-detection/smart-turn-overview) which buffers audio and runs batch inference when VAD detects a pause. `KrispVivaTurn` makes its decision continuously as audio flows through, which can result in faster turn detection.
diff --git a/api-reference/server/utilities/turn-management/user-turn-strategies.mdx b/api-reference/server/utilities/turn-management/user-turn-strategies.mdx
index 89ac1d30..96126a6f 100644
--- a/api-reference/server/utilities/turn-management/user-turn-strategies.mdx
+++ b/api-reference/server/utilities/turn-management/user-turn-strategies.mdx
@@ -204,6 +204,57 @@ async def on_wake_phrase_timeout(strategy):
   detection.
 </Note>
 
+### KrispVivaIPUserTurnStartStrategy
+
+Uses Krisp's Interruption Prediction (IP) model to distinguish genuine user interruptions from backchannels (e.g., "uh-huh", "yeah"). When VAD detects user speech, this strategy feeds audio frames into the Krisp VIVA IP model, which outputs a probability indicating whether the speech is a genuine interruption. A user turn is triggered only when this probability exceeds the configured threshold.
+
+This strategy is designed to work alongside other start strategies (e.g., `TranscriptionUserTurnStartStrategy` as a fallback).
+
+<ParamField path="model_path" type="Optional[str]" default="None">
+  Path to the Krisp VIVA IP model file (.kef extension). If None, uses the
+  `KRISP_VIVA_IP_MODEL_PATH` environment variable.
+</ParamField>
+
+<ParamField path="threshold" type="float" default="0.5">
+  IP probability threshold (0.0 to 1.0). When the model's output exceeds this
+  value, the speech is classified as a genuine interruption.
+</ParamField>
+
+<ParamField path="frame_duration_ms" type="int" default="20">
+  Frame duration in milliseconds for IP processing. Supported values: 10, 15,
+  20, 30, 32.
+</ParamField>
+
+<ParamField path="api_key" type="str" default='""'>
+  Krisp SDK API key. If empty, falls back to the `KRISP_VIVA_API_KEY`
+  environment variable.
+</ParamField>
+
+```python
+from pipecat.turns.user_start import (
+    KrispVivaIPUserTurnStartStrategy,
+    TranscriptionUserTurnStartStrategy,
+)
+
+strategy = KrispVivaIPUserTurnStartStrategy(
+    model_path="/path/to/ip_model.kef",
+    threshold=0.5,
+)
+
+# Use with a fallback strategy
+strategies = UserTurnStrategies(
+    start=[
+        KrispVivaIPUserTurnStartStrategy(threshold=0.5),
+        TranscriptionUserTurnStartStrategy(),  # Fallback
+    ],
+)
+```
+
+<Note>
+  Requires the Krisp Python SDK. See the [Krisp VIVA
+  guide](/pipecat/features/krisp-viva) for installation instructions.
+</Note>
+
 ### ExternalUserTurnStartStrategy
 
 Delegates turn start detection to an external processor. This strategy listens for `UserStartedSpeakingFrame` frames emitted by other components in the pipeline (such as speech-to-speech services).
diff --git a/pipecat/features/krisp-viva.mdx b/pipecat/features/krisp-viva.mdx
index 7fc5165b..1d24e2ca 100644
--- a/pipecat/features/krisp-viva.mdx
+++ b/pipecat/features/krisp-viva.mdx
@@ -6,10 +6,11 @@ description: "Learn how to integrate Krisp's VIVA voice isolation and turn detec
 
 ## Overview
 
-Krisp's VIVA SDK provides three capabilities for Pipecat applications:
+Krisp's VIVA SDK provides four capabilities for Pipecat applications:
 
 - **Voice Isolation** — Filter out background noise and voices from the user's audio input stream, yielding clearer audio for fewer false interruptions and better transcription.
 - **Turn Detection** — Determine when a user has finished speaking using Krisp's streaming turn detection model, as an alternative to the [Smart Turn model](/api-reference/server/utilities/turn-detection/smart-turn-overview).
+- **Interruption Prediction** — Distinguish genuine user interruptions from backchannels (e.g. "uh-huh", "yeah"), preventing the bot from being interrupted by brief acknowledgements.
 - **Voice Activity Detection** — Detect speech in audio streams using Krisp's VAD model, supporting sample rates from 8kHz to 48kHz.
 
 You can use any combination of these features together.
@@ -29,6 +30,13 @@ You can use any combination of these features together.
   >
     API reference for turn detection
   </Card>
+  <Card
+    title="KrispVivaIPUserTurnStartStrategy"
+    icon="code"
+    href="/api-reference/server/utilities/turn-management/user-turn-strategies#krispvivaipuserturnstartstrategy"
+  >
+    API reference for interruption prediction
+  </Card>
   <Card
     title="KrispVivaVadAnalyzer Reference"
     icon="code"
@@ -111,13 +119,17 @@ KRISP_VIVA_FILTER_MODEL_PATH=/PATH_TO_UNZIPPED_MODELS/krisp-viva-tel-v2.kef
 # Turn detection model path
 KRISP_VIVA_TURN_MODEL_PATH=/PATH_TO_UNZIPPED_MODELS/krisp-viva-tt-v2.kef
 
+# Interruption prediction model path
+KRISP_VIVA_IP_MODEL_PATH=/PATH_TO_UNZIPPED_MODELS/krisp-viva-ip-v3.kef
+
 # Voice activity detection model path (optional)
 KRISP_VIVA_VAD_MODEL_PATH=/PATH_TO_UNZIPPED_MODELS/krisp-viva-vad-v2.kef
 ```
 
 <Note>
   Each feature uses a **different model**. Set `KRISP_VIVA_FILTER_MODEL_PATH`
-  for voice isolation, `KRISP_VIVA_TURN_MODEL_PATH` for turn detection, and
+  for voice isolation, `KRISP_VIVA_TURN_MODEL_PATH` for turn detection,
+  `KRISP_VIVA_IP_MODEL_PATH` for interruption prediction, and
   `KRISP_VIVA_VAD_MODEL_PATH` for voice activity detection.
 </Note>
 
@@ -182,6 +194,40 @@ user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
 
 See the [KrispVivaTurn reference](/api-reference/server/utilities/turn-detection/krisp-viva-turn) for configuration options.
 
+## Interruption Prediction
+
+`KrispVivaIPUserTurnStartStrategy` uses Krisp's Interruption Prediction (IP) model to distinguish genuine user interruptions from backchannels. When VAD detects user speech, the IP model analyzes the audio and outputs a probability indicating whether the speech is a real interruption or a brief acknowledgement (e.g., "uh-huh", "yeah").
+
+This prevents the bot from being interrupted unnecessarily by short utterances. Configure it as a user turn start strategy:
+
+```python
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+)
+from pipecat.turns.user_start import (
+    KrispVivaIPUserTurnStartStrategy,
+    TranscriptionUserTurnStartStrategy,
+)
+from pipecat.turns.user_turn_strategies import UserTurnStrategies
+
+user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+    context,
+    user_params=LLMUserAggregatorParams(
+        user_turn_strategies=UserTurnStrategies(
+            start=[
+                KrispVivaIPUserTurnStartStrategy(threshold=0.5),
+                TranscriptionUserTurnStartStrategy(),  # Fallback
+            ],
+        ),
+        vad_analyzer=SileroVADAnalyzer(),
+    ),
+)
+```
+
+See the [KrispVivaIPUserTurnStartStrategy reference](/api-reference/server/utilities/turn-management/user-turn-strategies#krispvivaipuserturnstartstrategy) for configuration options.
+
 ## Voice Activity Detection
 
 `KrispVivaVadAnalyzer` detects speech in audio streams using Krisp's VAD model. It supports sample rates from 8kHz to 48kHz, making it suitable for a wide range of applications including telephony and high-quality audio.