pipecat-ai · jamsea · Apr 17, 2026 · Apr 17, 2026 · Apr 17, 2026 · Apr 17, 2026
diff --git a/api-reference/pipecat-flows/overview.mdx b/api-reference/pipecat-flows/overview.mdx
@@ -107,6 +107,10 @@ Pipecat Flows works with any LLM service that supports function calling. Pipecat
 
 Any service that extends Pipecat's `LLMService` base class is supported. This includes OpenAI-compatible services like Groq, Together, Cerebras, DeepSeek, and others.
 
+### Realtime (S2S) models
+
+Realtime speech-to-speech services such as Gemini Live and OpenAI Realtime are not currently supported. See [Using Flows with Realtime Models](/pipecat-flows/guides/realtime-models) for the recommended cascade configuration.
+
 ## Additional Notes
 
 - **State Management**: Use `flow_manager.state` dictionary for persistent conversation data

diff --git a/api-reference/server/services/s2s/aws.mdx b/api-reference/server/services/s2s/aws.mdx
@@ -7,6 +7,12 @@ description: "Real-time speech-to-speech service implementation using AWS Nova S
 
 `AWSNovaSonicLLMService` enables natural, real-time conversations with AWS Nova Sonic. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with bidirectional audio streaming, text generation, and function calling capabilities.
 
+<Info>
+  **Not compatible with Pipecat Flows.** Flows requires a cascade LLM service.
+  See [Using Flows with Realtime
+  Models](/pipecat-flows/guides/realtime-models).
+</Info>
+
 <CardGroup cols={2}>
   <Card
     title="AWS Nova Sonic API Reference"

diff --git a/api-reference/server/services/s2s/gemini-live-vertex.mdx b/api-reference/server/services/s2s/gemini-live-vertex.mdx
@@ -7,6 +7,12 @@ description: "A real-time, multimodal conversational AI service powered by Googl
 
 `GeminiLiveVertexLLMService` enables natural, real-time conversations with Google's Gemini model through Vertex AI. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with multimodal capabilities including audio, video, and text processing.
 
+<Info>
+  **Not compatible with Pipecat Flows.** Flows requires a cascade LLM service.
+  See [Using Flows with Realtime
+  Models](/pipecat-flows/guides/realtime-models).
+</Info>
+
 <Tip>
   Want to start building? Check out our [Gemini Live
   Guide](/pipecat/features/gemini-live) for general concepts, then follow the

diff --git a/api-reference/server/services/s2s/gemini-live.mdx b/api-reference/server/services/s2s/gemini-live.mdx
@@ -7,6 +7,12 @@ description: "A real-time, multimodal conversational AI service powered by Googl
 
 `GeminiLiveLLMService` enables natural, real-time conversations with Google's Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with multimodal capabilities including audio, video, and text processing.
 
+<Info>
+  **Not compatible with Pipecat Flows.** Flows requires a cascade LLM service.
+  See [Using Flows with Realtime
+  Models](/pipecat-flows/guides/realtime-models).
+</Info>
+
 <Tip>
   Want to start building? Check out our [Gemini Live
   Guide](/pipecat/features/gemini-live).

diff --git a/api-reference/server/services/s2s/grok.mdx b/api-reference/server/services/s2s/grok.mdx
@@ -7,6 +7,12 @@ description: "Real-time speech-to-speech service implementation using xAI's Grok
 
 `GrokRealtimeLLMService` provides real-time, multimodal conversation capabilities using xAI's Grok Voice Agent API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management with low-latency response times.
 
+<Info>
+  **Not compatible with Pipecat Flows.** Flows requires a cascade LLM service.
+  See [Using Flows with Realtime
+  Models](/pipecat-flows/guides/realtime-models).
+</Info>
+
 <CardGroup cols={2}>
   <Card
     title="Grok Realtime API Reference"

diff --git a/api-reference/server/services/s2s/inworld.mdx b/api-reference/server/services/s2s/inworld.mdx
@@ -7,6 +7,12 @@ description: "Real-time speech-to-speech service implementation using Inworld's
 
 `InworldRealtimeLLMService` provides real-time, multimodal conversation capabilities using Inworld's Realtime API. It operates as a cascade STT/LLM/TTS pipeline under the hood with built-in semantic voice activity detection (VAD) for turn management, offering low-latency speech-to-speech interactions with integrated LLM processing and function calling.
 
+<Info>
+  **Not compatible with Pipecat Flows.** Flows requires a cascade LLM service.
+  See [Using Flows with Realtime
+  Models](/pipecat-flows/guides/realtime-models).
+</Info>
+
 <CardGroup cols={2}>
   <Card
     title="Inworld Realtime API Reference"

diff --git a/api-reference/server/services/s2s/openai.mdx b/api-reference/server/services/s2s/openai.mdx
@@ -7,6 +7,12 @@ description: "Real-time speech-to-speech service implementation using OpenAI's R
 
 `OpenAIRealtimeLLMService` provides real-time, multimodal conversation capabilities using OpenAI's Realtime API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management with minimal latency response times.
 
+<Info>
+  **Not compatible with Pipecat Flows.** Flows requires a cascade LLM service.
+  See [Using Flows with Realtime
+  Models](/pipecat-flows/guides/realtime-models).
+</Info>
+
 <CardGroup cols={2}>
   <Card
     title="OpenAI Realtime API Reference"

diff --git a/api-reference/server/services/s2s/ultravox.mdx b/api-reference/server/services/s2s/ultravox.mdx
@@ -7,6 +7,12 @@ description: "Real-time speech-to-speech service implementation using Ultravox's
 
 `UltravoxRealtimeLLMService` provides real-time conversational AI capabilities using Ultravox's Realtime API. It supports both text and audio modalities with voice transcription, streaming responses, and tool usage for creating interactive AI experiences.
 
+<Info>
+  **Not compatible with Pipecat Flows.** Flows requires a cascade LLM service.
+  See [Using Flows with Realtime
+  Models](/pipecat-flows/guides/realtime-models).
+</Info>
+
 <CardGroup cols={2}>
   <Card
     title="Ultravox Realtime API Reference"

diff --git a/docs.json b/docs.json
@@ -284,7 +284,8 @@
               "pipecat-flows/guides/functions",
               "pipecat-flows/guides/actions",
               "pipecat-flows/guides/context-strategies",
-              "pipecat-flows/guides/state-management"
+              "pipecat-flows/guides/state-management",
+              "pipecat-flows/guides/realtime-models"
             ]
           },
           {

diff --git a/pipecat-flows/guides/realtime-models.mdx b/pipecat-flows/guides/realtime-models.mdx
@@ -0,0 +1,128 @@
+---
+title: "Using Flows with Realtime Models"
+description: "Compatibility notes for Gemini Live, OpenAI Realtime, and other speech-to-speech services."
+---
+
+Pipecat Flows doesn't currently work with realtime speech-to-speech (S2S) services like Gemini Live or OpenAI Realtime. This page covers what works, what doesn't, and the recommended path forward.
+
+## Compatibility at a Glance
+
+| Service                                                                         | Works with Flows |
+| ------------------------------------------------------------------------------- | :--------------: |
+| Cascade LLMs (OpenAI, Anthropic, Gemini, AWS Bedrock, and OpenAI-compatible)    |        Yes       |
+| Gemini Live (`GeminiLiveLLMService`, `GeminiLiveVertexLLMService`)              |        No        |
+| OpenAI Realtime (`OpenAIRealtimeLLMService`)                                    |        No        |
+| AWS Nova Sonic (`AWSNovaSonicLLMService`)                                       |        No        |
+| Grok S2S, Inworld S2S, Ultravox                                                 |        No        |
+
+## Why
+
+Flows currently requires a cascade LLM service (STT → LLM → TTS). Native S2S support is currently being developed.
+
+## Recommended Path: Use a Cascade Pipeline
+
+If you want structured conversation flows today, build a cascade pipeline with a separate STT, LLM, and TTS service. Any cascade LLM that supports function calling works.
+
+Install Pipecat Flows along with Pipecat and the services you want to use. This example uses Deepgram (STT), Google Gemini (LLM), and Cartesia (TTS):
+
+```bash
+uv add pipecat-ai-flows
+uv add "pipecat-ai[daily,google,deepgram,cartesia,silero]"
+```
+
+Set the API keys:
+
+```bash
+export DEEPGRAM_API_KEY=...
+export GOOGLE_API_KEY=...
+export CARTESIA_API_KEY=...
+```
+
+Build the pipeline with the cascade services and attach a `FlowManager`:
+
+```python
+import os
+
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.task import PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_context_aggregator import (
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+)
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.services.deepgram.stt import DeepgramSTTService
+from pipecat.services.google.llm import GoogleLLMService
+from pipecat.services.cartesia.tts import CartesiaTTSService
+from pipecat_flows import FlowManager
+
+# `transport` is your configured Pipecat transport (Daily, LiveKit, etc.).
+# See the Flows Quickstart for the full setup, including `on_client_connected`
+# and `flow_manager.initialize(...)`.
+
+stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
+llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-2.0-flash")
+tts = CartesiaTTSService(
+    api_key=os.getenv("CARTESIA_API_KEY"),
+    voice_id="32b3f3c5-7171-46aa-abe7-b598964aa793",
+)
+
+context = LLMContext()
+context_aggregator = LLMContextAggregatorPair(
+    context,
+    user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+)
+
+pipeline = Pipeline(
+    [
+        transport.input(),
+        stt,
+        context_aggregator.user(),
+        llm,
+        tts,
+        transport.output(),
+        context_aggregator.assistant(),
+    ]
+)
+
+task = PipelineTask(pipeline)
+
+flow_manager = FlowManager(
+    task=task,
+    llm=llm,
+    context_aggregator=context_aggregator,
+    transport=transport,
+)
+
+# Start the flow when a client connects. `create_initial_node()` is your
+# first node definition; see the Flows Quickstart for an example.
+@transport.event_handler("on_client_connected")
+async def on_client_connected(transport, client):
+    await flow_manager.initialize(create_initial_node())
+```
+
+<Tip>
+  For a complete runnable walkthrough (nodes, functions, and a working end-to-end
+  example), see the [Flows Quickstart](/pipecat-flows/guides/quickstart).
+</Tip>
+
+## If You Specifically Need Realtime S2S
+
+If speech-to-speech is a hard requirement, build with plain Pipecat (without Flows) and manage conversation state in your own code. The S2S service pages have everything you need to get started:
+
+<CardGroup cols={2}>
+  <Card
+    title="Gemini Live"
+    icon="google"
+    href="/api-reference/server/services/s2s/gemini-live"
+  >
+    Realtime speech-to-speech with Google Gemini Live
+  </Card>
+  <Card
+    title="OpenAI Realtime"
+    icon="microphone"
+    href="/api-reference/server/services/s2s/openai"
+  >
+    Realtime speech-to-speech with OpenAI's Realtime API
+  </Card>
+</CardGroup>
diff --git a/pipecat-flows/introduction.mdx b/pipecat-flows/introduction.mdx
@@ -15,6 +15,12 @@ Pipecat Flows is best suited for use cases where:
 - **Your bot handles complex tasks** that can be broken down into smaller, manageable pieces
 - **You want to improve LLM accuracy** by focusing the model on one specific task at a time instead of managing multiple responsibilities simultaneously
 
+<Note>
+  Looking for Gemini Live, OpenAI Realtime, or another speech-to-speech model?
+  See [Using Flows with Realtime
+  Models](/pipecat-flows/guides/realtime-models).
+</Note>
+
 ## How Pipecat and Pipecat Flows Work Together
 
 **Pipecat** defines the core capabilities of your bot — the pipeline and processors that enable receiving audio, transcribing input, running LLM completions, converting responses to audio, and sending audio back to the user.