From 778a516c9fe40b2adc169ed22bb690a3335f1f18 Mon Sep 17 00:00:00 2001 From: James Hush Date: Fri, 17 Apr 2026 15:56:56 +0800 Subject: [PATCH 1/4] Add Flows + realtime (S2S) compatibility docs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Documents that Pipecat Flows requires a cascade LLM service (STT → LLM → TTS) and is not compatible with realtime speech-to-speech services (Gemini Live, OpenAI Realtime, AWS Nova Sonic, Grok, Inworld, Ultravox). Native S2S support is currently being developed. Closes a Kapa coverage gap where users across English, Russian, and Dutch conversations asked the same question and got only a "not supported" answer with no workaround. - New guide: pipecat-flows/guides/realtime-models.mdx with a compatibility table and a concrete cascade example (Deepgram + Gemini Flash + Cartesia) - Cross-linked from the Flows introduction, the Flows API overview, and each of the 7 S2S service pages so users land on the guide regardless of entry point --- api-reference/pipecat-flows/overview.mdx | 4 + api-reference/server/services/s2s/aws.mdx | 6 + .../services/s2s/gemini-live-vertex.mdx | 6 + .../server/services/s2s/gemini-live.mdx | 6 + api-reference/server/services/s2s/grok.mdx | 6 + api-reference/server/services/s2s/inworld.mdx | 6 + api-reference/server/services/s2s/openai.mdx | 6 + .../server/services/s2s/ultravox.mdx | 6 + docs.json | 3 +- pipecat-flows/guides/realtime-models.mdx | 116 ++++++++++++++++++ pipecat-flows/introduction.mdx | 6 + 11 files changed, 170 insertions(+), 1 deletion(-) create mode 100644 pipecat-flows/guides/realtime-models.mdx diff --git a/api-reference/pipecat-flows/overview.mdx b/api-reference/pipecat-flows/overview.mdx index 991435dc..b34339c4 100644 --- a/api-reference/pipecat-flows/overview.mdx +++ b/api-reference/pipecat-flows/overview.mdx @@ -107,6 +107,10 @@ Pipecat Flows works with any LLM service that supports function calling. Pipecat Any service that extends Pipecat's `LLMService` base class is supported. This includes OpenAI-compatible services like Groq, Together, Cerebras, DeepSeek, and others. +### Realtime (S2S) models + +Realtime speech-to-speech services such as Gemini Live and OpenAI Realtime are not currently supported. See [Using Flows with Realtime Models](/pipecat-flows/guides/realtime-models) for the recommended cascade configuration. + ## Additional Notes - **State Management**: Use `flow_manager.state` dictionary for persistent conversation data diff --git a/api-reference/server/services/s2s/aws.mdx b/api-reference/server/services/s2s/aws.mdx index 0bb425aa..9092cc38 100644 --- a/api-reference/server/services/s2s/aws.mdx +++ b/api-reference/server/services/s2s/aws.mdx @@ -7,6 +7,12 @@ description: "Real-time speech-to-speech service implementation using AWS Nova S `AWSNovaSonicLLMService` enables natural, real-time conversations with AWS Nova Sonic. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with bidirectional audio streaming, text generation, and function calling capabilities. + + **Not compatible with Pipecat Flows.** Flows requires a cascade LLM service. + See [Using Flows with Realtime + Models](/pipecat-flows/guides/realtime-models). + + + **Not compatible with Pipecat Flows.** Flows requires a cascade LLM service. + See [Using Flows with Realtime + Models](/pipecat-flows/guides/realtime-models). + + Want to start building? Check out our [Gemini Live Guide](/pipecat/features/gemini-live) for general concepts, then follow the diff --git a/api-reference/server/services/s2s/gemini-live.mdx b/api-reference/server/services/s2s/gemini-live.mdx index f72d4b7e..d5078e22 100644 --- a/api-reference/server/services/s2s/gemini-live.mdx +++ b/api-reference/server/services/s2s/gemini-live.mdx @@ -7,6 +7,12 @@ description: "A real-time, multimodal conversational AI service powered by Googl `GeminiLiveLLMService` enables natural, real-time conversations with Google's Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with multimodal capabilities including audio, video, and text processing. + + **Not compatible with Pipecat Flows.** Flows requires a cascade LLM service. + See [Using Flows with Realtime + Models](/pipecat-flows/guides/realtime-models). + + Want to start building? Check out our [Gemini Live Guide](/pipecat/features/gemini-live). diff --git a/api-reference/server/services/s2s/grok.mdx b/api-reference/server/services/s2s/grok.mdx index f4841e81..037eb3de 100644 --- a/api-reference/server/services/s2s/grok.mdx +++ b/api-reference/server/services/s2s/grok.mdx @@ -7,6 +7,12 @@ description: "Real-time speech-to-speech service implementation using xAI's Grok `GrokRealtimeLLMService` provides real-time, multimodal conversation capabilities using xAI's Grok Voice Agent API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management with low-latency response times. + + **Not compatible with Pipecat Flows.** Flows requires a cascade LLM service. + See [Using Flows with Realtime + Models](/pipecat-flows/guides/realtime-models). + + + **Not compatible with Pipecat Flows.** Flows requires a cascade LLM service. + See [Using Flows with Realtime + Models](/pipecat-flows/guides/realtime-models). + + + **Not compatible with Pipecat Flows.** Flows requires a cascade LLM service. + See [Using Flows with Realtime + Models](/pipecat-flows/guides/realtime-models). + + + **Not compatible with Pipecat Flows.** Flows requires a cascade LLM service. + See [Using Flows with Realtime + Models](/pipecat-flows/guides/realtime-models). + + + For a complete runnable walkthrough (nodes, functions, and a working end-to-end + example), see the [Flows Quickstart](/pipecat-flows/guides/quickstart). + + +## If You Specifically Need Realtime S2S + +If low-latency speech-to-speech is a hard requirement, build with plain Pipecat (without Flows) and manage conversation state in your own code. The S2S service pages have everything you need to get started: + + + + Realtime speech-to-speech with Google Gemini Live + + + Realtime speech-to-speech with OpenAI's Realtime API + + diff --git a/pipecat-flows/introduction.mdx b/pipecat-flows/introduction.mdx index cbc68a1a..28198dda 100644 --- a/pipecat-flows/introduction.mdx +++ b/pipecat-flows/introduction.mdx @@ -15,6 +15,12 @@ Pipecat Flows is best suited for use cases where: - **Your bot handles complex tasks** that can be broken down into smaller, manageable pieces - **You want to improve LLM accuracy** by focusing the model on one specific task at a time instead of managing multiple responsibilities simultaneously + + Looking for Gemini Live, OpenAI Realtime, or another speech-to-speech model? + See [Using Flows with Realtime + Models](/pipecat-flows/guides/realtime-models). + + ## How Pipecat and Pipecat Flows Work Together **Pipecat** defines the core capabilities of your bot — the pipeline and processors that enable receiving audio, transcribing input, running LLM completions, converting responses to audio, and sending audio back to the user. From bee348d63d612f3b52749a545810addd7e39d676 Mon Sep 17 00:00:00 2001 From: James Hush Date: Fri, 17 Apr 2026 16:04:12 +0800 Subject: [PATCH 2/4] Drop 'low-latency' qualifier from S2S hard requirement note --- pipecat-flows/guides/realtime-models.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pipecat-flows/guides/realtime-models.mdx b/pipecat-flows/guides/realtime-models.mdx index 8ce5266f..d6f63641 100644 --- a/pipecat-flows/guides/realtime-models.mdx +++ b/pipecat-flows/guides/realtime-models.mdx @@ -96,7 +96,7 @@ flow_manager = FlowManager( ## If You Specifically Need Realtime S2S -If low-latency speech-to-speech is a hard requirement, build with plain Pipecat (without Flows) and manage conversation state in your own code. The S2S service pages have everything you need to get started: +If speech-to-speech is a hard requirement, build with plain Pipecat (without Flows) and manage conversation state in your own code. The S2S service pages have everything you need to get started: Date: Fri, 17 Apr 2026 16:06:06 +0800 Subject: [PATCH 3/4] Address PR #741 review: add import os and transport/init note --- pipecat-flows/guides/realtime-models.mdx | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/pipecat-flows/guides/realtime-models.mdx b/pipecat-flows/guides/realtime-models.mdx index d6f63641..143b3166 100644 --- a/pipecat-flows/guides/realtime-models.mdx +++ b/pipecat-flows/guides/realtime-models.mdx @@ -41,6 +41,8 @@ export CARTESIA_API_KEY=... Build the pipeline with the cascade services and attach a `FlowManager`: ```python +import os + from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.task import PipelineTask from pipecat.processors.aggregators.llm_context import LLMContext @@ -54,6 +56,10 @@ from pipecat.services.google.llm import GoogleLLMService from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat_flows import FlowManager +# `transport` is your configured Pipecat transport (Daily, LiveKit, etc.). +# See the Flows Quickstart for the full setup, including `on_client_connected` +# and `flow_manager.initialize(...)`. + stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-2.0-flash") tts = CartesiaTTSService( From 8dd97bd33f61c8fc7526777d50b07a73827e17b2 Mon Sep 17 00:00:00 2001 From: James Hush Date: Fri, 17 Apr 2026 16:07:43 +0800 Subject: [PATCH 4/4] Show flow_manager.initialize() in cascade example Addresses Copilot review feedback: readers seeing just the `FlowManager` construction may think setup is complete. Adds the `on_client_connected` event handler that calls `flow_manager.initialize(create_initial_node())`, matching the pattern in the Flows Quickstart. --- pipecat-flows/guides/realtime-models.mdx | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/pipecat-flows/guides/realtime-models.mdx b/pipecat-flows/guides/realtime-models.mdx index 143b3166..41a6f9b2 100644 --- a/pipecat-flows/guides/realtime-models.mdx +++ b/pipecat-flows/guides/realtime-models.mdx @@ -93,6 +93,12 @@ flow_manager = FlowManager( context_aggregator=context_aggregator, transport=transport, ) + +# Start the flow when a client connects. `create_initial_node()` is your +# first node definition; see the Flows Quickstart for an example. +@transport.event_handler("on_client_connected") +async def on_client_connected(transport, client): + await flow_manager.initialize(create_initial_node()) ```