Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions api-reference/pipecat-flows/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,10 @@ Pipecat Flows works with any LLM service that supports function calling. Pipecat

Any service that extends Pipecat's `LLMService` base class is supported. This includes OpenAI-compatible services like Groq, Together, Cerebras, DeepSeek, and others.

### Realtime (S2S) models
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only need this small section in overview. Can you update this PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!


Realtime speech-to-speech services such as Gemini Live and OpenAI Realtime are not currently supported. See [Using Flows with Realtime Models](/pipecat-flows/guides/realtime-models) for the recommended cascade configuration.

## Additional Notes

- **State Management**: Use `flow_manager.state` dictionary for persistent conversation data
Expand Down
6 changes: 6 additions & 0 deletions api-reference/server/services/s2s/aws.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@ description: "Real-time speech-to-speech service implementation using AWS Nova S

`AWSNovaSonicLLMService` enables natural, real-time conversations with AWS Nova Sonic. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with bidirectional audio streaming, text generation, and function calling capabilities.

<Info>
**Not compatible with Pipecat Flows.** Flows requires a cascade LLM service.
See [Using Flows with Realtime
Models](/pipecat-flows/guides/realtime-models).
</Info>

<CardGroup cols={2}>
<Card
title="AWS Nova Sonic API Reference"
Expand Down
6 changes: 6 additions & 0 deletions api-reference/server/services/s2s/gemini-live-vertex.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@ description: "A real-time, multimodal conversational AI service powered by Googl

`GeminiLiveVertexLLMService` enables natural, real-time conversations with Google's Gemini model through Vertex AI. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with multimodal capabilities including audio, video, and text processing.

<Info>
**Not compatible with Pipecat Flows.** Flows requires a cascade LLM service.
See [Using Flows with Realtime
Models](/pipecat-flows/guides/realtime-models).
</Info>

<Tip>
Want to start building? Check out our [Gemini Live
Guide](/pipecat/features/gemini-live) for general concepts, then follow the
Expand Down
6 changes: 6 additions & 0 deletions api-reference/server/services/s2s/gemini-live.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@ description: "A real-time, multimodal conversational AI service powered by Googl

`GeminiLiveLLMService` enables natural, real-time conversations with Google's Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with multimodal capabilities including audio, video, and text processing.

<Info>
**Not compatible with Pipecat Flows.** Flows requires a cascade LLM service.
See [Using Flows with Realtime
Models](/pipecat-flows/guides/realtime-models).
</Info>

<Tip>
Want to start building? Check out our [Gemini Live
Guide](/pipecat/features/gemini-live).
Expand Down
6 changes: 6 additions & 0 deletions api-reference/server/services/s2s/grok.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@ description: "Real-time speech-to-speech service implementation using xAI's Grok

`GrokRealtimeLLMService` provides real-time, multimodal conversation capabilities using xAI's Grok Voice Agent API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management with low-latency response times.

<Info>
**Not compatible with Pipecat Flows.** Flows requires a cascade LLM service.
See [Using Flows with Realtime
Models](/pipecat-flows/guides/realtime-models).
</Info>

<CardGroup cols={2}>
<Card
title="Grok Realtime API Reference"
Expand Down
6 changes: 6 additions & 0 deletions api-reference/server/services/s2s/inworld.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@ description: "Real-time speech-to-speech service implementation using Inworld's

`InworldRealtimeLLMService` provides real-time, multimodal conversation capabilities using Inworld's Realtime API. It operates as a cascade STT/LLM/TTS pipeline under the hood with built-in semantic voice activity detection (VAD) for turn management, offering low-latency speech-to-speech interactions with integrated LLM processing and function calling.

<Info>
**Not compatible with Pipecat Flows.** Flows requires a cascade LLM service.
See [Using Flows with Realtime
Models](/pipecat-flows/guides/realtime-models).
</Info>

<CardGroup cols={2}>
<Card
title="Inworld Realtime API Reference"
Expand Down
6 changes: 6 additions & 0 deletions api-reference/server/services/s2s/openai.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@ description: "Real-time speech-to-speech service implementation using OpenAI's R

`OpenAIRealtimeLLMService` provides real-time, multimodal conversation capabilities using OpenAI's Realtime API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management with minimal latency response times.

<Info>
**Not compatible with Pipecat Flows.** Flows requires a cascade LLM service.
See [Using Flows with Realtime
Models](/pipecat-flows/guides/realtime-models).
</Info>

<CardGroup cols={2}>
<Card
title="OpenAI Realtime API Reference"
Expand Down
6 changes: 6 additions & 0 deletions api-reference/server/services/s2s/ultravox.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@ description: "Real-time speech-to-speech service implementation using Ultravox's

`UltravoxRealtimeLLMService` provides real-time conversational AI capabilities using Ultravox's Realtime API. It supports both text and audio modalities with voice transcription, streaming responses, and tool usage for creating interactive AI experiences.

<Info>
**Not compatible with Pipecat Flows.** Flows requires a cascade LLM service.
See [Using Flows with Realtime
Models](/pipecat-flows/guides/realtime-models).
</Info>

<CardGroup cols={2}>
<Card
title="Ultravox Realtime API Reference"
Expand Down
3 changes: 2 additions & 1 deletion docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,8 @@
"pipecat-flows/guides/functions",
"pipecat-flows/guides/actions",
"pipecat-flows/guides/context-strategies",
"pipecat-flows/guides/state-management"
"pipecat-flows/guides/state-management",
"pipecat-flows/guides/realtime-models"
]
},
{
Expand Down
116 changes: 116 additions & 0 deletions pipecat-flows/guides/realtime-models.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
---
title: "Using Flows with Realtime Models"
description: "Compatibility notes for Gemini Live, OpenAI Realtime, and other speech-to-speech services."
---

Pipecat Flows doesn't currently work with realtime speech-to-speech (S2S) services like Gemini Live or OpenAI Realtime. This page covers what works, what doesn't, and the recommended path forward.

## Compatibility at a Glance

| Service | Works with Flows |
| ------------------------------------------------------------------------------- | :--------------: |
| Cascade LLMs (OpenAI, Anthropic, Gemini, AWS Bedrock, and OpenAI-compatible) | Yes |
| Gemini Live (`GeminiLiveLLMService`, `GeminiLiveVertexLLMService`) | No |
| OpenAI Realtime (`OpenAIRealtimeLLMService`) | No |
| AWS Nova Sonic (`AWSNovaSonicLLMService`) | No |
| Grok S2S, Inworld S2S, Ultravox | No |

## Why

Flows currently requires a cascade LLM service (STT → LLM → TTS). Native S2S support is currently being developed.

## Recommended Path: Use a Cascade Pipeline

If you want structured conversation flows today, build a cascade pipeline with a separate STT, LLM, and TTS service. Any cascade LLM that supports function calling works.

Install Pipecat Flows along with Pipecat and the services you want to use. This example uses Deepgram (STT), Google Gemini (LLM), and Cartesia (TTS):

```bash
uv add pipecat-ai-flows
uv add "pipecat-ai[daily,google,deepgram,cartesia,silero]"
```

Set the API keys:

```bash
export DEEPGRAM_API_KEY=...
export GOOGLE_API_KEY=...
export CARTESIA_API_KEY=...
```

Build the pipeline with the cascade services and attach a `FlowManager`:

```python
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_context_aggregator import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.google.llm import GoogleLLMService
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat_flows import FlowManager

stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-2.0-flash")
Comment thread
jamsea marked this conversation as resolved.
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="32b3f3c5-7171-46aa-abe7-b598964aa793",
)

context = LLMContext()
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)

pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
Comment thread
jamsea marked this conversation as resolved.
]
)

task = PipelineTask(pipeline)

flow_manager = FlowManager(
task=task,
llm=llm,
context_aggregator=context_aggregator,
transport=transport,
)
Comment thread
jamsea marked this conversation as resolved.
```

<Tip>
For a complete runnable walkthrough (nodes, functions, and a working end-to-end
example), see the [Flows Quickstart](/pipecat-flows/guides/quickstart).
</Tip>

## If You Specifically Need Realtime S2S

If low-latency speech-to-speech is a hard requirement, build with plain Pipecat (without Flows) and manage conversation state in your own code. The S2S service pages have everything you need to get started:

<CardGroup cols={2}>
<Card
title="Gemini Live"
icon="google"
href="/api-reference/server/services/s2s/gemini-live"
>
Realtime speech-to-speech with Google Gemini Live
</Card>
<Card
title="OpenAI Realtime"
icon="microphone"
href="/api-reference/server/services/s2s/openai"
>
Realtime speech-to-speech with OpenAI's Realtime API
</Card>
</CardGroup>
6 changes: 6 additions & 0 deletions pipecat-flows/introduction.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@ Pipecat Flows is best suited for use cases where:
- **Your bot handles complex tasks** that can be broken down into smaller, manageable pieces
- **You want to improve LLM accuracy** by focusing the model on one specific task at a time instead of managing multiple responsibilities simultaneously

<Note>
Looking for Gemini Live, OpenAI Realtime, or another speech-to-speech model?
See [Using Flows with Realtime
Models](/pipecat-flows/guides/realtime-models).
</Note>

## How Pipecat and Pipecat Flows Work Together

**Pipecat** defines the core capabilities of your bot — the pipeline and processors that enable receiving audio, transcribing input, running LLM completions, converting responses to audio, and sending audio back to the user.
Expand Down
Loading