-
Notifications
You must be signed in to change notification settings - Fork 65
Add Flows + realtime (S2S) compatibility docs #741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jamsea
wants to merge
4
commits into
main
Choose a base branch
from
jh/flows-realtime-models
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+182
−1
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
778a516
Add Flows + realtime (S2S) compatibility docs
jamsea bee348d
Drop 'low-latency' qualifier from S2S hard requirement note
jamsea 90d94fc
Address PR #741 review: add import os and transport/init note
jamsea 8dd97bd
Show flow_manager.initialize() in cascade example
jamsea File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,128 @@ | ||
| --- | ||
| title: "Using Flows with Realtime Models" | ||
| description: "Compatibility notes for Gemini Live, OpenAI Realtime, and other speech-to-speech services." | ||
| --- | ||
|
|
||
| Pipecat Flows doesn't currently work with realtime speech-to-speech (S2S) services like Gemini Live or OpenAI Realtime. This page covers what works, what doesn't, and the recommended path forward. | ||
|
|
||
| ## Compatibility at a Glance | ||
|
|
||
| | Service | Works with Flows | | ||
| | ------------------------------------------------------------------------------- | :--------------: | | ||
| | Cascade LLMs (OpenAI, Anthropic, Gemini, AWS Bedrock, and OpenAI-compatible) | Yes | | ||
| | Gemini Live (`GeminiLiveLLMService`, `GeminiLiveVertexLLMService`) | No | | ||
| | OpenAI Realtime (`OpenAIRealtimeLLMService`) | No | | ||
| | AWS Nova Sonic (`AWSNovaSonicLLMService`) | No | | ||
| | Grok S2S, Inworld S2S, Ultravox | No | | ||
|
|
||
| ## Why | ||
|
|
||
| Flows currently requires a cascade LLM service (STT → LLM → TTS). Native S2S support is currently being developed. | ||
|
|
||
| ## Recommended Path: Use a Cascade Pipeline | ||
|
|
||
| If you want structured conversation flows today, build a cascade pipeline with a separate STT, LLM, and TTS service. Any cascade LLM that supports function calling works. | ||
|
|
||
| Install Pipecat Flows along with Pipecat and the services you want to use. This example uses Deepgram (STT), Google Gemini (LLM), and Cartesia (TTS): | ||
|
|
||
| ```bash | ||
| uv add pipecat-ai-flows | ||
| uv add "pipecat-ai[daily,google,deepgram,cartesia,silero]" | ||
| ``` | ||
|
|
||
| Set the API keys: | ||
|
|
||
| ```bash | ||
| export DEEPGRAM_API_KEY=... | ||
| export GOOGLE_API_KEY=... | ||
| export CARTESIA_API_KEY=... | ||
| ``` | ||
|
|
||
| Build the pipeline with the cascade services and attach a `FlowManager`: | ||
|
|
||
| ```python | ||
| import os | ||
|
|
||
| from pipecat.pipeline.pipeline import Pipeline | ||
| from pipecat.pipeline.task import PipelineTask | ||
| from pipecat.processors.aggregators.llm_context import LLMContext | ||
| from pipecat.processors.aggregators.llm_context_aggregator import ( | ||
| LLMContextAggregatorPair, | ||
| LLMUserAggregatorParams, | ||
| ) | ||
| from pipecat.audio.vad.silero import SileroVADAnalyzer | ||
| from pipecat.services.deepgram.stt import DeepgramSTTService | ||
| from pipecat.services.google.llm import GoogleLLMService | ||
| from pipecat.services.cartesia.tts import CartesiaTTSService | ||
| from pipecat_flows import FlowManager | ||
|
|
||
| # `transport` is your configured Pipecat transport (Daily, LiveKit, etc.). | ||
| # See the Flows Quickstart for the full setup, including `on_client_connected` | ||
| # and `flow_manager.initialize(...)`. | ||
|
|
||
| stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) | ||
| llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-2.0-flash") | ||
|
jamsea marked this conversation as resolved.
|
||
| tts = CartesiaTTSService( | ||
| api_key=os.getenv("CARTESIA_API_KEY"), | ||
| voice_id="32b3f3c5-7171-46aa-abe7-b598964aa793", | ||
| ) | ||
|
|
||
| context = LLMContext() | ||
| context_aggregator = LLMContextAggregatorPair( | ||
| context, | ||
| user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()), | ||
| ) | ||
|
|
||
| pipeline = Pipeline( | ||
| [ | ||
| transport.input(), | ||
| stt, | ||
| context_aggregator.user(), | ||
| llm, | ||
| tts, | ||
| transport.output(), | ||
| context_aggregator.assistant(), | ||
|
jamsea marked this conversation as resolved.
|
||
| ] | ||
| ) | ||
|
|
||
| task = PipelineTask(pipeline) | ||
|
|
||
| flow_manager = FlowManager( | ||
| task=task, | ||
| llm=llm, | ||
| context_aggregator=context_aggregator, | ||
| transport=transport, | ||
| ) | ||
|
jamsea marked this conversation as resolved.
|
||
|
|
||
| # Start the flow when a client connects. `create_initial_node()` is your | ||
| # first node definition; see the Flows Quickstart for an example. | ||
| @transport.event_handler("on_client_connected") | ||
| async def on_client_connected(transport, client): | ||
| await flow_manager.initialize(create_initial_node()) | ||
| ``` | ||
|
|
||
| <Tip> | ||
| For a complete runnable walkthrough (nodes, functions, and a working end-to-end | ||
| example), see the [Flows Quickstart](/pipecat-flows/guides/quickstart). | ||
| </Tip> | ||
|
|
||
| ## If You Specifically Need Realtime S2S | ||
|
|
||
| If speech-to-speech is a hard requirement, build with plain Pipecat (without Flows) and manage conversation state in your own code. The S2S service pages have everything you need to get started: | ||
|
|
||
| <CardGroup cols={2}> | ||
| <Card | ||
| title="Gemini Live" | ||
| icon="google" | ||
| href="/api-reference/server/services/s2s/gemini-live" | ||
| > | ||
| Realtime speech-to-speech with Google Gemini Live | ||
| </Card> | ||
| <Card | ||
| title="OpenAI Realtime" | ||
| icon="microphone" | ||
| href="/api-reference/server/services/s2s/openai" | ||
| > | ||
| Realtime speech-to-speech with OpenAI's Realtime API | ||
| </Card> | ||
| </CardGroup> | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only need this small section in overview. Can you update this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!