Describe the bug
Description
When invoking a Bedrock agent using streaming, the API call blocks until the agent finishes generating the full response. Only after completion does the first chunk event appear, after which remaining chunks are delivered quickly.
This defeats the purpose of streaming because users do not receive any partial output during generation.
Environment
SDK: boto3
Service: Amazon Bedrock Agent Runtime
Python: 3.12
boto3:1.42.59
Region: us-east-1
Minimal Reproducible Code
import boto3
import time
client = boto3.client(
"bedrock-agent-runtime",
region_name="us-east-1",
aws_access_key_id="...",
aws_secret_access_key="..."
)
start = time.time()
print("Invoking agent...")
response = client.invoke_agent(
agentId="HBULA1EYN8",
agentAliasId="92E64FKIFG",
sessionId="test-session",
enableTrace=True,
sessionState={'files': []},
inputText="Explain quantum computing in simple terms",
streamingConfigurations={
"streamFinalResponse": True
}
)
print(f"Time taken to receive first response: {time.time() - start:.2f}s")
for event in response["completion"]:
print(f"{time.time() - start:.2f}s -> {event.keys()}")
Note: I have added the permission - bedrock:InvokeModelWithResponseStream as per this documentation (https://docs.aws.amazon.com/boto3/latest/reference/services/bedrock-agent-runtime/client/invoke_agent.html)
Observed Output
Invoking agent...
Time taken to receive first response: 1.13s
1.33s -> dict_keys(['trace'])
7.18s -> dict_keys(['chunk'])
7.19s -> dict_keys(['chunk'])
7.19s -> dict_keys(['chunk'])
...
Key observation:
trace events arrive early
chunk events begin only after ~7 seconds
Expected Behavior
Streaming should begin as soon as the model starts generating tokens, for example:
0.8s -> dict_keys(['chunk'])
0.9s -> dict_keys(['chunk'])
1.0s -> dict_keys(['chunk'])
Actual Behavior
Agent executes for several seconds
↓
Full response generated internally
↓
Only then streaming of chunks begins
This results in a perceived delay and defeats the purpose of using a streaming API for interactive applications.
Regression Issue
Expected Behavior
Streaming should begin as soon as the model starts generating tokens, for example:
0.8s -> dict_keys(['chunk'])
0.9s -> dict_keys(['chunk'])
1.0s -> dict_keys(['chunk'])
Current Behavior
Actual Behavior
Agent executes for several seconds
↓
Full response generated internally
↓
Only then streaming of chunks begins
Reproduction Steps
import boto3
import time
client = boto3.client(
"bedrock-agent-runtime",
region_name="us-east-1",
aws_access_key_id="...",
aws_secret_access_key="..."
)
start = time.time()
print("Invoking agent...")
response = client.invoke_agent(
agentId="HBULA1EYN8",
agentAliasId="92E64FKIFG",
sessionId="test-session",
enableTrace=True,
sessionState={'files': []},
inputText="Explain quantum computing in simple terms",
streamingConfigurations={
"streamFinalResponse": True
}
)
print(f"Time taken to receive first response: {time.time() - start:.2f}s")
for event in response["completion"]:
print(f"{time.time() - start:.2f}s -> {event.keys()}")
Possible Solution
No response
Additional Information/Context
No response
SDK version used
1.42.59
Environment details (OS name and version, etc.)
Mac
Describe the bug
Description
When invoking a Bedrock agent using streaming, the API call blocks until the agent finishes generating the full response. Only after completion does the first chunk event appear, after which remaining chunks are delivered quickly.
This defeats the purpose of streaming because users do not receive any partial output during generation.
Environment
SDK: boto3
Service: Amazon Bedrock Agent Runtime
Python: 3.12
boto3:1.42.59
Region: us-east-1
Minimal Reproducible Code
Note: I have added the permission -
bedrock:InvokeModelWithResponseStreamas per this documentation (https://docs.aws.amazon.com/boto3/latest/reference/services/bedrock-agent-runtime/client/invoke_agent.html)Observed Output
...
Key observation:
trace events arrive early
chunk events begin only after ~7 seconds
Expected Behavior
Streaming should begin as soon as the model starts generating tokens, for example:
Actual Behavior
This results in a perceived delay and defeats the purpose of using a streaming API for interactive applications.
Regression Issue
Expected Behavior
Streaming should begin as soon as the model starts generating tokens, for example:
Current Behavior
Actual Behavior
Reproduction Steps
Possible Solution
No response
Additional Information/Context
No response
SDK version used
1.42.59
Environment details (OS name and version, etc.)
Mac