Skip to content

fix(memory): wrap boto3 clients in context-isolating proxy to silence OTel detach errors#555

Open
citizen204 wants to merge 1 commit into
aws:mainfrom
citizen204:fix-456-otel-context-detach
Open

fix(memory): wrap boto3 clients in context-isolating proxy to silence OTel detach errors#555
citizen204 wants to merge 1 commit into
aws:mainfrom
citizen204:fix-456-otel-context-detach

Conversation

@citizen204

Copy link
Copy Markdown
Contributor

Problem

When MemoryClient is used inside an asyncio application with aws-opentelemetry-distro (or any OTel boto3 auto-instrumentation), every data-plane or control-plane call emits an ERROR-level log:

ERROR [opentelemetry.context] Failed to detach context
ValueError: Token was created in a different Context

This happens because the OTel boto3 instrumentation calls context.attach() before the API call and context.detach() after. Python's ContextVar.reset() raises ValueError when the token was created in a different execution context -- which occurs when the SDK is used from an asyncio task that inherited OTel context from a parent task, or when a ThreadPoolExecutor worker crosses await boundaries.

At scale (many concurrent agents, AgentCoreMemorySessionManager with asyncio.to_thread()), this fills CloudWatch with spurious ERROR entries that obscure real failures.

Fix

Added _ContextIsolatingProxy -- a thin wrapper around a boto3 client that runs each method call inside contextvars.copy_context().run(). This creates an isolated context snapshot so the OTel attach/detach pair is always contained within the same context, making ContextVar.reset() succeed.

Both gmdp_client (data plane) and gmcp_client (control plane) are wrapped at construction time in MemoryClient.__init__. This fixes every call site -- including direct self.gmdp_client.create_event(...) calls in session_manager.py -- without per-call changes.

The proxy exposes meta directly (for meta.region_name logging) and wraps all callable attributes; non-callable attributes are returned as-is.

Changes

  • src/bedrock_agentcore/memory/client.py:
    • Added import contextvars
    • Added _ContextIsolatingProxy class (27 lines, before MemoryClient)
    • Wrapped gmdp_client and gmcp_client with the proxy in __init__

Fixes #456

… OTel detach errors

When MemoryClient is used from an asyncio task that already has an OTel context
attached (e.g. via aws-opentelemetry-distro), the boto3 auto-instrumentation calls
opentelemetry.context.attach() then detach() around every API call.  If the call
crosses an asyncio task or thread-pool boundary, Python raises:

  ValueError: <Token> was created in a different Context

because ContextVar.reset() only accepts tokens created in the current execution
context.  The error is logged at ERROR level by the OTel runtime on every boto3
call, polluting CloudWatch and obscuring real failures.

_ContextIsolatingProxy wraps each method call in contextvars.copy_context().run()
so the OTel attach/detach pair is fully contained within the copied context
snapshot -- reset() always matches the token created in the same context.

Applies to both gmdp_client and gmcp_client at construction time, covering every
call site in MemoryClient and AgentCoreMemorySessionManager without any per-call
changes.

Fixes aws#456
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OTEL context token detached across asyncio/thread boundary in memory client + Strands session_manager

1 participant