feat(openai-agents): add @temporalio/openai-agents package#2024
Draft
feat(openai-agents): add @temporalio/openai-agents package#2024
Conversation
Integration for running OpenAI Agents SDK workflows as durable Temporal workflows. Model calls become Temporal activities; the agent loop (tools, handoffs, guardrails) runs in the workflow. Package layout: - plugin: OpenAIAgentsPlugin wires model activity + MCP providers - workflow-side: createTemporalRunner(), activityAsTool(), statelessMcpServer(), tracing utilities - activity-side: invokeModelActivity, error classification, retry-after header support, auto-heartbeating - testing namespace: FakeModel, FakeModelProvider, GeneratorFakeModel, ResponseBuilders (textResponse/toolCallResponse/handoffResponse/ multiToolCallResponse) Correctness-critical behavior: - Handoff conversion: shallow-clones user's Handoff objects to avoid mutation; preserves onHandoff, inputType, isEnabled callbacks; recursive cycle detection for cyclic handoff graphs - Error classification: inspects OpenAI SDK error shape (direct .status/.headers and legacy .response.*), honors x-should-retry + retry-after headers, derives ModelInvocationError subtypes from status code - Tool validation: tags activityAsTool-wrapped tools with a symbol marker, rejects raw functions and FunctionTools built via the bare tool() factory, recurses into handoff agents - AgentsWorkflowError wraps non-Temporal errors as cause of ApplicationFailure; TemporalFailures in the cause chain are unwrapped rather than re-wrapped Known deferrals (documented in src/index.ts): - StatefulMCPServerProvider, nexusOperationAsTool, OTel trace interceptor, workflowFailureExceptionTypes registration, testing.AgentEnvironment, testing.ResponseBuilders class Tests: 61 integration tests covering the full feature surface plus bug-reproduction coverage for every fix landed across 4 audit rounds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e TemporalModelStub to ActivityBackedModel, drop unused plugin options
Side-aware src layout — directory enforces what was previously a
convention. workflow/ holds the agent loop + activity proxies that run
in the workflow sandbox; worker/ holds the plugin and activity
implementations; common/ holds types referenced by both sides.
mcp.ts splits into workflow/mcp-client.ts (statelessMcpServer factory
and types) and worker/mcp-provider.ts (StatelessMCPServerProvider
class). ActivityModelInput moves to common/ since both sides type-
reference it.
Renames:
- TemporalModelStub -> ActivityBackedModel. "Stub" reads as
"test-double" to most modern callers; the class is production hot-
path code that proxies model calls to an activity. New name says
what it is.
- src/workflow.ts and src/index.ts now re-export from the
side-specific dirs; their public surface is unchanged.
Plugin cleanup:
- Drop unused modelParams field from OpenAIAgentsPluginOptions.
Constructor never read it; runtime config lives on
createTemporalRunner({modelParams}) workflow-side.
- Drop createOpenAIAgentsPlugin factory. Other SDK plugins
(AiSdkPlugin, OpenTelemetryPlugin) export class only; users do
`new OpenAIAgentsPlugin({...})` to match.
package.json exports paths updated for the new layout. Backward-
compat ./lib/* aliases remap to the new locations.
No behavior change. 60/61 tests pass; the one occasional flake is a
known dev-server resource-contention issue tracked separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… serialized-model contract Workflow→activity model-call boundary now goes through explicit serialized types and field-by-field projections in both directions. Replaces ad-hoc destructure-and-strip on activity-backed-model.ts:58 plus `as any` casts on activities.ts:125. Contract (src/common/serialized-model.ts): - SerializedModelRequest / SerializedModelResponse — JSON-safe projections of upstream ModelRequest / ModelResponse. JsonValue replaces unknown on the wire. - WIRE_VERSION literal field on both, validated activity-side. Mismatch throws non-retryable WireVersionMismatch ApplicationFailure — protects rolling deploys with workflow code on a different package version than the worker. - signal excluded from SerializedModelRequest by design (AbortSignal is not serializable; Temporal cancellation provides the equivalent). Projections live with their consumers: - toSerializedModelRequest + fromSerializedModelResponse inline in workflow/activity-backed-model.ts (workflow side). - toSerializedModelResponse + fromSerializedModelRequest inline in worker/activities.ts (worker side). fromSerializedModelResponse reconstitutes Usage via new Usage(...) so .add() keeps working across multi-turn runs. fromSerializedModelRequest strips __wireVersion before passing to the upstream model so the internal protocol field doesn't leak through getResponse(). Public exports: - SerializedModelRequest, SerializedModelResponse, InvokeModelActivityInput, JsonValue, WIRE_VERSION exported from both index.ts (worker side) and workflow.ts (workflow side). - toSerializedModelRequest exported via workflow.ts (test workflow imports it). - toSerializedModelResponse exported via index.ts. Tests added in packages/test/src/test-openai-agents.ts: - Round-trip: prompt + tracing survive workflow→activity→workflow, Usage round-trips with working .add(), __wireVersion stripped from both directions. - Stripping: signal absent activity-side after additive projection. - Version mismatch: stale __wireVersion throws WireVersionMismatch. - Snapshot: shape of toSerializedModelRequest / toSerializedModelResponse output matches expected key list — fails loudly with "bump WIRE_VERSION" message if upstream adds a field that gets silently copied through. Replaces src/common/activity-model-input.ts with src/common/serialized-model.ts. No behavior change to user-facing API. 66/66 tests pass; one occasional flake under load passes in isolation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
Implements TemporalTracingProcessor — a TracingProcessor that maps OpenAI Agents trace/span events to OTel spans. Removes the tracingDisabled flag on the internal Runner so trace events fire. Architecture: - Workflow-side TracingProcessor (src/workflow/tracing.ts) listens for agent loop events (agent, generation, function, handoff, guardrail, custom, response, transcription, speech, speech_group, mcp_tools). - Each event creates a child OTel span via @opentelemetry/api, parented under the active OTel context (typically the workflow execution span registered by interceptors-opentelemetry). - Replay-safe: skips span creation during workflow replay via isReplaying() guard. End events implicitly skip too — the entry map is empty during replay so onSpanEnd / onTraceEnd no-op. - Idempotent registration: Symbol.for() flag on globalThis ensures the processor registers once per workflow isolate. - Uses addTraceProcessor() (not setTraceProcessors): preserves user processors registered before runner construction, no risk of wiping. Wiring: - TemporalOpenAIRunner constructor calls ensureTracingProcessorRegistered() — registers the processor and calls setTracingDisabled(false) to override the upstream NODE_ENV=test default that gates trace creation. - ActivityBackedModel.getResponse() wraps both normal and summaryOverride paths in withGenerationSpan() so generation spans fire — upstream's built-in adapters do this themselves; ours has to mirror the pattern. - Removed tracingDisabled: true from internal Runner config. Span attributes split into static (set at start: type, name, handoffs, output_type, from_agent, to_agent, server) and dynamic (set at end: tools, model, triggered, result) — avoids redundant double-set. Known limitation (documented in code): activity spans from interceptors-opentelemetry appear as siblings of generation spans, not children. Proper nesting would require pushing OTel context before the activity call. Deferred to a follow-up — current hierarchy still gives users visibility into the agent loop. Public exports: - TemporalTracingProcessor (workflow-side) - ensureTracingProcessorRegistered Adds @opentelemetry/api ^1.9.0 as regular dep (matches interceptors-opentelemetry's pattern). Tests: T1 verifies the tracing path is active and produces trace + agent + generation/response span events. Full OTel span emission verification deferred — requires in-memory exporter + interceptors- opentelemetry workflow bundle integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tyOptions, drop createTemporalRunner factory + createModelActivity export, harden wire layer Two batches landed together since they share the activity-backed-model.ts file footprint. Batch A — public API cleanup: - ModelActivityParameters → ModelActivityOptions. Matches the *Options convention used elsewhere in the SDK (ActivityOptions, LocalActivityOptions, WorkerOptions). DEFAULT_MODEL_ACTIVITY_PARAMETERS → DEFAULT_MODEL_ACTIVITY_OPTIONS. File model-parameters.ts → model-activity-options.ts. - Drop createTemporalRunner factory. Pure `new` shortcut, no value-add. Users do `new TemporalOpenAIRunner(opts)` directly. - Demote createModelActivity from public exports. The plugin auto-registers the activity, so there's no reason to advertise the manual-registration bypass route. Function stays in worker/activities.ts for the plugin to use; just no longer exported from index.ts. Batch B — wire / serialization layer hardening: - Asymmetric exports made consistent. to* projections public (part of the wire contract); from* projections private (implementation detail). - Cast comments at every type-assertion boundary now document why each is safe — input/prompt/tracing as JsonValue, Usage data as JsonValue, AgentOutputItem[] as JsonValue[]. No `as any` regressions. - Usage class reconstruction comment clarified: Usage is the only class instance needing reconstitution post-wire, since AgentOutputItem variants are all Zod-inferred plain objects. - providerData JSDoc expanded with explicit coercion warning (Date → ISO string, Map/Set/class instances flattened by Temporal's JSON codec). - Inline rationale at activities.ts version-check site (no longer references CLEANUP.md, which is a local-only artifact). - Removed duplicate `signal` exclusion comment from activity-backed-model.ts. Canonical comment lives in serialized-model.ts. - tracing field commentary updated to accurately describe ModelTracing as an enablement flag (not a context carrier); span-context propagation belongs in TemporalTracingProcessor. - Two new drift-detection tests in test-openai-agents.ts. JSON round-trip on populated SerializedModelRequest / SerializedModelResponse with deepEqual — fails loudly if upstream adds a non-serializable field. No behavior change. 69/69 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ety fixes
Two batches landed together since they share test-file footprint.
Batch C — convert-agent.ts hardening (7 items):
- Move unwrapTemporalFailure from convert-agent.ts to common/errors.ts.
It's an error utility, not agent conversion logic; co-locating with
AgentsWorkflowError gives one place for error helpers.
- Replace `'default'` model-name fallback with explicit
AgentsWorkflowError thrown at convert time. Users now get a clear
"no model declared" error at workflow start instead of an opaque
activity failure on first model call.
- Introduce getAgentInternals() helper in src/workflow/agent-internals.ts
centralizing the unsafe access to upstream Agent's model/handoffs/tools
fields. Future upstream type changes touch one file.
- Drop `as Model` cast on agent.clone({ model: activityBackedModel }).
ActivityBackedModel implements Model and is structurally compatible —
TS accepts it without the cast.
- Expand setAgent comment to explain why the original (pre-clone) agent
is bound to the summary provider, not the cloned wrapper.
- Add CLEANUP-6 test asserting the Object.create-based Handoff clone
preserves all upstream-documented fields and prototype identity.
Future upstream Handoff additions trip this test.
- Top-of-file contract block in convert-agent.ts pinning the implicit
upstream contracts (Agent.clone, Handoff.onInvokeHandoff,
Agent.handoffs) to @openai/agents-core ~0.3.0 — checklist for dep
upgrades.
Batch D — tracing replay-safety fixes:
- Symmetric replay gating: onTraceEnd and onSpanEnd now have the same
isReplaying() guard as their start counterparts. Matches Python's
uniform gating; no longer relies on Map-empty-after-replay as the
implicit gate.
- Workflow-scoped spans Map: Map<workflowId, Map<spanId, SpanEntry>>
instead of a flat Map shared across all workflows in the V8 isolate.
Cross-workflow leaks impossible. Outer entry is auto-cleaned when
the inner Map empties.
- Document that deterministic trace/span IDs come from the
crypto.randomUUID polyfill in load-polyfills.ts (delegates to
workflow.uuid4(), which is per-workflow seeded). No code change for
this — verified via the new replay-safety test.
- Add T2 replay-safety test using maxCachedWorkflows: 0 to force
workflow eviction after every task. Asserts replay actually occurred
AND the workflow completes without NondeterminismError. Proves the
trace processor's replay gating + the polyfill-backed deterministic
IDs hold up under forced replay.
- Delete getWorkflowTracingConfig() — was a dead function that always
returned 'enabled'. Removed from public exports.
- Expand JSDoc on ensureTracingProcessorRegistered documenting the
global side-effect (mutates upstream's processor list) and the
per-isolate singleton behavior so users aren't surprised.
71/71 tests pass. ESLint + Prettier clean. Both batches reviewed by
code-auditor and comment-auditor and all findings applied.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(Python parity)
Single combined batch — runner.ts cleanups and the validateTools rewrite
share the same code path. 8 runner.ts items + tool validation relax.
Runner cleanups:
- Remove runStreamed method entirely. Calling it now produces a clean
TypeError ("not a function") instead of a custom throw. We don't
extend Runner so we have no obligation to expose it.
- Drop AgentsWorkflowError class. The string 'AgentsWorkflowError' on
ApplicationFailure.type already serves as the marker; the wrapper
class added duplication and an extra hop on the cause chain. Now
errors flow directly: ApplicationFailure(type='AgentsWorkflowError',
cause: originalError). BREAKING: the class is no longer exported.
Users doing `instanceof AgentsWorkflowError` should switch to
ApplicationFailure type-tag checks.
- Tighten TemporalRunOptions.runConfig.model to `string` only.
Workflow can't serialize Model objects across the activity boundary,
so accepting `string | Model` was a typed lie. Runtime guard
deleted; TypeScript catches misuse at the call site.
- Forward all upstream RunConfig fields explicitly to internalRunner.
Previously dropped silently: handoffInputFilter, inputGuardrails,
outputGuardrails, modelSettings, tracingDisabled,
traceIncludeSensitiveData, workflowName, traceId, groupId,
traceMetadata, conversationId, session, sessionInputCallback,
callModelInputFilter, tracing. Each new field has a JSDoc comment
noting Temporal-specific caveats (e.g. guardrails must be
deterministic, signal omitted in favor of CancellationScope).
Convert-agent + tool validation:
- Fold validateTools into convertAgent. Single graph traversal now
handles validation, model conversion, and Handoff cloning — was
three full walks. ~40 lines of duplication removed.
- Drop TEMPORAL_ACTIVITY_TOOL_MARKER gate. Upstream tool() factory
products are now accepted inline in workflow context — Python
parity. Raw functions still rejected. The marker constant stays
for debugging / future introspection.
- Tool-type allowlist comment in convertAgent listing every accepted
upstream tool variant alphabetically. Note that ApplyPatch /
Computer / Shell tools pass validation but will fail at runtime in
the sandbox (require local I/O).
- getAgentInternals helper now used for tools as well as
model/handoffs — single source of truth for upstream Agent
property access.
- Tighten convert-agent.ts error message for non-string Model. Points
users at runConfig.model: string as the override path.
Tests:
- E3/F20 inverted from rejection-test to inline-success-test. A
deterministic tool() product runs in the workflow without
activityAsTool; verifies the output round-trips.
- C3/F27 simplified to verify TypeError surfaces as
WorkflowFailedError. No specific message check since the method
doesn't exist at all now.
- C1/F7 updated: assertion on causeName is now 'Error' (the original
error directly on cause), not 'AgentsWorkflowError' (which would
imply a wrapper).
- H2 raw-function-rejection tests still pass; message strings
updated to match the tightened error wording.
Migration note: AgentsWorkflowError class removed from public API.
Users identifying these errors should switch from
`e instanceof AgentsWorkflowError` to checking
`(e as ApplicationFailure).type === 'AgentsWorkflowError'` on the
serialized failure.
71/71 tests pass. ESLint + Prettier clean. Both audits PASS — only
note was the breaking class removal, intentional per CLEANUP.md spec.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ugin/options polish Two batches landed in parallel — they touch disjoint primary files but share test-file footprint, so committing as one. Batch F — tracing remaining gaps + plugin modelParams + concurrent test: - New TemporalOpenAIRunnerOptions extends ModelActivityOptions with startSpansInReplay?: boolean. Plumbed through to TemporalTracingProcessor via ensureTracingProcessorRegistered, so callers can opt into emitting spans during replay for debugging replay-divergence issues. Default false. - TemporalTracingProcessor's four event methods now gate via a single shouldSkip() helper that respects startSpansInReplay. Previously used four inline isReplaying() checks. - Activity-span nesting comment in tracing.ts updated. Previously said "deferred to a follow-up"; now correctly explains that activity spans nest under generation spans when @temporalio/interceptors- opentelemetry is configured. Investigated worker-side TracingProcessor and concluded it isn't needed — OpenAI Agents SDK trace events fire workflow-side only; the activity span just needs OTel context propagation, which interceptors-opentelemetry already provides. - OpenAIAgentsPluginOptions accepts modelParams?: ModelActivityOptions as a config-surface field. The plugin runs worker-side and can't inject config into the V8 workflow sandbox, so users must still pass modelParams to new TemporalOpenAIRunner(options) in workflow code. JSDoc explains this honestly; future versions may auto-propagate via workflow interceptors. - T3 test verifies two concurrent workflows on one worker have isolated trace IDs (no cross-pollination), exercising the workflow-scoped Map added in 2d7949a. Batch H — model-activity-options enhancements + testing.ts consolidation: - model-activity-options.ts: cancellationType defaults to ActivityCancellationType.TRY_CANCEL so cancellations reach the activity cooperatively. JSDoc on every public field. versioningIntent was investigated and dropped — upstream Worker Versioning API is deprecated; using the Worker Deployment API is the path forward whenever that becomes a need. Adding a deprecated option to a new surface would create immediate tech debt. - testing.ts: FakeModel and the former GeneratorFakeModel collapse into a single FakeModel that takes ModelResponse[] | Generator <ModelResponse>. FakeModelProvider takes the same union with a factory variant for the generator side. The deprecated aliases GeneratorFakeModel and GeneratorFakeModelProvider were removed entirely (BREAKING for tests, low surface — only used inside our own test suite). - ResponseBuilders const object exposes text / toolCall / handoff / multiToolCall — namespace-style access in addition to the existing flat exports. Mirrors Python's TestModel grouping. - src/testing.ts barrel re-exports from worker/testing.ts so consumers can import from @temporalio/openai-agents/lib/testing without diving into worker/. The lib/testing path is already used by stubs in packages/test. - Activity-backed model wires cancellationType through to both proxyActivities and proxyLocalActivities options. User overrides win since DEFAULT_MODEL_ACTIVITY_OPTIONS is spread first. Public API removals: - GeneratorFakeModel / GeneratorFakeModelProvider (test utility aliases, not used by production code). 72/72 tests pass. ESLint + Prettier clean. Both audits PASS — code auditor noted T3 proves trace-ID disjointness via UUID uniqueness rather than directly verifying Map scoping; comment auditor flagged 3 stale references in DEFERRED.md / index.ts module-level JSDoc, all fixed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
@temporalio/openai-agents, a Temporal plugin that runs OpenAI Agents SDK workflows as durable Temporal workflows. Model invocations become Temporal activities; the agent loop (tools, handoffs, guardrails) runs deterministically in the workflow sandbox.User-facing API:
Plus
activityAsTool,statelessMcpServer,StatelessMCPServerProvider, tracing utilities, and a publictestingnamespace.Design
Design proposal lives at
openai-agents-proposal-v2.mdin the repo root of the branch's working tree (not committed). High-level: the runner recursively converts the agent graph, swapping each agent's model for aTemporalModelStub. The stub dispatchesgetResponsecalls toinvokeModelActivity, which runs on the activity worker where the realModelProviderlives. The agent loop stays in the workflow, so tool calls and handoffs are durable.Correctness-critical behavior
Handoffobjects (does not mutate); preservesonHandoff,inputType,isEnabled; recursive cycle detection for cyclic handoff graphs.error.status/error.headersand legacyerror.response.*); honorsx-should-retryandretry-afterheaders; derivesModelInvocationErrorsubtypes (.RateLimit,.Authentication,.BadRequest,.ServerError,.Timeout,.Conflict).activityAsTool-produced tools carry a symbol marker; the runner rejects raw functions and FunctionTools built via the baretool()factory, recurses into handoff agents' tools.TemporalFailures buried in the cause chain are unwrapped rather than re-wrapped.Headers,ReadableStream,structuredClone,crypto.randomUUID(deterministic viauuid4()),EventTarget/Event/CustomEvent(workflow-safe, isolated listener errors).What's deferred
See
packages/openai-agents/DEFERRED.md:StatefulMCPServerProvidernexusOperationAsTool(TS SDK doesn't exposeexecuteNexusOperationyet)workflowFailureExceptionTypesregistration (TS SDK doesn't support this concept)testing.AgentEnvironment,testing.ResponseBuildersclassTests
61 integration tests in
packages/test/src/test-openai-agents.tscovering the full feature surface plus bug-reproduction coverage for every audit finding fixed during development (4 audit rounds, ~65 findings addressed). Tests useFakeModelProviderandGeneratorFakeModelProviderfor determinism; optional remote tests against real OpenAI API can be gated behind an env var in a follow-up.Known test flakiness: Full-suite runs occasionally show 1-5 test failures under sequential load with the dev server ("service rate limit exceeded"); the same tests pass reliably when run in isolation via
--match. This is infrastructure flake, not correctness. Tracking for follow-up (potential mitigations:avaretry config,WorkflowEnvironment.createTimeSkipping, splitting the test file).Test plan
pnpm --filter @temporalio/openai-agents exec tsc --buildpnpm --filter @temporalio/test run build:tspnpm --filter @temporalio/test exec ava ./lib/test-openai-agents.jspnpm --filter @temporalio/openai-agents exec eslint src/pnpm --filter @temporalio/openai-agents exec prettier --check src/🤖 Generated with Claude Code