feat(openai-agents): add @temporalio/openai-agents package by xumaple · Pull Request #2024 · temporalio/sdk-typescript

xumaple · 2026-04-24T15:52:18Z

Summary

Adds @temporalio/openai-agents, a Temporal plugin that runs OpenAI Agents SDK workflows as durable Temporal workflows. Model invocations become Temporal activities; the agent loop (tools, handoffs, guardrails) runs deterministically in the workflow sandbox.

User-facing API:

// Worker side
new OpenAIAgentsPlugin({ modelProvider: new OpenAIProvider() });

// Workflow side
const runner = createTemporalRunner({ startToCloseTimeout: '2m' });
const result = await runner.run(agent, input);

Plus activityAsTool, statelessMcpServer, StatelessMCPServerProvider, tracing utilities, and a public testing namespace.

Design

Design proposal lives at openai-agents-proposal-v2.md in the repo root of the branch's working tree (not committed). High-level: the runner recursively converts the agent graph, swapping each agent's model for a TemporalModelStub. The stub dispatches getResponse calls to invokeModelActivity, which runs on the activity worker where the real ModelProvider lives. The agent loop stays in the workflow, so tool calls and handoffs are durable.

Correctness-critical behavior

Handoff conversion — shallow-clones user's Handoff objects (does not mutate); preserves onHandoff, inputType, isEnabled; recursive cycle detection for cyclic handoff graphs.
Error classification — inspects OpenAI SDK error shape (both modern error.status / error.headers and legacy error.response.*); honors x-should-retry and retry-after headers; derives ModelInvocationError subtypes (.RateLimit, .Authentication, .BadRequest, .ServerError, .Timeout, .Conflict).
Tool validation — activityAsTool-produced tools carry a symbol marker; the runner rejects raw functions and FunctionTools built via the bare tool() factory, recurses into handoff agents' tools.
AgentsWorkflowError — wraps non-Temporal errors; TemporalFailures buried in the cause chain are unwrapped rather than re-wrapped.
Sandbox polyfills — Headers, ReadableStream, structuredClone, crypto.randomUUID (deterministic via uuid4()), EventTarget/Event/CustomEvent (workflow-safe, isolated listener errors).

What's deferred

See packages/openai-agents/DEFERRED.md:

StatefulMCPServerProvider
nexusOperationAsTool (TS SDK doesn't expose executeNexusOperation yet)
OpenTelemetry trace interceptor
workflowFailureExceptionTypes registration (TS SDK doesn't support this concept)
testing.AgentEnvironment, testing.ResponseBuilders class

Tests

61 integration tests in packages/test/src/test-openai-agents.ts covering the full feature surface plus bug-reproduction coverage for every audit finding fixed during development (4 audit rounds, ~65 findings addressed). Tests use FakeModelProvider and GeneratorFakeModelProvider for determinism; optional remote tests against real OpenAI API can be gated behind an env var in a follow-up.

Known test flakiness: Full-suite runs occasionally show 1-5 test failures under sequential load with the dev server ("service rate limit exceeded"); the same tests pass reliably when run in isolation via --match. This is infrastructure flake, not correctness. Tracking for follow-up (potential mitigations: ava retry config, WorkflowEnvironment.createTimeSkipping, splitting the test file).

Test plan

pnpm --filter @temporalio/openai-agents exec tsc --build
pnpm --filter @temporalio/test run build:ts
pnpm --filter @temporalio/test exec ava ./lib/test-openai-agents.js
pnpm --filter @temporalio/openai-agents exec eslint src/
pnpm --filter @temporalio/openai-agents exec prettier --check src/
Manually wire up a simple agent workflow with a real OpenAI key and verify end-to-end

🤖 Generated with Claude Code

Integration for running OpenAI Agents SDK workflows as durable Temporal workflows. Model calls become Temporal activities; the agent loop (tools, handoffs, guardrails) runs in the workflow. Package layout: - plugin: OpenAIAgentsPlugin wires model activity + MCP providers - workflow-side: createTemporalRunner(), activityAsTool(), statelessMcpServer(), tracing utilities - activity-side: invokeModelActivity, error classification, retry-after header support, auto-heartbeating - testing namespace: FakeModel, FakeModelProvider, GeneratorFakeModel, ResponseBuilders (textResponse/toolCallResponse/handoffResponse/ multiToolCallResponse) Correctness-critical behavior: - Handoff conversion: shallow-clones user's Handoff objects to avoid mutation; preserves onHandoff, inputType, isEnabled callbacks; recursive cycle detection for cyclic handoff graphs - Error classification: inspects OpenAI SDK error shape (direct .status/.headers and legacy .response.*), honors x-should-retry + retry-after headers, derives ModelInvocationError subtypes from status code - Tool validation: tags activityAsTool-wrapped tools with a symbol marker, rejects raw functions and FunctionTools built via the bare tool() factory, recurses into handoff agents - AgentsWorkflowError wraps non-Temporal errors as cause of ApplicationFailure; TemporalFailures in the cause chain are unwrapped rather than re-wrapped Known deferrals (documented in src/index.ts): - StatefulMCPServerProvider, nexusOperationAsTool, OTel trace interceptor, workflowFailureExceptionTypes registration, testing.AgentEnvironment, testing.ResponseBuilders class Tests: 61 integration tests covering the full feature surface plus bug-reproduction coverage for every fix landed across 4 audit rounds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e TemporalModelStub to ActivityBackedModel, drop unused plugin options Side-aware src layout — directory enforces what was previously a convention. workflow/ holds the agent loop + activity proxies that run in the workflow sandbox; worker/ holds the plugin and activity implementations; common/ holds types referenced by both sides. mcp.ts splits into workflow/mcp-client.ts (statelessMcpServer factory and types) and worker/mcp-provider.ts (StatelessMCPServerProvider class). ActivityModelInput moves to common/ since both sides type- reference it. Renames: - TemporalModelStub -> ActivityBackedModel. "Stub" reads as "test-double" to most modern callers; the class is production hot- path code that proxies model calls to an activity. New name says what it is. - src/workflow.ts and src/index.ts now re-export from the side-specific dirs; their public surface is unchanged. Plugin cleanup: - Drop unused modelParams field from OpenAIAgentsPluginOptions. Constructor never read it; runtime config lives on createTemporalRunner({modelParams}) workflow-side. - Drop createOpenAIAgentsPlugin factory. Other SDK plugins (AiSdkPlugin, OpenTelemetryPlugin) export class only; users do `new OpenAIAgentsPlugin({...})` to match. package.json exports paths updated for the new layout. Backward- compat ./lib/* aliases remap to the new locations. No behavior change. 60/61 tests pass; the one occasional flake is a known dev-server resource-contention issue tracked separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… serialized-model contract Workflow→activity model-call boundary now goes through explicit serialized types and field-by-field projections in both directions. Replaces ad-hoc destructure-and-strip on activity-backed-model.ts:58 plus `as any` casts on activities.ts:125. Contract (src/common/serialized-model.ts): - SerializedModelRequest / SerializedModelResponse — JSON-safe projections of upstream ModelRequest / ModelResponse. JsonValue replaces unknown on the wire. - WIRE_VERSION literal field on both, validated activity-side. Mismatch throws non-retryable WireVersionMismatch ApplicationFailure — protects rolling deploys with workflow code on a different package version than the worker. - signal excluded from SerializedModelRequest by design (AbortSignal is not serializable; Temporal cancellation provides the equivalent). Projections live with their consumers: - toSerializedModelRequest + fromSerializedModelResponse inline in workflow/activity-backed-model.ts (workflow side). - toSerializedModelResponse + fromSerializedModelRequest inline in worker/activities.ts (worker side). fromSerializedModelResponse reconstitutes Usage via new Usage(...) so .add() keeps working across multi-turn runs. fromSerializedModelRequest strips __wireVersion before passing to the upstream model so the internal protocol field doesn't leak through getResponse(). Public exports: - SerializedModelRequest, SerializedModelResponse, InvokeModelActivityInput, JsonValue, WIRE_VERSION exported from both index.ts (worker side) and workflow.ts (workflow side). - toSerializedModelRequest exported via workflow.ts (test workflow imports it). - toSerializedModelResponse exported via index.ts. Tests added in packages/test/src/test-openai-agents.ts: - Round-trip: prompt + tracing survive workflow→activity→workflow, Usage round-trips with working .add(), __wireVersion stripped from both directions. - Stripping: signal absent activity-side after additive projection. - Version mismatch: stale __wireVersion throws WireVersionMismatch. - Snapshot: shape of toSerializedModelRequest / toSerializedModelResponse output matches expected key list — fails loudly with "bump WIRE_VERSION" message if upstream adds a field that gets silently copied through. Replaces src/common/activity-model-input.ts with src/common/serialized-model.ts. No behavior change to user-facing API. 66/66 tests pass; one occasional flake under load passes in isolation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

socket-security · 2026-04-27T22:01:26Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	npm/@openai/agents-openai@0.3.9
	npm/@openai/agents-core@0.3.9

View full report

Implements TemporalTracingProcessor — a TracingProcessor that maps OpenAI Agents trace/span events to OTel spans. Removes the tracingDisabled flag on the internal Runner so trace events fire. Architecture: - Workflow-side TracingProcessor (src/workflow/tracing.ts) listens for agent loop events (agent, generation, function, handoff, guardrail, custom, response, transcription, speech, speech_group, mcp_tools). - Each event creates a child OTel span via @opentelemetry/api, parented under the active OTel context (typically the workflow execution span registered by interceptors-opentelemetry). - Replay-safe: skips span creation during workflow replay via isReplaying() guard. End events implicitly skip too — the entry map is empty during replay so onSpanEnd / onTraceEnd no-op. - Idempotent registration: Symbol.for() flag on globalThis ensures the processor registers once per workflow isolate. - Uses addTraceProcessor() (not setTraceProcessors): preserves user processors registered before runner construction, no risk of wiping. Wiring: - TemporalOpenAIRunner constructor calls ensureTracingProcessorRegistered() — registers the processor and calls setTracingDisabled(false) to override the upstream NODE_ENV=test default that gates trace creation. - ActivityBackedModel.getResponse() wraps both normal and summaryOverride paths in withGenerationSpan() so generation spans fire — upstream's built-in adapters do this themselves; ours has to mirror the pattern. - Removed tracingDisabled: true from internal Runner config. Span attributes split into static (set at start: type, name, handoffs, output_type, from_agent, to_agent, server) and dynamic (set at end: tools, model, triggered, result) — avoids redundant double-set. Known limitation (documented in code): activity spans from interceptors-opentelemetry appear as siblings of generation spans, not children. Proper nesting would require pushing OTel context before the activity call. Deferred to a follow-up — current hierarchy still gives users visibility into the agent loop. Public exports: - TemporalTracingProcessor (workflow-side) - ensureTracingProcessorRegistered Adds @opentelemetry/api ^1.9.0 as regular dep (matches interceptors-opentelemetry's pattern). Tests: T1 verifies the tracing path is active and produces trace + agent + generation/response span events. Full OTel span emission verification deferred — requires in-memory exporter + interceptors- opentelemetry workflow bundle integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tyOptions, drop createTemporalRunner factory + createModelActivity export, harden wire layer Two batches landed together since they share the activity-backed-model.ts file footprint. Batch A — public API cleanup: - ModelActivityParameters → ModelActivityOptions. Matches the *Options convention used elsewhere in the SDK (ActivityOptions, LocalActivityOptions, WorkerOptions). DEFAULT_MODEL_ACTIVITY_PARAMETERS → DEFAULT_MODEL_ACTIVITY_OPTIONS. File model-parameters.ts → model-activity-options.ts. - Drop createTemporalRunner factory. Pure `new` shortcut, no value-add. Users do `new TemporalOpenAIRunner(opts)` directly. - Demote createModelActivity from public exports. The plugin auto-registers the activity, so there's no reason to advertise the manual-registration bypass route. Function stays in worker/activities.ts for the plugin to use; just no longer exported from index.ts. Batch B — wire / serialization layer hardening: - Asymmetric exports made consistent. to* projections public (part of the wire contract); from* projections private (implementation detail). - Cast comments at every type-assertion boundary now document why each is safe — input/prompt/tracing as JsonValue, Usage data as JsonValue, AgentOutputItem[] as JsonValue[]. No `as any` regressions. - Usage class reconstruction comment clarified: Usage is the only class instance needing reconstitution post-wire, since AgentOutputItem variants are all Zod-inferred plain objects. - providerData JSDoc expanded with explicit coercion warning (Date → ISO string, Map/Set/class instances flattened by Temporal's JSON codec). - Inline rationale at activities.ts version-check site (no longer references CLEANUP.md, which is a local-only artifact). - Removed duplicate `signal` exclusion comment from activity-backed-model.ts. Canonical comment lives in serialized-model.ts. - tracing field commentary updated to accurately describe ModelTracing as an enablement flag (not a context carrier); span-context propagation belongs in TemporalTracingProcessor. - Two new drift-detection tests in test-openai-agents.ts. JSON round-trip on populated SerializedModelRequest / SerializedModelResponse with deepEqual — fails loudly if upstream adds a non-serializable field. No behavior change. 69/69 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ety fixes Two batches landed together since they share test-file footprint. Batch C — convert-agent.ts hardening (7 items): - Move unwrapTemporalFailure from convert-agent.ts to common/errors.ts. It's an error utility, not agent conversion logic; co-locating with AgentsWorkflowError gives one place for error helpers. - Replace `'default'` model-name fallback with explicit AgentsWorkflowError thrown at convert time. Users now get a clear "no model declared" error at workflow start instead of an opaque activity failure on first model call. - Introduce getAgentInternals() helper in src/workflow/agent-internals.ts centralizing the unsafe access to upstream Agent's model/handoffs/tools fields. Future upstream type changes touch one file. - Drop `as Model` cast on agent.clone({ model: activityBackedModel }). ActivityBackedModel implements Model and is structurally compatible — TS accepts it without the cast. - Expand setAgent comment to explain why the original (pre-clone) agent is bound to the summary provider, not the cloned wrapper. - Add CLEANUP-6 test asserting the Object.create-based Handoff clone preserves all upstream-documented fields and prototype identity. Future upstream Handoff additions trip this test. - Top-of-file contract block in convert-agent.ts pinning the implicit upstream contracts (Agent.clone, Handoff.onInvokeHandoff, Agent.handoffs) to @openai/agents-core ~0.3.0 — checklist for dep upgrades. Batch D — tracing replay-safety fixes: - Symmetric replay gating: onTraceEnd and onSpanEnd now have the same isReplaying() guard as their start counterparts. Matches Python's uniform gating; no longer relies on Map-empty-after-replay as the implicit gate. - Workflow-scoped spans Map: Map<workflowId, Map<spanId, SpanEntry>> instead of a flat Map shared across all workflows in the V8 isolate. Cross-workflow leaks impossible. Outer entry is auto-cleaned when the inner Map empties. - Document that deterministic trace/span IDs come from the crypto.randomUUID polyfill in load-polyfills.ts (delegates to workflow.uuid4(), which is per-workflow seeded). No code change for this — verified via the new replay-safety test. - Add T2 replay-safety test using maxCachedWorkflows: 0 to force workflow eviction after every task. Asserts replay actually occurred AND the workflow completes without NondeterminismError. Proves the trace processor's replay gating + the polyfill-backed deterministic IDs hold up under forced replay. - Delete getWorkflowTracingConfig() — was a dead function that always returned 'enabled'. Removed from public exports. - Expand JSDoc on ensureTracingProcessorRegistered documenting the global side-effect (mutates upstream's processor list) and the per-isolate singleton behavior so users aren't surprised. 71/71 tests pass. ESLint + Prettier clean. Both batches reviewed by code-auditor and comment-auditor and all findings applied. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…(Python parity) Single combined batch — runner.ts cleanups and the validateTools rewrite share the same code path. 8 runner.ts items + tool validation relax. Runner cleanups: - Remove runStreamed method entirely. Calling it now produces a clean TypeError ("not a function") instead of a custom throw. We don't extend Runner so we have no obligation to expose it. - Drop AgentsWorkflowError class. The string 'AgentsWorkflowError' on ApplicationFailure.type already serves as the marker; the wrapper class added duplication and an extra hop on the cause chain. Now errors flow directly: ApplicationFailure(type='AgentsWorkflowError', cause: originalError). BREAKING: the class is no longer exported. Users doing `instanceof AgentsWorkflowError` should switch to ApplicationFailure type-tag checks. - Tighten TemporalRunOptions.runConfig.model to `string` only. Workflow can't serialize Model objects across the activity boundary, so accepting `string | Model` was a typed lie. Runtime guard deleted; TypeScript catches misuse at the call site. - Forward all upstream RunConfig fields explicitly to internalRunner. Previously dropped silently: handoffInputFilter, inputGuardrails, outputGuardrails, modelSettings, tracingDisabled, traceIncludeSensitiveData, workflowName, traceId, groupId, traceMetadata, conversationId, session, sessionInputCallback, callModelInputFilter, tracing. Each new field has a JSDoc comment noting Temporal-specific caveats (e.g. guardrails must be deterministic, signal omitted in favor of CancellationScope). Convert-agent + tool validation: - Fold validateTools into convertAgent. Single graph traversal now handles validation, model conversion, and Handoff cloning — was three full walks. ~40 lines of duplication removed. - Drop TEMPORAL_ACTIVITY_TOOL_MARKER gate. Upstream tool() factory products are now accepted inline in workflow context — Python parity. Raw functions still rejected. The marker constant stays for debugging / future introspection. - Tool-type allowlist comment in convertAgent listing every accepted upstream tool variant alphabetically. Note that ApplyPatch / Computer / Shell tools pass validation but will fail at runtime in the sandbox (require local I/O). - getAgentInternals helper now used for tools as well as model/handoffs — single source of truth for upstream Agent property access. - Tighten convert-agent.ts error message for non-string Model. Points users at runConfig.model: string as the override path. Tests: - E3/F20 inverted from rejection-test to inline-success-test. A deterministic tool() product runs in the workflow without activityAsTool; verifies the output round-trips. - C3/F27 simplified to verify TypeError surfaces as WorkflowFailedError. No specific message check since the method doesn't exist at all now. - C1/F7 updated: assertion on causeName is now 'Error' (the original error directly on cause), not 'AgentsWorkflowError' (which would imply a wrapper). - H2 raw-function-rejection tests still pass; message strings updated to match the tightened error wording. Migration note: AgentsWorkflowError class removed from public API. Users identifying these errors should switch from `e instanceof AgentsWorkflowError` to checking `(e as ApplicationFailure).type === 'AgentsWorkflowError'` on the serialized failure. 71/71 tests pass. ESLint + Prettier clean. Both audits PASS — only note was the breaking class removal, intentional per CLEANUP.md spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ugin/options polish Two batches landed in parallel — they touch disjoint primary files but share test-file footprint, so committing as one. Batch F — tracing remaining gaps + plugin modelParams + concurrent test: - New TemporalOpenAIRunnerOptions extends ModelActivityOptions with startSpansInReplay?: boolean. Plumbed through to TemporalTracingProcessor via ensureTracingProcessorRegistered, so callers can opt into emitting spans during replay for debugging replay-divergence issues. Default false. - TemporalTracingProcessor's four event methods now gate via a single shouldSkip() helper that respects startSpansInReplay. Previously used four inline isReplaying() checks. - Activity-span nesting comment in tracing.ts updated. Previously said "deferred to a follow-up"; now correctly explains that activity spans nest under generation spans when @temporalio/interceptors- opentelemetry is configured. Investigated worker-side TracingProcessor and concluded it isn't needed — OpenAI Agents SDK trace events fire workflow-side only; the activity span just needs OTel context propagation, which interceptors-opentelemetry already provides. - OpenAIAgentsPluginOptions accepts modelParams?: ModelActivityOptions as a config-surface field. The plugin runs worker-side and can't inject config into the V8 workflow sandbox, so users must still pass modelParams to new TemporalOpenAIRunner(options) in workflow code. JSDoc explains this honestly; future versions may auto-propagate via workflow interceptors. - T3 test verifies two concurrent workflows on one worker have isolated trace IDs (no cross-pollination), exercising the workflow-scoped Map added in 2d7949a. Batch H — model-activity-options enhancements + testing.ts consolidation: - model-activity-options.ts: cancellationType defaults to ActivityCancellationType.TRY_CANCEL so cancellations reach the activity cooperatively. JSDoc on every public field. versioningIntent was investigated and dropped — upstream Worker Versioning API is deprecated; using the Worker Deployment API is the path forward whenever that becomes a need. Adding a deprecated option to a new surface would create immediate tech debt. - testing.ts: FakeModel and the former GeneratorFakeModel collapse into a single FakeModel that takes ModelResponse[] | Generator <ModelResponse>. FakeModelProvider takes the same union with a factory variant for the generator side. The deprecated aliases GeneratorFakeModel and GeneratorFakeModelProvider were removed entirely (BREAKING for tests, low surface — only used inside our own test suite). - ResponseBuilders const object exposes text / toolCall / handoff / multiToolCall — namespace-style access in addition to the existing flat exports. Mirrors Python's TestModel grouping. - src/testing.ts barrel re-exports from worker/testing.ts so consumers can import from @temporalio/openai-agents/lib/testing without diving into worker/. The lib/testing path is already used by stubs in packages/test. - Activity-backed model wires cancellationType through to both proxyActivities and proxyLocalActivities options. User overrides win since DEFAULT_MODEL_ACTIVITY_OPTIONS is spread first. Public API removals: - GeneratorFakeModel / GeneratorFakeModelProvider (test utility aliases, not used by production code). 72/72 tests pass. ESLint + Prettier clean. Both audits PASS — code auditor noted T3 proves trace-ID disjointness via UUID uniqueness rather than directly verifying Map scoping; comment auditor flagged 3 stale references in DEFERRED.md / index.ts module-level JSDoc, all fixed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

xumaple and others added 3 commits April 24, 2026 11:51

xumaple and others added 5 commits April 27, 2026 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(openai-agents): add @temporalio/openai-agents package#2024

feat(openai-agents): add @temporalio/openai-agents package#2024
xumaple wants to merge 8 commits intomainfrom
maplexu/openai-agents-plugin

xumaple commented Apr 24, 2026

Uh oh!

socket-security Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xumaple commented Apr 24, 2026

Summary

Design

Correctness-critical behavior

What's deferred

Tests

Test plan

Uh oh!

socket-security Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

socket-security Bot commented Apr 27, 2026 •

edited

Loading