Skip to content

feat(openai-agents): add @temporalio/openai-agents package#2024

Draft
xumaple wants to merge 8 commits intomainfrom
maplexu/openai-agents-plugin
Draft

feat(openai-agents): add @temporalio/openai-agents package#2024
xumaple wants to merge 8 commits intomainfrom
maplexu/openai-agents-plugin

Conversation

@xumaple
Copy link
Copy Markdown

@xumaple xumaple commented Apr 24, 2026

Summary

Adds @temporalio/openai-agents, a Temporal plugin that runs OpenAI Agents SDK workflows as durable Temporal workflows. Model invocations become Temporal activities; the agent loop (tools, handoffs, guardrails) runs deterministically in the workflow sandbox.

User-facing API:

// Worker side
new OpenAIAgentsPlugin({ modelProvider: new OpenAIProvider() });

// Workflow side
const runner = createTemporalRunner({ startToCloseTimeout: '2m' });
const result = await runner.run(agent, input);

Plus activityAsTool, statelessMcpServer, StatelessMCPServerProvider, tracing utilities, and a public testing namespace.

Design

Design proposal lives at openai-agents-proposal-v2.md in the repo root of the branch's working tree (not committed). High-level: the runner recursively converts the agent graph, swapping each agent's model for a TemporalModelStub. The stub dispatches getResponse calls to invokeModelActivity, which runs on the activity worker where the real ModelProvider lives. The agent loop stays in the workflow, so tool calls and handoffs are durable.

Correctness-critical behavior

  • Handoff conversion — shallow-clones user's Handoff objects (does not mutate); preserves onHandoff, inputType, isEnabled; recursive cycle detection for cyclic handoff graphs.
  • Error classification — inspects OpenAI SDK error shape (both modern error.status / error.headers and legacy error.response.*); honors x-should-retry and retry-after headers; derives ModelInvocationError subtypes (.RateLimit, .Authentication, .BadRequest, .ServerError, .Timeout, .Conflict).
  • Tool validationactivityAsTool-produced tools carry a symbol marker; the runner rejects raw functions and FunctionTools built via the bare tool() factory, recurses into handoff agents' tools.
  • AgentsWorkflowError — wraps non-Temporal errors; TemporalFailures buried in the cause chain are unwrapped rather than re-wrapped.
  • Sandbox polyfillsHeaders, ReadableStream, structuredClone, crypto.randomUUID (deterministic via uuid4()), EventTarget/Event/CustomEvent (workflow-safe, isolated listener errors).

What's deferred

See packages/openai-agents/DEFERRED.md:

  • StatefulMCPServerProvider
  • nexusOperationAsTool (TS SDK doesn't expose executeNexusOperation yet)
  • OpenTelemetry trace interceptor
  • workflowFailureExceptionTypes registration (TS SDK doesn't support this concept)
  • testing.AgentEnvironment, testing.ResponseBuilders class

Tests

61 integration tests in packages/test/src/test-openai-agents.ts covering the full feature surface plus bug-reproduction coverage for every audit finding fixed during development (4 audit rounds, ~65 findings addressed). Tests use FakeModelProvider and GeneratorFakeModelProvider for determinism; optional remote tests against real OpenAI API can be gated behind an env var in a follow-up.

Known test flakiness: Full-suite runs occasionally show 1-5 test failures under sequential load with the dev server ("service rate limit exceeded"); the same tests pass reliably when run in isolation via --match. This is infrastructure flake, not correctness. Tracking for follow-up (potential mitigations: ava retry config, WorkflowEnvironment.createTimeSkipping, splitting the test file).

Test plan

  • pnpm --filter @temporalio/openai-agents exec tsc --build
  • pnpm --filter @temporalio/test run build:ts
  • pnpm --filter @temporalio/test exec ava ./lib/test-openai-agents.js
  • pnpm --filter @temporalio/openai-agents exec eslint src/
  • pnpm --filter @temporalio/openai-agents exec prettier --check src/
  • Manually wire up a simple agent workflow with a real OpenAI key and verify end-to-end

🤖 Generated with Claude Code

xumaple and others added 3 commits April 24, 2026 11:51
Integration for running OpenAI Agents SDK workflows as durable Temporal
workflows. Model calls become Temporal activities; the agent loop
(tools, handoffs, guardrails) runs in the workflow.

Package layout:
- plugin: OpenAIAgentsPlugin wires model activity + MCP providers
- workflow-side: createTemporalRunner(), activityAsTool(),
  statelessMcpServer(), tracing utilities
- activity-side: invokeModelActivity, error classification,
  retry-after header support, auto-heartbeating
- testing namespace: FakeModel, FakeModelProvider, GeneratorFakeModel,
  ResponseBuilders (textResponse/toolCallResponse/handoffResponse/
  multiToolCallResponse)

Correctness-critical behavior:
- Handoff conversion: shallow-clones user's Handoff objects to avoid
  mutation; preserves onHandoff, inputType, isEnabled callbacks;
  recursive cycle detection for cyclic handoff graphs
- Error classification: inspects OpenAI SDK error shape (direct
  .status/.headers and legacy .response.*), honors x-should-retry +
  retry-after headers, derives ModelInvocationError subtypes from
  status code
- Tool validation: tags activityAsTool-wrapped tools with a symbol
  marker, rejects raw functions and FunctionTools built via the bare
  tool() factory, recurses into handoff agents
- AgentsWorkflowError wraps non-Temporal errors as cause of
  ApplicationFailure; TemporalFailures in the cause chain are
  unwrapped rather than re-wrapped

Known deferrals (documented in src/index.ts):
- StatefulMCPServerProvider, nexusOperationAsTool, OTel trace
  interceptor, workflowFailureExceptionTypes registration,
  testing.AgentEnvironment, testing.ResponseBuilders class

Tests: 61 integration tests covering the full feature surface plus
bug-reproduction coverage for every fix landed across 4 audit rounds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e TemporalModelStub to ActivityBackedModel, drop unused plugin options

Side-aware src layout — directory enforces what was previously a
convention. workflow/ holds the agent loop + activity proxies that run
in the workflow sandbox; worker/ holds the plugin and activity
implementations; common/ holds types referenced by both sides.

mcp.ts splits into workflow/mcp-client.ts (statelessMcpServer factory
and types) and worker/mcp-provider.ts (StatelessMCPServerProvider
class). ActivityModelInput moves to common/ since both sides type-
reference it.

Renames:
- TemporalModelStub -> ActivityBackedModel. "Stub" reads as
  "test-double" to most modern callers; the class is production hot-
  path code that proxies model calls to an activity. New name says
  what it is.
- src/workflow.ts and src/index.ts now re-export from the
  side-specific dirs; their public surface is unchanged.

Plugin cleanup:
- Drop unused modelParams field from OpenAIAgentsPluginOptions.
  Constructor never read it; runtime config lives on
  createTemporalRunner({modelParams}) workflow-side.
- Drop createOpenAIAgentsPlugin factory. Other SDK plugins
  (AiSdkPlugin, OpenTelemetryPlugin) export class only; users do
  `new OpenAIAgentsPlugin({...})` to match.

package.json exports paths updated for the new layout. Backward-
compat ./lib/* aliases remap to the new locations.

No behavior change. 60/61 tests pass; the one occasional flake is a
known dev-server resource-contention issue tracked separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… serialized-model contract

Workflow→activity model-call boundary now goes through explicit
serialized types and field-by-field projections in both directions.
Replaces ad-hoc destructure-and-strip on activity-backed-model.ts:58
plus `as any` casts on activities.ts:125.

Contract (src/common/serialized-model.ts):
- SerializedModelRequest / SerializedModelResponse — JSON-safe
  projections of upstream ModelRequest / ModelResponse. JsonValue
  replaces unknown on the wire.
- WIRE_VERSION literal field on both, validated activity-side.
  Mismatch throws non-retryable WireVersionMismatch ApplicationFailure
  — protects rolling deploys with workflow code on a different package
  version than the worker.
- signal excluded from SerializedModelRequest by design (AbortSignal
  is not serializable; Temporal cancellation provides the equivalent).

Projections live with their consumers:
- toSerializedModelRequest + fromSerializedModelResponse inline in
  workflow/activity-backed-model.ts (workflow side).
- toSerializedModelResponse + fromSerializedModelRequest inline in
  worker/activities.ts (worker side).

fromSerializedModelResponse reconstitutes Usage via new Usage(...) so
.add() keeps working across multi-turn runs. fromSerializedModelRequest
strips __wireVersion before passing to the upstream model so the
internal protocol field doesn't leak through getResponse().

Public exports:
- SerializedModelRequest, SerializedModelResponse,
  InvokeModelActivityInput, JsonValue, WIRE_VERSION exported from both
  index.ts (worker side) and workflow.ts (workflow side).
- toSerializedModelRequest exported via workflow.ts (test workflow
  imports it).
- toSerializedModelResponse exported via index.ts.

Tests added in packages/test/src/test-openai-agents.ts:
- Round-trip: prompt + tracing survive workflow→activity→workflow,
  Usage round-trips with working .add(), __wireVersion stripped from
  both directions.
- Stripping: signal absent activity-side after additive projection.
- Version mismatch: stale __wireVersion throws WireVersionMismatch.
- Snapshot: shape of toSerializedModelRequest / toSerializedModelResponse
  output matches expected key list — fails loudly with "bump
  WIRE_VERSION" message if upstream adds a field that gets silently
  copied through.

Replaces src/common/activity-model-input.ts with
src/common/serialized-model.ts.

No behavior change to user-facing API. 66/66 tests pass; one occasional
flake under load passes in isolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@socket-security
Copy link
Copy Markdown

socket-security Bot commented Apr 27, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addednpm/​@​openai/​agents-openai@​0.3.9991007899100
Addednpm/​@​openai/​agents-core@​0.3.9991008099100

View full report

xumaple and others added 5 commits April 27, 2026 21:40
Implements TemporalTracingProcessor — a TracingProcessor that maps
OpenAI Agents trace/span events to OTel spans. Removes the
tracingDisabled flag on the internal Runner so trace events fire.

Architecture:
- Workflow-side TracingProcessor (src/workflow/tracing.ts) listens for
  agent loop events (agent, generation, function, handoff, guardrail,
  custom, response, transcription, speech, speech_group, mcp_tools).
- Each event creates a child OTel span via @opentelemetry/api, parented
  under the active OTel context (typically the workflow execution span
  registered by interceptors-opentelemetry).
- Replay-safe: skips span creation during workflow replay via
  isReplaying() guard. End events implicitly skip too — the entry map
  is empty during replay so onSpanEnd / onTraceEnd no-op.
- Idempotent registration: Symbol.for() flag on globalThis ensures
  the processor registers once per workflow isolate.
- Uses addTraceProcessor() (not setTraceProcessors): preserves user
  processors registered before runner construction, no risk of wiping.

Wiring:
- TemporalOpenAIRunner constructor calls
  ensureTracingProcessorRegistered() — registers the processor and
  calls setTracingDisabled(false) to override the upstream NODE_ENV=test
  default that gates trace creation.
- ActivityBackedModel.getResponse() wraps both normal and
  summaryOverride paths in withGenerationSpan() so generation spans
  fire — upstream's built-in adapters do this themselves; ours has
  to mirror the pattern.
- Removed tracingDisabled: true from internal Runner config.

Span attributes split into static (set at start: type, name,
handoffs, output_type, from_agent, to_agent, server) and dynamic
(set at end: tools, model, triggered, result) — avoids redundant
double-set.

Known limitation (documented in code): activity spans from
interceptors-opentelemetry appear as siblings of generation spans,
not children. Proper nesting would require pushing OTel context
before the activity call. Deferred to a follow-up — current
hierarchy still gives users visibility into the agent loop.

Public exports:
- TemporalTracingProcessor (workflow-side)
- ensureTracingProcessorRegistered

Adds @opentelemetry/api ^1.9.0 as regular dep (matches
interceptors-opentelemetry's pattern).

Tests: T1 verifies the tracing path is active and produces trace +
agent + generation/response span events. Full OTel span emission
verification deferred — requires in-memory exporter + interceptors-
opentelemetry workflow bundle integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tyOptions, drop createTemporalRunner factory + createModelActivity export, harden wire layer

Two batches landed together since they share the activity-backed-model.ts
file footprint.

Batch A — public API cleanup:
- ModelActivityParameters → ModelActivityOptions. Matches the *Options
  convention used elsewhere in the SDK (ActivityOptions,
  LocalActivityOptions, WorkerOptions). DEFAULT_MODEL_ACTIVITY_PARAMETERS
  → DEFAULT_MODEL_ACTIVITY_OPTIONS. File model-parameters.ts →
  model-activity-options.ts.
- Drop createTemporalRunner factory. Pure `new` shortcut, no value-add.
  Users do `new TemporalOpenAIRunner(opts)` directly.
- Demote createModelActivity from public exports. The plugin
  auto-registers the activity, so there's no reason to advertise the
  manual-registration bypass route. Function stays in worker/activities.ts
  for the plugin to use; just no longer exported from index.ts.

Batch B — wire / serialization layer hardening:
- Asymmetric exports made consistent. to* projections public (part of
  the wire contract); from* projections private (implementation detail).
- Cast comments at every type-assertion boundary now document why each
  is safe — input/prompt/tracing as JsonValue, Usage data as JsonValue,
  AgentOutputItem[] as JsonValue[]. No `as any` regressions.
- Usage class reconstruction comment clarified: Usage is the only class
  instance needing reconstitution post-wire, since AgentOutputItem
  variants are all Zod-inferred plain objects.
- providerData JSDoc expanded with explicit coercion warning (Date → ISO
  string, Map/Set/class instances flattened by Temporal's JSON codec).
- Inline rationale at activities.ts version-check site (no longer
  references CLEANUP.md, which is a local-only artifact).
- Removed duplicate `signal` exclusion comment from
  activity-backed-model.ts. Canonical comment lives in
  serialized-model.ts.
- tracing field commentary updated to accurately describe ModelTracing
  as an enablement flag (not a context carrier); span-context
  propagation belongs in TemporalTracingProcessor.
- Two new drift-detection tests in test-openai-agents.ts. JSON
  round-trip on populated SerializedModelRequest /
  SerializedModelResponse with deepEqual — fails loudly if upstream
  adds a non-serializable field.

No behavior change. 69/69 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ety fixes

Two batches landed together since they share test-file footprint.

Batch C — convert-agent.ts hardening (7 items):
- Move unwrapTemporalFailure from convert-agent.ts to common/errors.ts.
  It's an error utility, not agent conversion logic; co-locating with
  AgentsWorkflowError gives one place for error helpers.
- Replace `'default'` model-name fallback with explicit
  AgentsWorkflowError thrown at convert time. Users now get a clear
  "no model declared" error at workflow start instead of an opaque
  activity failure on first model call.
- Introduce getAgentInternals() helper in src/workflow/agent-internals.ts
  centralizing the unsafe access to upstream Agent's model/handoffs/tools
  fields. Future upstream type changes touch one file.
- Drop `as Model` cast on agent.clone({ model: activityBackedModel }).
  ActivityBackedModel implements Model and is structurally compatible —
  TS accepts it without the cast.
- Expand setAgent comment to explain why the original (pre-clone) agent
  is bound to the summary provider, not the cloned wrapper.
- Add CLEANUP-6 test asserting the Object.create-based Handoff clone
  preserves all upstream-documented fields and prototype identity.
  Future upstream Handoff additions trip this test.
- Top-of-file contract block in convert-agent.ts pinning the implicit
  upstream contracts (Agent.clone, Handoff.onInvokeHandoff,
  Agent.handoffs) to @openai/agents-core ~0.3.0 — checklist for dep
  upgrades.

Batch D — tracing replay-safety fixes:
- Symmetric replay gating: onTraceEnd and onSpanEnd now have the same
  isReplaying() guard as their start counterparts. Matches Python's
  uniform gating; no longer relies on Map-empty-after-replay as the
  implicit gate.
- Workflow-scoped spans Map: Map<workflowId, Map<spanId, SpanEntry>>
  instead of a flat Map shared across all workflows in the V8 isolate.
  Cross-workflow leaks impossible. Outer entry is auto-cleaned when
  the inner Map empties.
- Document that deterministic trace/span IDs come from the
  crypto.randomUUID polyfill in load-polyfills.ts (delegates to
  workflow.uuid4(), which is per-workflow seeded). No code change for
  this — verified via the new replay-safety test.
- Add T2 replay-safety test using maxCachedWorkflows: 0 to force
  workflow eviction after every task. Asserts replay actually occurred
  AND the workflow completes without NondeterminismError. Proves the
  trace processor's replay gating + the polyfill-backed deterministic
  IDs hold up under forced replay.
- Delete getWorkflowTracingConfig() — was a dead function that always
  returned 'enabled'. Removed from public exports.
- Expand JSDoc on ensureTracingProcessorRegistered documenting the
  global side-effect (mutates upstream's processor list) and the
  per-isolate singleton behavior so users aren't surprised.

71/71 tests pass. ESLint + Prettier clean. Both batches reviewed by
code-auditor and comment-auditor and all findings applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(Python parity)

Single combined batch — runner.ts cleanups and the validateTools rewrite
share the same code path. 8 runner.ts items + tool validation relax.

Runner cleanups:
- Remove runStreamed method entirely. Calling it now produces a clean
  TypeError ("not a function") instead of a custom throw. We don't
  extend Runner so we have no obligation to expose it.
- Drop AgentsWorkflowError class. The string 'AgentsWorkflowError' on
  ApplicationFailure.type already serves as the marker; the wrapper
  class added duplication and an extra hop on the cause chain. Now
  errors flow directly: ApplicationFailure(type='AgentsWorkflowError',
  cause: originalError). BREAKING: the class is no longer exported.
  Users doing `instanceof AgentsWorkflowError` should switch to
  ApplicationFailure type-tag checks.
- Tighten TemporalRunOptions.runConfig.model to `string` only.
  Workflow can't serialize Model objects across the activity boundary,
  so accepting `string | Model` was a typed lie. Runtime guard
  deleted; TypeScript catches misuse at the call site.
- Forward all upstream RunConfig fields explicitly to internalRunner.
  Previously dropped silently: handoffInputFilter, inputGuardrails,
  outputGuardrails, modelSettings, tracingDisabled,
  traceIncludeSensitiveData, workflowName, traceId, groupId,
  traceMetadata, conversationId, session, sessionInputCallback,
  callModelInputFilter, tracing. Each new field has a JSDoc comment
  noting Temporal-specific caveats (e.g. guardrails must be
  deterministic, signal omitted in favor of CancellationScope).

Convert-agent + tool validation:
- Fold validateTools into convertAgent. Single graph traversal now
  handles validation, model conversion, and Handoff cloning — was
  three full walks. ~40 lines of duplication removed.
- Drop TEMPORAL_ACTIVITY_TOOL_MARKER gate. Upstream tool() factory
  products are now accepted inline in workflow context — Python
  parity. Raw functions still rejected. The marker constant stays
  for debugging / future introspection.
- Tool-type allowlist comment in convertAgent listing every accepted
  upstream tool variant alphabetically. Note that ApplyPatch /
  Computer / Shell tools pass validation but will fail at runtime in
  the sandbox (require local I/O).
- getAgentInternals helper now used for tools as well as
  model/handoffs — single source of truth for upstream Agent
  property access.
- Tighten convert-agent.ts error message for non-string Model. Points
  users at runConfig.model: string as the override path.

Tests:
- E3/F20 inverted from rejection-test to inline-success-test. A
  deterministic tool() product runs in the workflow without
  activityAsTool; verifies the output round-trips.
- C3/F27 simplified to verify TypeError surfaces as
  WorkflowFailedError. No specific message check since the method
  doesn't exist at all now.
- C1/F7 updated: assertion on causeName is now 'Error' (the original
  error directly on cause), not 'AgentsWorkflowError' (which would
  imply a wrapper).
- H2 raw-function-rejection tests still pass; message strings
  updated to match the tightened error wording.

Migration note: AgentsWorkflowError class removed from public API.
Users identifying these errors should switch from
`e instanceof AgentsWorkflowError` to checking
`(e as ApplicationFailure).type === 'AgentsWorkflowError'` on the
serialized failure.

71/71 tests pass. ESLint + Prettier clean. Both audits PASS — only
note was the breaking class removal, intentional per CLEANUP.md spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ugin/options polish

Two batches landed in parallel — they touch disjoint primary files but
share test-file footprint, so committing as one.

Batch F — tracing remaining gaps + plugin modelParams + concurrent test:

- New TemporalOpenAIRunnerOptions extends ModelActivityOptions with
  startSpansInReplay?: boolean. Plumbed through to
  TemporalTracingProcessor via ensureTracingProcessorRegistered, so
  callers can opt into emitting spans during replay for debugging
  replay-divergence issues. Default false.

- TemporalTracingProcessor's four event methods now gate via a single
  shouldSkip() helper that respects startSpansInReplay. Previously
  used four inline isReplaying() checks.

- Activity-span nesting comment in tracing.ts updated. Previously said
  "deferred to a follow-up"; now correctly explains that activity
  spans nest under generation spans when @temporalio/interceptors-
  opentelemetry is configured. Investigated worker-side TracingProcessor
  and concluded it isn't needed — OpenAI Agents SDK trace events fire
  workflow-side only; the activity span just needs OTel context
  propagation, which interceptors-opentelemetry already provides.

- OpenAIAgentsPluginOptions accepts modelParams?: ModelActivityOptions
  as a config-surface field. The plugin runs worker-side and can't
  inject config into the V8 workflow sandbox, so users must still pass
  modelParams to new TemporalOpenAIRunner(options) in workflow code.
  JSDoc explains this honestly; future versions may auto-propagate
  via workflow interceptors.

- T3 test verifies two concurrent workflows on one worker have
  isolated trace IDs (no cross-pollination), exercising the
  workflow-scoped Map added in 2d7949a.

Batch H — model-activity-options enhancements + testing.ts
consolidation:

- model-activity-options.ts: cancellationType defaults to
  ActivityCancellationType.TRY_CANCEL so cancellations reach the
  activity cooperatively. JSDoc on every public field. versioningIntent
  was investigated and dropped — upstream Worker Versioning API is
  deprecated; using the Worker Deployment API is the path forward
  whenever that becomes a need. Adding a deprecated option to a new
  surface would create immediate tech debt.

- testing.ts: FakeModel and the former GeneratorFakeModel collapse
  into a single FakeModel that takes ModelResponse[] | Generator
  <ModelResponse>. FakeModelProvider takes the same union with a
  factory variant for the generator side. The deprecated aliases
  GeneratorFakeModel and GeneratorFakeModelProvider were removed
  entirely (BREAKING for tests, low surface — only used inside our
  own test suite).

- ResponseBuilders const object exposes text / toolCall / handoff /
  multiToolCall — namespace-style access in addition to the existing
  flat exports. Mirrors Python's TestModel grouping.

- src/testing.ts barrel re-exports from worker/testing.ts so
  consumers can import from @temporalio/openai-agents/lib/testing
  without diving into worker/. The lib/testing path is already used
  by stubs in packages/test.

- Activity-backed model wires cancellationType through to both
  proxyActivities and proxyLocalActivities options. User overrides
  win since DEFAULT_MODEL_ACTIVITY_OPTIONS is spread first.

Public API removals:
- GeneratorFakeModel / GeneratorFakeModelProvider (test utility
  aliases, not used by production code).

72/72 tests pass. ESLint + Prettier clean. Both audits PASS — code
auditor noted T3 proves trace-ID disjointness via UUID uniqueness
rather than directly verifying Map scoping; comment auditor flagged
3 stale references in DEFERRED.md / index.ts module-level JSDoc, all
fixed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant