Skip to content

feat(obs): add wrapper telemetry foundations (FER-10253)#14

Merged
FedeZara merged 2 commits intoniels/conver-to-js-actionsfrom
FedeZara/fer-10253-obs-foundations
May 8, 2026
Merged

feat(obs): add wrapper telemetry foundations (FER-10253)#14
FedeZara merged 2 commits intoniels/conver-to-js-actionsfrom
FedeZara/fer-10253-obs-foundations

Conversation

@FedeZara
Copy link
Copy Markdown

@FedeZara FedeZara commented May 6, 2026

Summary

Wraps every JS action through a single telemetry pipeline that emits structured automation_run_started / automation_run_completed / wrapper_failed events to four sinks: a ::fern-telemetry::<json> log line (always), PostHog (always), Sentry (failures only), and the Lightweight API at /v1/automation/events (failures only). Single touchpoint for actions: instrumentAction(name, fn) plus injectFernToken(token) to authenticate the Lightweight API POST.

Linear: FER-10253

Highlights

  • New packages/shared/src/obs/ module — telemetry-client, posthog, sentry, lightweight-api, automation-context, errors, types. All free functions, no class instances exposed.
  • WrapperError(errorCode, message, originalError?) is the single way for wrapper code to attach a stable SCREAMING_SNAKE error code to a thrown exception. Anything that isn't a WrapperError gets classified as UNKNOWN_ERROR.
  • injectFernToken(token) is called from inside instrumentAction's body after parsing — so input-parsing failures still get classified as wrapper_failed via the catch path before the token is available.
  • flushTelemetry() runs from runAction's exit path: awaits in-flight Lightweight API POSTs (Promise.allSettled), then shuts down the PostHog and Sentry SDK queues.
  • All 8 actions wired (preview, generate, upgrade, verify, sync-openapi, setup-cli, resolve-cli, verify-token). CLI invocations in generate / sync-openapi catch + re-throw as WrapperError with action-specific codes (CLI_AUTOMATIONS_GENERATE_FAILED, CLI_GHA_PULL_SPEC_FAILED, CLI_GHA_SYNC_SPECS_FAILED). setup-cli wraps installFernCli and the source-build path in WrapperError("CLI_INSTALL_*", ...).
  • Cancellation path: SIGINT/SIGTERM handler marks outcome=cancelled in saved state; the post phase emits automation_run_completed with status: cancelled. No wrapper_failed from cancellation — it's not a wrapper-side fault.

Follow-ups before this is "live"

Two things need to happen on top of this PR before any telemetry actually fires:

  1. Wire the actual build constants. POSTHOG_API_KEY, SENTRY_DSN_AUTOMATIONS, and LIGHTWEIGHT_API_URL in packages/shared/src/obs/build-constants.ts are empty strings. The PostHog and Sentry SDKs initialize as no-ops when their constant is empty; the Lightweight API short-circuits when its URL is empty. Once the automations Sentry project and the /v1/automation/events endpoint exist, hardcode the values here. (PostHog API keys and Sentry DSNs are designed to be embedded in client code — they're write-only at the project level. No CI secret needed.)

  2. Sentry release tagging + source-maps upload. Sentry.init({ release: ... }) is currently unset in packages/shared/src/obs/sentry.ts, and source maps aren't uploaded — so any captured exception today would point at the bundled dist/index.js line numbers, not the original TypeScript. Both depend on a CI-driven release pipeline that bakes a release tag at the moment dist is built. Open question: do we move release builds to CI (the release.yml header already has a TODO about this) so we can run sentry-cli releases new <tag> + sentry-cli releases files upload-sourcemaps keyed to the same tag we pass to Sentry.init? Or do we keep dist commits as-is and run a separate sourcemaps-upload workflow on tag publish?

Either way the value passed to Sentry.init({ release }) and the Sentry CLI's release identifier need to agree, otherwise deobfuscation won't resolve. Worth a separate Linear ticket.

Test plan

  • pnpm typecheck && pnpm check && pnpm test && pnpm build — local clean (typecheck across 10 packages, lint clean across 82 files, 51/51 shared tests pass, all 8 dists rebuilt at ~3.7 MB)
  • Cut a pre-release tag of one action (e.g. setup-cli@v0.0.0-obs1) and trigger a workflow that uses it. Verify ::fern-telemetry:: log lines appear with the right event names and that PostHog / Sentry stay silent (constants still empty). Once constants are wired (follow-up 1), repeat and confirm events show up in PostHog / Sentry.
  • Force a non-zero CLI exit inside a test workflow that uses generate or sync-openapi, verify wrapper_failed log line carries the action-specific error_code (e.g. CLI_AUTOMATIONS_GENERATE_FAILED).

@FedeZara FedeZara requested a review from Swimburger as a code owner May 6, 2026 22:07
@FedeZara FedeZara force-pushed the FedeZara/fer-10253-obs-foundations branch 2 times, most recently from a3a904e to d8dc2d6 Compare May 7, 2026 11:04
@Swimburger
Copy link
Copy Markdown
Member

From Claude:

Review

Overall this is a well-structured PR — clean module boundaries, good no-op-by-default story, solid test coverage on the lifecycle paths. A few items to address before merge, plus some non-blocking follow-ups.

Should fix before merge

1. captureFernAutomationsEvent doesn't match its docstring

automation-event-api.ts:95-100 — the comment says "No-op for non-wrapper_failed events (run-level events are PostHog-only)", but the code unconditionally pushes every event into inflight. postAutomationEvent only guards on isGithubActionsRunner() and URL emptiness, not event type. Once AUTOMATION_EVENT_API_URL is wired, automation_run_started and automation_run_completed will start POSTing too.

Either add the event-type guard:

export function captureFernAutomationsEvent(event, context) {
  if (event.event !== EventName.WrapperFailed) return;
  inflight.push(postAutomationEvent(event, context));
}

…or update the docstring + PR description to reflect that all three event types flow through this sink.

2. Sentry environment fragments per branch

sentry.ts:17environment: process.env.GITHUB_REF_NAME ?? "unknown" makes every PR branch its own Sentry environment, which fragments the dashboard and breaks alert rules that key off environment:production. Conventional usage is a small fixed set (production / staging). Suggest hardcoding "production" (since this only fires on a runner) and putting the branch in a tag — scope.setTag("config_branch", …) already does this.

3. Naming inconsistency: "Lightweight API" vs "Automation Event API"

The PR description says "Lightweight API at /v1/automation/events" but the code uses AUTOMATION_EVENT_API_URL / automation-event-api.ts / captureFernAutomationsEvent. Pick one name and apply consistently — the PR description is the most-visible spot to update. Also: stale references to enqueueWrapperFailed (the old name) remain in automation-event-api.ts:103 docstring and as local aliases in sinks-noop.test.ts.

Non-blocking — worth a follow-up

SIGINT/SIGTERM handler doesn't flush before exit

telemetry.ts:36-42process.exit(code) drops any in-flight HTTP from PostHog/Sentry in the main process. Lower impact than it sounds: the post phase still emits automation_run_completed from a separate Node process, so this only loses the automation_run_started network flush. Suggested fix when you get to it:

let shuttingDown = false;
const onSignal = (signal: "SIGINT" | "SIGTERM", code: number) => async () => {
  if (shuttingDown) return;
  shuttingDown = true;
  core.saveState(STATE_OUTCOME, RunStatus.Cancelled);
  core.info(`::fern-telemetry::received ${signal}, marking run as cancelled`);
  await Promise.race([
    flushTelemetry().catch(() => {}),
    new Promise((resolve) => setTimeout(resolve, 2000)),
  ]);
  process.exit(code);
};

GitHub Actions waits ~7.5s before SIGKILL, so the 2s cap is safe; the re-entrancy guard prevents a second signal from kicking off a second flush.

Token leakage risk in error_message

PostHog and the Automation Event API receive attributes.error_message derived from err.message. core.setSecret masks tokens in logs but doesn't filter the SDK payload. If a CLI subprocess error message ever interpolates a token (e.g. a logged URL with ?token=…), it ships off-runner. Worth a regex scrubber on error_message before emit, or a documented constraint that wrappers must not let untrusted strings into thrown messages.

cli_version: null is hardcoded

automation-context.ts:28setup-cli and installFernCli already capture the CLI version in stdout and discard it. File a follow-up to thread it into the automation context so events carry it.

Race in inflight array

automation-event-api.ts:106-112 — if captureFernAutomationsEvent runs after shutdownFernAutomations has snapshotted the array, the new promise lands in the freshly-emptied list and is never awaited. Unlikely in practice but worth either a done flag or a comment that this is intentional.

Nits

  • telemetry.ts:97attributes: attributes, → just attributes.
  • index.ts:35-41getRequiredFernToken could use core.getInput("fern-token", { required: true }) for consistency with getRequiredInput.
  • No test for the SIGINT/SIGTERM handler. If you take the async-flush fix above, add coverage.
  • No test verifying the Automation Event API guard fires correctly when GITHUB_ACTIONS=true but URL is empty.

What's solid

  • Free-function module API + module-level singletons is the right shape for one-process-per-action.
  • Empty-constant short-circuit is consistent across all three remote sinks; the "ships safely before backends exist" claim holds.
  • WrapperError taxonomy is clean; the UNKNOWN_ERROR fallback in recordError correctly handles non-WrapperError throws.
  • Lifecycle test coverage on instrumentAction and runPostCleanup is thorough (success/failure/cancelled/missing-state/unparseable-time all covered).
  • core.setSecret is applied to every token input.

github_run_url: getGithubRunUrl(),
org: extractOrg(repository),
config_repo: repository,
config_commit_sha: env.FERN_CONFIG_COMMIT_SHA ?? env.GITHUB_SHA ?? "",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we fallback to undefined instead?

*/
export function getAutomationContext(): AutomationContext {
const env = process.env;
const repository = env.FERN_CONFIG_REPO ?? env.GITHUB_REPOSITORY ?? "";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we fallback to undefined instead?

org: extractOrg(repository),
config_repo: repository,
config_commit_sha: env.FERN_CONFIG_COMMIT_SHA ?? env.GITHUB_SHA ?? "",
config_branch: env.FERN_CONFIG_BRANCH ?? env.GITHUB_HEAD_REF ?? env.GITHUB_REF_NAME ?? "",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we fallback to undefined instead?

config_commit_sha: env.FERN_CONFIG_COMMIT_SHA ?? env.GITHUB_SHA ?? "",
config_branch: env.FERN_CONFIG_BRANCH ?? env.GITHUB_HEAD_REF ?? env.GITHUB_REF_NAME ?? "",
config_pr_number:
env.FERN_CONFIG_PR_NUMBER ?? extractPrNumberFromGithubRef(env.GITHUB_REF) ?? null,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we fallback to undefined instead?

config_branch: env.FERN_CONFIG_BRANCH ?? env.GITHUB_HEAD_REF ?? env.GITHUB_REF_NAME ?? "",
config_pr_number:
env.FERN_CONFIG_PR_NUMBER ?? extractPrNumberFromGithubRef(env.GITHUB_REF) ?? null,
trigger: env.GITHUB_EVENT_NAME ?? "",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we fallback to undefined instead?

Comment thread packages/shared/src/obs/posthog.ts Outdated
if (client !== null) {
return client;
}
if (!isGithubActionsRunner() || POSTHOG_API_KEY.length === 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

POSTHOG_API_KEY.length === 0 can be shortened to POSTHOG_API_KEY. It'll be falsy if null, undefined, or an empty string.

*/
export function capturePostHogEvent(event: TelemetryEvent, context: AutomationContext): void {
const c = getClient();
if (c === null) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be shortened to if (c) { return; }

Comment thread packages/shared/src/obs/types.ts Outdated
config_repo: string;
config_commit_sha: string;
config_branch: string;
config_pr_number: string | null;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would make this config_pr_number?: string or config_pr_number: string | undefined

Comment thread packages/shared/src/obs/types.ts Outdated
config_branch: string;
config_pr_number: string | null;
trigger: string;
cli_version: string | null;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cli_version?: or cli_version: string | undefined

Comment on lines +22 to +24
export function injectFernToken(token: string): void {
fernToken = token.length > 0 ? token : null;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feel hacky, is there a way around this?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really if we want to have also the parsing of the inputs instrumented

@FedeZara FedeZara force-pushed the FedeZara/fer-10253-obs-foundations branch 3 times, most recently from 85cf925 to 1cfd61e Compare May 8, 2026 11:24
Wires every JS action through `instrumentAction` + `runPostCleanup` so each
run emits structured `automation_run_started` / `automation_run_completed` /
`wrapper_failed` events to four sinks: a `::fern-telemetry::<json>` log line,
PostHog (always), Sentry (`wrapper_failed` only), and the Lightweight API
(`/v1/automation/events`, `wrapper_failed` only).

- New `packages/shared/src/obs/` module — telemetry-client, posthog, sentry,
  lightweight-api, automation-context, errors, types. All free functions; no
  class instances exposed.
- `WrapperError(errorCode, message, originalError?)` is the single way for
  wrapper code to attach a stable SCREAMING_SNAKE error code to a thrown
  exception. Errors that aren't `WrapperError` get the generic `UNKNOWN_ERROR`
  code at classification time.
- `injectFernToken(token)` (called from inside `instrumentAction`'s body
  after parsing) configures the Lightweight API auth so input-parsing
  failures still get classified before the token is available.
- `flushTelemetry()` (called from `runAction` before `process.exit`) awaits
  every in-flight Lightweight API request, then shuts down the PostHog and
  Sentry SDK queues.
- All 8 actions wired (preview, generate, upgrade, verify, sync-openapi,
  setup-cli, resolve-cli, verify-token). CLI invocations in `generate` and
  `sync-openapi` catch + re-throw as `WrapperError` with action-specific
  codes (`CLI_AUTOMATIONS_GENERATE_FAILED`, `CLI_GHA_PULL_SPEC_FAILED`,
  `CLI_GHA_SYNC_SPECS_FAILED`). `setup-cli` wraps `installFernCli` and
  `buildCliFromSource` likewise (`CLI_INSTALL_*`).

Hardcoded build constants (`POSTHOG_API_KEY`, `SENTRY_DSN_AUTOMATIONS`,
`LIGHTWEIGHT_API_URL`) are empty strings today; telemetry is a runtime no-op
until they're populated. Sentry release tagging and source-maps upload also
pending — see PR description for the follow-ups.

Refs FER-10253.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@FedeZara FedeZara force-pushed the FedeZara/fer-10253-obs-foundations branch from 1cfd61e to 0d89477 Compare May 8, 2026 11:50
@FedeZara FedeZara merged commit 43508ef into niels/conver-to-js-actions May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants