Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
ba8f268
test(e2e): repro OAuth dead-ends when PAR dies before user completes …
aspiers May 4, 2026
64c50f7
docs(design): PAR expiry, heartbeats, and clean-exit strategy
aspiers May 4, 2026
b1fc940
feat(auth-service): keep PAR alive while user sits on OTP / recovery …
aspiers May 4, 2026
339aba3
test(e2e): cover @par-heartbeat liveness end-to-end
aspiers May 4, 2026
2e4d327
feat(auth-service,pds-core): redirect to OAuth client on sign-in fail…
aspiers May 4, 2026
5183b7b
docs(design): record heartbeat + clean-exit landed in the design doc
aspiers May 4, 2026
8bcb344
fix(demo): mark /client-metadata.json route dynamic so EPDS_CLIENT_TH…
aspiers May 4, 2026
9fb0e69
fix: address PR #154 review feedback (CodeRabbit + Copilot + SonarQub…
aspiers May 4, 2026
3b0ec50
fix(demo): default EPDS_CLIENT_THEME to amber so client-branding e2e …
aspiers May 4, 2026
b6f3ba9
test(e2e): rename "the user can try again" to "the OTP entry boxes ar…
aspiers May 4, 2026
fcab1c7
fix(auth-service): forward no_heartbeat through recovery Verify form too
aspiers May 4, 2026
f3766a9
test(auth-service,shared): bring new heartbeat / clean-exit code to 1…
aspiers May 4, 2026
241807a
fix: address SonarCloud quality-gate findings on PR #154
aspiers May 4, 2026
25f37aa
test(auth-service): unit-test buildEpdsCallbackUrl + ratchet down fun…
aspiers May 4, 2026
7650f14
fix: address third-round PR #154 review feedback
aspiers May 4, 2026
4bc113d
test(auth-service): replace `q.get(name)!` with a typed helper to cle…
aspiers May 4, 2026
26151cd
refactor(shared): extract resolveStartOverHref to fix Sonar dup-block…
aspiers May 4, 2026
3d31876
fix(auth-service): inline "Send a new code" action on expired-OTP error
aspiers May 4, 2026
03958c5
test(e2e): allow inline action button alongside the OTP error message
aspiers May 4, 2026
86cd60f
fix(auth-service): don't offer Resend when it cannot complete the sig…
aspiers May 4, 2026
160f20d
test(auth-service): extract heartbeat-router test harness + replace R…
aspiers May 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .changeset/clean-exit-on-expired-signin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
'ePDS': minor
---

Sign-in pages no longer strand users on a "session expired" dead end.

**Affects:** End users, Client app developers

**End users:** if your sign-in does time out (e.g. you closed the tab and came back later, or you took longer than the heartbeat could cover), you no longer land on a static "Session expired, please start over" page with no way forward. Instead you are redirected back to the app you were signing in to, which can show its own retry button. The app you are signing in to controls the retry copy from there.

**Client app developers:**

- When a sign-in fails because the user took too long, the user is now redirected back to your registered `redirect_uri` with the standard OAuth error parameters (`error=access_denied`, `error_description=…`, and `iss`) instead of being stranded on an ePDS-hosted error page. This is the behaviour RFC 6749 §4.1.2.1 prescribes; ePDS used to short-circuit it with a static "session expired" page on a few internal paths (`/auth/complete`, `/auth/choose-handle`, the recovery flow). All of those paths now bounce back to your client.
- Internal-server-error cases (e.g. better-auth outage) redirect the same way but with `error=server_error`.
- The original `state` parameter is preserved when ePDS still has it (most paths). It is dropped when the upstream PAR row was already gone before ePDS could read it; treat the missing `state` as an anonymous error and start a fresh authorization request.
- For the rare worst case where ePDS cannot identify which OAuth client the user came from (no cookie, no flow row, no signed `client_id` on the callback), the page is now a styled error with a "Return to sign in" button targeting the client's `client_uri` from its OAuth metadata, instead of the previous text-only page.
- No changes required on your side. If you already handle the OAuth error response per spec, the only observable difference is that some flows that previously stranded the user inside ePDS now arrive at your callback with `error=access_denied`.
11 changes: 11 additions & 0 deletions .changeset/demo-client-metadata-runtime-theme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
'ePDS': patch
---

Demo client's published OAuth metadata now actually picks up the operator-set theme.

**Affects:** Operators

**Operators:** the demo's `/client-metadata.json` was previously prerendered at `next build` time, so its `brand_color`, `background_color`, and `branding.css` fields stayed frozen at the defaults regardless of what `EPDS_CLIENT_THEME` was set to in the running container's environment. The route is now flagged `dynamic = 'force-dynamic'`, mirroring the sibling `page.tsx`, so each request re-reads the env. The two-demo split in `docker-compose.yml` (a trusted `demo` with `EPDS_CLIENT_THEME=clay` and an untrusted `demo-untrusted` with the theme blank) now produces visually distinct metadata as intended.

No action required if you already had `EPDS_CLIENT_THEME` set on the demo service — it just begins to take effect.
9 changes: 9 additions & 0 deletions .changeset/par-heartbeat-keeps-slow-signins-alive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
'ePDS': minor
---

Slow sign-ins are less likely to time out before you finish entering your code.

**Affects:** End users

**End users:** if you take a few minutes to find your sign-in code in your inbox before entering it, you will no longer be bounced to a "session expired" page when you submit it. While the sign-in code page (and the recovery-by-backup-email page) is open, the page now quietly tells the server "I'm still here" every few minutes so the OAuth flow it is part of stays alive. Closing the tab or walking away for a long stretch can still expire the flow, in which case the existing error pages still apply — but reading email at human speed should not.
354 changes: 354 additions & 0 deletions docs/design/par-expiry-and-clean-exits.md

Large diffs are not rendered by default.

100 changes: 87 additions & 13 deletions e2e/step-definitions/auth.steps.ts
Original file line number Diff line number Diff line change
Expand Up @@ -566,25 +566,45 @@ async function callPdsExpiryHook(
}

/**
* Read the PAR request_uri from the current auth-service login page
* URL and stash it on the world for subsequent expiry hooks. The URL
* is `https://<auth>/oauth/authorize?request_uri=urn:...&...` while the
* user is on the login/OTP form. Throws if the URL doesn't carry it,
* which means the surrounding scenario is mis-ordered (the page must
* be on the auth-service side before this step runs).
* Read the PAR request_uri from the current auth-service page URL and
* stash it on the world for subsequent expiry hooks. While the user is
* on the login/OTP form the URL is
* `https://<auth>/oauth/authorize?request_uri=urn:...&...`, but on
* downstream pages (e.g. /auth/recover) the parameter has been dropped.
* Falls back to a previously-stashed `world.lastRequestUri` so a
* scenario can capture the URI early (via the dedicated capture step
* below) and consult it after navigation. Throws when neither source
* has a value, which means the scenario is mis-ordered.
*/
function captureRequestUriFromPage(world: EpdsWorld): string {
const page = getPage(world)
const requestUri = new URL(page.url()).searchParams.get('request_uri')
if (!requestUri) {
throw new Error(
`Expected request_uri in page URL but found none: ${page.url()}`,
)
const fromUrl = new URL(page.url()).searchParams.get('request_uri')
if (fromUrl) {
world.lastRequestUri = fromUrl
return fromUrl
}
if (world.lastRequestUri) {
return world.lastRequestUri
}
world.lastRequestUri = requestUri
return requestUri
throw new Error(
`Expected request_uri in page URL or previously captured but found none: ${page.url()}`,
)
}

/**
* Capture and stash the PAR request_uri from the current auth-service
* page URL so a later step can refer to it after navigating away. Used
* by scenarios where the OTP form is left for /auth/recover (recovery)
* or any other downstream page where the request_uri has dropped off
* the URL.
*/
When(
'the PAR request_uri is captured for later expiry',
function (this: EpdsWorld) {
captureRequestUriFromPage(this)
},
)

When(
'the PAR request_uri has expired before the bridge fires',
async function (this: EpdsWorld) {
Expand Down Expand Up @@ -636,3 +656,57 @@ Then(
}
},
)

// ---------------------------------------------------------------------------
// Clean-exit assertions for @otp-and-par-expiry scenarios
// ---------------------------------------------------------------------------
//
// When the upstream PAR is hard-dead (the test hook deletes the row),
// no amount of heartbeat can revive it — but we still owe the user a
// clean exit per RFC 6749 §4.1.2.1: redirect them back to the OAuth
// client's redirect_uri with `error=access_denied` so the client's
// own UI can handle retry. The demo client translates that to
// `?error=auth_failed` on its landing page.

Then(
'the browser lands back at the demo client with an auth error',
async function (this: EpdsWorld) {
const origin = new URL(testEnv.demoUrl).origin
const page = getPage(this)
await page.waitForURL(`${origin}/?error=auth_failed*`, {
timeout: 30_000,
})
},
)

// ---------------------------------------------------------------------------
// PAR heartbeat liveness (@par-heartbeat)
// ---------------------------------------------------------------------------
//
// The OTP form auto-fires a fetch to /auth/ping every 3 minutes. Waiting
// 3 minutes wall-clock is unacceptable for an e2e scenario, so this step
// invokes the same fetch synchronously from the page's own JS context
// — same origin, same cookies, same browser security boundary — and
// asserts the response. That proves the wiring (page can reach
// /auth/ping → auth-service forwards to pds-core's
// /_internal/ping-request → returns 200) without waiting for the
// interval to tick.

Then(
'a heartbeat fetched from the OTP form returns ok:true',
async function (this: EpdsWorld) {
const page = getPage(this)
const body = await page.evaluate(async () => {
const r = await fetch('/auth/ping', {
credentials: 'include',
cache: 'no-store',
})
return (await r.json()) as { ok: boolean; reason?: string }
})
if (!body.ok) {
throw new Error(
`Expected /auth/ping to return ok:true but got: ${JSON.stringify(body)}`,
)
}
},
)
110 changes: 110 additions & 0 deletions features/passwordless-authentication.feature
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,116 @@ Feature: Passwordless authentication via email OTP
Then the browser is redirected back to the demo client
And the demo client has a valid OAuth access token

# --- Hard-dead PAR clean-exit scenarios (@otp-and-par-expiry) ---
#
# These four scenarios reproduce the dead-end pages a real user used
# to trip when the upstream PAR (request_uri) row died before the
# OAuth flow could complete. Upstream's PAR has a 5-min sliding
# inactivity timeout that we cannot bump (hardcoded in
# @atproto/oauth-provider) and that, once tripped, deletes the row
# in the same call that throws — so no in-flight recovery is
# possible.
#
# The fix is twofold (see docs/design/par-expiry-and-clean-exits.md):
#
# 1. While the user is on the OTP / recovery form, a 3-min
# heartbeat to /auth/ping slides the PAR's inactivity timer so
# the row stays alive. This is exercised by @par-heartbeat.
#
# 2. When the PAR is *genuinely* dead (closed tab, walked away,
# explicit upstream cleanup), every dead-end auth-service page
# must redirect the user back to the OAuth client with the
# RFC 6749 §4.1.2.1 error parameters so the client's own UI
# handles retry — never strand them on a static page.
#
# These scenarios use /_internal/test/delete-par to simulate (2) —
# the row is hard-deleted and cannot be revived. The contract they
# assert is therefore "land back at the demo client with an auth
# error" (the demo translates `error=access_denied` to
# `?error=auth_failed` on its landing page), NOT "the OAuth flow
# successfully completes". Layer 1 (heartbeat) is what prevents
# the dead PAR; layer 2 (clean-exit) is what makes recovery
# one-click when prevention fails.
@email @otp-and-par-expiry
Scenario: Expired OTP + expired PAR — clean exit back to the OAuth client
When the demo client initiates an OAuth login
Then the browser is redirected to the auth service login page
And the login page displays an email input form
When the user enters a unique test email and submits
Then an OTP email arrives in the mail trap for the test email
And the login page shows an OTP verification form
When more than 10 minutes pass before the user enters the OTP
And the PAR request_uri has expired before the bridge fires
And the user enters the OTP code
Then the verification form shows an "OTP expired" error
And the user can try again
When the user requests a new OTP via the resend button
Then a fresh OTP email arrives in the mail trap for the test email
When the user enters the OTP code
Then the browser lands back at the demo client with an auth error

@email @otp-and-par-expiry @multiple-resend
Scenario: Two Resend cycles after silent PAR death — clean exit back to the OAuth client
When the demo client initiates an OAuth login
Then the browser is redirected to the auth service login page
And the login page displays an email input form
When the user enters a unique test email and submits
Then an OTP email arrives in the mail trap for the test email
And the login page shows an OTP verification form
When the user requests a new OTP via the resend button
Then a fresh OTP email arrives in the mail trap for the test email
When the PAR request_uri has expired before the bridge fires
And the user requests a new OTP via the resend button
Then a fresh OTP email arrives in the mail trap for the test email
When the user enters the OTP code
Then the browser lands back at the demo client with an auth error

@email @otp-and-par-expiry @prompt-login
Scenario: prompt=login + expired PAR — clean exit back to the OAuth client
Given a returning user has a PDS account
When the demo client starts a new OAuth flow with prompt=login
And the user enters the test email on the login page
Then an OTP email arrives in the mail trap
And the login page shows an OTP verification form
When the PAR request_uri has expired before the bridge fires
And the user enters the OTP code
Then the browser lands back at the demo client with an auth error

@email @otp-and-par-expiry @recovery
Scenario: Recovery via backup email + expired PAR — clean exit back to the OAuth client
Given a returning user has a PDS account
And the test user has a verified backup email
And the demo client initiates OAuth with the test email as login_hint
# /auth/recover is a plain URL with no request_uri query parameter,
# so we must snapshot the request_uri while we're still on the
# original /oauth/authorize page (the login_hint step lands here).
And the PAR request_uri is captured for later expiry
When the user navigates to the recovery page
And the user enters the backup email on the recovery page
Then an OTP email arrives in the mail trap for the backup email
When the PAR request_uri has expired before the bridge fires
And the user enters the recovery OTP
Then the browser lands back at the demo client with an auth error

# --- PAR heartbeat (live PAR, no delete) ---
#
# Proves the in-page heartbeat reaches /auth/ping and that
# /auth/ping forwards to pds-core's /_internal/ping-request,
# i.e. the wiring works end-to-end through Caddy. This is
# heartbeat liveness, not heartbeat efficacy — the unit tests in
# heartbeat.test.ts cover routing logic. We don't wall-clock-wait
# 5 min for the timer to lapse; instead the browser fires the
# heartbeat fetch synchronously from the OTP form's page context
# and we assert the response shape.
@email @par-heartbeat
Scenario: OTP form's heartbeat reaches /auth/ping with ok:true
When the demo client initiates an OAuth login
Then the browser is redirected to the auth service login page
When the user enters a unique test email and submits
Then an OTP email arrives in the mail trap for the test email
And the login page shows an OTP verification form
Then a heartbeat fetched from the OTP form returns ok:true

# --- PAR (request_uri) expiry ---
#
# The PAR ("Pushed Authorization Request") record lives in pds-core's
Expand Down
119 changes: 119 additions & 0 deletions packages/auth-service/src/__tests__/heartbeat-toggle.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
/**
* Tests for the server-side and rendered-page sides of the PAR-heartbeat
* test toggle.
*
* Two surfaces:
* 1. heartbeatEnabledFor(req) — the function login-page.ts and
* recovery.ts call to decide whether to inline the heartbeat
* fetch into the page.
* 2. The rendered HTML — a smoke test that proves the OTP form and
* recovery OTP form actually contain (or omit) the /auth/ping
* fetch wired up by the toggle.
*/
import { describe, it, expect, afterEach } from 'vitest'
import type { Request } from 'express'
import { heartbeatEnabledFor, renderLoginPage } from '../routes/login-page.js'
import { renderRecoveryOtpForm } from '../routes/recovery.js'

const ORIGINAL_TEST_HOOKS = process.env.EPDS_TEST_HOOKS

afterEach(() => {
if (ORIGINAL_TEST_HOOKS === undefined) delete process.env.EPDS_TEST_HOOKS
else process.env.EPDS_TEST_HOOKS = ORIGINAL_TEST_HOOKS
})

function reqWith(query: Record<string, string>): Request {
return { query } as unknown as Request
}

describe('heartbeatEnabledFor', () => {
it('is enabled by default in production (no test hooks)', () => {
delete process.env.EPDS_TEST_HOOKS
expect(heartbeatEnabledFor(reqWith({}))).toBe(true)
})

it('ignores no_heartbeat=1 when EPDS_TEST_HOOKS is unset', () => {
delete process.env.EPDS_TEST_HOOKS
expect(heartbeatEnabledFor(reqWith({ no_heartbeat: '1' }))).toBe(true)
})

it('is enabled when EPDS_TEST_HOOKS=1 but no_heartbeat is unset', () => {
process.env.EPDS_TEST_HOOKS = '1'
expect(heartbeatEnabledFor(reqWith({}))).toBe(true)
})

it('is disabled when both EPDS_TEST_HOOKS=1 AND no_heartbeat=1', () => {
process.env.EPDS_TEST_HOOKS = '1'
expect(heartbeatEnabledFor(reqWith({ no_heartbeat: '1' }))).toBe(false)
})

it('does not match arbitrary truthy no_heartbeat values', () => {
// Tighten the toggle: only '1' disables. Anything else (incl. 'true')
// is treated as a no-op rather than a footgun.
process.env.EPDS_TEST_HOOKS = '1'
expect(heartbeatEnabledFor(reqWith({ no_heartbeat: 'true' }))).toBe(true)
})
})

describe('renderLoginPage heartbeat wiring', () => {
function render(heartbeatEnabled: boolean): string {

Check warning on line 59 in packages/auth-service/src/__tests__/heartbeat-toggle.test.ts

View check run for this annotation

SonarQubeCloud / SonarCloud Code Analysis

Move function 'render' to the outer scope.

See more on https://sonarcloud.io/project/issues?id=hypercerts-org_ePDS&issues=AZ3zSR2coa4SO2rW05OQ&open=AZ3zSR2coa4SO2rW05OQ&pullRequest=154
return renderLoginPage({
flowId: 'flow-1',
clientId: 'https://example.com/client-metadata.json',
clientName: 'Example',
branding: {},
customCss: null,
customFaviconUrl: null,
customFaviconUrlDark: null,
loginHint: '',
initialStep: 'email',
otpAlreadySent: false,
csrfToken: 'csrf',
authBasePath: '/api/auth',
pdsPublicUrl: 'https://pds.example.com',
otpLength: 6,
otpCharset: 'numeric',
heartbeatEnabled,
})
}

it('inlines /auth/ping when heartbeat is enabled', () => {
const html = render(true)
expect(html).toContain("'/auth/ping'")
expect(html).toContain('var heartbeatEnabled = true;')
})

it('emits a disabled flag when heartbeat is off', () => {
const html = render(false)
expect(html).toContain('var heartbeatEnabled = false;')
// The fetch literal is still in the bundle, gated at runtime by
// the flag — that's fine and matches how the IIFE composes.
// What matters: the flag is false.
})
})

describe('renderRecoveryOtpForm heartbeat wiring', () => {
function render(heartbeatEnabled: boolean): string {

Check warning on line 96 in packages/auth-service/src/__tests__/heartbeat-toggle.test.ts

View check run for this annotation

SonarQubeCloud / SonarCloud Code Analysis

Move function 'render' to the outer scope.

See more on https://sonarcloud.io/project/issues?id=hypercerts-org_ePDS&issues=AZ3zSR2coa4SO2rW05OR&open=AZ3zSR2coa4SO2rW05OR&pullRequest=154
return renderRecoveryOtpForm({
email: 'user@example.com',
csrfToken: 'csrf',
requestUri: 'urn:ietf:params:oauth:request_uri:req-abc',
otpLength: 6,
otpCharset: 'numeric',
heartbeatEnabled,
})
}

it('inlines /auth/ping when heartbeat is enabled', () => {
const html = render(true)
expect(html).toContain("'/auth/ping'")
// The if-guard at the top of the recovery script reads the flag
// and bails when false, so check the flag's compile-time value.
expect(html).toContain('if (!true) return;')
})

it('emits an early-return guard when heartbeat is off', () => {
const html = render(false)
expect(html).toContain('if (!false) return;')
})
})
Loading
Loading