Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
ba8f268
test(e2e): repro OAuth dead-ends when PAR dies before user completes …
aspiers May 4, 2026
64c50f7
docs(design): PAR expiry, heartbeats, and clean-exit strategy
aspiers May 4, 2026
b1fc940
feat(auth-service): keep PAR alive while user sits on OTP / recovery …
aspiers May 4, 2026
339aba3
test(e2e): cover @par-heartbeat liveness end-to-end
aspiers May 4, 2026
2e4d327
feat(auth-service,pds-core): redirect to OAuth client on sign-in fail…
aspiers May 4, 2026
5183b7b
docs(design): record heartbeat + clean-exit landed in the design doc
aspiers May 4, 2026
8bcb344
fix(demo): mark /client-metadata.json route dynamic so EPDS_CLIENT_TH…
aspiers May 4, 2026
9fb0e69
fix: address PR #154 review feedback (CodeRabbit + Copilot + SonarQub…
aspiers May 4, 2026
3b0ec50
fix(demo): default EPDS_CLIENT_THEME to amber so client-branding e2e …
aspiers May 4, 2026
b6f3ba9
test(e2e): rename "the user can try again" to "the OTP entry boxes ar…
aspiers May 4, 2026
fcab1c7
fix(auth-service): forward no_heartbeat through recovery Verify form too
aspiers May 4, 2026
f3766a9
test(auth-service,shared): bring new heartbeat / clean-exit code to 1…
aspiers May 4, 2026
241807a
fix: address SonarCloud quality-gate findings on PR #154
aspiers May 4, 2026
25f37aa
test(auth-service): unit-test buildEpdsCallbackUrl + ratchet down fun…
aspiers May 4, 2026
7650f14
fix: address third-round PR #154 review feedback
aspiers May 4, 2026
4bc113d
test(auth-service): replace `q.get(name)!` with a typed helper to cle…
aspiers May 4, 2026
26151cd
refactor(shared): extract resolveStartOverHref to fix Sonar dup-block…
aspiers May 4, 2026
3d31876
fix(auth-service): inline "Send a new code" action on expired-OTP error
aspiers May 4, 2026
03958c5
test(e2e): allow inline action button alongside the OTP error message
aspiers May 4, 2026
86cd60f
fix(auth-service): don't offer Resend when it cannot complete the sig…
aspiers May 4, 2026
160f20d
test(auth-service): extract heartbeat-router test harness + replace R…
aspiers May 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .changeset/clean-exit-on-expired-signin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
'ePDS': minor
---

Sign-in pages no longer strand users on a "session expired" dead end.

**Affects:** End users

**End users:** if your sign-in does time out (e.g. you closed the tab and came back later, or your wait was longer than the page-level keepalive could cover), you no longer land on a static "Session expired, please start over" page with no way forward. Instead you are redirected back to the app you were signing in to, which can show its own retry button. If something prevents that bounce-back (the app's metadata is unreachable, the originating client is unknown), the error page now offers a "Return to sign in" button instead of being text-only. Closes #151; substantially addresses #150 by replacing the dead-end at `/auth/complete` with a clean redirect.
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated
9 changes: 9 additions & 0 deletions .changeset/par-heartbeat-keeps-slow-signins-alive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
'ePDS': minor
---

Slow sign-ins are less likely to time out before you finish entering your code.

**Affects:** End users

**End users:** if you take a few minutes to find your sign-in code in your inbox before entering it, you will no longer be bounced to a "session expired" page when you submit it. Closing the tab or walking away for a long stretch can still expire the flow, in which case the existing error pages still apply — but reading email at human speed should not.
373 changes: 373 additions & 0 deletions docs/design/par-expiry-and-clean-exits.md

Large diffs are not rendered by default.

122 changes: 77 additions & 45 deletions e2e/step-definitions/auth.steps.ts
Original file line number Diff line number Diff line change
Expand Up @@ -566,25 +566,45 @@ async function callPdsExpiryHook(
}

/**
* Read the PAR request_uri from the current auth-service login page
* URL and stash it on the world for subsequent expiry hooks. The URL
* is `https://<auth>/oauth/authorize?request_uri=urn:...&...` while the
* user is on the login/OTP form. Throws if the URL doesn't carry it,
* which means the surrounding scenario is mis-ordered (the page must
* be on the auth-service side before this step runs).
* Read the PAR request_uri from the current auth-service page URL and
* stash it on the world for subsequent expiry hooks. While the user is
* on the login/OTP form the URL is
* `https://<auth>/oauth/authorize?request_uri=urn:...&...`, but on
* downstream pages (e.g. /auth/recover) the parameter has been dropped.
* Falls back to a previously-stashed `world.lastRequestUri` so a
* scenario can capture the URI early (via the dedicated capture step
* below) and consult it after navigation. Throws when neither source
* has a value, which means the scenario is mis-ordered.
*/
function captureRequestUriFromPage(world: EpdsWorld): string {
const page = getPage(world)
const requestUri = new URL(page.url()).searchParams.get('request_uri')
if (!requestUri) {
throw new Error(
`Expected request_uri in page URL but found none: ${page.url()}`,
)
const fromUrl = new URL(page.url()).searchParams.get('request_uri')
if (fromUrl) {
world.lastRequestUri = fromUrl
return fromUrl
}
world.lastRequestUri = requestUri
return requestUri
if (world.lastRequestUri) {
return world.lastRequestUri
}
throw new Error(
`Expected request_uri in page URL or previously captured but found none: ${page.url()}`,
)
}

/**
* Capture and stash the PAR request_uri from the current auth-service
* page URL so a later step can refer to it after navigating away. Used
* by scenarios where the OTP form is left for /auth/recover (recovery)
* or any other downstream page where the request_uri has dropped off
* the URL.
*/
When(
'the PAR request_uri is captured for later expiry',
function (this: EpdsWorld) {
captureRequestUriFromPage(this)
},
)

When(
'the PAR request_uri has expired before the bridge fires',
async function (this: EpdsWorld) {
Expand All @@ -595,43 +615,55 @@ When(
},
)

Then('the response body is not raw JSON', async function (this: EpdsWorld) {
const page = getPage(this)
// The OTP form's submit is JS-driven and async, and Playwright's
// fill() returns before the bridge redirects. Wait for the
// browser to actually leave the auth-service host and arrive at
// pds-core's epds-callback (where, in this scenario, the
// catch-block renders the error page). Without this wait we'd
// read the still-rendering OTP form's body.
const pdsHost = new URL(testEnv.pdsUrl).host
await page.waitForURL(
(url) => url.host === pdsHost && url.pathname.includes('epds-callback'),
{ timeout: 30_000 },
)
const body = await page.locator('body').innerText()
// The pre-fix behaviour returned a body that started with
// {"error": "Authentication failed"}. A graceful HTML page won't —
// its <h1>/<p> text isn't valid JSON. The regex catches any
// {"error": ...} shape so a future leak of a different JSON
// payload is still caught.
if (/^\s*\{\s*"error"/.test(body)) {
throw new Error(
`pds-core leaked raw JSON to the browser: ${body.slice(0, 200)}`,
)
}
})
// ---------------------------------------------------------------------------
// Clean-exit assertions for @otp-and-par-expiry scenarios
// ---------------------------------------------------------------------------
//
// When the upstream PAR is hard-dead (the test hook deletes the row),
// no amount of heartbeat can revive it — but we still owe the user a
// clean exit per RFC 6749 §4.1.2.1: redirect them back to the OAuth
// client's redirect_uri with `error=access_denied` so the client's
// own UI can handle retry. The demo client translates that to
// `?error=auth_failed` on its landing page.

Then(
'the browser lands back at the demo client with an auth error',
async function (this: EpdsWorld) {
const origin = new URL(testEnv.demoUrl).origin
const page = getPage(this)
await page.waitForURL(`${origin}/?error=auth_failed*`, {
timeout: 30_000,
})
},
)

// ---------------------------------------------------------------------------
// PAR heartbeat liveness (@par-heartbeat)
// ---------------------------------------------------------------------------
//
// The OTP form auto-fires a fetch to /auth/ping every 3 minutes. Waiting
// 3 minutes wall-clock is unacceptable for an e2e scenario, so this step
// invokes the same fetch synchronously from the page's own JS context
// — same origin, same cookies, same browser security boundary — and
// asserts the response. That proves the wiring (page can reach
// /auth/ping → auth-service forwards to pds-core's
// /_internal/ping-request → returns 200) without waiting for the
// interval to tick.

Then(
'the response body explains that sign-in timed out',
'a heartbeat fetched from the OTP form returns ok:true',
async function (this: EpdsWorld) {
const page = getPage(this)
const body = await page.locator('body').innerText()
// Don't pin exact wording — just require something that mentions
// the timeout / expiry so a human reading it understands why
// their sign-in failed.
if (!/expir|timed? ?out|too long/i.test(body)) {
const body = await page.evaluate(async () => {
const r = await fetch('/auth/ping', {
credentials: 'include',
cache: 'no-store',
})
return (await r.json()) as { ok: boolean; reason?: string }
})
if (!body.ok) {
throw new Error(
`Error page should mention the timeout but said: "${body.slice(0, 500)}"`,
`Expected /auth/ping to return ok:true but got: ${JSON.stringify(body)}`,
)
}
},
Expand Down
135 changes: 123 additions & 12 deletions features/passwordless-authentication.feature
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,116 @@ Feature: Passwordless authentication via email OTP
Then the browser is redirected back to the demo client
And the demo client has a valid OAuth access token

# --- Hard-dead PAR clean-exit scenarios (@otp-and-par-expiry) ---
#
# These four scenarios reproduce the dead-end pages a real user used
# to trip when the upstream PAR (request_uri) row died before the
# OAuth flow could complete. Upstream's PAR has a 5-min sliding
# inactivity timeout that we cannot bump (hardcoded in
# @atproto/oauth-provider) and that, once tripped, deletes the row
# in the same call that throws — so no in-flight recovery is
# possible.
#
# The fix is twofold (see docs/design/par-expiry-and-clean-exits.md):
#
# 1. While the user is on the OTP / recovery form, a 3-min
# heartbeat to /auth/ping slides the PAR's inactivity timer so
# the row stays alive. This is exercised by @par-heartbeat.
#
# 2. When the PAR is *genuinely* dead (closed tab, walked away,
# explicit upstream cleanup), every dead-end auth-service page
# must redirect the user back to the OAuth client with the
# RFC 6749 §4.1.2.1 error parameters so the client's own UI
# handles retry — never strand them on a static page.
#
# These scenarios use /_internal/test/delete-par to simulate (2) —
# the row is hard-deleted and cannot be revived. The contract they
# assert is therefore "land back at the demo client with an auth
# error" (the demo translates `error=access_denied` to
# `?error=auth_failed` on its landing page), NOT "the OAuth flow
# successfully completes". Layer 1 (heartbeat) is what prevents
# the dead PAR; layer 2 (clean-exit) is what makes recovery
# one-click when prevention fails.
@email @otp-and-par-expiry
Scenario: Expired OTP + expired PAR — clean exit back to the OAuth client
When the demo client initiates an OAuth login
Then the browser is redirected to the auth service login page
And the login page displays an email input form
When the user enters a unique test email and submits
Then an OTP email arrives in the mail trap for the test email
And the login page shows an OTP verification form
When more than 10 minutes pass before the user enters the OTP
And the PAR request_uri has expired before the bridge fires
And the user enters the OTP code
Then the verification form shows an "OTP expired" error
And the user can try again
When the user requests a new OTP via the resend button
Then a fresh OTP email arrives in the mail trap for the test email
When the user enters the OTP code
Then the browser lands back at the demo client with an auth error

@email @otp-and-par-expiry @multiple-resend
Scenario: Two Resend cycles after silent PAR death — clean exit back to the OAuth client
When the demo client initiates an OAuth login
Then the browser is redirected to the auth service login page
And the login page displays an email input form
When the user enters a unique test email and submits
Then an OTP email arrives in the mail trap for the test email
And the login page shows an OTP verification form
When the user requests a new OTP via the resend button
Then a fresh OTP email arrives in the mail trap for the test email
When the PAR request_uri has expired before the bridge fires
And the user requests a new OTP via the resend button
Then a fresh OTP email arrives in the mail trap for the test email
When the user enters the OTP code
Then the browser lands back at the demo client with an auth error

@email @otp-and-par-expiry @prompt-login
Scenario: prompt=login + expired PAR — clean exit back to the OAuth client
Given a returning user has a PDS account
When the demo client starts a new OAuth flow with prompt=login
And the user enters the test email on the login page
Then an OTP email arrives in the mail trap
And the login page shows an OTP verification form
When the PAR request_uri has expired before the bridge fires
And the user enters the OTP code
Then the browser lands back at the demo client with an auth error

@email @otp-and-par-expiry @recovery
Scenario: Recovery via backup email + expired PAR — clean exit back to the OAuth client
Given a returning user has a PDS account
And the test user has a verified backup email
And the demo client initiates OAuth with the test email as login_hint
# /auth/recover is a plain URL with no request_uri query parameter,
# so we must snapshot the request_uri while we're still on the
# original /oauth/authorize page (the login_hint step lands here).
And the PAR request_uri is captured for later expiry
When the user navigates to the recovery page
And the user enters the backup email on the recovery page
Then an OTP email arrives in the mail trap for the backup email
When the PAR request_uri has expired before the bridge fires
And the user enters the recovery OTP
Then the browser lands back at the demo client with an auth error

# --- PAR heartbeat (live PAR, no delete) ---
#
# Proves the in-page heartbeat reaches /auth/ping and that
# /auth/ping forwards to pds-core's /_internal/ping-request,
# i.e. the wiring works end-to-end through Caddy. This is
# heartbeat liveness, not heartbeat efficacy — the unit tests in
# heartbeat.test.ts cover routing logic. We don't wall-clock-wait
# 5 min for the timer to lapse; instead the browser fires the
# heartbeat fetch synchronously from the OTP form's page context
# and we assert the response shape.
@email @par-heartbeat
Scenario: OTP form's heartbeat reaches /auth/ping with ok:true
When the demo client initiates an OAuth login
Then the browser is redirected to the auth service login page
When the user enters a unique test email and submits
Then an OTP email arrives in the mail trap for the test email
And the login page shows an OTP verification form
Then a heartbeat fetched from the OTP form returns ok:true

# --- PAR (request_uri) expiry ---
#
# The PAR ("Pushed Authorization Request") record lives in pds-core's
Expand Down Expand Up @@ -316,27 +426,28 @@ Feature: Passwordless authentication via email OTP
# The test deletes the PAR row before the callback fires, mirroring
# the production failure where /oauth/epds-callback's first
# requestManager.get() (Step 2 in the handler) throws
# InvalidRequestError("Unknown request_uri"). With no live PAR row
# there is no redirect_uri to recover, so the user must land on a
# styled HTML page on the PDS host — not the previous raw JSON
# body — explaining that the sign-in timed out.
# InvalidRequestError("Unknown request_uri"). The signed callback
# URL now carries `client_id` (per the clean-exit work in this PR),
# so even though the PAR is dead the catch block can resolve the
# client's metadata and redirect the user back to the OAuth client
# with the standard RFC 6749 §4.1.2.1 error parameters. The demo
# client translates `error=access_denied` into `?error=auth_failed`
# on its landing page; that's the contract this scenario asserts.
#
# The redirect-back-to-client path (driven by the redirect_uri /
# state captured during a successful Step 2) is exercised by the
# other paths inside the same handler that throw later in the
# flow; we don't have an easy e2e harness to inject a mid-flow
# failure today.
# The static HTML fallback inside pds-core's catch block only fires
# in the worst case where neither the captured PAR data nor the
# signed `client_id` resolves to a usable redirect URI — much
# harder to reach now and not exercised here.
@email @par-callback-error
Scenario: Expired PAR shows a styled HTML page instead of leaking JSON
Scenario: Expired PAR redirects back to the OAuth client instead of stranding the user
Given a returning user has a PDS account
When the demo client initiates an OAuth login
And the user enters the test email on the login page
Then an OTP email arrives in the mail trap
And the login page shows an OTP verification form
When the PAR request_uri has expired before the bridge fires
And the user enters the OTP code
Then the response body is not raw JSON
And the response body explains that sign-in timed out
Then the browser lands back at the demo client with an auth error

# --- Brute force protection ---

Expand Down
Loading
Loading