From e0fe57bafe34e8e1612137eaf7d0f3c7e1b5b10e Mon Sep 17 00:00:00 2001 From: Tung Wu Date: Fri, 8 May 2026 17:35:41 +0800 Subject: [PATCH 1/2] Specify pii masking --- docs/specs/admin-api-access-token.md | 413 +++++++++++++++++++ docs/specs/glossary.md | 14 + docs/specs/pii-masking.md | 596 +++++++++++++++++++++++++++ 3 files changed, 1023 insertions(+) create mode 100644 docs/specs/admin-api-access-token.md create mode 100644 docs/specs/pii-masking.md diff --git a/docs/specs/admin-api-access-token.md b/docs/specs/admin-api-access-token.md new file mode 100644 index 00000000000..9340c25fb00 --- /dev/null +++ b/docs/specs/admin-api-access-token.md @@ -0,0 +1,413 @@ +# Admin API Access Token + +- [Overview](#overview) +- [Use Cases](#use-cases) + - [UC1: Customer support tool integration](#uc1-customer-support-tool-integration) + - [UC2: HR system user provisioning](#uc2-hr-system-user-provisioning) + - [UC3: Automated fraud response](#uc3-automated-fraud-response) + - [UC4: Compliance audit log archival](#uc4-compliance-audit-log-archival) + - [UC5: Data warehouse sync](#uc5-data-warehouse-sync) +- [Built-in Resource](#built-in-resource) +- [Scopes](#scopes) +- [Scope Permissions](#scope-permissions) +- [Obtaining a Token](#obtaining-a-token) +- [Token Validation](#token-validation) +- [Backward Compatibility](#backward-compatibility) +- [Examples](#examples) + +## Overview + +The Admin API supports two authentication methods: + +| Method | Behaviour | +| ------------------------------------------------------ | ----------------------------------------------------------------- | +| Keypair JWT (`typ: JWT`) signed by `admin-api.auth` | Full Admin API access. No scope enforcement. Legacy method. | +| OAuth access token issued via Client Credentials Grant | Scope-controlled. Only scopes granted to the client are enforced. | + +This document describes the OAuth-based scoped access token method. The legacy keypair method is unchanged and remains supported for backward compatibility. + +## Use Cases + +### UC1: Customer support tool integration + +A company integrates their support tool (e.g. Zendesk, Intercom) with Authgear so that support agents can look up accounts and perform account operations directly from the support interface — disabling compromised accounts, terminating active sessions, and removing authenticators when a user loses access to their device. + +**Setup:** + +1. In the portal, create an M2M client named `support-tool`. +2. Associate the client with the Admin API resource (`https://auth.myapp.com/_api/admin`). +3. Grant the following scopes to the client: `user:read user:write`. +4. Copy the `client_id` and `client_secret` into the support tool's backend configuration. + +**Token request (called by the support tool backend at startup or on expiry):** + +``` +POST /oauth2/token +Content-Type: application/x-www-form-urlencoded + +grant_type=client_credentials +client_id=support-tool +client_secret=THE_CLIENT_SECRET +resource=https://auth.myapp.com/_api/admin +scope=user:read user:write +``` + +The returned access token is then attached as a `Bearer` token on all subsequent Admin API calls made by the support tool backend. + +--- + +### UC2: HR system user provisioning + +A company uses an HR system (e.g. Workday, BambooHR) as the source of truth for employee accounts. A backend sync service creates Authgear accounts when employees join, updates profiles when details change, disables accounts when employees leave, and manages group membership according to department structure. + +**Setup:** + +1. In the portal, create an M2M client named `hr-sync`. +2. Associate the client with the Admin API resource (`https://auth.myapp.com/_api/admin`). +3. Grant the following scopes to the client: `user:read user:write group:read group:write`. +4. Copy the `client_id` and `client_secret` into the HR sync service configuration. + +**Token request:** + +``` +POST /oauth2/token +Content-Type: application/x-www-form-urlencoded + +grant_type=client_credentials +client_id=hr-sync +client_secret=THE_CLIENT_SECRET +resource=https://auth.myapp.com/_api/admin +scope=user:read user:write group:read group:write +``` + +--- + +### UC3: Automated fraud response + +A risk engine monitors user behaviour and flags suspicious accounts. When the engine records a fraud decision, an automation service reads the decision and immediately disables the flagged account. The service only acts on fraud signals and has no reason to access user profiles or audit logs. + +**Setup:** + +1. In the portal, create an M2M client named `fraud-responder`. +2. Associate the client with the Admin API resource (`https://auth.myapp.com/_api/admin`). +3. Grant the following scopes to the client: `fraud-protection:read user:write`. +4. Copy the `client_id` and `client_secret` into the fraud response service configuration. + +**Token request:** + +``` +POST /oauth2/token +Content-Type: application/x-www-form-urlencoded + +grant_type=client_credentials +client_id=fraud-responder +client_secret=THE_CLIENT_SECRET +resource=https://auth.myapp.com/_api/admin +scope=fraud-protection:read user:write +``` + +--- + +### UC4: Compliance audit log archival + +A company is required to retain audit logs for a fixed period under SOC2 or GDPR obligations. An automated pipeline reads audit log entries and archives them to cold storage (e.g. S3, GCS). The pipeline has no reason to access user data. + +**Setup:** + +1. In the portal, create an M2M client named `audit-archiver`. +2. Associate the client with the Admin API resource (`https://auth.myapp.com/_api/admin`). +3. Grant the following scope to the client: `audit-log:read`. +4. Copy the `client_id` and `client_secret` into the archival pipeline configuration. + +**Token request:** + +``` +POST /oauth2/token +Content-Type: application/x-www-form-urlencoded + +grant_type=client_credentials +client_id=audit-archiver +client_secret=THE_CLIENT_SECRET +resource=https://auth.myapp.com/_api/admin +scope=audit-log:read +``` + +--- + +### UC5: Role sync from enterprise directory + +An enterprise uses Active Directory or Okta as the source of truth for employee permissions. A sync service maps directory group membership to Authgear roles — when an employee is promoted or changes teams, their Authgear role assignments are updated automatically. The service needs to read users to resolve identities, and read and write roles to manage assignments. + +**Setup:** + +1. In the portal, create an M2M client named `role-sync`. +2. Associate the client with the Admin API resource (`https://auth.myapp.com/_api/admin`). +3. Grant the following scopes to the client: `user:read role:read role:write`. +4. Copy the `client_id` and `client_secret` into the role sync service configuration. + +**Token request:** + +``` +POST /oauth2/token +Content-Type: application/x-www-form-urlencoded + +grant_type=client_credentials +client_id=role-sync +client_secret=THE_CLIENT_SECRET +resource=https://auth.myapp.com/_api/admin +scope=user:read role:read role:write +``` + +--- + +## Built-in Resource + +The Admin API is represented as a built-in resource with the following URI: + +``` +https://{authgear_endpoint}/_api/admin +``` + +For example, if the Authgear endpoint is `https://auth.myapp.com`, the resource URI is `https://auth.myapp.com/_api/admin`. + +This resource is built-in and not stored in the `_auth_resource` table. Its URI is derived from the Authgear public endpoint at runtime. + +**Shadowing:** If a user-defined resource is created with a URI that matches the built-in resource URI, the built-in resource takes precedence. The user-defined resource is effectively ignored for that URI. + +**Client eligibility:** Only M2M (confidential) clients may be associated with the built-in Admin API resource and granted admin scopes. This is enforced when assigning scopes to a client in the portal or Admin API. + +**Visibility:** The built-in Admin API resource is returned by the `resources` query and its scopes are returned by scope queries. It can be assigned to or removed from M2M clients using `resource:write` mutations (`addResourceToClientID`, `removeResourceFromClientID`, `addScopesToClientID`, `removeScopesFromClientID`, `replaceScopesOfClientID`). However, the built-in resource and its scopes cannot be created, updated, or deleted — `createResource`, `updateResource`, `deleteResource`, `createScope`, `updateScope`, and `deleteScope` do not apply to them. + +## Scopes + +The following scopes are defined on the built-in Admin API resource. + +| Scope | Grants access to | +| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | +| `user:read` | Read users, identities, authenticators, sessions, authorizations | +| `user:write` | Mutate users, identities, authenticators, sessions, authorizations; image upload | +| `role:read` | Read roles | +| `role:write` | Mutate roles; add/remove role–user and role–group assignments | +| `group:read` | Read groups | +| `group:write` | Mutate groups; add/remove group–user and group–role assignments | +| `resource:read` | Read resources and scopes | +| `resource:write` | Mutate resources, scopes, and client assignments | +| `fraud-protection:read` | Read fraud protection overview and decision records | +| `audit-log:read` | Read audit logs | +| `user:import` | Bulk user import | +| `user:export` | Bulk user export | +| `pii:read` | Bypass `pii.masking` — all PII fields returned in cleartext; also grants PII-based search (see [pii-masking.md](./pii-masking.md)) | +| `pii:search` | Use PII fields as search/filter criteria without bypassing response masking (see [pii-masking.md](./pii-masking.md)) | + +## Scope Permissions + +This section lists the exact queries, mutations, and HTTP endpoints permitted by each scope. + +> **Note on mutation responses:** `user:write` without `user:read` permits calling write mutations but not Query operations. Mutation responses that include a `user` object (e.g. `revokeSession`, `deleteIdentity`) are still returned in full — this is a natural side effect of the mutation, not a query. + +### `user:read` + +**Queries:** + +- `users` +- `getUsersByStandardAttribute` +- `getUserByLoginID` +- `getUserByOAuth` +- `node` / `nodes` (for `User`, `Identity`, `Authenticator`, `Session`, `Authorization` node types) + +> **Note on PII-based lookups:** When `pii.masking` is enabled, `getUsersByStandardAttribute` +> (attributeName = `email` / `phone_number` / `preferred_username`), `getUserByLoginID`, and +> `users` with a PII-containing `searchKeyword` additionally require `pii:search` or `pii:read`. +> See [pii-masking.md](./pii-masking.md). + +### `user:write` + +**HTTP endpoints:** + +- `POST /_api/admin/images/upload` + +**Mutations:** + +- `createUser` +- `updateUser` +- `deleteUser` +- `resetPassword` +- `setPasswordExpired` +- `setMFAGracePeriod` +- `removeMFAGracePeriod` +- `sendResetPasswordMessage` +- `generateOOBOTPCode` +- `setVerifiedStatus` +- `setDisabledStatus` +- `setAccountValidFrom` +- `setAccountValidUntil` +- `setAccountValidPeriod` +- `scheduleAccountDeletion` +- `unscheduleAccountDeletion` +- `scheduleAccountAnonymization` +- `unscheduleAccountAnonymization` +- `anonymizeUser` +- `createIdentity` +- `updateIdentity` +- `deleteIdentity` +- `createAuthenticator` +- `deleteAuthenticator` +- `createSession` +- `revokeSession` +- `revokeAllSessions` +- `deleteAuthorization` + +### `role:read` + +**Queries:** + +- `roles` +- `node` / `nodes` (for `Role` node type) + +### `role:write` + +**Mutations:** + +- `createRole` +- `updateRole` +- `deleteRole` +- `addRoleToUsers` +- `removeRoleFromUsers` +- `addUserToRoles` +- `removeUserFromRoles` +- `addRoleToGroups` +- `removeRoleFromGroups` + +### `group:read` + +**Queries:** + +- `groups` +- `node` / `nodes` (for `Group` node type) + +### `group:write` + +**Mutations:** + +- `createGroup` +- `updateGroup` +- `deleteGroup` +- `addGroupToUsers` +- `removeGroupFromUsers` +- `addUserToGroups` +- `removeUserFromGroups` +- `addGroupToRoles` +- `removeGroupFromRoles` + +### `resource:read` + +**Queries:** + +- `resources` +- `node` / `nodes` (for `Resource`, `Scope` node types) + +> **Note:** The built-in Admin API resource and its scopes are included in these results. Its scopes can be assigned to clients via `resource:write` mutations, but the resource and its scopes cannot be created, updated, or deleted. See [Built-in Resource](#built-in-resource). + +### `resource:write` + +**Mutations:** + +- `createResource` +- `updateResource` +- `deleteResource` +- `createScope` +- `updateScope` +- `deleteScope` +- `addResourceToClientID` +- `removeResourceFromClientID` +- `addScopesToClientID` +- `removeScopesFromClientID` +- `replaceScopesOfClientID` + +### `fraud-protection:read` + +**Queries:** + +- `fraudProtectionOverview` +- `fraudProtectionLogs` +- `node` / `nodes` (for `FraudProtectionDecisionRecord` node type) + +### `audit-log:read` + +**Queries:** + +- `auditLogs` +- `node` / `nodes` (for `AuditLog` node type) + +> **Note on PII filters:** When `pii.masking` is enabled, using the `emailAddresses` or +> `phoneNumbers` filter arguments on `auditLogs` additionally requires `pii:search` or `pii:read`. +> See [pii-masking.md](./pii-masking.md). + +### `user:import` + +**HTTP endpoints:** + +- `POST /_api/admin/users/import` +- `GET /_api/admin/users/import/{id}` + +### `user:export` + +**HTTP endpoints:** + +- `POST /_api/admin/users/export` +- `GET /_api/admin/users/export/{id}` + +## Obtaining a Token + +Use the OAuth 2.0 Client Credentials Grant with the built-in Admin API resource URI. See [M2M — The request](./m2m.md#the-request) for the full flow description. + +``` +POST /oauth2/token HTTP/1.1 +Content-Type: application/x-www-form-urlencoded + +grant_type=client_credentials +client_id=my-backend +client_secret=THE_CLIENT_SECRET +resource=https://auth.myapp.com/_api/admin +scope=user:read pii:read +``` + +The returned access token is a JWT conforming to [RFC9068](https://datatracker.ietf.org/doc/html/rfc9068): + +```json +{ + "iss": "https://auth.myapp.com", + "sub": "client_id_my-backend", + "aud": ["https://auth.myapp.com/_api/admin"], + "client_id": "my-backend", + "scope": "user:read pii:read", + "exp": 1234567890 +} +``` + +## Token Validation + +The Admin API accepts a Bearer token in the `Authorization` header and determines the access method by inspecting the token: + +1. **Keypair JWT** — If the token is signed by the `admin-api.auth` keypair, full access is granted (legacy path, no scope enforcement). +2. **OAuth access token** — If the token is a standard JWT whose `aud` includes the built-in resource URI, scope-controlled access is granted based on the `scope` claim. + +## Backward Compatibility + +Keypair JWTs signed by `admin-api.auth` retain full Admin API access unchanged. Existing tokens and integrations are unaffected. + +## Examples + +Read-only integration (no PII): + +``` +POST /oauth2/token +grant_type=client_credentials&client_id=my-backend&client_secret=...&resource=https://auth.myapp.com/_api/admin&scope=user%3Aread +``` + +Data migration script (needs raw email addresses): + +``` +POST /oauth2/token +grant_type=client_credentials&client_id=my-backend&client_secret=...&resource=https://auth.myapp.com/_api/admin&scope=user%3Aread+pii%3Aread +``` diff --git a/docs/specs/glossary.md b/docs/specs/glossary.md index 72610e2e82a..501ce694412 100644 --- a/docs/specs/glossary.md +++ b/docs/specs/glossary.md @@ -10,6 +10,8 @@ * [Authentication Flow](#authentication-flow) + [Identification](#identification) + [Authentication](#authentication) + * [PII](#pii) + + [pii_type](#pii_type) # Glossary @@ -60,3 +62,15 @@ Read the [authentication flow API reference](./authentication-flow-api-reference Authentication is the means the user uses to prove they are the user identified with the identification method. For example, by using a password, OTP, or biometrics. Read the [authentication flow API reference](./authentication-flow-api-reference.md) for details. + +## PII + +Personally Identifiable Information. Any data that can be used to identify a specific individual. + +See [PII Masking](./pii-masking.md) for how Authgear handles PII in API responses. + +### pii_type + +A classification tag applied to a user profile attribute or audit log field that determines which masking format is applied when `pii.masking` is enabled. + +See [PII Types](./pii-masking.md#pii-types) for the full list of valid values and their masking formats. diff --git a/docs/specs/pii-masking.md b/docs/specs/pii-masking.md new file mode 100644 index 00000000000..f31440a011f --- /dev/null +++ b/docs/specs/pii-masking.md @@ -0,0 +1,596 @@ +# PII Masking + +- [Overview](#overview) +- [Use Cases](#use-cases) + - [UC1: Customer support staff should not see raw contact details](#uc1-customer-support-staff-should-not-see-raw-contact-details) + - [UC2: GDPR data minimisation — mask contact details not needed for support operations](#uc2-gdpr-data-minimisation--mask-contact-details-not-needed-for-support-operations) + - [UC3: National ID numbers collected for KYC should not be visible to support staff](#uc3-national-id-numbers-collected-for-kyc-should-not-be-visible-to-support-staff) + - [UC4: Audit log export pipeline must not contain raw PII](#uc4-audit-log-export-pipeline-must-not-contain-raw-pii) + - [UC5: A trusted integration needs raw data while other consumers receive masked data](#uc5-a-trusted-integration-needs-raw-data-while-other-consumers-receive-masked-data) + - [UC6: Support staff can look up a specific account by email but cannot browse accounts by email](#uc6-support-staff-can-look-up-a-specific-account-by-email-but-cannot-browse-accounts-by-email) +- [PII Types](#pii-types) + - [Masking Logic](#masking-logic) +- [Configuration](#configuration) + - [pii.masking](#piimasking) + - [pii_type on user profile attributes](#pii_type-on-user-profile-attributes) + - [Admin API Access Token Scopes](#admin-api-access-token-scopes) +- [Masking Behaviour](#masking-behaviour) + - [User Profile Data](#user-profile-data) + - [Search and Filter Requests](#search-and-filter-requests) + - [Audit Log Data](#audit-log-data) + - [Audit Log PII Fields](#audit-log-pii-fields) +- [Access Control Interaction](#access-control-interaction) +- [Portal Collaborator Roles](#portal-collaborator-roles) +- [Future Work](#future-work) + - [Configurable default scopes for the Support role](#configurable-default-scopes-for-the-support-role) + - [Per-field search permission (not supported)](#per-field-search-permission-not-supported) + - [PII Retention config](#pii-retention-config) + +## Overview + +PII masking allows project owners to enforce server-side masking of personally identifiable information (PII) in API responses. When masking is enabled for an API, affected fields are replaced with a redacted representation instead of their raw value. + +This is distinct from [access control](../specs/user-profile/design.md), which governs **visibility** (whether a field is returned at all). Masking operates on fields that are already visible to the caller — it changes **how** the value is presented, not **whether** it is returned. + +## Use Cases + +### UC1: Customer support staff should not see raw contact details + +A company operates a customer support team that uses the Admin Portal to look up users and manage their accounts. Support staff do not need to see raw email addresses or phone numbers — they communicate with customers through a separate ticketing system. Support staff access the Admin API using scoped access tokens without the `pii:read` scope, so the server masks PII using the built-in defaults. Support staff see `joh****@example.com` instead of the real email, reducing unnecessary PII exposure and limiting liability in case of a support account breach. + +No config change is needed — the default `pii_types` list covers all PII types. + +--- + +### UC2: GDPR data minimisation — mask contact details not needed for support operations + +Under GDPR Article 5(1)(c), personal data must be limited to what is necessary for the purpose it is processed. A company's support team communicates with users exclusively through a ticketing system (e.g. Zendesk), which already owns the email and phone channel. Support staff therefore have no legitimate purpose for seeing raw contact details in the Admin Portal — they only need the user's name to identify the case. The project owner masks email and phone while leaving names visible, satisfying the data minimisation principle without impeding support operations. + +**Config:** + +```yaml +pii: + masking: + admin_api: + pii_types: + - email + - phone_number +``` + +Scoped access tokens issued to support staff do not include `pii:read`. + +--- + +### UC3: National ID numbers collected for KYC should not be visible to support staff + +A fintech company collects national ID numbers during KYC verification and stores them as a custom attribute. Support staff can look up accounts but should not have access to raw ID numbers — only compliance officers should. The project owner marks the attribute as `identifier` so it is masked alongside other PII when masking is enabled. + +**Config:** + +```yaml +user_profile: + custom_attributes: + attributes: + - id: "0001" + pointer: /x_national_id + type: string + pii_type: identifier +``` + +Scoped access tokens issued to support staff do not include `pii:read`. + +--- + +### UC4: Audit log export pipeline must not contain raw PII + +A company exports audit logs to a third-party SIEM (e.g. Datadog, Splunk) for security monitoring. Their data processing agreement with the SIEM vendor prohibits sending raw personal data. The SIEM integration uses a scoped access token without `pii:read`, so all audit log entries — including email recipients in `email.sent` events, phone numbers in `sms.sent` events, and IP addresses in event context — are masked before being exported. No config change is required. + +### UC5: A trusted integration needs raw data while other consumers receive masked data + +A company has both a customer support team (who should see masked data) and a backend data migration script (which needs raw email addresses to send migration notifications). Scoped access tokens issued to support staff do not include `pii:read` and receive masked PII. The migration script obtains a scoped access token with `pii:read` via Client Credentials Grant, which causes the server to bypass masking for that token only. + +No config change is required. + +**Migration script token request:** + +``` +POST /oauth2/token +grant_type=client_credentials&client_id=migration-script&client_secret=...&resource=https://auth.myapp.com/_api/admin&scope=user%3Aread+pii%3Aread +``` + +Scoped access tokens issued to support staff do not include `pii:read` and receive masked PII as normal. + +### UC6: Support staff can look up a specific account by email but cannot browse accounts by email + +A company's support team handles inbound requests from customers who identify themselves by email. When a customer says "my email is `johndoe@example.com`", the support agent needs to look up that specific account. However, the company does not want support staff to be able to filter the full user list by email or name — doing so would let them enumerate accounts and build a list of customer contacts. + +Support staff are issued scoped access tokens with `pii:search` but without `pii:read`. Support staff can submit a known email as a search criterion and find the matching account, but the email is still shown masked (`joh****@example.com`) in the response. They cannot browse accounts by typing a partial name or email to discover who is registered. + +**Support staff token request:** + +``` +POST /oauth2/token +grant_type=client_credentials&client_id=support-portal&client_secret=...&resource=https://auth.myapp.com/_api/admin&scope=user%3Aread+user%3Awrite+pii%3Asearch +``` + +No config change is needed — the default `pii_types` list applies. Search by email is permitted (via `pii:search`), but the returned email is still masked. + +--- + +## PII Types + +Every PII field is classified by a **pii_type**. The masking format for each type is fixed by the server and is not configurable. + +| pii_type | Example raw value | Masked representation | +| --------------- | ---------------------------------------- | ------------------------------ | +| `email` | `johndoe@example.com` | `joh****@example.com` | +| `phone_number` | `+85223456789` | `+8522345****` | +| `name` | `John Doe` | `Jo** Do*` | +| `username` | `johndoe` | `joh****` | +| `identifier` | `A1234567` | `A123****` | +| `ip_address` | `192.168.1.100` | `192.168.*.*` | +| `date_of_birth` | `1990-01-15` | `****-01-15` | +| `address` | `{"street_address": "123 Main St", ...}` | `{"street_address": "*", ...}` | + +The `pii_type` classifies the semantic meaning of a field. It is used to look up the masking format and to determine whether the field should be masked for a given API. + +### Masking Logic + +**`email`** + +Split on `@`. For the local part, preserve the first half of the characters (floor division) and replace the rest with `*`. The domain is kept as-is. + +- `user@example.com` → `us**@example.com` (local: 4 chars → preserve 2) +- `johndoe@example.com` → `joh****@example.com` (local: 7 chars → preserve 3) + +**`phone_number`** + +Parse using the phonenumbers library. Preserve the full country calling code. For the national significant number, preserve the first half of the digits (floor division) and replace the rest with `*`. + +- `+85223456789` → `+8522345****` (national: 8 digits → preserve 4) + +**`name`** + +Split on whitespace. For each word, preserve the first half of the characters (floor division) and replace the rest with `*`. Rejoin with the original whitespace. + +- `John Doe` → `Jo** Do*` (John: 4 → preserve 2; Doe: 3 → preserve 1) +- `Mary Jane Watson` → `Ma** Ja** Wa****` + +**`username`** + +Preserve the first half of the characters (floor division) and replace the rest with `*`. If the value is 1 character, replace entirely with `*`. + +- `johndoe` → `joh****` (7 chars → preserve 3) +- `user` → `us**` (4 chars → preserve 2) +- `a` → `*` + +**`identifier`** + +Preserve the first half of the characters (floor division) and replace the rest with `*`. If the value is 1 character, replace entirely with `*`. + +- `A1234567` → `A123****` (8 chars → preserve 4) +- `X1` → `X*` (2 chars → preserve 1) + +**`ip_address`** + +Parse the address to determine its version. + +For **IPv4**, preserve the first two octets and replace the remaining two with `*`: + +- `192.168.1.100` → `192.168.*.*` + +For **IPv6**, preserve the first two groups and replace the remaining six with `*`: + +- `2001:db8:85a3:0:0:8a2e:370:7334` → `2001:db8:*:*:*:*:*:*` + +If the value cannot be parsed as a valid IP address, replace entirely with `*`. + +**`date_of_birth`** + +Expected format is `YYYY-MM-DD`. Replace the year component with `****`, preserving the month and day. + +- `1990-01-15` → `****-01-15` + +If the value does not match the expected format, replace entirely with `*`. + +**`address`** + +For a plain string value, replace with `*`. + +For a structured object (e.g. the OIDC `address` claim), preserve the object structure and replace each string field value with `*`: + +- Input: `{"street_address": "123 Main St", "locality": "New York", "region": "NY", "postal_code": "10001", "country": "US"}` +- Output: `{"street_address": "*", "locality": "*", "region": "*", "postal_code": "*", "country": "*"}` + +## Configuration + +### pii.masking + +A new top-level `pii` section is added to `authgear.yaml`. + +`pii.masking.admin_api.pii_types` is a list of `pii_type` values to mask for Admin API requests. Masking applies to scoped access tokens that do not carry the `pii:read` scope. Legacy keypair tokens (`typ: JWT`) always receive cleartext and are not affected by this config. + +The default `pii_types` list covers all PII types: + +```yaml +pii: + masking: + admin_api: + pii_types: + - email + - phone_number + - name + - username + - identifier + - date_of_birth + - address + - ip_address +``` + +The `pii_types` list can be customised to mask only a subset: + +```yaml +pii: + masking: + admin_api: + pii_types: + - email + - phone_number +``` + +### pii_type on user profile attributes + +A new optional `pii_type` field is added to both standard attribute and custom attribute config entries. + +**Default `pii_type` for standard attributes** + +Standard attributes have built-in default `pii_type` values. Project owners do not need to configure them unless they want to override or clear the default. + +| Pointer | Default `pii_type` | +| --------------------- | ------------------ | +| `/email` | `email` | +| `/phone_number` | `phone_number` | +| `/preferred_username` | `username` | +| `/name` | `name` | +| `/given_name` | `name` | +| `/family_name` | `name` | +| `/middle_name` | `name` | +| `/nickname` | `name` | +| `/birthdate` | `date_of_birth` | +| `/address` | `address` | +| `/picture` | (none) | +| `/website` | (none) | +| `/profile` | (none) | +| `/gender` | (none) | +| `/zoneinfo` | (none) | +| `/locale` | (none) | + +**Standard attributes example:** + +```yaml +user_profile: + standard_attributes: + access_control: + - pointer: /email + pii_type: email + access_control: + end_user: readwrite + bearer: readonly + portal_ui: readwrite + - pointer: /phone_number + pii_type: phone_number + access_control: + end_user: readonly + bearer: hidden + portal_ui: readwrite + - pointer: /given_name + pii_type: name + access_control: + end_user: readwrite + bearer: readonly + portal_ui: readwrite + - pointer: /family_name + pii_type: name + access_control: + end_user: readwrite + bearer: readonly + portal_ui: readwrite +``` + +**Custom attributes example:** + +```yaml +user_profile: + custom_attributes: + attributes: + - id: "0001" + pointer: /x_national_id + type: string + pii_type: identifier + access_control: + end_user: readwrite + portal_ui: readwrite +``` + +`pii_type` is optional on standard attributes (the built-in default applies when absent) and on custom attributes (treated as non-PII when absent). + +Valid values for `pii_type` on user profile attributes are: `email`, `phone_number`, `name`, `username`, `identifier`, `ip_address`, `date_of_birth`, `address`. + +### Admin API Access Token Scopes + +Admin API access tokens support scope-based permissions (see [admin-api-access-token.md](./admin-api-access-token.md)). The following scopes are relevant to PII masking: + +| Scope | Effect | +| ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `pii:read` | Bypass `pii.masking` for this token — all PII fields returned in cleartext; also grants the ability to use PII fields as search/filter criteria | +| `pii:search` | Allow using PII fields as search/filter criteria without bypassing response masking — responses still show masked values for tokens without `pii:read` | + +`pii:read` is a superset of `pii:search`: a token with `pii:read` implicitly has search permission. + +Scoped access tokens with neither scope are subject to both response masking and search restrictions described in [Masking Behaviour](#masking-behaviour). Legacy keypair tokens (`typ: JWT`) always receive cleartext and are not subject to `pii.masking`. + +## Masking Behaviour + +### User Profile Data + +When the server serializes user profile data for a given API, it applies masking as follows: + +1. Determine the calling API (e.g. `admin_api`). +2. If the request token is a legacy keypair token (`typ: JWT`) → return all values as-is. +3. If the request token is a scoped access token and `scope` contains `pii:read` → return all values as-is (masking bypassed for this token). +4. Look up `pii.masking.admin_api.pii_types` to get the list of masked pii_types. +5. For each attribute in the response: + a. If the attribute has a `pii_type` and that type is in the masked list → replace the value with its masked representation. + b. Otherwise → return the raw value. + +Masking is applied **after** access control. If a field is `hidden` for the calling API, it is not returned at all; masking does not apply to hidden fields. + +### Search and Filter Requests + +When the server processes a search or filter request (e.g. listing users by email, looking up a user by login ID, filtering audit logs by recipient), it enforces search restrictions as follows: + +1. If the request token is a legacy keypair token (`typ: JWT`) → all search criteria are allowed. +2. If the request token is a scoped access token and has `pii:read` or `pii:search` in its `scope` → all search criteria are allowed. +3. Otherwise, for each search criterion: + a. Determine the pii_type of the field being searched. + b. If the field's pii_type is in `pii.masking.admin_api.pii_types` → reject the request with an error. + c. Otherwise → allow the criterion. + +The `pii.masking.admin_api.pii_types` list therefore controls both what is masked in responses **and** what can be used as search input for tokens with neither `pii:read` nor `pii:search`. Fields excluded from `pii_types` are always searchable regardless of scope. + +**Example:** A project owner who wants Support to search by username but not by email or name configures: + +```yaml +pii: + masking: + admin_api: + pii_types: + - email + - phone_number + - name + # username excluded — Support sees it cleartext and can search by it +``` + +Scoped access tokens without `pii:read` or `pii:search` can look up users by username but receive an error if they attempt to filter by email, phone, or name. + +A project owner who wants Support to be able to look up a specific user by an email the customer provides (but still see masked values in responses) would grant `pii:search` without `pii:read`. + +### Audit Log Data + +Audit log data is accessed through the Admin API. The same masking rules as user profile data apply. + +Audit log entries include two categories of PII: + +**Category A — User profile attributes:** Fields drawn from `standard_attributes` or `custom_attributes` (e.g. `user.standard_attributes.email` in a `user.created` event). These are masked using the same rules as user profile data: the `pii_type` declared on the attribute config is matched against `pii.masking.admin_api.pii_types`. + +**Category B — Non-profile PII:** Fields that are not user profile attributes but contain PII by nature (e.g. the recipient of a sent email). These fields have no corresponding entry in `user_profile.access_control`. Instead, the server determines their `pii_type` (see table below). The same masking rules apply: if the field's `pii_type` is in `pii.masking.admin_api.pii_types`, the field is masked. + +### Audit Log PII Fields + +The following lists all audit log events that contain PII fields, based on the events documented in [event.md](./event.md). + +#### Event context fields + +Every audit log event includes a `context` object. The following field in `context` contains PII: + +| Field | pii_type | +| -------------------- | ------------ | +| `context.ip_address` | `ip_address` | + +This applies to all audit log events. + +#### Events containing `payload.user` + +The `user` object includes `standard_attributes`. Fields in `standard_attributes` are masked based on the `pii_type` declared in `user_profile` config. + +| Event | +| ---------------------------------------------------------------------------- | +| `user.created` | +| `user.profile.updated` | +| `user.authenticated` | +| `user.reauthenticated` | +| `user.signed_out` | +| `user.session.terminated` | +| `user.anonymous.promoted` (both `payload.user` and `payload.anonymous_user`) | +| `user.disabled` | +| `user.reenabled` | +| `user.deletion_scheduled` | +| `user.deletion_unscheduled` | +| `user.deleted` | +| `user.anonymization_scheduled` | +| `user.anonymization_unscheduled` | +| `user.anonymized` | +| `authentication.identity.anonymous.failed` | +| `authentication.identity.biometric.failed` | +| `authentication.primary.password.failed` | +| `authentication.primary.oob_otp_email.failed` | +| `authentication.primary.oob_otp_sms.failed` | +| `authentication.secondary.password.failed` | +| `authentication.secondary.totp.failed` | +| `authentication.secondary.oob_otp_email.failed` | +| `authentication.secondary.oob_otp_sms.failed` | +| `authentication.secondary.recovery_code.failed` | +| `authentication.blocked` | +| `identity.email.added` | +| `identity.email.removed` | +| `identity.email.updated` | +| `identity.phone.added` | +| `identity.phone.removed` | +| `identity.phone.updated` | +| `identity.username.added` | +| `identity.username.removed` | +| `identity.username.updated` | +| `identity.oauth.connected` | +| `identity.oauth.disconnected` | +| `identity.biometric.enabled` | +| `identity.biometric.disabled` | +| `identity.email.verified` | +| `identity.phone.verified` | +| `identity.email.unverified` | +| `identity.phone.unverified` | + +#### Events containing `payload.identity` + +The following events include an `identity` object (or `old_identity` / `new_identity`). The `claims` field within the identity object contains PII with a server-determined `pii_type`: + +| Event | Field | pii_type | +| ----------------------------- | -------------------------------------------------------------------------------------------------- | -------------- | +| `identity.email.added` | `payload.identity.claims.email` | `email` | +| `identity.email.removed` | `payload.identity.claims.email` | `email` | +| `identity.email.updated` | `payload.old_identity.claims.email`, `payload.new_identity.claims.email` | `email` | +| `identity.phone.added` | `payload.identity.claims.phone_number` | `phone_number` | +| `identity.phone.removed` | `payload.identity.claims.phone_number` | `phone_number` | +| `identity.phone.updated` | `payload.old_identity.claims.phone_number`, `payload.new_identity.claims.phone_number` | `phone_number` | +| `identity.username.added` | `payload.identity.claims.preferred_username` | `username` | +| `identity.username.removed` | `payload.identity.claims.preferred_username` | `username` | +| `identity.username.updated` | `payload.old_identity.claims.preferred_username`, `payload.new_identity.claims.preferred_username` | `username` | +| `identity.oauth.connected` | `payload.identity.claims.email` | `email` | +| `identity.oauth.disconnected` | `payload.identity.claims.email` | `email` | +| `identity.oauth.connected` | `payload.identity.claims.name` | `name` | +| `identity.oauth.disconnected` | `payload.identity.claims.name` | `name` | +| `identity.oauth.connected` | `payload.identity.claims.given_name` | `name` | +| `identity.oauth.disconnected` | `payload.identity.claims.given_name` | `name` | +| `identity.oauth.connected` | `payload.identity.claims.family_name` | `name` | +| `identity.oauth.disconnected` | `payload.identity.claims.family_name` | `name` | +| `identity.email.verified` | `payload.identity.claims.email` | `email` | +| `identity.email.unverified` | `payload.identity.claims.email` | `email` | +| `identity.phone.verified` | `payload.identity.claims.phone_number` | `phone_number` | +| `identity.phone.unverified` | `payload.identity.claims.phone_number` | `phone_number` | + +Note: OAuth identity claims depend on the provider. Only the commonly returned claims listed above are masked. Additional PII claims from specific providers are not masked unless added here in a future update. + +#### Events with scalar PII fields + +| Event | Field | pii_type | +| ----------------------------------------- | ---------------------------------------- | --------------------------------------------------------------------------------------------------------- | +| `email.sent` | `payload.recipient` | `email` | +| `email.error` | `payload.recipient` | `email` | +| `email.suppressed` | `payload.recipient` | `email` | +| `sms.sent` | `payload.recipient` | `phone_number` | +| `sms.error` | `payload.recipient` | `phone_number` | +| `sms.suppressed` | `payload.recipient` | `phone_number` | +| `whatsapp.sent` | `payload.recipient` | `phone_number` | +| `whatsapp.error` | `payload.recipient` | `phone_number` | +| `whatsapp.suppressed` | `payload.recipient` | `phone_number` | +| `authentication.identity.login_id.failed` | `payload.login_id` | determined at mask time: contains `@` → `email`; starts with `+` → `phone_number`; otherwise → `username` | +| `project.collaborator.invitation.created` | `payload.invitee_email` | `email` | +| `project.collaborator.invitation.deleted` | `payload.invitee_email` | `email` | +| `fraud_protection.decision_recorded` | `payload.record.action_detail.recipient` | `phone_number` (only present when `payload.record.action` is `send_sms`) | + +## Access Control Interaction + +Masking and access control are independent: + +- **Access control** (the `access_control` field on each attribute) determines whether a field is visible at all (`hidden`, `readonly`, `readwrite`). +- **PII masking** determines whether a visible field is returned in cleartext or as a redacted value. + +A field that is `hidden` for the calling API is never returned, regardless of `pii.masking`. A field that is `readonly` or `readwrite` may be masked if its `pii_type` matches the API's masking list. + +The Admin API bypasses `access_control` (it uses `RoleGreatest` internally and always returns all attributes). PII masking is therefore the primary mechanism for restricting what raw PII the Admin API exposes. + +## Portal Collaborator Roles + +A new **Support** collaborator role is introduced alongside the existing **Owner** and **Editor** roles. + +| Role | PII visible | PII search | User export | View Admin API key secret | +| ------- | ------------------------------------------------- | ---------- | ----------- | ------------------------- | +| Owner | Yes | Yes | Yes | Yes | +| Editor | Yes | Yes | Yes | Yes | +| Support | No — masked per `pii.masking.admin_api.pii_types` | No | No | No | + +The portal issues scoped access tokens for Support collaborators using the reserved `client_id=portal`. This client ID is reserved for portal use and is not visible or configurable by project owners or collaborators. + +Support collaborators are issued scoped access tokens with the following scopes: + +- `user:read` +- `user:write` +- `role:read` +- `role:write` +- `group:read` +- `group:write` +- `resource:read` +- `resource:write` +- `fraud-protection:read` +- `audit-log:read` +- `user:import` + +`pii:read`, `pii:search`, and `user:export` are intentionally excluded. Without `pii:search`, Support cannot use PII fields (email, phone, name) as search or filter criteria. They can still look up users by non-PII fields such as `username` if `username` is excluded from `pii.masking.admin_api.pii_types`. + +**Admin API key secret:** The portal API must not allow Support collaborators to view or retrieve the `admin-api.auth` secret. A Support collaborator who can obtain it could mint a full-access legacy token, bypassing all PII restrictions of their role. + +## Future Work + +### Configurable default scopes for the Support role + +Currently the scopes granted to Support collaborators are fixed. In future, project owners may need to tailor these defaults — for example, granting `pii:search` to Support by default, or restricting certain write scopes for more sensitive projects. A per-project configuration for the default scope set issued to Support collaborators could be introduced without breaking existing behaviour (tokens already issued to Support collaborators would continue to use the fixed defaults until the project owner opts in). + +```yaml +# illustrative only — not yet implemented +portal: + collaborators: + roles: + - type: support + scopes: + - user:read + - user:write + - role:read + - role:write + - group:read + - group:write + - resource:read + - resource:write + - fraud-protection:read + - audit-log:read + - user:import + - pii:search # opted in — support can look up users by email +``` + +### Per-field search permission (not supported) + +Currently `pii:search` is an all-or-nothing token-level scope: a token with `pii:search` may use **any** PII field as a search or filter criterion. There is no way to grant search permission for one PII field while blocking it for another. + +This means the following use case is **not supported**: a project owner whose support team handles inbound tickets identified by email (and therefore needs to search by email) but does not want support staff to search by phone number — while keeping both fields masked in responses. With `pii:search`, both email and phone become searchable simultaneously. + +A future `pii:search` extension could allow per-field search grants, for example: + +```yaml +# illustrative only — not yet implemented +scope: "user:read pii:search:email" +``` + +Until then, project owners with this requirement can work around it by removing the desired search field from `pii.masking.admin_api.pii_types` (which permits searching but also unmasks the field in responses), or by accepting that `pii:search` grants broader search access than strictly needed. + +### PII Retention config + +The `pii` root key is designed to be extensible. Future concerns such as data retention can be added as sibling keys alongside `pii.masking` without restructuring the config: + +```yaml +pii: + masking: + admin_api: + pii_types: + - email + - phone_number + retention: # future, illustrative only + # ... +``` From e743e7119fd1856af0c09e3913e72dad5a71c5a5 Mon Sep 17 00:00:00 2001 From: Tung Wu Date: Fri, 8 May 2026 18:02:53 +0800 Subject: [PATCH 2/2] Refine api-design skill --- .claude/skills/api-design/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.claude/skills/api-design/SKILL.md b/.claude/skills/api-design/SKILL.md index 164fd81353c..37341c567c4 100644 --- a/.claude/skills/api-design/SKILL.md +++ b/.claude/skills/api-design/SKILL.md @@ -78,7 +78,7 @@ Then self-review the proposed design against the full checklist before presentin ### B. Config Conventions (`authgear.yaml`) -5. Uses list over map when ordering doesn't matter and the key is an attribute of the item (e.g., `[{type: "phone"}]` not `{phone: {}}`) +5. Uses list over map when ordering doesn't matter and the key is a plain identifier with no semantic meaning beyond naming the item (e.g., `[{type: "phone"}]` not `{phone: {}}`). A map (object keys) is appropriate when: (a) each key represents a distinct named entity whose config shape may differ from other keys (e.g., per-API config where `admin_api` and a future `bearer_api` could have different fields), or (b) uniqueness of each key must be enforced by schema. Do not flag map-style config as a violation when the keys are a bounded, well-known set and different keys legitimately need different shapes. 6. Flags are minimal — avoid boolean flags that will always be true in practice; prefer a feature being enabled by its presence 7. New config structs follow the Go struct + JSON Schema pattern (see `pkg/lib/config/bot_protection.go` or `pkg/lib/config/fraud_protection.go`) 8. Backward compatible: new fields have `omitempty` and sensible zero-value defaults; no existing fields removed or renamed