feat: add api keys support for ingestion/query authentication#1620
feat: add api keys support for ingestion/query authentication#1620nikhilsinhaparseable merged 7 commits intoparseablehq:mainfrom
Conversation
WalkthroughAdds a tenant-partitioned in-memory API key store with metastore persistence and object-store layout, integrates X-API-KEY handling into HTTP middleware (including Kinesis header extraction and short-lived query sessions), extends Metastore trait/impl for apikey CRUD, exposes Changes
Sequence DiagramsequenceDiagram
participant Client
participant Middleware as AuthMiddleware
participant ApiKeyStore as API_KEYS
participant Metastore
participant Storage
Client->>Middleware: HTTP request (may include X-API-KEY / Kinesis header)
Middleware->>Middleware: extract_kinesis_headers (if ingest)
Middleware->>Middleware: read tenant header & x-api-key
alt x-api-key present and action maps to KeyType
Middleware->>ApiKeyStore: validate_key(api_key, tenant, required_type)
ApiKeyStore->>ApiKeyStore: lookup in tenant-partitioned RwLock
alt Key valid
ApiKeyStore-->>Middleware: valid
Middleware->>Middleware: create short-lived session if KeyType::Query
Middleware-->>Client: forward request to service
else Key invalid
ApiKeyStore-->>Middleware: invalid
Middleware-->>Client: 401/403 response
end
else no x-api-key
Middleware->>Middleware: normal session extraction & RBAC
end
Note over ApiKeyStore,Metastore: bootstrap/sync flow
ApiKeyStore->>Metastore: get_api_keys()
Metastore->>Storage: list/read SETTINGS_ROOT/apikeys/*.json per-tenant
Storage-->>Metastore: ApiKey JSON blobs
Metastore-->>ApiKeyStore: HashMap<tenant, Vec<ApiKey>>
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/apikeys.rs`:
- Around line 42-53: The ApiKey struct currently persists plaintext secrets
(field api_key) and validate_key() compares plaintext; change storage to persist
only a keyed hash/digest (e.g., api_key_hash) and remove plaintext from
serialization (make plaintext returned only at creation), update ApiKey struct
(replace or annotate api_key -> api_key_hash and adjust serde to not expose
secret), update the code paths that create and store keys to compute and save
the derived hash while returning the plaintext once, and modify validate_key()
to compute the same digest for the presented key and compare against
api_key_hash (constant-time compare); apply the same changes to the other
locations that persist or compare keys (the creation/storage logic and any
validation/auth functions referenced in the diff).
- Around line 121-129: When loading keys in get_api_keys(), don't overwrite
default-tenant keys with Some(DEFAULT_TENANT); preserve tenant == None so
apikey_json_path() continues to generate the legacy path
(".settings/apikeys/..."). Locate the loop that iterates (tenant_id, keys) and,
instead of unconditionally setting k.tenant = Some(tenant_id.clone()), set
k.tenant = None when tenant_id == DEFAULT_TENANT (otherwise set
Some(tenant_id.clone())). This keeps delete/update paths consistent with
apikey_json_path() for single-tenant/default entries.
- Around line 33-35: API_KEYS is never hydrated at startup; call
API_KEYS.load().await as part of the concurrent initialization sequence that
currently runs load_on_init() for other global stores so persisted keys are
available after restart. Edit the async startup loader where the other globals
are awaited (the block that triggers load_on_init() for CORRELATIONS, FILTERS,
DASHBOARDS, ALERTS, TARGETS) and add API_KEYS.load().await to that set of
concurrent tasks (referencing the static API_KEYS and its method load() on
ApiKeyStore) so the cache is populated from persistent storage during server
startup.
In `@src/handlers/http/middleware.rs`:
- Around line 194-217: The API-key branch currently validates the key with
API_KEYS.validate_key(...) but then returns the service future without running
the normal authorization flow, bypassing self.auth_method (which ultimately
calls Users.authorize(...)); after a successful validate_key(), invoke
self.auth_method (or the same authorization helper used for other branches) for
the request/resource (Action::Ingest) and only proceed to await fut if that
authorization succeeds, otherwise return the appropriate error (e.g.,
unauthorized/forbidden); keep the existing header_error and suspension checks
before validation and ensure any auth call uses the same request/tenant context
as the other paths.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e82a3074-a0eb-4c16-8ce5-111e492a771d
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
Cargo.tomlsrc/apikeys.rssrc/handlers/http/middleware.rssrc/lib.rssrc/metastore/metastore_traits.rssrc/metastore/metastores/object_store_metastore.rssrc/storage/mod.rssrc/storage/object_storage.rs
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/apikeys.rs (1)
234-249: Consider constant-time comparison for API key validation.The current comparison uses
==(lines 239, 245), which is susceptible to timing attacks. While network jitter typically masks timing differences and the high entropy of UUIDv4 keys makes exploitation difficult in practice, using constant-time comparison is the security best practice for credential validation.♻️ Proposed fix using constant-time comparison
Add
subtleto dependencies:subtle = "2"Then update the comparison:
+use subtle::ConstantTimeEq; + pub async fn validate_key(&self, api_key_value: &str, tenant_id: &Option<String>) -> bool { let map = self.keys.read().await; if let Some(tenant_id) = tenant_id { // Multi-tenant: check keys for the specific tenant if let Some(tenant_keys) = map.get(tenant_id) { - return tenant_keys.values().any(|k| k.api_key == api_key_value); + return tenant_keys.values().any(|k| { + k.api_key.as_bytes().ct_eq(api_key_value.as_bytes()).into() + }); } false } else { // Single-tenant: check keys under DEFAULT_TENANT if let Some(tenant_keys) = map.get(DEFAULT_TENANT) { - return tenant_keys.values().any(|k| k.api_key == api_key_value); + return tenant_keys.values().any(|k| { + k.api_key.as_bytes().ct_eq(api_key_value.as_bytes()).into() + }); } false } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/apikeys.rs` around lines 234 - 249, In validate_key, replace the plain equality checks of k.api_key == api_key_value with a constant-time comparison using the subtle crate: add subtle = "2" to Cargo.toml, import subtle::ConstantTimeEq, and use k.api_key.as_bytes().ct_eq(api_key_value.as_bytes()).unwrap_u8() == 1 (or equivalent API) when iterating tenant_keys.values().any(...); update both places in the validate_key method to use this constant-time byte comparison instead of == to mitigate timing attacks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/apikeys.rs`:
- Around line 234-249: In validate_key, replace the plain equality checks of
k.api_key == api_key_value with a constant-time comparison using the subtle
crate: add subtle = "2" to Cargo.toml, import subtle::ConstantTimeEq, and use
k.api_key.as_bytes().ct_eq(api_key_value.as_bytes()).unwrap_u8() == 1 (or
equivalent API) when iterating tenant_keys.values().any(...); update both places
in the validate_key method to use this constant-time byte comparison instead of
== to mitigate timing attacks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: ed5dd247-c054-4a2d-ac96-87f7fb53d2dc
📒 Files selected for processing (2)
src/apikeys.rssrc/handlers/http/middleware.rs
use api keys as an alternative auth mechanism for ingestion use header `X-API-KEY` in place of basic auth middleware validates the header on ingest action key store in object store at .settings/apikeys/<key-id>.json
fc7e4a8 to
63b62c8
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/apikeys.rs`:
- Around line 168-180: The code currently holds the global write lock
(self.keys.write().await) across the awaited metastore call
(PARSEABLE.metastore.put_api_key), causing long blocking; change this to do the
duplicate-name check under the global lock, make a short-lived reservation
(e.g., insert a placeholder entry or per-tenant reservation key in tenant_keys)
or acquire a per-tenant lock, then drop the global write guard before awaiting
put_api_key(&api_key, &api_key.tenant). After put_api_key completes, reacquire
the global write lock and replace the placeholder with the real api_key (or
remove the reservation on error), returning
ApiKeyError::DuplicateKeyName(api_key.key_name) as before when duplicate
detected. Ensure any cleanup on metastore failure removes the reservation to
avoid dead reservations.
In `@src/handlers/http/middleware.rs`:
- Around line 157-178: The tenant header is being overwritten by
get_user_and_tenant, so in the API-key short-circuit branch capture the incoming
tenant header before calling get_user_and_tenant and use that captured value for
tenant_scoped checks (e.g., for suspension via check_suspension and for
validating API key scope) instead of reading req.headers() after
get_user_and_tenant; specifically, read TENANT_ID into a local tenant_id_before
(Option<String>) before invoking get_user_and_tenant, then use tenant_id_before
in the api-key branch where api_key_type_for_action, extract_api_key, and
check_suspension are used.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 16ad85d2-9099-4462-8af9-9c071809b95e
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
Cargo.tomlsrc/apikeys.rssrc/handlers/http/middleware.rssrc/lib.rssrc/metastore/metastore_traits.rssrc/metastore/metastores/object_store_metastore.rssrc/storage/mod.rssrc/storage/object_storage.rs
✅ Files skipped from review due to trivial changes (3)
- src/storage/mod.rs
- Cargo.toml
- src/lib.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/metastore/metastore_traits.rs
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/apikeys.rs (1)
274-286: Index API keys by secret on the auth path.
validate_key()does a full scan of every key in the tenant on each request. Since this now sits on ingest/query authentication, the cost grows linearly with key count and can become a tenant-local bottleneck. A second in-memory index keyed by the API-key value (or a derived hash) would keep validation O(1) while preserving the currentkey_idmap for CRUD/listing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/apikeys.rs` around lines 274 - 286, validate_key currently scans all tenant keys (self.keys) which is O(n); add a secondary in-memory index (e.g., self.keys_by_secret or self.key_secret_index) keyed by the API secret string or its derived hash to allow O(1) lookup per tenant. Change validate_key to first compute tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT) and then check the new index for an entry matching api_key_value and required_type instead of iterating tenant_keys.values().any(...). Ensure all CRUD paths that modify self.keys (create/update/delete functions that reference key_id/key maps) also update the new keys_by_secret index under the same lock semantics (write lock on mutation, read lock on validation), and consider storing a hash instead of raw secret if you want to avoid duplicating secret strings in memory.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/handlers/http/middleware.rs`:
- Around line 185-223: The code currently creates a temporary query session
(mut_sessions().track_new(...), SessionKey::SessionId, and
req.extensions_mut().insert(session_key)) before performing header_error,
suspension checks and API_KEYS.validate_key(...).await, causing sessions to be
created for failed requests; move the session creation and insertion (the
track_new call and req.extensions_mut().insert) to after the checks (after
header_error/suspension handling and after API_KEYS.validate_key(...) returns
true) so the session is only created when the API key is validated, keeping
SessionCleanupGuard(query_session_id) logic intact to still clean up the session
if created.
---
Nitpick comments:
In `@src/apikeys.rs`:
- Around line 274-286: validate_key currently scans all tenant keys (self.keys)
which is O(n); add a secondary in-memory index (e.g., self.keys_by_secret or
self.key_secret_index) keyed by the API secret string or its derived hash to
allow O(1) lookup per tenant. Change validate_key to first compute tenant =
tenant_id.as_deref().unwrap_or(DEFAULT_TENANT) and then check the new index for
an entry matching api_key_value and required_type instead of iterating
tenant_keys.values().any(...). Ensure all CRUD paths that modify self.keys
(create/update/delete functions that reference key_id/key maps) also update the
new keys_by_secret index under the same lock semantics (write lock on mutation,
read lock on validation), and consider storing a hash instead of raw secret if
you want to avoid duplicating secret strings in memory.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: c3abb88d-0abf-43b5-8bb8-347ab05cf8c5
📒 Files selected for processing (2)
src/apikeys.rssrc/handlers/http/middleware.rs
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/apikeys.rs (1)
274-288: Consider constant-time comparison for API key validation.The string comparison
k.api_key == api_key_valueis not constant-time, which could theoretically leak timing information about the API key. While practical exploitation is difficult (network jitter dominates, and theany()iterator already introduces variable timing), using constant-time comparison is a defense-in-depth best practice for secret validation.🔒 Proposed fix using subtle crate
Add
subtleto dependencies, then:+use subtle::ConstantTimeEq; + pub async fn validate_key( &self, api_key_value: &str, tenant_id: &Option<String>, required_type: KeyType, ) -> bool { let map = self.keys.read().await; let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT); if let Some(tenant_keys) = map.get(tenant) { - return tenant_keys - .values() - .any(|k| k.api_key == api_key_value && k.key_type == required_type); + for k in tenant_keys.values() { + if k.key_type == required_type + && k.api_key.as_bytes().ct_eq(api_key_value.as_bytes()).into() + { + return true; + } + } } false }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/apikeys.rs` around lines 274 - 288, The validate_key function uses a standard equality check (k.api_key == api_key_value) which leaks timing info; replace it with a constant-time comparison: add the subtle crate to Cargo.toml and use k.api_key.as_bytes().ct_eq(api_key_value.as_bytes()).unwrap_u8() == 1 (or equivalent ConstantTimeEq API) inside the any() closure in validate_key so you still check k.key_type == required_type but perform the API key comparison in constant time; reference the validate_key method and the api_key field on the key struct when making this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/apikeys.rs`:
- Around line 274-288: The validate_key function uses a standard equality check
(k.api_key == api_key_value) which leaks timing info; replace it with a
constant-time comparison: add the subtle crate to Cargo.toml and use
k.api_key.as_bytes().ct_eq(api_key_value.as_bytes()).unwrap_u8() == 1 (or
equivalent ConstantTimeEq API) inside the any() closure in validate_key so you
still check k.key_type == required_type but perform the API key comparison in
constant time; reference the validate_key method and the api_key field on the
key struct when making this change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 661190ac-4fb6-4c3b-909e-128ead64876c
📒 Files selected for processing (2)
src/apikeys.rssrc/handlers/http/middleware.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/handlers/http/middleware.rs
use api keys as an alternative auth mechanism for ingestion
use header
X-API-KEYin place of basic authmiddleware validates the header on ingest action
key store in object store at .settings/apikeys/.json
Summary by CodeRabbit