Skip to content

fix(pegboard): validate drain_grace_period < request_lifespan#4998

Open
abcxff wants to merge 1 commit into
05-06-fix_log_error_on_failed_inspector_requestsfrom
05-07-fix_pegboard_validate_drain_grace_period_request_lifespan
Open

fix(pegboard): validate drain_grace_period < request_lifespan#4998
abcxff wants to merge 1 commit into
05-06-fix_log_error_on_failed_inspector_requestsfrom
05-07-fix_pegboard_validate_drain_grace_period_request_lifespan

Conversation

@abcxff
Copy link
Copy Markdown
Contributor

@abcxff abcxff commented May 7, 2026

when drain_grace_period >= request_life_span, we get undefined behaviour. This enforces this on the server side befoe pool config is upserted

@railway-app
Copy link
Copy Markdown

railway-app Bot commented May 7, 2026

🚅 Deployed to the rivet-pr-4998 environment in rivet-frontend

Service Status Web Updated (UTC)
kitchen-sink ❌ Build Failed (View Logs) Web May 12, 2026 at 1:21 pm
website 😴 Sleeping (View Logs) Web May 9, 2026 at 2:23 am
mcp-hub ✅ Success (View Logs) Web May 7, 2026 at 8:18 pm
frontend-cloud ❌ Build Failed (View Logs) Web May 7, 2026 at 8:18 pm
frontend-inspector ❌ Build Failed (View Logs) Web May 7, 2026 at 8:18 pm
ladle ❌ Build Failed (View Logs) Web May 7, 2026 at 8:15 pm

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 7, 2026

PR Review: fix(pegboard): validate drain_grace_period < request_lifespan

Overview

This PR adds a server-side validation in pegboard_runner_config_upsert that enforces drain_grace_period < request_lifespan for serverless runner configs. The root cause of the undefined behavior is in pegboard-outbound/src/lib.rs:450 where saturating_sub silently returns 0 when drain_grace_period >= request_lifespan, causing the drain to start immediately. The fix is correct and follows existing error-handling patterns.


Issues

Missing test coverage

A test for the new validation is absent. The existing upsert_runner_config_serverless_drain_grace_period_exceeds_actor_stop_threshold test in engine/packages/engine/tests/runner/api_runner_configs_upsert.rs shows exactly what a new test should look like. A symmetric test should be added covering the equal case (drain_grace_period == request_lifespan) and the strictly-greater case to verify the boundary.

Default value trap — error message should mention the 1800s default

drain_grace_period is optional in the API and defaults to 1800 s (30 min) when omitted (api-types/src/namespaces/runner_configs.rs:71). A user who only sets request_lifespan: 60 without providing drain_grace_period will see:

`drain_grace_period` must be less than `request_lifespan` (1800s >= 60s)

The 1800 s value will look surprising since the user never set it. The existing actor_stop_threshold error already handles this by appending "If drain_grace_period was omitted, the default is 1800s." The new error message should do the same.


Observations (not blocking)

Two code paths for drain timing

pegboard-outbound/src/lib.rs:450 uses the per-runner drain_grace_period field (validated here). pegboard/src/workflows/serverless/conn.rs:443-444 uses the system-level ctx.config().pegboard().serverless_drain_grace_period() and is unaffected by this validation. Both share the same saturating_sub structure, so if the system-level value can exceed request_lifespan the same silent early-drain behavior is possible there. Pre-existing, out of scope, but worth a follow-up.

Validation ordering is correct

Placing the new check before the actor_stop_threshold check is the right order.


Summary

The core logic is correct and the change is small and well-targeted. Two actionable items before merging: (1) add a test mirroring the existing drain_grace_period_exceeds_actor_stop_threshold pattern, and (2) append the default-value hint to the new error message to avoid surprising users who omit drain_grace_period.

@abcxff abcxff changed the base branch from 05-06-_slop_fix_rivetkit_remove_async_drizzle_db_proxy to graphite-base/4998 May 7, 2026 20:16
@abcxff abcxff force-pushed the 05-07-fix_pegboard_validate_drain_grace_period_request_lifespan branch from 9b5f655 to 56af1d1 Compare May 7, 2026 20:16
@abcxff abcxff changed the base branch from graphite-base/4998 to 05-06-fix_rivetkit_bind_methods_through_createwritethroughproxy May 7, 2026 20:16
@abcxff abcxff marked this pull request as ready for review May 7, 2026 20:18
@abcxff abcxff requested a review from MasterPtato May 7, 2026 20:23
@abcxff abcxff force-pushed the 05-07-fix_pegboard_validate_drain_grace_period_request_lifespan branch from 56af1d1 to aac9634 Compare May 11, 2026 03:41
@abcxff abcxff force-pushed the 05-06-fix_rivetkit_bind_methods_through_createwritethroughproxy branch from c39972c to add44af Compare May 11, 2026 03:42
@abcxff abcxff changed the base branch from 05-06-fix_rivetkit_bind_methods_through_createwritethroughproxy to graphite-base/4998 May 12, 2026 13:21
@abcxff abcxff force-pushed the 05-07-fix_pegboard_validate_drain_grace_period_request_lifespan branch from aac9634 to f8fe5aa Compare May 12, 2026 13:21
@abcxff abcxff changed the base branch from graphite-base/4998 to 05-06-fix_log_error_on_failed_inspector_requests May 12, 2026 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant