fix(discovery): harden Discovery Plugin registration against failed plugins, retry floods#1482
Conversation
|
Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id> |
|
/build_test |
|
Workflow started at 4/22/2026, 4:25:40 PM. View Actions Run. |
|
No WebSocket notifications schema changes detected. |
|
No OpenAPI schema changes detected. |
|
No GraphQL schema changes detected. |
|
CI build: https://github.com/cryostatio/cryostat/actions/runs/24800846425 |
634b445 to
2f11eea
Compare
only start plugin refresh job for new registrations
2f11eea to
6c39e95
Compare
|
/build_test |
|
Workflow started at 4/24/2026, 4:17:57 PM. View Actions Run. |
|
No WebSocket notifications schema changes detected. |
|
No OpenAPI schema changes detected. |
|
No GraphQL schema changes detected. |
|
CI build: https://github.com/cryostatio/cryostat/actions/runs/24909934429 |
Welcome to Cryostat! 👋
Before contributing, make sure you have:
mainbranch[chore, ci, docs, feat, fix, test]To recreate commits with GPG signature
git fetch upstream && git rebase --force --gpg-sign upstream/mainRelated to #189
Related to #406
Fixes #1483
See cryostatio/cryostat-agent#851
Description of the change:
consecutiveFailures,lastSuccessfulPing,lastFailedPing,backoffMultiplier, andnextPingAtcolumns to the DiscoveryPlugin table. These are used for enhanced logic to detect when Discovery Plugins (Agents) become unreachable. Previously Cryostat would consider a plugin failed as soon as it failed a single ping check, but pings may fail in practice due to network interruptions or target application overload etc., so there should be some leeway. Once a plugin does fail enough consecutive checks with exponential backoff then Cryostat will consider it failed and prune it.ActiveRecordingUpdateJobto ensure the system is more resilient to things like race conditions where a Target may have been lost while a task was in the middle of executing.idandtoken, but using an existingcallbackandrealm, AND the new plugin is able to pass the callback ping check, then we know that this is (somehow) a state where an Agent instance has lost its internal state for keeping track of its own registration information but still appears to be functionally the same as a previously-registered Agent instance. Previously this would generate a registration refusal from Cryostat because the plugin appears to be a duplicate, but now we consider it a replacement of the same plugin (since it has the same identity and passes our identification checks) and pass it back its ID and a fresh token.