[ANCHOR-1943]: rpc observer DoS via malformed soroban transfer event#1944
[ANCHOR-1943]: rpc observer DoS via malformed soroban transfer event#1944amandagonsalves wants to merge 13 commits into
Conversation
* add test coverage for empty scv_map event values * prevent crashes or stalls when processing an empty scv_map event * ensure the observer continues processing and updates the cursor successfully
* update `scv_u64` memo discriminant check * add error handling for invalid muxed account creation
* update exception handling for payment event processing to catch all exceptions * update error logging to include full exception string * refactor logging format for muxed account creation
* add try-catch block around event processing * update processing to fix potential observer crashes from single event errors
* refactor cursor saving to execute in a `finally` block * update metric and cursor persistence to always run after event processing * fix potential re-processing of events on error during processing
* add null and length checks for scv_map entries * fix parsing by validating scv_i128 amount discriminant * refactor contract event data extraction for robustness
* add tests for one-entry scv_map events * add tests for scv_map events with wrong first-entry type * add tests for contract-address recipient with scv_u64 memo * update helper function to mock varied poison responses
* update timeout for waiting for payment observer events. * refactor payment event assertions to explicitly check for null events. * add explicit failure assertion when payment event waiting times out.
* add new test file for stellar rpc payment observer * add tests to verify observer resilience against malformed soroban events * add scenarios where malformed events do not halt event processing * add checks to ensure observer health remains green after encountering poison events * add verification that the cursor advances correctly past mixed batches of events
* update the default timeout for event capturing to 60 seconds * allow more time for payment observer events to be processed and captured in integration tests
There was a problem hiding this comment.
Pull request overview
Hardens the Stellar RPC payment observer against denial-of-service conditions caused by malformed Soroban transfer events by validating event shapes, isolating per-event failures, and ensuring cursor advancement so poison events cannot be replayed indefinitely.
Changes:
- Make
fetchEvents()persist the RPC cursor in afinallyblock after a successfulgetEvents()call. - Add defensive parsing + broader exception handling in
shouldProcess()and isolate failures per event inprocessEvents(). - Add unit/integration-style tests covering multiple “poison event” variants and improve integration test assertions/timeouts.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| platform/src/main/java/org/stellar/anchor/platform/observer/stellar/StellarRpcPaymentObserver.java | Adds validation/exception hardening around Soroban event parsing; advances cursor even when event processing encounters bad events. |
| platform/src/test/kotlin/org/stellar/anchor/platform/observer/stellar/StellarRpcPaymentObserverTest.kt | Adds unit tests for malformed SCV_MAP variants and contract-address + SCV_U64 memo handling. |
| platform/src/test/kotlin/org/stellar/anchor/platform/observer/stellar/StellarRpcObserverPoisonResilienceTest.kt | Adds scheduler-based resilience tests to ensure observer remains RUNNING and cursor advances past poison events. |
| essential-tests/src/testFixtures/kotlin/org/stellar/anchor/platform/integrationtest/PaymentObserverTests.kt | Improves assertion clarity, increases wait timeout, and makes timeouts fail explicitly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
* update test assertion from `assertNotNull(null)` to `fail()` * refactor test failure logic for clarity and explicitness
* shut down the executor service when observer is stopped * prevent potential resource leaks
* delete junit assertions.fail import * refactor test timeout failure to directly throw assertionerror
| saveCursor(response.getCursor()); | ||
| metricLatestBlockProcessed.set(response.getLatestLedger()); |
There was a problem hiding this comment.
We need streamBackoffTimer.reset(); here after a successful saveCursor. This is reporter's Fix #4
Without this, an operator could still observe slow walks toward self-shutdown that have nothing to do with the malformed-event attack.
|
Note on report's Fix #5 (restrict EventFilter to known SAC addresses) — deferred With the fixes here, the residual exposure of accepting events from any contract is log spam + RPC quota burn, not a service outage — defense-in-depth rather than load-bearing. Tracking as a follow-up. When picked up, SAC addresses for Stellar assets can be derived deterministically from ( |
Description
Before this change, the Stellar RPC payment observer could be permanently shut down by any Soroban contract emitting a
transferevent with a malformedvaluefield.The
shouldProcessmethod indexedSCV_MAPentries at[0]and[1]without bounds or type validation, and the surroundingcatchonly handledIOException— lettingArrayIndexOutOfBoundsException,IllegalArgumentException, andNullPointerExceptionpropagate.Because
saveCursoris called afterprocessEvents, a single poison event stalled the cursor, causing the same event to be replayed on every 1-second tick until the exponential backoff timer maxed out (~10 min) and the observer calledshutdown(). All SEP-6/24/31 withdrawals stopped completing until the process was manually restarted.Changes
SCV_MAPshape inshouldProcessbefore indexing: null-checkgetMap(), requireentries.length >= 2, requireentries[0]discriminant isSCV_I128; return a skip result early otherwise.IllegalArgumentExceptionfromnew MuxedAccount(C…, u64)whentois a contract address; log a warning and fall back to the unmuxed address.catch (IOException)tocatch (Exception)inshouldProcessso all unchecked runtime exceptions are logged and skipped instead of propagating.processEventswith its owntry/catch(Exception)so one bad event cannot abort the entire batch.saveCursorinto afinallyblock infetchEventsso the cursor always advances whengetEventssucceeds, preventing infinite replay of a poison event.Acceptance Criteria
SCV_MAPevent does not crash the observer or setSTREAM_ERROR.SCV_MAPevent does not crash the observer.SCV_MAPwhose first entry is notSCV_I128does not crash the observer.SCV_U64memo logs a warning and completes without crashing.RUNNINGand the cursor advances past the poison event.Context
#1943
Testing
Four new unit tests in
StellarRpcPaymentObserverTest:fetchEvents skips empty SCV_MAP poison event without crashing or stalling(P1)fetchEvents skips one-entry SCV_MAP without crashing or stalling(P2)fetchEvents skips SCV_MAP with wrong first-entry type without crashing or stalling(P3)fetchEvents handles contract-address recipient with SCV_U64 memo without crashing(P4)Each test confirms
ObserverStatus.RUNNINGand cursor advancement after the poison event is processed.Integration tests
ObserverStatus.RUNNINGand cursor advance on the live schedulerGREENafter processing a poison eventwaitForEventsCoroutinedefault timeout from 10 s to 60 s to account for testnet ledger close time (~6 s) plus network latencywaitForEventsCoroutinenow fails with an explicit assertion message on timeout instead of silently returningassertEventsPayment/assertEventsPathPaymentuseassertNotNullbefore indexing so null events produce a readable failureDocumentation
N/A
Known limitations
N/A