Skip to content

[ANCHOR-1943]: rpc observer DoS via malformed soroban transfer event#1944

Open
amandagonsalves wants to merge 13 commits into
developfrom
fix/empty-scvmap-dos
Open

[ANCHOR-1943]: rpc observer DoS via malformed soroban transfer event#1944
amandagonsalves wants to merge 13 commits into
developfrom
fix/empty-scvmap-dos

Conversation

@amandagonsalves
Copy link
Copy Markdown
Collaborator

@amandagonsalves amandagonsalves commented May 20, 2026

Description

Before this change, the Stellar RPC payment observer could be permanently shut down by any Soroban contract emitting a transfer event with a malformed value field.

The shouldProcess method indexed SCV_MAP entries at [0] and [1] without bounds or type validation, and the surrounding catch only handled IOException — letting ArrayIndexOutOfBoundsException, IllegalArgumentException, and NullPointerException propagate.

Because saveCursor is called after processEvents, a single poison event stalled the cursor, causing the same event to be replayed on every 1-second tick until the exponential backoff timer maxed out (~10 min) and the observer called shutdown(). All SEP-6/24/31 withdrawals stopped completing until the process was manually restarted.

Changes

  • Validate SCV_MAP shape in shouldProcess before indexing: null-check getMap(), require entries.length >= 2, require entries[0] discriminant is SCV_I128; return a skip result early otherwise.
  • Catch IllegalArgumentException from new MuxedAccount(C…, u64) when to is a contract address; log a warning and fall back to the unmuxed address.
  • Broaden catch (IOException) to catch (Exception) in shouldProcess so all unchecked runtime exceptions are logged and skipped instead of propagating.
  • Wrap each event in processEvents with its own try/catch(Exception) so one bad event cannot abort the entire batch.
  • Move saveCursor into a finally block in fetchEvents so the cursor always advances when getEvents succeeds, preventing infinite replay of a poison event.

Acceptance Criteria

  • An empty SCV_MAP event does not crash the observer or set STREAM_ERROR.
  • A one-entry SCV_MAP event does not crash the observer.
  • An SCV_MAP whose first entry is not SCV_I128 does not crash the observer.
  • A contract-address recipient with SCV_U64 memo logs a warning and completes without crashing.
  • In all four cases above, the observer status remains RUNNING and the cursor advances past the poison event.
  • Existing SEP-24 / SEP-6 / SEP-31 happy paths and single-client callback flows continue to pass.

Context

#1943

Testing

Four new unit tests in StellarRpcPaymentObserverTest:

  • fetchEvents skips empty SCV_MAP poison event without crashing or stalling (P1)
  • fetchEvents skips one-entry SCV_MAP without crashing or stalling (P2)
  • fetchEvents skips SCV_MAP with wrong first-entry type without crashing or stalling (P3)
  • fetchEvents handles contract-address recipient with SCV_U64 memo without crashing (P4)

Each test confirms ObserverStatus.RUNNING and cursor advancement after the poison event is processed.

Integration tests

  • P1–P4 poison variants each confirm ObserverStatus.RUNNING and cursor advance on the live scheduler
  • Health check returns GREEN after processing a poison event
  • Mixed batch containing a poison and a valid event advances cursor past both
  • Increased waitForEventsCoroutine default timeout from 10 s to 60 s to account for testnet ledger close time (~6 s) plus network latency
  • waitForEventsCoroutine now fails with an explicit assertion message on timeout instead of silently returning
  • assertEventsPayment / assertEventsPathPayment use assertNotNull before indexing so null events produce a readable failure

Documentation

N/A

Known limitations

N/A

* add test coverage for empty scv_map event values

* prevent crashes or stalls when processing an empty scv_map event

* ensure the observer continues processing and updates the cursor successfully
* update `scv_u64` memo discriminant check
* add error handling for invalid muxed account creation
* update exception handling for payment event processing to catch all exceptions

* update error logging to include full exception string

* refactor logging format for muxed account creation
* add try-catch block around event processing

* update processing to fix potential observer crashes from single event errors
* refactor cursor saving to execute in a `finally` block
* update metric and cursor persistence to always run after event processing
* fix potential re-processing of events on error during processing
*   add null and length checks for scv_map entries

*   fix parsing by validating scv_i128 amount discriminant

*   refactor contract event data extraction for robustness
* add tests for one-entry scv_map events

* add tests for scv_map events with wrong first-entry type

* add tests for contract-address recipient with scv_u64 memo

* update helper function to mock varied poison responses
* update timeout for waiting for payment observer events.

* refactor payment event assertions to explicitly check for null events.

* add explicit failure assertion when payment event waiting times out.
* add new test file for stellar rpc payment observer

* add tests to verify observer resilience against malformed soroban events

* add scenarios where malformed events do not halt event processing

* add checks to ensure observer health remains green after encountering poison events

* add verification that the cursor advances correctly past mixed batches of events
* update the default timeout for event capturing to 60 seconds

* allow more time for payment observer events to be processed and captured in integration tests
@amandagonsalves amandagonsalves marked this pull request as ready for review May 21, 2026 04:22
Copilot AI review requested due to automatic review settings May 21, 2026 04:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens the Stellar RPC payment observer against denial-of-service conditions caused by malformed Soroban transfer events by validating event shapes, isolating per-event failures, and ensuring cursor advancement so poison events cannot be replayed indefinitely.

Changes:

  • Make fetchEvents() persist the RPC cursor in a finally block after a successful getEvents() call.
  • Add defensive parsing + broader exception handling in shouldProcess() and isolate failures per event in processEvents().
  • Add unit/integration-style tests covering multiple “poison event” variants and improve integration test assertions/timeouts.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
platform/src/main/java/org/stellar/anchor/platform/observer/stellar/StellarRpcPaymentObserver.java Adds validation/exception hardening around Soroban event parsing; advances cursor even when event processing encounters bad events.
platform/src/test/kotlin/org/stellar/anchor/platform/observer/stellar/StellarRpcPaymentObserverTest.kt Adds unit tests for malformed SCV_MAP variants and contract-address + SCV_U64 memo handling.
platform/src/test/kotlin/org/stellar/anchor/platform/observer/stellar/StellarRpcObserverPoisonResilienceTest.kt Adds scheduler-based resilience tests to ensure observer remains RUNNING and cursor advances past poison events.
essential-tests/src/testFixtures/kotlin/org/stellar/anchor/platform/integrationtest/PaymentObserverTests.kt Improves assertion clarity, increases wait timeout, and makes timeouts fail explicitly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

* update test assertion from `assertNotNull(null)` to `fail()`

* refactor test failure logic for clarity and explicitness
* shut down the executor service when observer is stopped

* prevent potential resource leaks
* delete junit assertions.fail import

* refactor test timeout failure to directly throw assertionerror
@amandagonsalves amandagonsalves requested a review from JiahuiWho May 21, 2026 20:19
Comment on lines +133 to +134
saveCursor(response.getCursor());
metricLatestBlockProcessed.set(response.getLatestLedger());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need streamBackoffTimer.reset(); here after a successful saveCursor. This is reporter's Fix #4

Without this, an operator could still observe slow walks toward self-shutdown that have nothing to do with the malformed-event attack.

@JiahuiWho
Copy link
Copy Markdown
Contributor

Note on report's Fix #5 (restrict EventFilter to known SAC addresses) — deferred

With the fixes here, the residual exposure of accepting events from any contract is log spam + RPC quota burn, not a service outage — defense-in-depth rather than load-bearing.

Tracking as a follow-up. When picked up, SAC addresses for Stellar assets can be derived deterministically from (asset code, issuer) via assetService.getStellarAssets() (no operator config). Custom Soroban tokens would need an opt-in stellar.soroban.token_contracts: [...] allowlist that falls back to current behavior when empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Anchor Platform Stellar-RPC payment observer permanent DoS via crafted Soroban transfer event with empty SCV_MAP

3 participants