fix(deps): backport matrix-rust-sdk#6361 to stop idle sync loop#101
fix(deps): backport matrix-rust-sdk#6361 to stop idle sync loop#101TigerInYourDream wants to merge 2 commits intomainfrom
Conversation
Pin matrix-sdk{, -base, -ui} to Project-Robius-China/matrix-rust-sdk
@ cb391f70 — which is 627563bb (our previous space_room_suggested
tip) plus a single cherry-pick of upstream matrix-org#6361's
production-code commit (c7573469b).
The `requires_timeout` closure in RoomListService was forcing
`timeout=0` for all post-init states, so idle clients kept re-sending
the same `pos` immediately instead of long-polling. The backport
restricts `timeout=0` to `State::Init` only and lets SettingUp /
Recovering / Running use `PollTimeout::Default`, so the server can
long-poll while idle (and still answer immediately when it has
pending changes).
Runtime delta is exactly one closure-body rewrite in
crates/matrix-sdk-ui/src/room_list_service/mod.rs — no public API,
type, or feature-flag changes.
Refs #30
Smoke-test verification against local PalpoRan the release binary ( Phase 1 — catching up with backlog (~27s)The fast cadence here is textbook long-poll "server returns early when it has pending changes" — not the old bug. Phase 2 — idle,
|
| Metric | Bug (pre-fix) | Observed (post-fix) |
|---|---|---|
timeout=0 share |
~100% | 1/112 ≈ 0.9% (only the State::Init sync) |
| Idle request rate | >1000 req/min (tight loop) | 5.6 req/min |
Same-pos spacing |
<1 ms | ~30 s (= requested timeout) |
Verdict
- ✅ SDK-side fix (this PR) is working end-to-end
- ✅ No observable regression in sync behavior during ~10 minutes of idle + active usage
- As a bonus: Palpo is also holding the connection the full 30 s when truly idle, so fix: ignore static metadata when long-polling sliding sync palpo-im/palpo#72's trigger conditions are not being hit in this test env either
Running binary confirmed to be built from this branch (mtime Apr 16 13:34 > commit d18631a3 at 13:11:43).
Real pre/post measurement from the 25-hour Palpo log
The same Palpo container has been running for 25 hours — covering both the buggy pre-fix binary and this fix. Aggregating docker logs --timestamps of robrix2-testenv-palpo-1 gives direct before/after numbers (not estimates):
MSC3575 /sync request volume per hour
| Hour (UTC, 2026-04-16) | Robrix binary | Requests | Rate |
|---|---|---|---|
| 02:00 | pre-fix (buggy) | 12,329 | ~205 req/min |
| 03:00 | pre-fix (buggy) | 21,409 | ~357 req/min (peak) |
| 04:00 | pre-fix (tail) | 3,839 | ~64 req/min |
| 05:00 | transition | 102 | — |
| 05:36 – 05:56 | post-fix | 84 total | ~4 req/min |
Busiest single minute across 25 hours vs. idle minute now
| Minute (UTC) | Binary | Requests | timeout=0 share |
|---|---|---|---|
| 2026-04-16T02:16 | pre-fix peak | 738 / min | 731 / 738 = 99.1% |
| 2026-04-16T05:55 | post-fix idle | 4 / min | 0 / 4 = 0% |
Reduction factor
- Request rate: 738 / 4 ≈ 184× reduction at the peak-vs-idle point
- Sustained hourly load: 21,409 / 240 (extrapolated 4 req/min × 60) ≈ 89× reduction
timeout=0share: 99.1% → 0.9% ≈ 110× reduction
Server-side CPU impact (docker stats --no-stream)
| Container | CPU % now (post-fix, idle) |
|---|---|
robrix2-testenv-palpo-1 |
0.05% |
robrix2-testenv-palpo_postgres-1 |
0.32% |
Palpo is essentially idle on this route now. Before the fix, at ~357 req/min sustained, it was doing MSC3575 room-list diff computation + Postgres lookups hundreds of times per minute; that load is gone.
Bottom line
This is not a "theoretical cleanup" — it is an observable ~100× reduction in sync traffic and a corresponding drop in Palpo CPU load from the same local test env, captured from real logs spanning the binary swap.
| dependencies = [ | ||
| "libc", | ||
| "windows-sys 0.59.0", | ||
| "windows-sys 0.61.1", |
There was a problem hiding this comment.
Need to keep window-sys version unchanged.
There was a problem hiding this comment.
It seems that Project-robius Robrix does not have this issue.
There was a problem hiding this comment.
Good catch, thanks for flagging this — you're right that cargo update -p matrix-sdk ... opportunistically re-resolved unrelated transitive deps. That was scope creep, not intentional.
Fixed in e3fbcae:
- Reset
Cargo.locktoorigin/mainbaseline. - Surgically replaced only the 8
matrix-sdk{, -base, -common, -crypto, -sqlite, -store-encryption, -ui, -indexeddb-stores}source =URL lines to point at the backport fork at revcb391f70. windows-sys 0.59.0at line 2037 (and all other transitive entries) now matchorigin/mainbyte-for-byte — verified: 8 / 8 occurrences identical.cargo check --lockedpasses in 34.63s with zero warnings.
The only extra diff line is robrix = 0.0.1-pre-alpha-4 (vs the lock's stale 0.1.0-pre-alpha-1) — that's Cargo auto-correcting a pre-existing inconsistency in origin/main's lockfile under --locked mode; unavoidable without a separate Cargo.toml version bump.
Final PR scope: 3 Cargo.toml lines + 8 Cargo.lock source-URL lines + 1 auto-corrected version field. Nothing else changed.
Happy to squash d18631a3 + e3fbcae8 into a single clean commit if you'd prefer that for merge.
Reviewer flagged that the previous `cargo update -p matrix-sdk ...`
opportunistically re-resolved unrelated transitive deps (windows-sys
0.59.0 → 0.61.1 for errno, etc.).
This commit resets Cargo.lock to `origin/main` and then surgically
patches only the 8 matrix-sdk{, -base, -common, -crypto, -sqlite,
-store-encryption, -ui, -indexeddb-stores} source URLs to point to
the backport fork at rev cb391f70. All other transitive entries now
exactly match `origin/main`.
The single extra line (`robrix = 0.0.1-pre-alpha-4` vs the lock's
stale `0.1.0-pre-alpha-1`) is Cargo auto-correcting a pre-existing
inconsistency in `origin/main` — unavoidable under `--locked`.
Verified with `cargo check --locked` (34.63s, zero warnings).
Refs #30, addresses review comment from @alanpoon on PR #101
Reviewer flagged that the previous `cargo update -p matrix-sdk ...`
opportunistically re-resolved unrelated transitive deps (windows-sys
0.59.0 → 0.61.1 for errno, etc.).
This commit resets Cargo.lock to `origin/main` and then surgically
patches only the 8 matrix-sdk{, -base, -common, -crypto, -sqlite,
-store-encryption, -ui, -indexeddb-stores} source URLs to point to
the backport fork at rev cb391f70. All other transitive entries now
exactly match `origin/main`.
The single extra line (`robrix = 0.0.1-pre-alpha-4` vs the lock's
stale `0.1.0-pre-alpha-1`) is Cargo auto-correcting a pre-existing
inconsistency in `origin/main` — unavoidable under `--locked`.
Verified with `cargo check --locked` (34.63s, zero warnings).
Refs #30, addresses review comment from @alanpoon on PR #101
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e3fbcae to
5b74337
Compare
|
Pre-fix (25h Palpo log, same container):
Post-fix (same binary, same container, idle):
≈ 180× fewer requests, ≈ 110× drop in The bug window is |
Summary
Backport of upstream matrix-org/matrix-rust-sdk#6361 onto our
space_room_suggestedbase, to address the idle sliding-sync request loop tracked in #30.Pin
matrix-sdk{, -base, -ui}to a SHA on the Robius-China fork that is627563bb(our previousspace_room_suggestedtip) + one cherry-pick:Project-Robius-China/matrix-rust-sdk@fix/room-list-long-poll-after-initial-synccb391f70ade93aee295108e623f6bed3ef1cea53c7573469b(the production-code commit of PR #6361; the test-assertion companion commit is intentionally omitted since we don't compile SDK test targets)What the upstream fix does
RoomListService::requires_timeoutwas forcingtimeout=0for all post-init states —SettingUp,Recovering, andRunning(beforefully_loaded). While idle, the client kept re-sending the sameposright away instead of long-polling, producing thepos=<n>&timeout=0spam tracked in #30.After the fix:
State::Init→PollTimeout::Some(0)(unchanged; first sync still returns immediately so the session establishes fast)State::SettingUp | Recovering | Running→PollTimeout::Default(server long-polls when idle; still responds immediately when it has pending changes)State::Error { .. } | State::Terminated { .. }→PollTimeout::Some(0)Why minimum-risk
crates/matrix-sdk-ui/src/room_list_service/mod.rs. No public API / type / feature-flag changes, so zero call-site ripple in Robrix2.space_room_suggestedcustomizations (thread-subscriptions, ring crypto provider,suggestedfield on SpaceRoom, etc.) are all preserved on the same base commit627563bb.rev = "..."), not branch-tracked — the release build cannot silently absorb later drift on the fork.space_room_suggestedmodifies other parts ofroom_list_service/mod.rsbut does not overlap therequires_timeoutclosure; pre-verified withgit merge-tree --write-tree.Test plan
cargo check --all-targetspasses locally (22.77s with incremental reuse)pos=<n>&timeout=0spam stops once the client reaches idle (verified 2026-04-16)suggestedfield behavior (thespace_room_suggestedcustomizations that sit alongside the fix)Rollback plan
Once upstream #6361 merges and propagates onto
project-robius/space_room_suggested:Cargo.tomllines toproject-robius/matrix-rust-sdk branch = "space_room_suggested"cargo update -p matrix-sdk -p matrix-sdk-base -p matrix-sdk-uiProject-Robius-China/matrix-rust-sdk@fix/room-list-long-poll-after-initial-syncbranchRefs #30