feat(sdk): replace Meteor DDP transport with @rocket.chat/ddp-client#40301
feat(sdk): replace Meteor DDP transport with @rocket.chat/ddp-client#40301
Conversation
|
Looks like this PR is not ready to merge, because of the following issues:
Please fix the issues and try again If you have any trouble, please check the PR guidelines |
|
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #40301 +/- ##
===========================================
+ Coverage 69.86% 69.99% +0.13%
===========================================
Files 3298 3300 +2
Lines 119347 120583 +1236
Branches 21530 21613 +83
===========================================
+ Hits 83377 84399 +1022
- Misses 32672 32915 +243
+ Partials 3298 3269 -29
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
e0d347f to
de8a406
Compare
4dfa46d to
56c5e1e
Compare
|
@copilot resolve the merge conflicts in this pull request |
d51c070 to
c2c5bd8
Compare
DDPDispatcher.dispatch() was pushing every payload (including the `connect`, `sub`/`unsub`, and `ping`/`pong` frames) into the serialized queue, which serializes wait blocks at its head. A login frame dispatched while the socket is still connecting therefore queues ahead of the `connect` frame ws.onopen later emits — the `connect` ends up wedged in a non-wait block behind the wait block and never flushes, leaving the socket open but DDP-unhandshaked. Bypass the queue for non-method frames so they go straight to the wire while wait-block ordering still applies to method calls.
…fns crash The hook called formatDate(time, String(undefined)) when the Message_DateFormat setting was momentarily unloaded — passing the literal string "undefined" to date-fns, which throws because it contains an unescaped 'n'. Reachable any time the setting is mid-load (e.g. /admin/info mounted via dynamic import while public settings are still streaming in). Pass 'LL' as the fallback so the formatter never sees a non-string format token. Drops the now-redundant String() coercion.
Stand up a singleton DDPSDK instance (apps/meteor/client/lib/sdk/ddpSdk.ts) pointing at the current origin and keep its connection in sync with the authenticated session via a userIdStore subscription: - ensureConnectedAndAuthenticated() drives DDPSDK.connect() + login-with-token and is awaited from runUserDataSync(uid) so subscribe-on-resume races resolve in the right order. Recognized auth-rejection reasons trigger Meteor.logout() so a server-side force-logout cleanly drains client state; a token-stable guard avoids that path firing on transient 401s where a parallel flow has already swapped the stored token. - adoptAccountFromMeteorLoginResult() syncs DDPSDK.account from a Meteor login result so a downstream ensureConnectedAndAuthenticated() doesn't fire a second redundant login on the same socket. - onLoggedIn now bridges off Accounts.onLogin AND userIdStore so the callback fires reliably when Meteor's autorun chain is wedged (logout → fresh login over the SDK socket). - CachedStore version bumped (18 → 19) to invalidate caches written before the DDP wire encoding switched from JSON to EJSON, since fields like Date were stringified incorrectly in the JSON window. Also wires the SDK module into client/main.ts and bumps @rocket.chat/ddp-client into apps/meteor's manifest. The streamerAdapter file gets the symbol the override layer (next commit) will consume.
Switch the live DDP transport from Meteor's bundled WebSocket to the DDPSDK socket, while preserving Meteor.connection's invoker/Accounts machinery so existing flows (login resume, methods with stubs, the Mongo.Collection registry) keep working unchanged: - ddpOverREST: new `Meteor.connection._send` override routes method calls through DDPSDK.client.callAsync when the SDK is connected and the session is authenticated (or the call is the login that does the authenticating). Falls back to REST while DDPSDK is still booting. Login results are fed back through Meteor's invoker so Accounts state updates the same way it would over Meteor's own WS. - ddpSdkCollectionBridge: re-feeds DDPSDK collection frames into Meteor.connection._streamHandlers so the Mongo.Collection registry keeps updating as the user logs in. - subscribeViaSDK: routes Meteor.connection.subscribe through DDPSDK, with a recursion guard so the bridge above doesn't double-emit. - killMeteorStream: permanently closes Meteor's own WS at boot now that DDPSDK owns the transport. Drains _outstandingMethodBlocks / _methodsBlockingQuiescence on logout so a logout invoker can't get stuck after Meteor's WS goes down. - SDKClient (sdk.call): block on ensureConnectedAndAuthenticated() before dispatching, so cached-store gets fired on the SDK socket immediately after a re-login don't hit an unauthenticated session and persist empty arrays. - Presence streamer: route notify-logged-user-presence subscriptions through DDPSDK and bridge frames back into the SDK.stream event bus. - ServerProvider: combine Meteor.connection + DDPSDK status so the status indicator reflects the actual transport.
The helper waited on a REST response matching `/api/v1/method.call/sendMessage` to extract the new message's id. Once Meteor.connection._send routes through DDPSDK, sendMessage goes over DDP/WS instead and that REST waiter never resolves — every spec calling this helper times out at 60s. Wait for the optimistic list item to appear and reach a non-pending state instead. Use `>= before + 1` rather than `==` because some flows (e.g. just-created encrypted channels) drop additional list items in alongside the user's send.
The previous routing in ddpOverREST shipped methods over the DDPSDK
WebSocket whenever the SDK socket was up — including cached-store gets
that fire immediately after re-login, where the SDK session was briefly
unauthenticated and the server returned [] for everything. The earlier
mitigation (auth-gating) added complexity without resolving the deeper
mismatch with develop's transport choice. Realign on develop's logic
and only use DDPSDK for what it's specialised for:
- All non-bypassed DDP methods now route to REST (`method.call` /
`method.callAnon`), matching develop. `bypassMethods` is restored to
['setUserStatus', 'logout'] and `UserPresence:*` / `stream-*` keep
bypassing.
- `login` (resume + password) routes through the DDPSDK socket when
it's connected so the same login authenticates Meteor's session AND
the SDK socket in one hop. `adoptAccountFromMeteorLoginResult` syncs
`sdk.account` so a downstream `ensureConnectedAndAuthenticated`
short-circuits instead of firing a redundant second login.
- `sdk.call` (used by cached stores) now goes via `Meteor.callAsync`,
same as develop, so methods that previously bypassed ddpOverREST
(`permissions/get`, `subscriptions/get`, `private-settings/get`)
now hit REST too.
Two sub-fixes that fall out of this:
- ddpOverREST's REST error path was rethrowing the API middleware's
parsed JSON body (a plain string `error.message`) into
`processResult`. Meteor's stream handler couldn't parse it as a DDP
result frame, so the resume invoker never saw the rejection, the
stale token stayed in localStorage, and the user wedged on /home with
no main UI. Re-encode it as a proper DDP error result.
- `ensureConnectedAndAuthenticated` now drops the local credentials on
an authentication error (`Accounts._unstoreLoginToken` +
`Meteor.connection.setUserId(null)`) when the stored token didn't
change mid-flight. Keeps the dead-WS path off limits — the previous
fix using `Meteor.logout()` flaked in CI's parallel-shard runs by
racing fresh registration / re-auth.
Reverts `killMeteorStream` to leave Meteor's WS connected: the
permanent-disconnect path broke
`MethodInvoker.sendMessage()`'s `if (this.connection._stream._connected)
{ _send(...) }` gate, leaving every method invoker queued behind a
connection that never returned and ddpOverREST's `_send` wrapper never
firing.
Verified locally:
- Reload with valid token: `/home` renders, 18 REST method calls, 2 WS
frames (login resume on DDPSDK + the SDK login from
ensureConnectedAndAuthenticated).
- Reload with invalid token: localStorage cleared, userId null, login
form rendered.
- administration-settings.spec.ts:26 passes locally in 8.3s.
…nticated
Stream subscriptions fired immediately after re-login (notably the
SubscriptionsCachedStore listener that re-arms via onLoggedIn) hit the
SDK socket while it was still anonymous. The server rejected those
subs with `not-allowed`/`nosub`, the stream's `ready` promise
emitted an error, and the cached store never received subsequent
server events — its in-memory state stayed frozen at the boot
snapshot.
Visible failure: an agent that took a livechat chat after relogout +
relogin saw the chat work (composer enabled, "joined" system
message), but the "Move to the queue" quick-action never appeared.
Tracing it back: in canMoveQueue (`!!routeConfig?.returnQueue && room?.u !== undefined`),
`room.u` is missing — there's a `// TODO: Solve u missing issue`
in app/livechat/server/lib/Helper.ts and livechat rooms in DB never
have `u`. RoomProvider compensates by spreading the subscription
into pseudoRoom (`{...sub, ...room}`), and the agent's subscription
brings its own `u`. With the missing notify-user/<uid>/subscriptions-changed
event, that subscription never lands in the client store, so the
spread leaves room.u undefined and the button hides.
Wrap the DDPSDK subscribe call in createNewDdpSdkStream with
`await ensureConnectedAndAuthenticated()` so subscriptions are only
sent once the socket has the agent's identity. Same pattern as the
sdk.call gate; resolves the same race for streams. `stop()` becomes
nullable-safe because subscribe might still be pending when the
caller unsubscribes.
…rvices ddp-streamer-service (ee/apps/ddp-streamer/src/configureServer.ts) registers `login`, `logout`, `setUserStatus` and `UserPresence:*` as native methods on its own WebSocket — every other method delegates to the Meteor service via `MeteorService.callMethodWithToken`, paying an extra hop that goes (client WS → ddp-streamer → Meteor service → response back). The develop `shouldBypass` is shaped to keep exactly those methods on the client's own DDP WS for the fast path, and route everything else through REST. Our PR had aligned bypassMethods + UserPresence:* + stream-* but dropped the `login + resume` bypass, on the rationale that killMeteorStream tore down Meteor's WS and a bypass would deadlock. After we stopped disconnecting Meteor's stream, that constraint went away — restoring the resume bypass routes the fast-path back through ddp-streamer in CI's microservices runs and lifts the post-relogin slowness that was pushing several specs (auth.ts:9, login.ts:24) past the 5s toBeVisible timeout. Verified locally: invalid-token reload still clears storage and shows the login form; admin-settings/login/presence/e2ee-encryption-decryption/ omnichannel-manual-selection-logout (12 tests) all pass.
ddp-streamer-service in microservices CI only registers `login` natively
for the `{resume}` shape (see ee/apps/ddp-streamer/src/configureServer.ts).
Anything else — `{saml: true, credentialToken}`, `{user, password}`,
OAuth credentials — falls through to MeteorService.callMethodWithToken,
which is the slow extra-hop path the bypass list was designed to avoid.
The previous SDK route fired any `login` call through DDPSDK whenever
the socket was up. That meant SAML credential exchange went DDPSDK →
ddp-streamer → MeteorService.callMethodWithToken, and the success
handler then triggered `Meteor.loginWithToken(result.token)` which
queues a follow-up resume that goes via Meteor's WS bypass to
ddp-streamer again — two distinct logins for the same user, on
different sockets, with diverging account state.
The 5 SAML specs (Login, Allow password change, Logout × 2, User Merge)
in CI EE shard 5/5 all bailed at `getUserInfo` for samluser1 right
after the URL navigated to /home, even though the SAML credential
exchange completed: the user document and ensuing onLogin chain were
trapped in the cross-socket race.
Drop the SDK route for non-resume logins. They go through REST →
rocketchat-main (one hop, no extra delegation), the success handler's
`Meteor.loginWithToken` resume hits the existing bypass to Meteor's WS
→ ddp-streamer's native handler (fast path), and the SDK socket
authenticates via that resume's onLogin chain through
ensureConnectedAndAuthenticated. Login resume itself is still bypassed
in shouldBypass; this hunk just removes the dead pre-bypass SDK detour
that only fired for non-resume callers.
Sanity locally: admin-settings, login × 5, presence × 4,
e2ee-encryption-decryption, omnichannel-manual-selection-logout —
12/12 pass.
Replace Meteor.connection._stream with a stub that pretends to be
connected and forwards outbound DDP frames through the DDPSDK socket.
Goal: one WebSocket per page (the DDPSDK one), eliminating the second
auth roundtrip that was inflating boot time by ~1.5s in EE microservices.
- stubMeteorStream: disconnects Meteor's real stream, swaps in a stub
with currentStatus.connected=true, routes method/sub/unsub frames via
sdk.client.ddp.emit('send', ...) using Meteor's id namespace, and
answers heartbeat pings locally with synthetic pongs. Carries the
message/reset/disconnect listeners that Meteor registered before the
swap onto the stub. Synthesizes a `connected` frame after a microtask
if Meteor's WS hadn't finished its DDP handshake yet.
- ddpSdkCollectionBridge: also forwards `result` and `updated` frames so
bypassed methods routed through the SDK socket reach Meteor's
_methodInvokers (SDK-internal ids never collide with Meteor's numeric
ids, so the bridge is a no-op for SDK's own callers).
- overrides/index: import order now guarantees _send override and the
inbound bridge are wired before the stream swap.
Two follow-ups to the stub-stream prototype: - ddpSdkCollectionBridge: result/updated frames are only forwarded to Meteor when their id is Meteor-shaped. Previously bridging unconditionally caused "No callback invoker for method rc-ddp-client-1" because Meteor's _livedata_data throws when the methodId in an `updated` frame isn't in _methodInvokers (document_processors.js:168). SDK-internal ids start with rc-ddp-client- so a simple prefix check isolates them. - stubMeteorStream: when a `login` method frame goes through the stub to the SDK socket, register an onResult listener and call adoptAccountFromMeteorLoginResult on success. This populates sdk.account.uid/user/token from Meteor's login result so ensureConnectedAndAuthenticated short-circuits its own loginWithToken — eliminating the duplicate login on every page boot.
When ensureConnectedAndAuthenticated runs at boot (from the userIdStore subscriber) it can race Meteor's own resume login that's routed through stubMeteorStream. Both end up as `login` method frames on the SDK socket within the same tick. ddp-streamer's Account.login has no dedup, so each fires Accounts.onLogin → Presence.newConnection → a duplicate row in usersSessions for the same session id. The duplicate stays ONLINE while the active connection flips to AWAY on idle. processConnectionStatus prefers ONLINE over AWAY in the aggregate, so the user.status update is a no-op (modifiedCount=0) and the `presence.status` broadcast never fires — the navbar badge stays online even after `UserPresence:away` succeeds server-side. Fixes: - Drop the boot-time `ensureConnectedAndAuthenticated` call. The Meteor login resume going through stubMeteorStream (with adoptAccountFromMeteorLoginResult populating sdk.account on the result) is the only auth path needed at boot. - Gate the userIdStore-subscriber path on `Accounts.loggingIn()` and `sdk.account.uid`: if Meteor's login is in flight (or has just finished and the adopt callback set sdk.account.uid), short-circuit instead of issuing a redundant loginWithToken. - Single-flight `inflightLogin` so concurrent calls share one promise. Verified: tests/e2e/omnichannel/omnichannel-rooms-forward.spec.ts "should be set to the queue" passes locally in 7.2s with IS_EE=true.
- new-cap: rename local 'tracker' → 'TrackerDependency' so 'new ...()' is OK - prefer-template: use template string for thrown Error message - prettier formatting in stubMeteorStream pong helper - prefer-destructuring on sdk.client.ddp - no-useless-return on the discard-only switch cases - drop the leftover eslint-disable directive in the bridge
…bscriber Both `onLoggedIn` (accounts-base) and `userIdStore.subscribe` fire for the same uid on a successful login. runUserDataSync calls userSetUtcOffset which is rate-limited in CI/prod, so the unguarded second invocation returns 400 too-many-requests. Worse, follow-up REST calls (sessions/list etc.) start coming back 401 because the limiter throttles auth checks for the rest of the window — the user lands on 'Manage Devices' and sees 'Something went wrong' because /sessions/list got rejected. Use a single guarded gate (`syncOnce` with a shared `lastSyncedUid`) across both call sites and the boot fast-path.
…nnect DDPSDK auto-fires loginWithToken on every `connected` event using the in-memory account.user.token (DDPSDK.create line 115-122). When the server force-logs the user out (resetUserE2EKey, account-manage-devices logout, admin device management, etc.), the flow on the server is: 1. broadcast 'user.forceLogout' → meteor.service listener closes the user's WebSocket sessions 2. Users.unsetLoginTokens(uid) wipes services.resume.loginTokens The client sees the WS close, the SDK reconnects, and on the new `connected` it auto-retries loginWithToken with the now-dead token. DDPSDK calls this with `void` so the rejection is swallowed — `account.user` stays populated, `Meteor.userId()` stays set, the router never falls back to Login, and the navbar continues to render Home with a stale session. Wrap account.loginWithToken to observe rejections from this auto-retry path: on auth error (and only when the token in localStorage is still the same one we tried — guard against a concurrent fresh login that already rotated it), drop local credentials so the router goes to Login. Mirrors what ensureConnectedAndAuthenticated already does for its own loginWithToken call. Verified: e2ee-key-reset and e2ee-passphrase-management now pass.
The SAML login flow exposed a third concurrent-login window: after ddpOverREST routes the SAML credential exchange via REST, line 98 fires `Meteor.loginWithToken(token)` (a fresh resume) which goes through stubMeteorStream → SDK socket. While that resume is in flight, `onLoggedIn` synchronously runs syncOnce → ensureConnectedAndAuthenticated. The Accounts.loggingIn() gate doesn't help here — Meteor's accounts package treats _loggingIn as a flag, not a counter, so the first login's onLogin callback resets it to false even though the resume from line 98 is still pending. ensureConnectedAndAuthenticated then proceeds and dispatches its own loginWithToken on the SDK socket, giving us TWO concurrent logins on the same socket again — which ends up creating duplicate connections in usersSessions and stalls the SAML test on PageLoading because synchronizeUserData waits on divergent auth state. Wire stubMeteorStream's outbound login frames into the same `inflightLogin` slot ensureConnectedAndAuthenticated checks. Now any Meteor-routed login (via the stub) holds the lock; the boot path awaits it (and then short-circuits because adopt has populated sdk.account.uid). One login on the SDK socket per page boot, regardless of whether the trigger is a resume, SAML, password, or OAuth flow.
…thToken wrap The previous `account.loginWithToken` wrap (commit 08a8238) was clearing local credentials whenever the SDK socket's auto-retry loginWithToken hit an auth error. That fixed the e2ee-key-reset force logout chain but broke SAML: in some flows the wrap was triggered by SDK's auto-relogin while a fresh login was concurrently completing, clearing the just-stored token and stranding the user mid-login. Move the cleanup to useForceLogout instead. When the `notify-user/<uid>/force_logout` stream fires, do an actual local logout (`Accounts._unstoreLoginToken` + `Meteor.connection.setUserId(null)`) on top of the existing `forceLogout` session flag. This is targeted to the actual force-logout signal rather than any auth error and doesn't race with normal login flows.
Two related issues showed up in SAML and post-logout-relogin tests (e2ee-passphrase-management, account-manage-devices in EE): - The Accounts.loggingIn() gate in ensureConnectedAndAuthenticated could loop forever if Meteor's _loggingIn flag stayed set (e.g. while the SAML follow-up resume from ddpOverREST line 98 was still in flight on the SDK socket). Cap it at 2s so we don't wedge the page on PageLoading. - syncOnce was deduping runUserDataSync per uid, but if the first call rejected (notify-user/userData stream sub coming back nosub because the SDK socket session wasn't auth'd yet), useUserDataSyncReady stayed false and useMainReady never flipped. Allow a retry by clearing lastSyncedUid on rejection, so the next userIdStore subscriber fire retries against a now-authenticated session. Also drop the setInflightStubLogin shared lock — the stub-routed Meteor login can complete in scenarios where the response frame doesn't reach the SDK socket (e.g. server's force_logout listener closes the socket before the matching frame is delivered), and the shared lock would then hold ensureConnectedAndAuthenticated open indefinitely. The Accounts.loggingIn gate (now bounded) and the adopt short-circuit cover the common case.
Re-add the auto-relogin auth-error handler that fixes e2ee-key-reset and
device-management force-logout flows, with two extra guards to avoid
the SAML regression that the previous version caused:
- readStoredLoginToken() === token: nothing rotated the stored token
mid-flight (a concurrent SAML/password/OAuth login already wrote a
fresh one)
- sdk.account.uid === triedWithUid: the SDK account didn't get
refreshed by a successful adopt while we were awaiting (a parallel
Meteor-routed login completed and updated the in-memory state)
If both guards hold, the only plausible explanation is a true server
force-logout (Users.unsetLoginTokens), so we drop local credentials and
let the router fall back to Login.
Verified locally: login.spec, account-login, e2ee-key-reset and
omnichannel-rooms-forward all pass.
…sh logins
The previous wrap cleared local credentials synchronously when DDPSDK's
auto-relogin on `connected` came back with an auth error. That fixed
the e2ee-key-reset flow (server force-logs out, SDK reconnects with
dead token, wrap clears creds, user falls back to Login) but raced
with concurrent fresh logins on:
- e2ee-passphrase-management :76/:87 (loginByUserState +
_pollStoredLoginToken inject a fresh token while the auto-retry
with the dead one is still in flight)
- saml :307 SLO (post-logout redirect chain rotates state under us)
Defer the cleanup by 500ms and re-verify the guards at the deadline.
If a concurrent fresh login completed in the meantime it will have
rotated either the stored token or sdk.account.uid; the deferred
check then bails out instead of nuking the just-stored credentials.
For genuine force-logout flows nothing else touches the state, so
the cleanup runs as before — just half a second later, well within
test timeouts.
The previous `Accounts.loggingIn?.()` gate threw at runtime
('Cannot read properties of undefined reading _loggingIn') because it
was called as a free reference and lost its `this` binding. The throw
was caught silently by runUserDataSync's try/catch, which meant the
gate never actually ran — boot timing happened to work because the
microtask between the throw and synchronizeUserData's stream subscribe
gave adopt enough time to populate sdk.account.
Replace it with an explicit ~500ms poll on sdk.account.uid. If the
Meteor-routed resume login (via stubMeteorStream → SDK socket → adopt)
completes within the window, we short-circuit and avoid issuing a
duplicate loginWithToken on the same socket — the duplicate causes a
second Presence connection, the aggregate stays online instead of
flipping to away, and the omnichannel idle/away tests fail. If adopt
doesn't fire in time, fall back to our own loginWithToken (existing
inflightLogin gate keeps it idempotent).
Verified: login, account-login, e2ee-key-reset and the full
omnichannel-rooms-forward suite pass locally.
The loginWithToken wrap clears creds via Accounts._unstoreLoginToken() + Meteor.connection.setUserId(null), which does NOT fire Accounts.onLogout. Without resetting lastSyncedUid on the resulting uid=undefined transition, a subsequent re-login (e.g. e2ee-passphrase-management's loginByUserState of the same user) is deduped and runUserDataSync never runs — the page stays wedged on PageLoading and the Login button never hides. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… invoker After force-logout cycles (e.g. resetUserE2EKey → ws.terminate → SDK reconnect), stale result/updated frames for methods whose Meteor invoker already cleared can arrive over the SDK socket. _process_updated then throws "No callback invoker for method N" out of an async generator — the throw escapes the bridge's try/catch as an unhandled rejection and aborts Meteor's frame queue, so subsequent login result frames never land and the page stays wedged on /login. Gate the result/updated bridge on _methodInvokers[id] existing so stale frames are silently dropped instead of corrupting Meteor's frame processing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dc87f2e to
f6cd3b7
Compare
…raining
Meteor's _streamHandlers.onMessage returns a Promise. Sync try/catch
around the bridge call doesn't catch throws inside _process_updated
("No callback invoker for method N" when a stale frame arrives) — the
throw escapes as an unhandled rejection and stops Meteor's frame queue,
so the next login's result never gets processed and the page stays
on /login.
Revert the prior _methodInvokers gating (it dropped legitimate login
result frames in the same cycle) and instead capture the async
rejection at the bridge boundary so individual bad frames don't poison
subsequent ones.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…h-error The 500ms-deferred cleanup wasn't enough to handle e2ee-passphrase-management: the test's loginByUserState fires _pollStoredLoginToken with the same token already in localStorage, so Meteor's poller bails (cached token == current). By the time the wrap's setTimeout fires, the test has already injected the SAME token (mongo $addToSet re-added it server-side after unsetLoginTokens), but the wrap was clearing creds anyway, leaving the page stuck on /login with no follow-up login firing. Replace the unconditional clear with a Meteor.loginWithToken retry against whatever's in localStorage right now. If the token was rotated (or re-added to mongo concurrently), the retry succeeds; if it's truly stale (real force-logout, no concurrent recovery), Meteor's callback invokes forceClientLoggedOut to drive the user to /login as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…failure Add console.warn diagnostics around the wrap's deferred recovery so the trace shows whether the retry is firing, what the guard values are, and whether Meteor.loginWithToken throws/resolves/rejects. Will revert once we have a fix locked in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Meteor's accounts-base registers a per-call DDP.onReconnect handler in
callLoginMethod (accounts_client.js:292) that retries login with the
latest stored token and calls makeClientLoggedOut on failure. With our
stub keeping Meteor's connection permanently 'connected', that handler
never fires when the underlying SDK socket reconnects, so server-side
force-logouts (resetUserE2EKey → ws.terminate) leave the user with stale
credentials and no automatic recovery.
Listen for the SDK's 'connected' event in stubMeteorStream and fire
'reset' on every subsequent reconnect — Meteor's _streamHandlers.onReset
then drives _callOnReconnectAndSendAppropriateOutstandingMethods, which
in turn invokes the onReconnect callback. That covers both branches:
- if localStorage was rotated by a concurrent flow, the resume retry
succeeds and setUserId fires;
- if the token is genuinely stale, the resume retry fails and Meteor's
own makeClientLoggedOut clears creds and routes to /login.
Now that Meteor's flow handles the recovery, simplify the
sdk.account.loginWithToken wrap to just swallow the auth-error
rejection (so DDPSDK's `void` auto-relogin doesn't surface as an
unhandled rejection / pageError) — no more deferred cleanup, no
retry-through-Meteor duplication.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous full _streamHandlers.onReset path was too aggressive: it re-sent the outstanding method blocks, but those methods had already completed on the prior SDK socket session, and re-feeding their result/updated frames through the bridge yielded "method result but no methods outstanding" + "No callback invoker for method N" warnings. Drive only what we actually need on SDK reconnect — the DDP._reconnectHook callbacks. The accounts-base _reconnectStopper registered by callLoginMethod is in there, and that's what retries login with the latest stored token and calls makeClientLoggedOut on auth failure. DDPSDK already handles its own subscription resends on reconnect, so we don't need _resendSubscriptions either. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surgical onReconnect-only call broke message-actions, report-message, and e2ee-encryption-decryption — those tests fire methods exactly when the SDK socket churns from a force-logout cycle, and without the full reset's _callOnReconnectAndSendAppropriateOutstandingMethods step the methods stay marked sentMessage=true forever. The bridge's async catch already absorbs the "method result but no methods outstanding" warnings that the resent blocks generate, so the noise is harmless. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Log every SDK 'connected' event, the stored token at that moment, the Meteor userId, and every _pollStoredLoginToken call (with the current/last token and whether it would fire). Will revert once we have :87 nailed down — the trace currently shows nothing between the force-logout disconnect and the failing waitForLogin assertion, so we need eyes on what actually runs in that window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The microservices ddp-streamer was using ws.terminate() (TCP RST) for broadcast force-logouts. The monolith path uses session.connectionHandle.close() which is graceful and flushes the WS buffer first — letting the `notify-user/<uid>/force_logout` stream message (queued by apps/meteor/server/modules/listeners/listeners.module.ts:49) reach the client before the socket goes down. In EE that stream message races with the terminate, terminate wins, the client's useForceLogout hook never fires, and tests like e2e-encryption/e2ee-passphrase-management.spec.ts:87 are left with stale localStorage credentials and a Login button that never hides. Switch to ws.close() with a 5s setTimeout fallback to terminate() for unresponsive sockets — matches the graceful-close semantics the monolith already relies on without losing the safety net for zombies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tStopper
The full fire('reset') was firing accounts-base's _reconnectStopper, which
retries login with the captured `result.token` from the original
callLoginMethod scope. After force-logout, that token is the stale one
the server just invalidated. The retry runs on a wait:true block that
queues AHEAD of the test's own loginByUserState; the stopper's
userCallback then calls makeClientLoggedOut, which clobbers the
credentials the test just injected, and the test's queued login never
sends a frame.
With the ddp-streamer ws.close() change, useForceLogout now reliably
fires (the notify-user/<uid>/force_logout stream message arrives before
the socket closes), so we don't need accounts-base's reconnect-time
relogin retry at all. We still need to resend in-flight methods so that
tests like message-actions / report-message / e2ee-encryption-decryption
don't wedge.
Mirror onReset's _handleOutstandingMethodsOnReset +
_sendOutstandingMethodBlocksMessages + _resendSubscriptions directly,
skipping _callOnReconnectAndSendAppropriateOutstandingMethods (which is
where _reconnectStopper would fire).
Also drop the diagnostic logging now that we have a fix in mind.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 'skip _reconnectStopper' approach broke force-logout coverage:
e2ee-key-reset, account-manage-devices, admin-device-management all
failed because, even with the ddp-streamer ws.close() server fix, the
notify-user/<uid>/force_logout stream message still races with the
close in microservices (it travels rocketchat-main → broker → ddp-streamer
→ WS, while the close listener fires directly on ddp-streamer). The
graceful close only flushes what's already in the WS buffer at that
moment — if the stream message is still in transit, it's still lost.
So the per-call _reconnectStopper from callLoginMethod IS the
load-bearing failsafe in microservices: it retries login with the
latest stored token and calls makeClientLoggedOut on auth failure,
which is what drives the user to /login when the stream message is
lost.
Restore the full fire('reset'). :87 is the only remaining EE 2/5
regression and a separate problem to solve.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Migrates the frontend DDP transport from Meteor's WebSocket to our own
@rocket.chat/ddp-clientSDK, while keeping the existing Meteor Accountscode as the auth anchor. Every client-side method call and subscription
now runs on a single WebSocket we own; Meteor's socket stays present
only to keep Accounts' in-memory state happy.
What moves onto the DDPSDK socket
sdk.call/sdk.publish/sdk.stream— every consumer ofSDKClient now dispatches against DDPSDK (no Meteor fallback).
ServerContext.callMethodandwriteStream— delegate through thesame SDK.
Meteor.apply/Meteor.call/Meteor.callAsync— intercepted inddpOverSDK(formerlyddpOverREST) and routed toddpSdk.client.callAsync. Includes login (password, SAML, LDAP,CAS, resume),
UserPresence:*,setUserStatus, andlogout. RESTstays only as a transient fallback if the SDK is still handshaking.
Meteor.connection.subscribe/Meteor.subscribe— intercepted insubscribeViaSDKand translated toddpSdk.client.subscribe, soAccounts'
loginServiceConfiguration, autoupdate and any strayinternal publications ride our socket too.
stream-user-presence— subscribed via DDPSDK and also listened tovia a
streamerCentraladapter bridging DDPSDK'sonMessageintothe existing
_stream.on('message')contract.Supporting changes
until the WS handshake resolves and authenticates with Meteor's
resume token as soon as
userIdStoreshows a uid.Dates and other EJSON extensions round-trip identically to Meteor's
native frames.
ServerProvider.getStatusflipped to DDPSDK-primary —connected/statusnow derive fromddpSdk.connection.status, with retrycounters falling back to
Meteor.status().ServerProvider.disconnect/reconnectdrive both transports.CachedStoreversion bumped to invalidate entries persisted beforethe EJSON switch (ISO-string dates would fail
.getTime()onfields like
subscription.ls).Meteor.methodsstubs (setReaction,sendMessage) are converted into explicitrunOptimistic*callsin their
flows/*so the optimistic UI no longer relies onMeteor's stub-dispatch machinery.
@rocket.chat/ddp-clientpromoted from a transitive dep (viaui-contexts) to a direct workspace dep ofapps/meteor.What still lives on Meteor
design; removing is a separate phase.
Meteor.connection's WebSocket still opens on page load. Everyoutgoing message is intercepted, but the socket itself is kept so
Meteor's internal reactive status dependencies (used by a few
Tracker.autoruns outside this PR's scope) don't see a permanently
"connecting" state. Neutralising that socket is the next step.
Test plan
two sockets to
/websocketare open — one Meteor, one DDPSDK.Every
method/subframe should appear on the DDPSDK socket;Meteor's is effectively idle.
still work end-to-end. Session resume on reload takes the user
straight into the app.
appear in near-realtime.
the DDPSDK socket via DevTools and verify the offline banner
appears.