fix(desktop): gate DB-dependent services on RewindDatabase initialization (#6271)#6591
fix(desktop): gate DB-dependent services on RewindDatabase initialization (#6271)#6591
Conversation
…tion (#6271) - Add retry logic (up to 3 attempts with exponential backoff) to RewindDatabase.initialize() via performInitializationWithRetry() to handle transient I/O errors (WAL lock, disk busy at startup) - Make getDatabaseQueue() auto-trigger initialization if the DB is not yet initialized and no init is in progress, preventing the app from being permanently non-functional after a failed init - Track consecutive init failures to avoid infinite retry loops This addresses the startup race where AgentSync, ScreenActivitySync, and RewindIndexer read before DB initialization completes, and the permanent failure mode when init fails once with no recovery path.
Greptile SummaryThis PR gates DB-dependent services on
Confidence Score: 4/5Safe to merge after addressing the close() invariant break — the P1 is a narrow but real path that reintroduces the concurrent-init race on migration failures. One P1 finding: close() inside performInitializationWithRetry clears initializationTask while the task is still alive, allowing a second concurrent initialize() to fire during migration-failure retries. Actor serialization and guard dbQueue == nil prevent literal data corruption, but the initializationTask invariant is violated. The rest of the logic is correct. desktop/Desktop/Sources/Rewind/Core/RewindDatabase.swift — specifically the close() call inside performInitializationWithRetry and the try? on Task.sleep Important Files Changed
Sequence DiagramsequenceDiagram
participant S as Service
participant GDQ as getDatabaseQueue()
participant Init as initialize()
participant Retry as performInitializationWithRetry()
participant Perf as performInitialization()
S->>GDQ: getDatabaseQueue()
Note over GDQ: dbQueue==nil, initTask==nil
GDQ->>Init: Task { try? await initialize() }
GDQ-->>S: nil
Init->>Retry: performInitializationWithRetry()
loop attempt 0..2
Retry->>Perf: performInitialization()
alt success
Perf-->>Retry: dbQueue set
Retry-->>Init: return
else migrate() throws after dbQueue set
Perf-->>Retry: throw
Note over Retry: close() sets initTask=nil while running
Retry->>Retry: Task.sleep(backoff)
Note over GDQ: Sees initTask==nil, fires 2nd initialize()
Retry->>Perf: retry
end
end
Reviews (1): Last reviewed commit: "fix(desktop): gate DB-dependent services..." | Re-trigger Greptile |
| // Close before retry so performInitialization starts fresh | ||
| if dbQueue != nil { close() } |
There was a problem hiding this comment.
close() nullifies initializationTask while the retry task is still running
close() sets initializationTask = nil (and increments initGeneration), but the Task executing this retry loop is not cancelled — it keeps running. This breaks the concurrency guard in both getDatabaseQueue() and initialize().
Concrete failure path: migrate() throws after dbQueue = activeQueue (line 361 vs 365). The retry code at line 245 then calls close(), resetting initializationTask = nil. During the next await performInitialization() call, getDatabaseQueue() can observe dbQueue == nil && initializationTask == nil && consecutiveInitFailures < maxInitRetries and fire an additional Task { try? await self.initialize() } — launching a second concurrent initialize() while the first retry is still in-flight.
While actor serialization and the guard dbQueue == nil inside performInitialization() prevent a literal double-open, the two concurrent initialize() paths race on consecutiveInitFailures and the initializationTask != nil invariant is silently violated.
Fix: only reset the pool reference without touching the concurrency-control fields:
// Instead of close(), only reset the pool reference
dbQueue = nil| lastError = error | ||
| let delay = baseRetryDelay * UInt64(1 << attempt) // 0.5s, 1s, 2s | ||
| logError("RewindDatabase: Init attempt \(attempt + 1)/\(maxInitRetries) failed, retrying in \(delay / 1_000_000)ms: \(error.localizedDescription)") | ||
| try? await Task.sleep(nanoseconds: delay) |
There was a problem hiding this comment.
try? on Task.sleep discards CancellationError
try? await Task.sleep(nanoseconds: delay) silently swallows CancellationError, opting the retry loop out of Swift's cooperative cancellation model. If the parent task is cancelled during a retry sleep, the loop continues retrying rather than unwinding cleanly.
| try? await Task.sleep(nanoseconds: delay) | |
| try await Task.sleep(nanoseconds: delay) |
Summary
Fixes #6271 — multiple services (AgentSync, ScreenActivitySync, RewindIndexer) race to read SQLite before initialization completes, and if init fails the app becomes permanently non-functional.
Root Cause
At startup, several services call
getDatabaseQueue()orinitialize()concurrently. Ifinitialize()fails (WAL lock, disk I/O, etc.), there is no retry mechanism and the app stays broken until restarted.Changes
initialize()now retries up to 3 times with exponential backoff (0.5s, 1s, 2s) viaperformInitializationWithRetry(). Handles transient I/O errors at startup.getDatabaseQueue(): If the DB is not initialized and no init is in progress,getDatabaseQueue()now fires a background initialization attempt. Callers getnilon the current call but subsequent calls will succeed once init completes.maxInitRetries(3).Testing