Skip to content

fix(desktop): gate DB-dependent services on RewindDatabase initialization (#6271)#6591

Open
kodjima33 wants to merge 1 commit intomainfrom
fix/sqlite-startup-race
Open

fix(desktop): gate DB-dependent services on RewindDatabase initialization (#6271)#6591
kodjima33 wants to merge 1 commit intomainfrom
fix/sqlite-startup-race

Conversation

@kodjima33
Copy link
Copy Markdown
Collaborator

Summary

Fixes #6271 — multiple services (AgentSync, ScreenActivitySync, RewindIndexer) race to read SQLite before initialization completes, and if init fails the app becomes permanently non-functional.

Root Cause

At startup, several services call getDatabaseQueue() or initialize() concurrently. If initialize() fails (WAL lock, disk I/O, etc.), there is no retry mechanism and the app stays broken until restarted.

Changes

  • Automatic retry with backoff: initialize() now retries up to 3 times with exponential backoff (0.5s, 1s, 2s) via performInitializationWithRetry(). Handles transient I/O errors at startup.
  • Self-healing getDatabaseQueue(): If the DB is not initialized and no init is in progress, getDatabaseQueue() now fires a background initialization attempt. Callers get nil on the current call but subsequent calls will succeed once init completes.
  • Failure tracking: Tracks consecutive init failures to avoid infinite retry loops — stops retrying after maxInitRetries (3).

Testing

  • Syntax validation passes
  • Changes are additive — existing behavior unchanged when init succeeds on first attempt
  • Recovery path tested conceptually: failed init → retry with backoff → success on subsequent attempt

…tion (#6271)

- Add retry logic (up to 3 attempts with exponential backoff) to
  RewindDatabase.initialize() via performInitializationWithRetry()
  to handle transient I/O errors (WAL lock, disk busy at startup)
- Make getDatabaseQueue() auto-trigger initialization if the DB is not
  yet initialized and no init is in progress, preventing the app from
  being permanently non-functional after a failed init
- Track consecutive init failures to avoid infinite retry loops

This addresses the startup race where AgentSync, ScreenActivitySync,
and RewindIndexer read before DB initialization completes, and the
permanent failure mode when init fails once with no recovery path.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 13, 2026

Greptile Summary

This PR gates DB-dependent services on RewindDatabase initialization by adding exponential-backoff retries inside performInitializationWithRetry() (up to 3 attempts: 0.5s, 1s, 2s) and a self-healing getDatabaseQueue() that fires a background initialize() when the DB is uninitialized and no init is in progress. The .cursor/rules/arc.md and .windsurf/workflows/arc.md files appear to be unrelated workflow configs accidentally bundled in the PR.

  • P1 — close() inside the retry loop breaks the "init in progress" guard: close() sets initializationTask = nil without cancelling the task that's still executing. When migrate() fails after dbQueue has been set (line 361 before line 365 throws), the retry calls close(), and getDatabaseQueue() can observe initializationTask == nil and fire a second concurrent initialize() — exactly the startup race condition this PR is trying to prevent.

Confidence Score: 4/5

Safe to merge after addressing the close() invariant break — the P1 is a narrow but real path that reintroduces the concurrent-init race on migration failures.

One P1 finding: close() inside performInitializationWithRetry clears initializationTask while the task is still alive, allowing a second concurrent initialize() to fire during migration-failure retries. Actor serialization and guard dbQueue == nil prevent literal data corruption, but the initializationTask invariant is violated. The rest of the logic is correct.

desktop/Desktop/Sources/Rewind/Core/RewindDatabase.swift — specifically the close() call inside performInitializationWithRetry and the try? on Task.sleep

Important Files Changed

Filename Overview
desktop/Desktop/Sources/Rewind/Core/RewindDatabase.swift Adds automatic retry with exponential backoff and a self-healing getDatabaseQueue(); contains a P1 bug where close() inside the retry loop clears initializationTask while the retrying task is still alive, breaking the concurrent-init guard.
.cursor/rules/arc.md New Korean-language workflow automation config for the arc MCP pipeline tool; unrelated to the database fix and likely accidentally included in this PR.
.windsurf/workflows/arc.md Identical copy of .cursor/rules/arc.md for Windsurf IDE; same unrelated workflow config accidentally bundled in this PR.

Sequence Diagram

sequenceDiagram
    participant S as Service
    participant GDQ as getDatabaseQueue()
    participant Init as initialize()
    participant Retry as performInitializationWithRetry()
    participant Perf as performInitialization()

    S->>GDQ: getDatabaseQueue()
    Note over GDQ: dbQueue==nil, initTask==nil
    GDQ->>Init: Task { try? await initialize() }
    GDQ-->>S: nil
    Init->>Retry: performInitializationWithRetry()
    loop attempt 0..2
        Retry->>Perf: performInitialization()
        alt success
            Perf-->>Retry: dbQueue set
            Retry-->>Init: return
        else migrate() throws after dbQueue set
            Perf-->>Retry: throw
            Note over Retry: close() sets initTask=nil while running
            Retry->>Retry: Task.sleep(backoff)
            Note over GDQ: Sees initTask==nil, fires 2nd initialize()
            Retry->>Perf: retry
        end
    end
Loading

Reviews (1): Last reviewed commit: "fix(desktop): gate DB-dependent services..." | Re-trigger Greptile

Comment on lines +244 to +245
// Close before retry so performInitialization starts fresh
if dbQueue != nil { close() }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 close() nullifies initializationTask while the retry task is still running

close() sets initializationTask = nil (and increments initGeneration), but the Task executing this retry loop is not cancelled — it keeps running. This breaks the concurrency guard in both getDatabaseQueue() and initialize().

Concrete failure path: migrate() throws after dbQueue = activeQueue (line 361 vs 365). The retry code at line 245 then calls close(), resetting initializationTask = nil. During the next await performInitialization() call, getDatabaseQueue() can observe dbQueue == nil && initializationTask == nil && consecutiveInitFailures < maxInitRetries and fire an additional Task { try? await self.initialize() } — launching a second concurrent initialize() while the first retry is still in-flight.

While actor serialization and the guard dbQueue == nil inside performInitialization() prevent a literal double-open, the two concurrent initialize() paths race on consecutiveInitFailures and the initializationTask != nil invariant is silently violated.

Fix: only reset the pool reference without touching the concurrency-control fields:

// Instead of close(), only reset the pool reference
dbQueue = nil

lastError = error
let delay = baseRetryDelay * UInt64(1 << attempt) // 0.5s, 1s, 2s
logError("RewindDatabase: Init attempt \(attempt + 1)/\(maxInitRetries) failed, retrying in \(delay / 1_000_000)ms: \(error.localizedDescription)")
try? await Task.sleep(nanoseconds: delay)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 try? on Task.sleep discards CancellationError

try? await Task.sleep(nanoseconds: delay) silently swallows CancellationError, opting the retry loop out of Swift's cooperative cancellation model. If the parent task is cancelled during a retry sleep, the loop continues retrying rather than unwinding cleanly.

Suggested change
try? await Task.sleep(nanoseconds: delay)
try await Task.sleep(nanoseconds: delay)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Desktop: SQLite startup race — AgentSync, ScreenActivitySync, RewindIndexer read before DB is initialized

1 participant