Move Flare benchmark onto a dedicated Web Worker by sauravpanda · Pull Request #309 · sauravpanda/BrowserAI

sauravpanda · 2026-04-20T19:36:35Z

Summary

Move Flare in the benchmark HTML onto a dedicated Web Worker so the main thread stays responsive during the 138 MB GGUF parse, GPU buffer upload, and decode. Matches how MLC and Transformers.js already work internally.

Why

FlareEngine.load(bytes) is a synchronous WASM call. For a 138 MB SmolLM2-135M Q8_0 GGUF it holds the main thread for 2–10 s depending on cache state — long enough that Chromium declares the tab unresponsive ("page crashing / can't scroll" reports). Running it in a worker removes that dead time from the UI thread entirely.

What changed

examples/benchmark/flare-worker.js (new, 145 lines) — owns the single FlareEngine instance. Handles RPC-style {id, type, args} messages and streams decoded tokens back via postMessage.
examples/benchmark/index.html (~170 lines rewritten) — main-thread Promise-based RPC helper (flareCall(type, args, transfer, onStream)), replaces the direct flareLib.FlareEngine.load(bytes) call with worker communication.
src/engines/flare-engine-wrapper.ts (+24 lines, 3 removed) — fixes the enable/read/disable ordering bug so the first generateText now logs a real prefill profile snapshot (previously seq_len: 0 because the read happened before any prefill).

Design

Transferables: GGUF bytes cross as Uint8Array with transfer: [bytes.buffer] — zero-copy.
Streaming: worker posts one {id, stream: {tokenText, tokenId}} message per decoded token; final {id, result} carries completion metadata. Main-thread client hooks an onStream callback per request.
Error bridging: worker catches exceptions in its dispatch and replies {id, error: {message, stack}}; main-thread RPC rejects the matching Promise.
WebGPU in worker: navigator.gpu is available in dedicated workers in all current-gen browsers (and @sauravpanda/flare@0.2.11 ships a WorkerGlobalScope.performance fallback, so the profiler reads valid numbers from the worker too).

Test plan

npx jest — 62 / 62 passing
npm run build — ESM + CJS + DTS all green
npx eslint — no new warnings (9 pre-existing console.log warnings unchanged)
Manual benchmark run (reviewer): fresh tab, scroll during Flare load — page should stay responsive throughout
Verify prefill profile JSON arrives with non-zero seq_len and per-phase ms values

Out of scope (follow-up)

Library-level FlareEngineWrapper still loads on the main thread. Moving it behind the same worker is a larger change — breaks the currently-sync reset()/encode_text() methods — and belongs in its own PR.

Summary by CodeRabbit

New Features
- Engine runs in a background worker for smoother UI; streaming token updates with live progress (tokens/sec) and token callbacks.
- GPU initialization and profiling accessible via the background worker; one-time prefill profile logging per page load.
Chores
- Refactored model loading, inference, and disposal to use the worker for better responsiveness and resource cleanup.
Tests
- Minor test formatting updates (no behavioral changes).

FlareEngine.load() is a synchronous WASM call that blocks the main thread for the full 138 MB GGUF parse + GPU buffer upload. Running it on the main thread made the tab unresponsive for the duration of the load (2-10s depending on cache state), and was responsible for the "page freezes after clicking Run Benchmark" reports. This change moves Flare into a dedicated Web Worker. The main thread sends RPC-style {id, type, args} messages; the worker owns the single FlareEngine instance and streams decoded tokens back via postMessage. Matches how MLC and Transformers.js already work internally. Key details: - New examples/benchmark/flare-worker.js owns the FlareEngine - index.html talks to the worker via a small Promise-based RPC helper - GGUF bytes cross as a transferable (zero-copy) - Streaming: worker posts one message per decoded token; main-thread client accumulates into the output string - Prefill profile read is deferred until after the first prefill (previously called before any inference, always returned seq_len=0) - src/engines/flare-engine-wrapper.ts: fix same enable/read/disable ordering bug so `generateText` logs a real profile snapshot Library-level worker migration for FlareEngineWrapper is a follow-up — this PR is scoped to fixing the benchmark's freeze.

coderabbitai · 2026-04-20T19:36:50Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f46573a7-2828-4aed-b9cf-66cda0f95bb5

📥 Commits

Reviewing files that changed from the base of the PR and between 624b9c1 and 8370f0e.

📒 Files selected for processing (3)

src/core/llm/browserai.test.ts
src/engines/flare-engine-wrapper.test.ts
src/engines/flare-engine-wrapper.ts

✅ Files skipped from review due to trivial changes (2)

src/engines/flare-engine-wrapper.test.ts
src/core/llm/browserai.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/engines/flare-engine-wrapper.ts

📝 Walkthrough

Walkthrough

FlareEngine was moved into a dedicated Web Worker (new examples/benchmark/flare-worker.js) with an RPC-style message protocol; the benchmark page (examples/benchmark/index.html) was updated to call the worker and transfer model bytes. Prefill profiling is deferred and logged once per page load via a latch in the wrapper.

Changes

Cohort / File(s)	Summary
Worker Implementation `examples/benchmark/flare-worker.js`	Added a new Web Worker implementing an RPC protocol (message ids, types: `init`, `load`, `init_gpu`, `stream`, `dispose`, etc.), streaming token responses, error propagation, and engine lifecycle management.
Main Thread Integration `examples/benchmark/index.html`	Refactored benchmark to communicate with the worker via `flareCall()` RPC bridge, send model bytes as transferables, handle streamed token callbacks, compute throughput/timing from streams, and terminate/cleanup worker state on dispose.
Engine Profiling Latch `src/engines/flare-engine-wrapper.ts`	Added a process-wide static latch to log the prefill profile JSON only once per page load; profiling snapshot reading is deferred to the first streaming generation and protected with try/catch for older builds.
Tests / Minor Formatting `src/core/llm/browserai.test.ts`, `src/engines/flare-engine-wrapper.test.ts`	Small test assertion formatting and whitespace normalization; no behavior changes.

Sequence Diagram

sequenceDiagram
    participant Main as Main Thread
    participant Worker as Flare Worker
    participant Engine as FlareEngine

    Main->>Worker: {id, type: "init", args}
    Worker->>Engine: import & initialize module
    Worker-->>Main: {id, result}

    Main->>Worker: {id, type: "load", args: [modelBytes]}
    Worker->>Engine: create engine from bytes
    Worker-->>Main: {id, result: {flareArch}}

    Main->>Worker: {id, type: "init_gpu"}
    Worker->>Engine: init GPU / backend
    Worker-->>Main: {id, result: backendInfo}

    Main->>Worker: {id, type: "stream", args: prompt}
    Worker->>Engine: generate tokens
    Worker-->>Main: {id, stream: {tokenText, tokenId}}  /* repeated */
    Worker-->>Main: {id, result: finalOutput}

    Main->>Worker: {id, type: "dispose"}
    Worker->>Engine: cleanup
    Worker-->>Main: {id, result}

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: add Flare WASM inference engine integration #301: Related changes to Flare engine integration and profiling behavior that overlap with wrapper and benchmark modifications.
feat: add engine benchmark demo and fix Flare package name #307: Prior benchmark and Flare integration work; modifies the same example files and runtime flows.
fix: benchmark Flare loading and broken SmolLM2 GGUF URLs #308: Related changes to module-loading and patched import logic now moved into the worker.

Suggested labels

size/L

🐰 I hopped a thread to move engines away,
Tokens drip like raindrops, chunk by chunk,
One profile log, no echoing fray,
Worker hums while the benchmark runs,
Hop, stream, repeat — joyous run! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Move Flare benchmark onto a dedicated Web Worker' directly and clearly describes the primary change across the changeset: moving benchmark operations from the main thread to a dedicated Web Worker.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch flare-web-worker

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/engines/flare-engine-wrapper.ts (1)
1-1: ⚠️ Potential issue | 🟡 Minor

Run Prettier for this file.

The Build and Lint job reports a Prettier check failure for src/engines/flare-engine-wrapper.ts; please run the formatter before merging.
npx prettier --write "src/**/*.ts"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/engines/flare-engine-wrapper.ts` at line 1, The file
src/engines/flare-engine-wrapper.ts fails Prettier formatting; run the project
formatter and commit the changes to satisfy CI. Run npx prettier --write
"src/**/*.ts" (or format just src/engines/flare-engine-wrapper.ts), verify that
the exported symbols in flare-engine-wrapper.ts remain unchanged, then stage and
commit the formatted file so the Build and Lint job passes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/benchmark/index.html`:
- Around line 725-741: Move the timing capture (compute totalTime and decodeTime
from t0 and firstTokenTime) to immediately after the streaming operation
resolves (right after the promise that yields the final stream result), before
calling flareCall('prefill_profile_json') or doing JSON.parse/log/UI logging;
this ensures window.__flareProfileLogged/profile parsing/console.log/log(...) do
not affect benchmark metrics. Keep the existing guard
window.__flareProfileLogged and the try/catch around
flareCall('prefill_profile_json') but run them after capturing
totalTime/decodeTime; after logging the profile, invoke the backend toggle via
flareCall('disable_prefill_profiling') (or the existing
disable_prefill_profiling RPC) to match backend support. Ensure variable names
totalTime, decodeTime, t0, firstTokenTime, flareCall('prefill_profile_json'),
and window.__flareProfileLogged are used as in the diff so the change is
localized and clear.
- Around line 613-616: The worker error handler currently just logs errors,
leaving any awaiting flareCall() promises unresolved; update the 'error' and
'messageerror' handlers on flareWorker to iterate over the pending RPC
bookkeeping (e.g., the map/obj used by flareCall to store resolvers — reference
that structure by name in your code), call each stored reject callback with the
received error (or a new Error describing worker failure), clear the pending
map, remove/cleanup worker event listeners, terminate the dead worker
(flareWorker.terminate()), and perform any UI/disabled-state cleanup so buttons
are re-enabled; do the same cleanup path for both 'error' and 'messageerror'
handlers to avoid leaked promises.

In `@src/engines/flare-engine-wrapper.ts`:
- Around line 390-397: After capturing a non-empty prefill profile in
FlareEngineWrapper (the block using (this.engine as unknown as {
prefill_profile_json(): string }).prefill_profile_json() and setting
FlareEngineWrapper.profileLogged = true), explicitly disable engine-level
profiling so subsequent calls aren't instrumented: after setting profileLogged =
true, detect and call the engine's profiling-disable API (e.g. check for and
invoke methods like disable_profiling(), setProfilingEnabled(false),
enableProfiling(false) or similar on this.engine), falling back to no-op if none
exist; ensure this logic is added in the same conditional that logs the profile
so profiling stops immediately after the first valid snapshot.

---

Outside diff comments:
In `@src/engines/flare-engine-wrapper.ts`:
- Line 1: The file src/engines/flare-engine-wrapper.ts fails Prettier
formatting; run the project formatter and commit the changes to satisfy CI. Run
npx prettier --write "src/**/*.ts" (or format just
src/engines/flare-engine-wrapper.ts), verify that the exported symbols in
flare-engine-wrapper.ts remain unchanged, then stage and commit the formatted
file so the Build and Lint job passes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dc5389f4-dcda-4a68-b853-a637ab1b9b6a

📥 Commits

Reviewing files that changed from the base of the PR and between 119866b and 624b9c1.

📒 Files selected for processing (3)

examples/benchmark/flare-worker.js
examples/benchmark/index.html
src/engines/flare-engine-wrapper.ts

coderabbitai · 2026-04-20T19:41:00Z

+        flareWorker.addEventListener('error', (e) => {
+          console.error('[flare-worker] error:', e);
+          log(`Flare worker error: ${e.message || e}`, 'error');
+        });


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash rg -nP -C4 "flareRpcPending\\.(set|delete|clear)|addEventListener\$'(error|messageerror)'|postMessage\\(" examples/benchmark/index.html

Repository: sauravpanda/BrowserAI

Length of output: 1478

🏁 Script executed:

#!/bin/bash # Get the complete ensureFlareWorker function and surrounding context sed -n '580,650p' examples/benchmark/index.html | cat -n

Repository: sauravpanda/BrowserAI

Length of output: 3323

🏁 Script executed:

#!/bin/bash # Search for all references to flareRpcPending in the file rg -n "flareRpcPending" examples/benchmark/index.html

Repository: sauravpanda/BrowserAI

Length of output: 354

🏁 Script executed:

#!/bin/bash # Check if there's any other error handling or cleanup related to flareWorker rg -n -A3 -B3 "flareWorker\s*=\s*null|flareWorker\s*=\s*undefined|terminate\($" examples/benchmark/index.html

Repository: sauravpanda/BrowserAI

Length of output: 664

Reject pending Flare RPCs when the worker errors.

If the worker fails during init/import/load, the current error handler only logs; any awaited flareCall() remains pending indefinitely and the benchmark hangs with buttons disabled. Reject and clear pending calls on error and messageerror, and clean up the dead worker.

Proposed fix

+ function rejectFlarePending(err) { + for (const slot of flareRpcPending.values()) { + slot.reject(err); + } + flareRpcPending.clear(); + } + function ensureFlareWorker() { if (flareWorker) return flareWorker; flareWorker = new Worker(new URL('./flare-worker.js', import.meta.url), { type: 'module', }); @@ flareWorker.addEventListener('error', (e) => { console.error('[flare-worker] error:', e); log(`Flare worker error: ${e.message || e}`, 'error'); + rejectFlarePending(new Error(e.message || 'Flare worker error')); + flareWorker?.terminate(); + flareWorker = null; }); + flareWorker.addEventListener('messageerror', (e) => { + console.error('[flare-worker] messageerror:', e); + rejectFlarePending(new Error('Flare worker message error')); + flareWorker?.terminate(); + flareWorker = null; + }); return flareWorker; } function flareCall(type, args = {}, transfer = [], onStream = null) { const id = ++flareRpcSeq; return new Promise((resolve, reject) => { flareRpcPending.set(id, { resolve, reject, onStream }); - flareWorker.postMessage({ id, type, args }, transfer); + try { + flareWorker.postMessage({ id, type, args }, transfer); + } catch (err) { + flareRpcPending.delete(id); + reject(err); + } }); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/benchmark/index.html` around lines 613 - 616, The worker error handler currently just logs errors, leaving any awaiting flareCall() promises unresolved; update the 'error' and 'messageerror' handlers on flareWorker to iterate over the pending RPC bookkeeping (e.g., the map/obj used by flareCall to store resolvers — reference that structure by name in your code), call each stored reject callback with the received error (or a new Error describing worker failure), clear the pending map, remove/cleanup worker event listeners, terminate the dead worker (flareWorker.terminate()), and perform any UI/disabled-state cleanup so buttons are re-enabled; do the same cleanup path for both 'error' and 'messageerror' handlers to avoid leaked promises.

coderabbitai · 2026-04-20T19:41:00Z

+        // Read per-phase profile from the completed prefill once per page load.
+        if (!window.__flareProfileLogged) {
+          try {
+            const profileStr = await flareCall('prefill_profile_json');
+            const profile = JSON.parse(profileStr);
+            if (profile && profile.seq_len > 0) {
+              console.log('[Flare] prefill profile:', profile);
+              log(`Flare prefill profile: ${JSON.stringify(profile)}`, 'info');
+              window.__flareProfileLogged = true;
+            }
+          } catch (e) {
+            console.warn('[Flare] prefill profile read failed:', e);
+          }
        }

        const totalTime = performance.now() - t0;
        const decodeTime = totalTime - firstTokenTime;


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash rg -nP -C5 'prefill_profile_json|disable_prefill_profiling|const totalTime = performance\\.now\$\$ - t0' examples/benchmark/index.html examples/benchmark/flare-worker.js

Repository: sauravpanda/BrowserAI

Length of output: 2031

🏁 Script executed:

sed -n '720,745p' examples/benchmark/index.html

Repository: sauravpanda/BrowserAI

Length of output: 969

🏁 Script executed:

rg -n 'enable_prefill_profiling' examples/benchmark/index.html examples/benchmark/flare-worker.js

Repository: sauravpanda/BrowserAI

Length of output: 307

🏁 Script executed:

sed -n '675,690p' examples/benchmark/index.html

Repository: sauravpanda/BrowserAI

Length of output: 747

🏁 Script executed:

rg -n 'disable_prefill_profiling' examples/benchmark/index.html

Repository: sauravpanda/BrowserAI

Length of output: 47

Move timing capture before profiling operations to exclude diagnostic overhead from Flare benchmark metrics.

totalTime and decodeTime are currently captured after prefill_profile_json RPC, JSON parsing, console logging, and UI logging. This includes diagnostic overhead in the benchmark. Move the timing capture to immediately after stream resolves, then read and log the profile separately. The backend already supports disable_prefill_profiling; call it after profile logging.

Proposed fix

- // Read per-phase profile from the completed prefill once per page load. + const totalTime = performance.now() - t0; + const decodeTime = totalTime - firstTokenTime; + + // Read per-phase profile from the completed prefill once per page load. + // Keep this outside measured inference time. if (!window.__flareProfileLogged) { try { const profileStr = await flareCall('prefill_profile_json'); const profile = JSON.parse(profileStr); if (profile && profile.seq_len > 0) { console.log('[Flare] prefill profile:', profile); log(`Flare prefill profile: ${JSON.stringify(profile)}`, 'info'); window.__flareProfileLogged = true; + await flareCall('disable_prefill_profiling'); } } catch (e) { console.warn('[Flare] prefill profile read failed:', e); } } - const totalTime = performance.now() - t0; - const decodeTime = totalTime - firstTokenTime; - return {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Read per-phase profile from the completed prefill once per page load.

if (!window.__flareProfileLogged) {

try {

const profileStr = await flareCall('prefill_profile_json');

const profile = JSON.parse(profileStr);

if (profile && profile.seq_len > 0) {

console.log('[Flare] prefill profile:', profile);

log(`Flare prefill profile: ${JSON.stringify(profile)}`, 'info');

window.__flareProfileLogged = true;

}

} catch (e) {

console.warn('[Flare] prefill profile read failed:', e);

}

}

const totalTime = performance.now() - t0;

const decodeTime = totalTime - firstTokenTime;

const totalTime = performance.now() - t0;

const decodeTime = totalTime - firstTokenTime;

// Read per-phase profile from the completed prefill once per page load.

// Keep this outside measured inference time.

if (!window.__flareProfileLogged) {

try {

const profileStr = await flareCall('prefill_profile_json');

const profile = JSON.parse(profileStr);

if (profile && profile.seq_len > 0) {

console.log('[Flare] prefill profile:', profile);

log(`Flare prefill profile: ${JSON.stringify(profile)}`, 'info');

window.__flareProfileLogged = true;

await flareCall('disable_prefill_profiling');

}

} catch (e) {

console.warn('[Flare] prefill profile read failed:', e);

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/benchmark/index.html` around lines 725 - 741, Move the timing capture (compute totalTime and decodeTime from t0 and firstTokenTime) to immediately after the streaming operation resolves (right after the promise that yields the final stream result), before calling flareCall('prefill_profile_json') or doing JSON.parse/log/UI logging; this ensures window.__flareProfileLogged/profile parsing/console.log/log(...) do not affect benchmark metrics. Keep the existing guard window.__flareProfileLogged and the try/catch around flareCall('prefill_profile_json') but run them after capturing totalTime/decodeTime; after logging the profile, invoke the backend toggle via flareCall('disable_prefill_profiling') (or the existing disable_prefill_profiling RPC) to match backend support. Ensure variable names totalTime, decodeTime, t0, firstTokenTime, flareCall('prefill_profile_json'), and window.__flareProfileLogged are used as in the diff so the change is localized and clear.

coderabbitai · 2026-04-20T19:41:00Z

+      // First-run prefill profile snapshot (if profiling is enabled).
+      if (!FlareEngineWrapper.profileLogged) {
+        try {
+          const profile = JSON.parse((this.engine as unknown as { prefill_profile_json(): string }).prefill_profile_json());
+          if (profile && profile.seq_len > 0) {
+            console.log('[Flare] prefill profile:', profile);
+            FlareEngineWrapper.profileLogged = true;
+          }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash rg -nP -C3 '\b(enable_prefill_profiling|prefill_profile_json|disable_prefill_profiling)\b' src/engines/flare-engine-wrapper.ts examples/benchmark/flare-worker.js examples/benchmark/index.html

Repository: sauravpanda/BrowserAI

Length of output: 4706

Disable prefill profiling after the first valid snapshot.

profileLogged suppresses repeated console output, but the engine-level profiler remains enabled for subsequent generation calls. Disable profiling once a non-empty profile has been captured to avoid unnecessary instrumentation overhead.

Proposed fix

if (profile && profile.seq_len > 0) { console.log('[Flare] prefill profile:', profile); FlareEngineWrapper.profileLogged = true; + this.engine.disable_prefill_profiling(); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// First-run prefill profile snapshot (if profiling is enabled).

if (!FlareEngineWrapper.profileLogged) {

try {

const profile = JSON.parse((this.engine as unknown as { prefill_profile_json(): string }).prefill_profile_json());

if (profile && profile.seq_len > 0) {

console.log('[Flare] prefill profile:', profile);

FlareEngineWrapper.profileLogged = true;

}

// First-run prefill profile snapshot (if profiling is enabled).

if (!FlareEngineWrapper.profileLogged) {

try {

const profile = JSON.parse((this.engine as unknown as { prefill_profile_json(): string }).prefill_profile_json());

if (profile && profile.seq_len > 0) {

console.log('[Flare] prefill profile:', profile);

FlareEngineWrapper.profileLogged = true;

this.engine.disable_prefill_profiling();

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/engines/flare-engine-wrapper.ts` around lines 390 - 397, After capturing a non-empty prefill profile in FlareEngineWrapper (the block using (this.engine as unknown as { prefill_profile_json(): string }).prefill_profile_json() and setting FlareEngineWrapper.profileLogged = true), explicitly disable engine-level profiling so subsequent calls aren't instrumented: after setting profileLogged = true, detect and call the engine's profiling-disable API (e.g. check for and invoke methods like disable_profiling(), setProfilingEnabled(false), enableProfiling(false) or similar on this.engine), falling back to no-op if none exist; ensure this logic is added in the same conditional that logs the profile so profiling stops immediately after the first valid snapshot.

…s) (#311) PR #309 moved Flare into a dedicated Worker to keep the UI responsive during the 138 MB GGUF parse. That fixed the freeze, but dropped WebGPU silently to CPU fallback — then the next release (flare-web 0.2.12) fixed the WebGPU-in-worker detection, and the benchmark immediately deadlocked on the first inference run. Root cause: flare-gpu's dispatch_and_readback does slice.map_async(Read, |r| sender.send(r)); device.poll(Wait); // no-op on wasm32 receiver.recv(); // blocks the worker forever The WebGPU map_async callback is serviced by browser-internal microtasks that only drain on the main thread. In a Worker, the sync recv() call deadlocks — we hung for 240+ s on the warmup run. Main-thread load still briefly freezes the UI, but that's the lesser evil compared to CPU-fallback-at-20-tok/s or a hung tab. Proper fix requires a worker-safe async readback path in flare-web. Tracked separately. The flare-worker.js helper is removed since nothing else uses it.

github-actions bot added the size/M label Apr 20, 2026

coderabbitai bot reviewed Apr 20, 2026

View reviewed changes

Apply prettier formatting

8370f0e

sauravpanda merged commit cd29114 into main Apr 20, 2026
10 checks passed

sauravpanda deleted the flare-web-worker branch April 20, 2026 19:43

sauravpanda mentioned this pull request Apr 20, 2026

Revert Flare worker — wgpu readback deadlocks in dedicated Workers #311

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move Flare benchmark onto a dedicated Web Worker#309

Move Flare benchmark onto a dedicated Web Worker#309
sauravpanda merged 2 commits intomainfrom
flare-web-worker

sauravpanda commented Apr 20, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 20, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 20, 2026

Uh oh!

coderabbitai bot Apr 20, 2026

Uh oh!

coderabbitai bot Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sauravpanda commented Apr 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What changed

Design

Test plan

Out of scope (follow-up)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sauravpanda commented Apr 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 20, 2026 •

edited

Loading