Move Flare benchmark onto a dedicated Web Worker#309
Conversation
FlareEngine.load() is a synchronous WASM call that blocks the main
thread for the full 138 MB GGUF parse + GPU buffer upload. Running it
on the main thread made the tab unresponsive for the duration of the
load (2-10s depending on cache state), and was responsible for the
"page freezes after clicking Run Benchmark" reports.
This change moves Flare into a dedicated Web Worker. The main thread
sends RPC-style {id, type, args} messages; the worker owns the single
FlareEngine instance and streams decoded tokens back via postMessage.
Matches how MLC and Transformers.js already work internally.
Key details:
- New examples/benchmark/flare-worker.js owns the FlareEngine
- index.html talks to the worker via a small Promise-based RPC helper
- GGUF bytes cross as a transferable (zero-copy)
- Streaming: worker posts one message per decoded token; main-thread
client accumulates into the output string
- Prefill profile read is deferred until after the first prefill
(previously called before any inference, always returned seq_len=0)
- src/engines/flare-engine-wrapper.ts: fix same enable/read/disable
ordering bug so `generateText` logs a real profile snapshot
Library-level worker migration for FlareEngineWrapper is a follow-up —
this PR is scoped to fixing the benchmark's freeze.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughFlareEngine was moved into a dedicated Web Worker (new Changes
Sequence DiagramsequenceDiagram
participant Main as Main Thread
participant Worker as Flare Worker
participant Engine as FlareEngine
Main->>Worker: {id, type: "init", args}
Worker->>Engine: import & initialize module
Worker-->>Main: {id, result}
Main->>Worker: {id, type: "load", args: [modelBytes]}
Worker->>Engine: create engine from bytes
Worker-->>Main: {id, result: {flareArch}}
Main->>Worker: {id, type: "init_gpu"}
Worker->>Engine: init GPU / backend
Worker-->>Main: {id, result: backendInfo}
Main->>Worker: {id, type: "stream", args: prompt}
Worker->>Engine: generate tokens
Worker-->>Main: {id, stream: {tokenText, tokenId}} /* repeated */
Worker-->>Main: {id, result: finalOutput}
Main->>Worker: {id, type: "dispose"}
Worker->>Engine: cleanup
Worker-->>Main: {id, result}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/engines/flare-engine-wrapper.ts (1)
1-1:⚠️ Potential issue | 🟡 MinorRun Prettier for this file.
The Build and Lint job reports a Prettier check failure for
src/engines/flare-engine-wrapper.ts; please run the formatter before merging.npx prettier --write "src/**/*.ts"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/engines/flare-engine-wrapper.ts` at line 1, The file src/engines/flare-engine-wrapper.ts fails Prettier formatting; run the project formatter and commit the changes to satisfy CI. Run npx prettier --write "src/**/*.ts" (or format just src/engines/flare-engine-wrapper.ts), verify that the exported symbols in flare-engine-wrapper.ts remain unchanged, then stage and commit the formatted file so the Build and Lint job passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/benchmark/index.html`:
- Around line 725-741: Move the timing capture (compute totalTime and decodeTime
from t0 and firstTokenTime) to immediately after the streaming operation
resolves (right after the promise that yields the final stream result), before
calling flareCall('prefill_profile_json') or doing JSON.parse/log/UI logging;
this ensures window.__flareProfileLogged/profile parsing/console.log/log(...) do
not affect benchmark metrics. Keep the existing guard
window.__flareProfileLogged and the try/catch around
flareCall('prefill_profile_json') but run them after capturing
totalTime/decodeTime; after logging the profile, invoke the backend toggle via
flareCall('disable_prefill_profiling') (or the existing
disable_prefill_profiling RPC) to match backend support. Ensure variable names
totalTime, decodeTime, t0, firstTokenTime, flareCall('prefill_profile_json'),
and window.__flareProfileLogged are used as in the diff so the change is
localized and clear.
- Around line 613-616: The worker error handler currently just logs errors,
leaving any awaiting flareCall() promises unresolved; update the 'error' and
'messageerror' handlers on flareWorker to iterate over the pending RPC
bookkeeping (e.g., the map/obj used by flareCall to store resolvers — reference
that structure by name in your code), call each stored reject callback with the
received error (or a new Error describing worker failure), clear the pending
map, remove/cleanup worker event listeners, terminate the dead worker
(flareWorker.terminate()), and perform any UI/disabled-state cleanup so buttons
are re-enabled; do the same cleanup path for both 'error' and 'messageerror'
handlers to avoid leaked promises.
In `@src/engines/flare-engine-wrapper.ts`:
- Around line 390-397: After capturing a non-empty prefill profile in
FlareEngineWrapper (the block using (this.engine as unknown as {
prefill_profile_json(): string }).prefill_profile_json() and setting
FlareEngineWrapper.profileLogged = true), explicitly disable engine-level
profiling so subsequent calls aren't instrumented: after setting profileLogged =
true, detect and call the engine's profiling-disable API (e.g. check for and
invoke methods like disable_profiling(), setProfilingEnabled(false),
enableProfiling(false) or similar on this.engine), falling back to no-op if none
exist; ensure this logic is added in the same conditional that logs the profile
so profiling stops immediately after the first valid snapshot.
---
Outside diff comments:
In `@src/engines/flare-engine-wrapper.ts`:
- Line 1: The file src/engines/flare-engine-wrapper.ts fails Prettier
formatting; run the project formatter and commit the changes to satisfy CI. Run
npx prettier --write "src/**/*.ts" (or format just
src/engines/flare-engine-wrapper.ts), verify that the exported symbols in
flare-engine-wrapper.ts remain unchanged, then stage and commit the formatted
file so the Build and Lint job passes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dc5389f4-dcda-4a68-b853-a637ab1b9b6a
📒 Files selected for processing (3)
examples/benchmark/flare-worker.jsexamples/benchmark/index.htmlsrc/engines/flare-engine-wrapper.ts
| flareWorker.addEventListener('error', (e) => { | ||
| console.error('[flare-worker] error:', e); | ||
| log(`Flare worker error: ${e.message || e}`, 'error'); | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -nP -C4 "flareRpcPending\\.(set|delete|clear)|addEventListener\\('(error|messageerror)'|postMessage\\(" examples/benchmark/index.htmlRepository: sauravpanda/BrowserAI
Length of output: 1478
🏁 Script executed:
#!/bin/bash
# Get the complete ensureFlareWorker function and surrounding context
sed -n '580,650p' examples/benchmark/index.html | cat -nRepository: sauravpanda/BrowserAI
Length of output: 3323
🏁 Script executed:
#!/bin/bash
# Search for all references to flareRpcPending in the file
rg -n "flareRpcPending" examples/benchmark/index.htmlRepository: sauravpanda/BrowserAI
Length of output: 354
🏁 Script executed:
#!/bin/bash
# Check if there's any other error handling or cleanup related to flareWorker
rg -n -A3 -B3 "flareWorker\s*=\s*null|flareWorker\s*=\s*undefined|terminate\(\)" examples/benchmark/index.htmlRepository: sauravpanda/BrowserAI
Length of output: 664
Reject pending Flare RPCs when the worker errors.
If the worker fails during init/import/load, the current error handler only logs; any awaited flareCall() remains pending indefinitely and the benchmark hangs with buttons disabled. Reject and clear pending calls on error and messageerror, and clean up the dead worker.
Proposed fix
+ function rejectFlarePending(err) {
+ for (const slot of flareRpcPending.values()) {
+ slot.reject(err);
+ }
+ flareRpcPending.clear();
+ }
+
function ensureFlareWorker() {
if (flareWorker) return flareWorker;
flareWorker = new Worker(new URL('./flare-worker.js', import.meta.url), {
type: 'module',
});
@@
flareWorker.addEventListener('error', (e) => {
console.error('[flare-worker] error:', e);
log(`Flare worker error: ${e.message || e}`, 'error');
+ rejectFlarePending(new Error(e.message || 'Flare worker error'));
+ flareWorker?.terminate();
+ flareWorker = null;
});
+ flareWorker.addEventListener('messageerror', (e) => {
+ console.error('[flare-worker] messageerror:', e);
+ rejectFlarePending(new Error('Flare worker message error'));
+ flareWorker?.terminate();
+ flareWorker = null;
+ });
return flareWorker;
}
function flareCall(type, args = {}, transfer = [], onStream = null) {
const id = ++flareRpcSeq;
return new Promise((resolve, reject) => {
flareRpcPending.set(id, { resolve, reject, onStream });
- flareWorker.postMessage({ id, type, args }, transfer);
+ try {
+ flareWorker.postMessage({ id, type, args }, transfer);
+ } catch (err) {
+ flareRpcPending.delete(id);
+ reject(err);
+ }
});
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/benchmark/index.html` around lines 613 - 616, The worker error
handler currently just logs errors, leaving any awaiting flareCall() promises
unresolved; update the 'error' and 'messageerror' handlers on flareWorker to
iterate over the pending RPC bookkeeping (e.g., the map/obj used by flareCall to
store resolvers — reference that structure by name in your code), call each
stored reject callback with the received error (or a new Error describing worker
failure), clear the pending map, remove/cleanup worker event listeners,
terminate the dead worker (flareWorker.terminate()), and perform any
UI/disabled-state cleanup so buttons are re-enabled; do the same cleanup path
for both 'error' and 'messageerror' handlers to avoid leaked promises.
| // Read per-phase profile from the completed prefill once per page load. | ||
| if (!window.__flareProfileLogged) { | ||
| try { | ||
| const profileStr = await flareCall('prefill_profile_json'); | ||
| const profile = JSON.parse(profileStr); | ||
| if (profile && profile.seq_len > 0) { | ||
| console.log('[Flare] prefill profile:', profile); | ||
| log(`Flare prefill profile: ${JSON.stringify(profile)}`, 'info'); | ||
| window.__flareProfileLogged = true; | ||
| } | ||
| } catch (e) { | ||
| console.warn('[Flare] prefill profile read failed:', e); | ||
| } | ||
| } | ||
|
|
||
| const totalTime = performance.now() - t0; | ||
| const decodeTime = totalTime - firstTokenTime; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -nP -C5 'prefill_profile_json|disable_prefill_profiling|const totalTime = performance\\.now\\(\\) - t0' examples/benchmark/index.html examples/benchmark/flare-worker.jsRepository: sauravpanda/BrowserAI
Length of output: 2031
🏁 Script executed:
sed -n '720,745p' examples/benchmark/index.htmlRepository: sauravpanda/BrowserAI
Length of output: 969
🏁 Script executed:
rg -n 'enable_prefill_profiling' examples/benchmark/index.html examples/benchmark/flare-worker.jsRepository: sauravpanda/BrowserAI
Length of output: 307
🏁 Script executed:
sed -n '675,690p' examples/benchmark/index.htmlRepository: sauravpanda/BrowserAI
Length of output: 747
🏁 Script executed:
rg -n 'disable_prefill_profiling' examples/benchmark/index.htmlRepository: sauravpanda/BrowserAI
Length of output: 47
Move timing capture before profiling operations to exclude diagnostic overhead from Flare benchmark metrics.
totalTime and decodeTime are currently captured after prefill_profile_json RPC, JSON parsing, console logging, and UI logging. This includes diagnostic overhead in the benchmark. Move the timing capture to immediately after stream resolves, then read and log the profile separately. The backend already supports disable_prefill_profiling; call it after profile logging.
Proposed fix
- // Read per-phase profile from the completed prefill once per page load.
+ const totalTime = performance.now() - t0;
+ const decodeTime = totalTime - firstTokenTime;
+
+ // Read per-phase profile from the completed prefill once per page load.
+ // Keep this outside measured inference time.
if (!window.__flareProfileLogged) {
try {
const profileStr = await flareCall('prefill_profile_json');
const profile = JSON.parse(profileStr);
if (profile && profile.seq_len > 0) {
console.log('[Flare] prefill profile:', profile);
log(`Flare prefill profile: ${JSON.stringify(profile)}`, 'info');
window.__flareProfileLogged = true;
+ await flareCall('disable_prefill_profiling');
}
} catch (e) {
console.warn('[Flare] prefill profile read failed:', e);
}
}
- const totalTime = performance.now() - t0;
- const decodeTime = totalTime - firstTokenTime;
-
return {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Read per-phase profile from the completed prefill once per page load. | |
| if (!window.__flareProfileLogged) { | |
| try { | |
| const profileStr = await flareCall('prefill_profile_json'); | |
| const profile = JSON.parse(profileStr); | |
| if (profile && profile.seq_len > 0) { | |
| console.log('[Flare] prefill profile:', profile); | |
| log(`Flare prefill profile: ${JSON.stringify(profile)}`, 'info'); | |
| window.__flareProfileLogged = true; | |
| } | |
| } catch (e) { | |
| console.warn('[Flare] prefill profile read failed:', e); | |
| } | |
| } | |
| const totalTime = performance.now() - t0; | |
| const decodeTime = totalTime - firstTokenTime; | |
| const totalTime = performance.now() - t0; | |
| const decodeTime = totalTime - firstTokenTime; | |
| // Read per-phase profile from the completed prefill once per page load. | |
| // Keep this outside measured inference time. | |
| if (!window.__flareProfileLogged) { | |
| try { | |
| const profileStr = await flareCall('prefill_profile_json'); | |
| const profile = JSON.parse(profileStr); | |
| if (profile && profile.seq_len > 0) { | |
| console.log('[Flare] prefill profile:', profile); | |
| log(`Flare prefill profile: ${JSON.stringify(profile)}`, 'info'); | |
| window.__flareProfileLogged = true; | |
| await flareCall('disable_prefill_profiling'); | |
| } | |
| } catch (e) { | |
| console.warn('[Flare] prefill profile read failed:', e); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/benchmark/index.html` around lines 725 - 741, Move the timing
capture (compute totalTime and decodeTime from t0 and firstTokenTime) to
immediately after the streaming operation resolves (right after the promise that
yields the final stream result), before calling
flareCall('prefill_profile_json') or doing JSON.parse/log/UI logging; this
ensures window.__flareProfileLogged/profile parsing/console.log/log(...) do not
affect benchmark metrics. Keep the existing guard window.__flareProfileLogged
and the try/catch around flareCall('prefill_profile_json') but run them after
capturing totalTime/decodeTime; after logging the profile, invoke the backend
toggle via flareCall('disable_prefill_profiling') (or the existing
disable_prefill_profiling RPC) to match backend support. Ensure variable names
totalTime, decodeTime, t0, firstTokenTime, flareCall('prefill_profile_json'),
and window.__flareProfileLogged are used as in the diff so the change is
localized and clear.
| // First-run prefill profile snapshot (if profiling is enabled). | ||
| if (!FlareEngineWrapper.profileLogged) { | ||
| try { | ||
| const profile = JSON.parse((this.engine as unknown as { prefill_profile_json(): string }).prefill_profile_json()); | ||
| if (profile && profile.seq_len > 0) { | ||
| console.log('[Flare] prefill profile:', profile); | ||
| FlareEngineWrapper.profileLogged = true; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -nP -C3 '\b(enable_prefill_profiling|prefill_profile_json|disable_prefill_profiling)\b' src/engines/flare-engine-wrapper.ts examples/benchmark/flare-worker.js examples/benchmark/index.htmlRepository: sauravpanda/BrowserAI
Length of output: 4706
Disable prefill profiling after the first valid snapshot.
profileLogged suppresses repeated console output, but the engine-level profiler remains enabled for subsequent generation calls. Disable profiling once a non-empty profile has been captured to avoid unnecessary instrumentation overhead.
Proposed fix
if (profile && profile.seq_len > 0) {
console.log('[Flare] prefill profile:', profile);
FlareEngineWrapper.profileLogged = true;
+ this.engine.disable_prefill_profiling();
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // First-run prefill profile snapshot (if profiling is enabled). | |
| if (!FlareEngineWrapper.profileLogged) { | |
| try { | |
| const profile = JSON.parse((this.engine as unknown as { prefill_profile_json(): string }).prefill_profile_json()); | |
| if (profile && profile.seq_len > 0) { | |
| console.log('[Flare] prefill profile:', profile); | |
| FlareEngineWrapper.profileLogged = true; | |
| } | |
| // First-run prefill profile snapshot (if profiling is enabled). | |
| if (!FlareEngineWrapper.profileLogged) { | |
| try { | |
| const profile = JSON.parse((this.engine as unknown as { prefill_profile_json(): string }).prefill_profile_json()); | |
| if (profile && profile.seq_len > 0) { | |
| console.log('[Flare] prefill profile:', profile); | |
| FlareEngineWrapper.profileLogged = true; | |
| this.engine.disable_prefill_profiling(); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/engines/flare-engine-wrapper.ts` around lines 390 - 397, After capturing
a non-empty prefill profile in FlareEngineWrapper (the block using (this.engine
as unknown as { prefill_profile_json(): string }).prefill_profile_json() and
setting FlareEngineWrapper.profileLogged = true), explicitly disable
engine-level profiling so subsequent calls aren't instrumented: after setting
profileLogged = true, detect and call the engine's profiling-disable API (e.g.
check for and invoke methods like disable_profiling(),
setProfilingEnabled(false), enableProfiling(false) or similar on this.engine),
falling back to no-op if none exist; ensure this logic is added in the same
conditional that logs the profile so profiling stops immediately after the first
valid snapshot.
…s) (#311) PR #309 moved Flare into a dedicated Worker to keep the UI responsive during the 138 MB GGUF parse. That fixed the freeze, but dropped WebGPU silently to CPU fallback — then the next release (flare-web 0.2.12) fixed the WebGPU-in-worker detection, and the benchmark immediately deadlocked on the first inference run. Root cause: flare-gpu's dispatch_and_readback does slice.map_async(Read, |r| sender.send(r)); device.poll(Wait); // no-op on wasm32 receiver.recv(); // blocks the worker forever The WebGPU map_async callback is serviced by browser-internal microtasks that only drain on the main thread. In a Worker, the sync recv() call deadlocks — we hung for 240+ s on the warmup run. Main-thread load still briefly freezes the UI, but that's the lesser evil compared to CPU-fallback-at-20-tok/s or a hung tab. Proper fix requires a worker-safe async readback path in flare-web. Tracked separately. The flare-worker.js helper is removed since nothing else uses it.
Summary
Move Flare in the benchmark HTML onto a dedicated Web Worker so the main thread stays responsive during the 138 MB GGUF parse, GPU buffer upload, and decode. Matches how MLC and Transformers.js already work internally.
Why
FlareEngine.load(bytes)is a synchronous WASM call. For a 138 MB SmolLM2-135M Q8_0 GGUF it holds the main thread for 2–10 s depending on cache state — long enough that Chromium declares the tab unresponsive ("page crashing / can't scroll" reports). Running it in a worker removes that dead time from the UI thread entirely.What changed
examples/benchmark/flare-worker.js(new, 145 lines) — owns the singleFlareEngineinstance. Handles RPC-style{id, type, args}messages and streams decoded tokens back viapostMessage.examples/benchmark/index.html(~170 lines rewritten) — main-thread Promise-based RPC helper (flareCall(type, args, transfer, onStream)), replaces the directflareLib.FlareEngine.load(bytes)call with worker communication.src/engines/flare-engine-wrapper.ts(+24 lines, 3 removed) — fixes the enable/read/disable ordering bug so the firstgenerateTextnow logs a real prefill profile snapshot (previouslyseq_len: 0because the read happened before any prefill).Design
Uint8Arraywithtransfer: [bytes.buffer]— zero-copy.{id, stream: {tokenText, tokenId}}message per decoded token; final{id, result}carries completion metadata. Main-thread client hooks anonStreamcallback per request.{id, error: {message, stack}}; main-thread RPC rejects the matching Promise.navigator.gpuis available in dedicated workers in all current-gen browsers (and@sauravpanda/flare@0.2.11ships aWorkerGlobalScope.performancefallback, so the profiler reads valid numbers from the worker too).Test plan
npx jest— 62 / 62 passingnpm run build— ESM + CJS + DTS all greennpx eslint— no new warnings (9 pre-existing console.log warnings unchanged)seq_lenand per-phase ms valuesOut of scope (follow-up)
Library-level
FlareEngineWrapperstill loads on the main thread. Moving it behind the same worker is a larger change — breaks the currently-syncreset()/encode_text()methods — and belongs in its own PR.Summary by CodeRabbit
New Features
Chores
Tests