Skip to content

Move Flare benchmark onto a dedicated Web Worker#309

Merged
sauravpanda merged 2 commits intomainfrom
flare-web-worker
Apr 20, 2026
Merged

Move Flare benchmark onto a dedicated Web Worker#309
sauravpanda merged 2 commits intomainfrom
flare-web-worker

Conversation

@sauravpanda
Copy link
Copy Markdown
Owner

@sauravpanda sauravpanda commented Apr 20, 2026

Summary

Move Flare in the benchmark HTML onto a dedicated Web Worker so the main thread stays responsive during the 138 MB GGUF parse, GPU buffer upload, and decode. Matches how MLC and Transformers.js already work internally.

Why

FlareEngine.load(bytes) is a synchronous WASM call. For a 138 MB SmolLM2-135M Q8_0 GGUF it holds the main thread for 2–10 s depending on cache state — long enough that Chromium declares the tab unresponsive ("page crashing / can't scroll" reports). Running it in a worker removes that dead time from the UI thread entirely.

What changed

  • examples/benchmark/flare-worker.js (new, 145 lines) — owns the single FlareEngine instance. Handles RPC-style {id, type, args} messages and streams decoded tokens back via postMessage.
  • examples/benchmark/index.html (~170 lines rewritten) — main-thread Promise-based RPC helper (flareCall(type, args, transfer, onStream)), replaces the direct flareLib.FlareEngine.load(bytes) call with worker communication.
  • src/engines/flare-engine-wrapper.ts (+24 lines, 3 removed) — fixes the enable/read/disable ordering bug so the first generateText now logs a real prefill profile snapshot (previously seq_len: 0 because the read happened before any prefill).

Design

  • Transferables: GGUF bytes cross as Uint8Array with transfer: [bytes.buffer] — zero-copy.
  • Streaming: worker posts one {id, stream: {tokenText, tokenId}} message per decoded token; final {id, result} carries completion metadata. Main-thread client hooks an onStream callback per request.
  • Error bridging: worker catches exceptions in its dispatch and replies {id, error: {message, stack}}; main-thread RPC rejects the matching Promise.
  • WebGPU in worker: navigator.gpu is available in dedicated workers in all current-gen browsers (and @sauravpanda/flare@0.2.11 ships a WorkerGlobalScope.performance fallback, so the profiler reads valid numbers from the worker too).

Test plan

  • npx jest62 / 62 passing
  • npm run build — ESM + CJS + DTS all green
  • npx eslint — no new warnings (9 pre-existing console.log warnings unchanged)
  • Manual benchmark run (reviewer): fresh tab, scroll during Flare load — page should stay responsive throughout
  • Verify prefill profile JSON arrives with non-zero seq_len and per-phase ms values

Out of scope (follow-up)

Library-level FlareEngineWrapper still loads on the main thread. Moving it behind the same worker is a larger change — breaks the currently-sync reset()/encode_text() methods — and belongs in its own PR.

Summary by CodeRabbit

  • New Features

    • Engine runs in a background worker for smoother UI; streaming token updates with live progress (tokens/sec) and token callbacks.
    • GPU initialization and profiling accessible via the background worker; one-time prefill profile logging per page load.
  • Chores

    • Refactored model loading, inference, and disposal to use the worker for better responsiveness and resource cleanup.
  • Tests

    • Minor test formatting updates (no behavioral changes).

FlareEngine.load() is a synchronous WASM call that blocks the main
thread for the full 138 MB GGUF parse + GPU buffer upload.  Running it
on the main thread made the tab unresponsive for the duration of the
load (2-10s depending on cache state), and was responsible for the
"page freezes after clicking Run Benchmark" reports.

This change moves Flare into a dedicated Web Worker.  The main thread
sends RPC-style {id, type, args} messages; the worker owns the single
FlareEngine instance and streams decoded tokens back via postMessage.
Matches how MLC and Transformers.js already work internally.

Key details:
- New examples/benchmark/flare-worker.js owns the FlareEngine
- index.html talks to the worker via a small Promise-based RPC helper
- GGUF bytes cross as a transferable (zero-copy)
- Streaming: worker posts one message per decoded token; main-thread
  client accumulates into the output string
- Prefill profile read is deferred until after the first prefill
  (previously called before any inference, always returned seq_len=0)
- src/engines/flare-engine-wrapper.ts: fix same enable/read/disable
  ordering bug so `generateText` logs a real profile snapshot

Library-level worker migration for FlareEngineWrapper is a follow-up —
this PR is scoped to fixing the benchmark's freeze.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f46573a7-2828-4aed-b9cf-66cda0f95bb5

📥 Commits

Reviewing files that changed from the base of the PR and between 624b9c1 and 8370f0e.

📒 Files selected for processing (3)
  • src/core/llm/browserai.test.ts
  • src/engines/flare-engine-wrapper.test.ts
  • src/engines/flare-engine-wrapper.ts
✅ Files skipped from review due to trivial changes (2)
  • src/engines/flare-engine-wrapper.test.ts
  • src/core/llm/browserai.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/engines/flare-engine-wrapper.ts

📝 Walkthrough

Walkthrough

FlareEngine was moved into a dedicated Web Worker (new examples/benchmark/flare-worker.js) with an RPC-style message protocol; the benchmark page (examples/benchmark/index.html) was updated to call the worker and transfer model bytes. Prefill profiling is deferred and logged once per page load via a latch in the wrapper.

Changes

Cohort / File(s) Summary
Worker Implementation
examples/benchmark/flare-worker.js
Added a new Web Worker implementing an RPC protocol (message ids, types: init, load, init_gpu, stream, dispose, etc.), streaming token responses, error propagation, and engine lifecycle management.
Main Thread Integration
examples/benchmark/index.html
Refactored benchmark to communicate with the worker via flareCall() RPC bridge, send model bytes as transferables, handle streamed token callbacks, compute throughput/timing from streams, and terminate/cleanup worker state on dispose.
Engine Profiling Latch
src/engines/flare-engine-wrapper.ts
Added a process-wide static latch to log the prefill profile JSON only once per page load; profiling snapshot reading is deferred to the first streaming generation and protected with try/catch for older builds.
Tests / Minor Formatting
src/core/llm/browserai.test.ts, src/engines/flare-engine-wrapper.test.ts
Small test assertion formatting and whitespace normalization; no behavior changes.

Sequence Diagram

sequenceDiagram
    participant Main as Main Thread
    participant Worker as Flare Worker
    participant Engine as FlareEngine

    Main->>Worker: {id, type: "init", args}
    Worker->>Engine: import & initialize module
    Worker-->>Main: {id, result}

    Main->>Worker: {id, type: "load", args: [modelBytes]}
    Worker->>Engine: create engine from bytes
    Worker-->>Main: {id, result: {flareArch}}

    Main->>Worker: {id, type: "init_gpu"}
    Worker->>Engine: init GPU / backend
    Worker-->>Main: {id, result: backendInfo}

    Main->>Worker: {id, type: "stream", args: prompt}
    Worker->>Engine: generate tokens
    Worker-->>Main: {id, stream: {tokenText, tokenId}}  /* repeated */
    Worker-->>Main: {id, result: finalOutput}

    Main->>Worker: {id, type: "dispose"}
    Worker->>Engine: cleanup
    Worker-->>Main: {id, result}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

size/L

🐰 I hopped a thread to move engines away,
Tokens drip like raindrops, chunk by chunk,
One profile log, no echoing fray,
Worker hums while the benchmark runs,
Hop, stream, repeat — joyous run! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Move Flare benchmark onto a dedicated Web Worker' directly and clearly describes the primary change across the changeset: moving benchmark operations from the main thread to a dedicated Web Worker.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch flare-web-worker

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/engines/flare-engine-wrapper.ts (1)

1-1: ⚠️ Potential issue | 🟡 Minor

Run Prettier for this file.

The Build and Lint job reports a Prettier check failure for src/engines/flare-engine-wrapper.ts; please run the formatter before merging.

npx prettier --write "src/**/*.ts"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/engines/flare-engine-wrapper.ts` at line 1, The file
src/engines/flare-engine-wrapper.ts fails Prettier formatting; run the project
formatter and commit the changes to satisfy CI. Run npx prettier --write
"src/**/*.ts" (or format just src/engines/flare-engine-wrapper.ts), verify that
the exported symbols in flare-engine-wrapper.ts remain unchanged, then stage and
commit the formatted file so the Build and Lint job passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/benchmark/index.html`:
- Around line 725-741: Move the timing capture (compute totalTime and decodeTime
from t0 and firstTokenTime) to immediately after the streaming operation
resolves (right after the promise that yields the final stream result), before
calling flareCall('prefill_profile_json') or doing JSON.parse/log/UI logging;
this ensures window.__flareProfileLogged/profile parsing/console.log/log(...) do
not affect benchmark metrics. Keep the existing guard
window.__flareProfileLogged and the try/catch around
flareCall('prefill_profile_json') but run them after capturing
totalTime/decodeTime; after logging the profile, invoke the backend toggle via
flareCall('disable_prefill_profiling') (or the existing
disable_prefill_profiling RPC) to match backend support. Ensure variable names
totalTime, decodeTime, t0, firstTokenTime, flareCall('prefill_profile_json'),
and window.__flareProfileLogged are used as in the diff so the change is
localized and clear.
- Around line 613-616: The worker error handler currently just logs errors,
leaving any awaiting flareCall() promises unresolved; update the 'error' and
'messageerror' handlers on flareWorker to iterate over the pending RPC
bookkeeping (e.g., the map/obj used by flareCall to store resolvers — reference
that structure by name in your code), call each stored reject callback with the
received error (or a new Error describing worker failure), clear the pending
map, remove/cleanup worker event listeners, terminate the dead worker
(flareWorker.terminate()), and perform any UI/disabled-state cleanup so buttons
are re-enabled; do the same cleanup path for both 'error' and 'messageerror'
handlers to avoid leaked promises.

In `@src/engines/flare-engine-wrapper.ts`:
- Around line 390-397: After capturing a non-empty prefill profile in
FlareEngineWrapper (the block using (this.engine as unknown as {
prefill_profile_json(): string }).prefill_profile_json() and setting
FlareEngineWrapper.profileLogged = true), explicitly disable engine-level
profiling so subsequent calls aren't instrumented: after setting profileLogged =
true, detect and call the engine's profiling-disable API (e.g. check for and
invoke methods like disable_profiling(), setProfilingEnabled(false),
enableProfiling(false) or similar on this.engine), falling back to no-op if none
exist; ensure this logic is added in the same conditional that logs the profile
so profiling stops immediately after the first valid snapshot.

---

Outside diff comments:
In `@src/engines/flare-engine-wrapper.ts`:
- Line 1: The file src/engines/flare-engine-wrapper.ts fails Prettier
formatting; run the project formatter and commit the changes to satisfy CI. Run
npx prettier --write "src/**/*.ts" (or format just
src/engines/flare-engine-wrapper.ts), verify that the exported symbols in
flare-engine-wrapper.ts remain unchanged, then stage and commit the formatted
file so the Build and Lint job passes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dc5389f4-dcda-4a68-b853-a637ab1b9b6a

📥 Commits

Reviewing files that changed from the base of the PR and between 119866b and 624b9c1.

📒 Files selected for processing (3)
  • examples/benchmark/flare-worker.js
  • examples/benchmark/index.html
  • src/engines/flare-engine-wrapper.ts

Comment on lines +613 to +616
flareWorker.addEventListener('error', (e) => {
console.error('[flare-worker] error:', e);
log(`Flare worker error: ${e.message || e}`, 'error');
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP -C4 "flareRpcPending\\.(set|delete|clear)|addEventListener\\('(error|messageerror)'|postMessage\\(" examples/benchmark/index.html

Repository: sauravpanda/BrowserAI

Length of output: 1478


🏁 Script executed:

#!/bin/bash
# Get the complete ensureFlareWorker function and surrounding context
sed -n '580,650p' examples/benchmark/index.html | cat -n

Repository: sauravpanda/BrowserAI

Length of output: 3323


🏁 Script executed:

#!/bin/bash
# Search for all references to flareRpcPending in the file
rg -n "flareRpcPending" examples/benchmark/index.html

Repository: sauravpanda/BrowserAI

Length of output: 354


🏁 Script executed:

#!/bin/bash
# Check if there's any other error handling or cleanup related to flareWorker
rg -n -A3 -B3 "flareWorker\s*=\s*null|flareWorker\s*=\s*undefined|terminate\(\)" examples/benchmark/index.html

Repository: sauravpanda/BrowserAI

Length of output: 664


Reject pending Flare RPCs when the worker errors.

If the worker fails during init/import/load, the current error handler only logs; any awaited flareCall() remains pending indefinitely and the benchmark hangs with buttons disabled. Reject and clear pending calls on error and messageerror, and clean up the dead worker.

Proposed fix
+      function rejectFlarePending(err) {
+        for (const slot of flareRpcPending.values()) {
+          slot.reject(err);
+        }
+        flareRpcPending.clear();
+      }
+
       function ensureFlareWorker() {
         if (flareWorker) return flareWorker;
         flareWorker = new Worker(new URL('./flare-worker.js', import.meta.url), {
           type: 'module',
         });
@@
         flareWorker.addEventListener('error', (e) => {
           console.error('[flare-worker] error:', e);
           log(`Flare worker error: ${e.message || e}`, 'error');
+          rejectFlarePending(new Error(e.message || 'Flare worker error'));
+          flareWorker?.terminate();
+          flareWorker = null;
         });
+        flareWorker.addEventListener('messageerror', (e) => {
+          console.error('[flare-worker] messageerror:', e);
+          rejectFlarePending(new Error('Flare worker message error'));
+          flareWorker?.terminate();
+          flareWorker = null;
+        });
         return flareWorker;
       }
 
       function flareCall(type, args = {}, transfer = [], onStream = null) {
         const id = ++flareRpcSeq;
         return new Promise((resolve, reject) => {
           flareRpcPending.set(id, { resolve, reject, onStream });
-          flareWorker.postMessage({ id, type, args }, transfer);
+          try {
+            flareWorker.postMessage({ id, type, args }, transfer);
+          } catch (err) {
+            flareRpcPending.delete(id);
+            reject(err);
+          }
         });
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/benchmark/index.html` around lines 613 - 616, The worker error
handler currently just logs errors, leaving any awaiting flareCall() promises
unresolved; update the 'error' and 'messageerror' handlers on flareWorker to
iterate over the pending RPC bookkeeping (e.g., the map/obj used by flareCall to
store resolvers — reference that structure by name in your code), call each
stored reject callback with the received error (or a new Error describing worker
failure), clear the pending map, remove/cleanup worker event listeners,
terminate the dead worker (flareWorker.terminate()), and perform any
UI/disabled-state cleanup so buttons are re-enabled; do the same cleanup path
for both 'error' and 'messageerror' handlers to avoid leaked promises.

Comment on lines +725 to 741
// Read per-phase profile from the completed prefill once per page load.
if (!window.__flareProfileLogged) {
try {
const profileStr = await flareCall('prefill_profile_json');
const profile = JSON.parse(profileStr);
if (profile && profile.seq_len > 0) {
console.log('[Flare] prefill profile:', profile);
log(`Flare prefill profile: ${JSON.stringify(profile)}`, 'info');
window.__flareProfileLogged = true;
}
} catch (e) {
console.warn('[Flare] prefill profile read failed:', e);
}
}

const totalTime = performance.now() - t0;
const decodeTime = totalTime - firstTokenTime;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP -C5 'prefill_profile_json|disable_prefill_profiling|const totalTime = performance\\.now\\(\\) - t0' examples/benchmark/index.html examples/benchmark/flare-worker.js

Repository: sauravpanda/BrowserAI

Length of output: 2031


🏁 Script executed:

sed -n '720,745p' examples/benchmark/index.html

Repository: sauravpanda/BrowserAI

Length of output: 969


🏁 Script executed:

rg -n 'enable_prefill_profiling' examples/benchmark/index.html examples/benchmark/flare-worker.js

Repository: sauravpanda/BrowserAI

Length of output: 307


🏁 Script executed:

sed -n '675,690p' examples/benchmark/index.html

Repository: sauravpanda/BrowserAI

Length of output: 747


🏁 Script executed:

rg -n 'disable_prefill_profiling' examples/benchmark/index.html

Repository: sauravpanda/BrowserAI

Length of output: 47


Move timing capture before profiling operations to exclude diagnostic overhead from Flare benchmark metrics.

totalTime and decodeTime are currently captured after prefill_profile_json RPC, JSON parsing, console logging, and UI logging. This includes diagnostic overhead in the benchmark. Move the timing capture to immediately after stream resolves, then read and log the profile separately. The backend already supports disable_prefill_profiling; call it after profile logging.

Proposed fix
-        // Read per-phase profile from the completed prefill once per page load.
+        const totalTime = performance.now() - t0;
+        const decodeTime = totalTime - firstTokenTime;
+
+        // Read per-phase profile from the completed prefill once per page load.
+        // Keep this outside measured inference time.
         if (!window.__flareProfileLogged) {
           try {
             const profileStr = await flareCall('prefill_profile_json');
             const profile = JSON.parse(profileStr);
             if (profile && profile.seq_len > 0) {
               console.log('[Flare] prefill profile:', profile);
               log(`Flare prefill profile: ${JSON.stringify(profile)}`, 'info');
               window.__flareProfileLogged = true;
+              await flareCall('disable_prefill_profiling');
             }
           } catch (e) {
             console.warn('[Flare] prefill profile read failed:', e);
           }
         }
 
-        const totalTime = performance.now() - t0;
-        const decodeTime = totalTime - firstTokenTime;
-
         return {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Read per-phase profile from the completed prefill once per page load.
if (!window.__flareProfileLogged) {
try {
const profileStr = await flareCall('prefill_profile_json');
const profile = JSON.parse(profileStr);
if (profile && profile.seq_len > 0) {
console.log('[Flare] prefill profile:', profile);
log(`Flare prefill profile: ${JSON.stringify(profile)}`, 'info');
window.__flareProfileLogged = true;
}
} catch (e) {
console.warn('[Flare] prefill profile read failed:', e);
}
}
const totalTime = performance.now() - t0;
const decodeTime = totalTime - firstTokenTime;
const totalTime = performance.now() - t0;
const decodeTime = totalTime - firstTokenTime;
// Read per-phase profile from the completed prefill once per page load.
// Keep this outside measured inference time.
if (!window.__flareProfileLogged) {
try {
const profileStr = await flareCall('prefill_profile_json');
const profile = JSON.parse(profileStr);
if (profile && profile.seq_len > 0) {
console.log('[Flare] prefill profile:', profile);
log(`Flare prefill profile: ${JSON.stringify(profile)}`, 'info');
window.__flareProfileLogged = true;
await flareCall('disable_prefill_profiling');
}
} catch (e) {
console.warn('[Flare] prefill profile read failed:', e);
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/benchmark/index.html` around lines 725 - 741, Move the timing
capture (compute totalTime and decodeTime from t0 and firstTokenTime) to
immediately after the streaming operation resolves (right after the promise that
yields the final stream result), before calling
flareCall('prefill_profile_json') or doing JSON.parse/log/UI logging; this
ensures window.__flareProfileLogged/profile parsing/console.log/log(...) do not
affect benchmark metrics. Keep the existing guard window.__flareProfileLogged
and the try/catch around flareCall('prefill_profile_json') but run them after
capturing totalTime/decodeTime; after logging the profile, invoke the backend
toggle via flareCall('disable_prefill_profiling') (or the existing
disable_prefill_profiling RPC) to match backend support. Ensure variable names
totalTime, decodeTime, t0, firstTokenTime, flareCall('prefill_profile_json'),
and window.__flareProfileLogged are used as in the diff so the change is
localized and clear.

Comment on lines +390 to +397
// First-run prefill profile snapshot (if profiling is enabled).
if (!FlareEngineWrapper.profileLogged) {
try {
const profile = JSON.parse((this.engine as unknown as { prefill_profile_json(): string }).prefill_profile_json());
if (profile && profile.seq_len > 0) {
console.log('[Flare] prefill profile:', profile);
FlareEngineWrapper.profileLogged = true;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP -C3 '\b(enable_prefill_profiling|prefill_profile_json|disable_prefill_profiling)\b' src/engines/flare-engine-wrapper.ts examples/benchmark/flare-worker.js examples/benchmark/index.html

Repository: sauravpanda/BrowserAI

Length of output: 4706


Disable prefill profiling after the first valid snapshot.

profileLogged suppresses repeated console output, but the engine-level profiler remains enabled for subsequent generation calls. Disable profiling once a non-empty profile has been captured to avoid unnecessary instrumentation overhead.

Proposed fix
           if (profile && profile.seq_len > 0) {
             console.log('[Flare] prefill profile:', profile);
             FlareEngineWrapper.profileLogged = true;
+            this.engine.disable_prefill_profiling();
           }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// First-run prefill profile snapshot (if profiling is enabled).
if (!FlareEngineWrapper.profileLogged) {
try {
const profile = JSON.parse((this.engine as unknown as { prefill_profile_json(): string }).prefill_profile_json());
if (profile && profile.seq_len > 0) {
console.log('[Flare] prefill profile:', profile);
FlareEngineWrapper.profileLogged = true;
}
// First-run prefill profile snapshot (if profiling is enabled).
if (!FlareEngineWrapper.profileLogged) {
try {
const profile = JSON.parse((this.engine as unknown as { prefill_profile_json(): string }).prefill_profile_json());
if (profile && profile.seq_len > 0) {
console.log('[Flare] prefill profile:', profile);
FlareEngineWrapper.profileLogged = true;
this.engine.disable_prefill_profiling();
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/engines/flare-engine-wrapper.ts` around lines 390 - 397, After capturing
a non-empty prefill profile in FlareEngineWrapper (the block using (this.engine
as unknown as { prefill_profile_json(): string }).prefill_profile_json() and
setting FlareEngineWrapper.profileLogged = true), explicitly disable
engine-level profiling so subsequent calls aren't instrumented: after setting
profileLogged = true, detect and call the engine's profiling-disable API (e.g.
check for and invoke methods like disable_profiling(),
setProfilingEnabled(false), enableProfiling(false) or similar on this.engine),
falling back to no-op if none exist; ensure this logic is added in the same
conditional that logs the profile so profiling stops immediately after the first
valid snapshot.

@sauravpanda sauravpanda merged commit cd29114 into main Apr 20, 2026
10 checks passed
@sauravpanda sauravpanda deleted the flare-web-worker branch April 20, 2026 19:43
sauravpanda added a commit that referenced this pull request Apr 20, 2026
…s) (#311)

PR #309 moved Flare into a dedicated Worker to keep the UI responsive
during the 138 MB GGUF parse.  That fixed the freeze, but dropped
WebGPU silently to CPU fallback — then the next release (flare-web
0.2.12) fixed the WebGPU-in-worker detection, and the benchmark
immediately deadlocked on the first inference run.

Root cause: flare-gpu's dispatch_and_readback does

  slice.map_async(Read, |r| sender.send(r));
  device.poll(Wait);        // no-op on wasm32
  receiver.recv();          // blocks the worker forever

The WebGPU map_async callback is serviced by browser-internal
microtasks that only drain on the main thread.  In a Worker, the sync
recv() call deadlocks — we hung for 240+ s on the warmup run.

Main-thread load still briefly freezes the UI, but that's the
lesser evil compared to CPU-fallback-at-20-tok/s or a hung tab.
Proper fix requires a worker-safe async readback path in flare-web.
Tracked separately.

The flare-worker.js helper is removed since nothing else uses it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant