daydreamlive · emranemran · Apr 20, 2026 · Apr 20, 2026 · Apr 20, 2026 · Apr 20, 2026
diff --git a/.agents/skills/testing-livepeer-fal-deploy/SKILL.md b/.agents/skills/testing-livepeer-fal-deploy/SKILL.md
@@ -0,0 +1,269 @@
+---
+name: testing-livepeer-fal-deploy
+description: End-to-end test harness for Scope's Livepeer cloud path against a deployed fal.ai app — the only supported cloud path going forward (the old cloud-relay / direct mode using `fal_app.py` + `CloudConnectionManager` is being deprecated). Primary path is a Playwright browser test that drives the full UI flow (camera → local scope WebRTC → livepeer trickle → fal runner → back), producing every session-lifecycle Kafka event. Secondary path is `test-cloud-connect.sh` — a bash/curl smoke test for the `/api/v1/cloud/connect` path only. TRIGGER any time a user says "test cloud", "test the fal deploy", "test cloud streaming", "run the e2e test", "run playwright", "verify cloud connect", "verify kafka events", "diagnose fal", "debug fal deploy", "did my stream work", "deploy-staging.sh", OR pastes any of these errors — "All orchestrators failed (N tried)", "ACCESS_DENIED", "did not receive ready message from websocket", "discover_orchestrators requires discovery_url", "cold start" — OR has just changed `src/scope/cloud/livepeer_fal_app.py` / `src/scope/cloud/livepeer_app.py` / `src/scope/server/livepeer.py` / `src/scope/server/livepeer_client.py`. Use `testing-livepeer` instead for a fully-local livepeer stack (prebuilt go-livepeer binary, no fal involvement).
+---
+
+# Testing Livepeer fal Deploy
+
+## When to use
+
+Use when testing the **deployed** livepeer path end-to-end — local Scope
+client → daydream orchestrator → deployed fal app. This exercises:
+
+- The wrapper in `src/scope/cloud/livepeer_fal_app.py` that fal runs
+- The runner in `src/scope/cloud/livepeer_app.py` that spawns inside the
+  fal container
+- The orchestrator → fal handshake (headers, auth, cold start)
+- Kafka event publishing across wrapper + runner (full lifecycle)
+
+**Two paths, pick the right one:**
+
+- **Playwright (primary)** — real browser drives the Perform-mode UI
+  with a synthetic camera, streams through, verifies the output video
+  comes back from the cloud. This is the only path that exercises the
+  full livepeer trickle round-trip and produces every lifecycle Kafka
+  event (`pipeline_loaded`, `session_created`, `stream_started`,
+  `stream_heartbeat`, `session_closed`). Takes 2–5 minutes.
+- **`test-cloud-connect.sh` (secondary, HTTP-only)** — bash script that
+  POSTs `/api/v1/cloud/connect` and polls `/api/v1/cloud/status`. Only
+  verifies the `websocket_connected` / `websocket_disconnected` pair at
+  the wrapper layer. Useful as a fast smoke test ("did the container
+  come up?") or in `git bisect run` against cloud-connect regressions.
+  Does not produce pipeline/session/stream events.
+
+Do **not** use this skill for local-only livepeer testing — that's
+`testing-livepeer` (prebuilt go-livepeer + local runner, no fal).
+
+## One-time setup
+
+1. **`.env.local`**: copy `.env.example` to `.env.local` (gitignored)
+   and fill in real values:
+   - `SCOPE_CLOUD_APP_ID` — your fal app URL. For the default `main`
+     env, the URL does **not** include a `--main` suffix (e.g.
+     `daydream/scope-livepeer-emran/ws`). Non-default envs do include
+     the suffix (e.g. `--preview/ws`).
+   - `SCOPE_CLOUD_API_KEY` — daydream cloud API key (sk_...). Without
+     this the scope client can't hit `signer.daydream.live` and fails
+     with `discover_orchestrators requires discovery_url or signer_url`.
+   - `SCOPE_USER_ID` — daydream user id. The runner's
+     `validate_user_access` rejects with `ACCESS_DENIED` when missing.
+     Find it in `~/.daydream-scope/logs/scope-logs-*.log` after a
+     successful UI connect, or in devtools Network on
+     `/api/v1/cloud/connect`.
+   - (Optional) `LIVEPEER_DEBUG=1` — surfaces per-orchestrator
+     rejection reasons in scope.log; essential for diagnosing
+     `All orchestrators failed (N tried)`.
+2. **Frontend rebuild with baked-in auth** (once per local workspace):
+   ```bash
+   source .env.local
+   cd frontend && VITE_DAYDREAM_API_KEY="$SCOPE_CLOUD_API_KEY" npm run build
+   cd ..
+   ```
+   This bakes the API key into the dist bundle so the app appears
+   signed-in (otherwise Playwright hits the login screen).
+3. **Playwright setup** (once per machine):
+   ```bash
+   cd e2e
+   npm install
+   npx playwright install chromium
+   ```
+   Then install Chromium's system deps (sudo required — one-time):
+   ```bash
+   sudo apt-get install -y libnss3 libnspr4 libasound2t64
+   # or the Playwright-managed superset:
+   sudo npx playwright install-deps chromium
+   ```
+   Without these the browser fails to launch with
+   `error while loading shared libraries: libnspr4.so`.
+
+## Running the Playwright test (primary)
+
+When the user says "test cloud" (or any trigger in the description),
+**always deploy their current working tree before running Playwright**.
+Otherwise the test runs against whatever stale code was last deployed
+and can false-positive on their change.
+
+### Step 0 — Ask the user where to deploy
+
+Before anything else, confirm the deploy target. Use AskUserQuestion
+(or plain text prompts) and persist answers for the session:
+
+1. **Fal app name** — required. If `SCOPE_FAL_APP_NAME` is set in
+   `.env.local`, show that value and ask the user to confirm or
+   override. Otherwise ask outright (e.g. `scope-livepeer-<name>`).
+2. **Fal env** — defaults to `main`. If `SCOPE_FAL_ENV` is set in
+   `.env.local`, show and offer to override. Non-default envs (e.g.
+   `preview`) change the URL suffix in `SCOPE_CLOUD_APP_ID` — see
+   below.
+
+Once confirmed, export both for the current shell, and derive /
+overwrite `SCOPE_CLOUD_APP_ID`:
+
+| Env | `SCOPE_CLOUD_APP_ID` |
+|---|---|
+| `main` | `daydream/<app>/ws`         (no suffix) |
+| anything else | `daydream/<app>--<env>/ws`  (with suffix) |
+
+This is a fal convention — the default `main` env is exposed without
+a suffix; all other envs include `--<env>` in the URL. Getting this
+wrong produces `did not receive ready message from websocket`.
+
+### Step 1 — Sanity-check `.env.local`
+
+- `SCOPE_CLOUD_API_KEY` must be set (otherwise:
+  `discover_orchestrators requires discovery_url or signer_url`)
+- `SCOPE_USER_ID` must be set (otherwise the runner's
+  `validate_user_access` rejects with `ACCESS_DENIED`)
+
+If either is missing, stop and ask the user before deploying.
+
+### Step 2 — Kill any scope already on :8000
+
+If another scope process is bound to the port, stop it (or ask the
+user) before continuing. The run-app.sh the script starts must be the
+one under test.
+
+### Step 3 — Deploy
+
+```bash
+SCOPE_FAL_APP_NAME=<app> SCOPE_FAL_ENV=<env> ./deploy-staging.sh
+```
+
+Abort with a clear error if this fails — don't run Playwright against
+stale deployed code. Common failure: the `{git-short-sha}-cloud`
+Docker base image isn't built yet (CI for the current commit is still
+running). If that's the case, either wait for CI or have the user
+confirm they want to deploy against an older base image.
+
+### Step 4 — Start scope and run Playwright
+
+```bash
+# Terminal 1 — scope (port 8000)
+SCOPE_CLOUD_APP_ID=<derived-url> ./run-app.sh
+
+# Terminal 2 — test
+cd e2e && npx playwright test
+```
+
+Expected on success (≤5 min cold, ~20 s warm):
+
+```
+Enabling cloud mode...          ✅
+Waiting for cloud connection... ✅
+Selecting passthrough model...  ✅
+Switching input source to Camera... ✅
+Starting stream...              ✅
+Verifying output stream processing... ✅ Output frames flowing
+Stopping stream...              ✅
+1 passed
+```
+
+**What the test does in livepeer terms:**
+
+1. Navigates to `localhost:8000`, switches the UI to Perform mode.
+2. Opens settings, flips Remote Inference on, waits for Connection ID
+   (proves the fal WebSocket handshake completed and
+   `websocket_connected` fired in Kafka).
+3. Selects the `passthrough` pipeline — triggers `pipeline/load`, which
+   runs on the fal runner and emits `pipeline_load_start` +
+   `pipeline_loaded`.
+4. Switches the input source to Camera — Playwright's launch args
+   `--use-fake-device-for-media-stream` and
+   `--use-fake-ui-for-media-stream` (configured in
+   `e2e/playwright.config.ts`) give `getUserMedia()` a synthetic feed.
+   This is essential: without a real MediaStream, the browser↔local
+   scope WebRTC ICE never completes, `CloudTrack._start()` is never
+   called, and the runner never gets `start_stream`.
+5. Clicks the play overlay (`[data-testid="start-stream-button"]`).
+   Frames flow via livepeer trickle through the orchestrator to the
+   fal runner; the runner emits `session_created` and `stream_started`.
+6. Waits 15 s so at least one `stream_heartbeat` fires on the runner.
+7. Asserts the **output** `<video>` inside the "Video Output" card is
+   actively playing (`currentTime > 0`). Checking any `<video>` would
+   false-positive on the local input preview.
+8. Stops the stream. Runner emits `session_closed` and eventually
+   `websocket_disconnected` when the session is reaped.
+
+## Running the quick HTTP smoke (secondary)
+
+```bash
+./test-cloud-connect.sh [flags]
+```
+
+Flags: `--skip-push`, `--skip-build-wait`, `--skip-deploy`,
+`--keep-scope`, `--port N`. Env overrides:
+`TIMEOUT_CONNECT`, `TIMEOUT_HEALTH`, `TIMEOUT_CI`, etc.
+
+Exit codes (bisect-friendly — `git bisect run` works):
+
+| Code | Meaning |
+|---|---|
+| 0 | Connected to cloud |
+| 1 | Cloud reported an `error` in `/cloud/status` |
+| 2 | Timed out waiting for connect |
+| 3 | Infra failure (push / CI / deploy / scope startup) |
+
+This only hits `POST /api/v1/cloud/connect` and polls status — it does
+**not** start a stream, load a pipeline on the cloud, or produce the
+session/stream events. If those are what you're after, use Playwright.
+
+A `--full-session` flag exists but hits a known gap: `/api/v1/session/start`
+is not livepeer-compatible (TODO at `src/scope/server/mcp_router.py:252`)
+and will error with `Pipeline X not loaded` in livepeer mode. The
+Playwright path is the supported way to exercise a full session.
+
+## Logs
+
+- `/tmp/test-cloud-connect/scope.log` — local scope stdout/stderr
+  (grep for `livepeer_gateway` when `LIVEPEER_DEBUG=1`)
+- `~/.daydream-scope/logs/scope-logs-*.log` — scope's rolling app logs
+- `e2e/test-results/` — Playwright screenshots + traces on failure
+- fal dashboard — runner stdout/stderr, including `[Kafka] Published
+  event: …` lines from `scope.server.kafka_publisher` in the runner.
+  Not accessible via CLI; open <https://fal.ai/dashboard/logs>.
+
+## Common failure signatures
+
+- **`All orchestrators failed (N tried)`** — set `LIVEPEER_DEBUG=1` to
+  get the per-orchestrator reason. Typical root causes:
+  - `did not receive ready message from websocket` → fal URL wrong
+    (e.g. stray `--main` suffix) or container cold-starting.
+  - `serverless handshake failed (ACCESS_DENIED)` → runner's
+    `validate_user_access` rejected (missing `SCOPE_USER_ID`, or
+    daydream API couldn't find the user).
+- **`discover_orchestrators requires discovery_url or signer_url`** →
+  `SCOPE_CLOUD_API_KEY` not set; signer fallback isn't configured.
+- **Playwright: `error while loading shared libraries: libnspr4.so`** →
+  Chromium system deps missing; run the `sudo apt-get install`
+  command from setup.
+- **Playwright: test passes but ClickHouse only has
+  `websocket_connected`** — the test probably clicked stop before ICE
+  completed. Confirm the fake-device launch args are set and the
+  Camera input was selected (not File).
+- **Playwright: `FrameProcessor failed to start: Pipeline X not
+  loaded`** — you're running the HTTP script's `--full-session` flag,
+  not the Playwright test. Switch to `npx playwright test`.
+
+## What "round-trip verified" looks like in ClickHouse
+
+After a successful Playwright run, `scope_cloud_events` filtered by
+your `user_id` and the `connection_id` from the `websocket_connected`
+row should contain:
+
+```
+websocket_connected          (wrapper)
+pipeline_load_start          (runner)
+pipeline_loaded              (runner)
+session_created              (runner)
+stream_started               (runner)
+stream_heartbeat × 1..N      (runner, ~every 10 s)
+stream_stopped               (runner)
+session_closed               (runner)
+websocket_disconnected       (wrapper, on session reap)
+```
+
+All sharing the same `user_id` and `connection_id` (= `manifest_id`).
+If any runner-emitted row is missing, something in
+`src/scope/cloud/livepeer_app.py` regressed — check the FrameProcessor
+construction around the `start_stream` handler and the explicit
+`publish_event` calls for `session_created` / `session_closed`.
diff --git a/.env.example b/.env.example
@@ -0,0 +1,40 @@
+# Copy this file to `.env.local` (gitignored) and fill in real values.
+# Used by run-app.sh, deploy-staging.sh, and test-cloud-connect.sh.
+
+# --- Client-side (the local scope process) ---
+
+# Required — fal app URL for your livepeer deployment.
+# Format: daydream/<app-name>/ws  (no --main suffix for the default env;
+# for non-default envs the URL includes the env, e.g. --preview/ws).
+# This MUST match SCOPE_FAL_APP_NAME + SCOPE_FAL_ENV below — the skill
+# derives it for you when it asks which app + env to test against.
+export SCOPE_CLOUD_APP_ID=daydream/<your-app>/ws
+
+# Required — daydream cloud API key (used to auth with signer.daydream.live).
+export SCOPE_CLOUD_API_KEY=sk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+
+# Required for the full automated test — your daydream user id.
+# Find it via the Scope UI cloud-connect request body, or in
+# ~/.daydream-scope/logs/scope-logs-*.log after a UI-driven connect.
+export SCOPE_USER_ID=user_xxxxxxxxxxxxxxxxxxxxxxxxx
+
+# --- Deploy-side (what deploy-staging.sh pushes to) ---
+
+# Optional default app name for deploy-staging.sh. If unset, the skill
+# asks the user. Example: scope-livepeer-<your-name>
+export SCOPE_FAL_APP_NAME=scope-livepeer-<your-name>
+
+# Optional default env for deploy-staging.sh. Defaults to "main". For
+# non-default envs remember that the URL in SCOPE_CLOUD_APP_ID includes
+# a --<env> suffix (e.g. "daydream/scope-livepeer-foo--preview/ws").
+# export SCOPE_FAL_ENV=main
+
+# Optional — auth mode for the fal deploy. Defaults to "public".
+# export SCOPE_FAL_AUTH=public
+
+# --- Optional ---
+
+# Enable DEBUG logs from livepeer_gateway so per-orchestrator rejection
+# reasons appear in scope.log (e.g. "ACCESS_DENIED", "did not receive
+# ready message from websocket").
+# export LIVEPEER_DEBUG=1
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -90,6 +90,13 @@ This documentation can be used to understand the architecture of the project:
 
 ## Local Cloud Testing
 
+> **DEPRECATED.** This section describes the old direct/cloud-relay
+> mode (`SCOPE_CLOUD_MODE=direct`, `fal_app.py`,
+> `CloudConnectionManager`) which is being removed. For all new
+> cloud testing, use the `testing-livepeer-fal-deploy` skill (see the
+> "Cloud testing — use this skill" section above). This section is
+> kept only for in-flight work on the legacy path.
+
 Test the cloud relay flow locally by running two Scope instances — one acting as the "cloud" relay server.
 
 **Environment variables:**
@@ -145,6 +152,46 @@ SourceManager reads video files
     reads from per-sink queues → _last_frames_by_sink
 ```
 
+## Cloud testing — use this skill
+
+**Livepeer cloud mode is the only supported cloud path going forward.**
+The older direct/cloud-relay mode (`fal_app.py` +
+`CloudConnectionManager` + `SCOPE_CLOUD_MODE=direct`) is being
+deprecated.
+
+**Whenever a user says "test cloud", "test the fal deploy", "verify
+cloud streaming", "run the e2e test", or pastes any cloud-connect
+error (`All orchestrators failed`, `ACCESS_DENIED`, `did not receive
+ready message`, `discover_orchestrators requires discovery_url`),
+route to the `testing-livepeer-fal-deploy` skill at
+`.agents/skills/testing-livepeer-fal-deploy/SKILL.md`.** Also
+route there for changes to `src/scope/cloud/livepeer_fal_app.py`,
+`src/scope/cloud/livepeer_app.py`, or the cloud-connect flow on the
+client side (`src/scope/server/livepeer.py`,
+`src/scope/server/livepeer_client.py`).
+
+The skill provides two paths:
+
+- **Playwright e2e test** (`e2e/tests/cloud-streaming.spec.ts`) —
+  primary. Drives the real Perform-mode UI with a synthetic camera
+  and verifies the full trickle round-trip. Produces every lifecycle
+  Kafka event (`websocket_connected`, `pipeline_loaded`,
+  `session_created`, `stream_started`, `stream_heartbeat`,
+  `session_closed`, `websocket_disconnected`).
+- **`test-cloud-connect.sh`** at the repo root — fast bash/curl smoke
+  test for `/api/v1/cloud/connect` only. Useful in `git bisect run`
+  or for "did the fal container come up?". Does not produce
+  pipeline/session/stream events.
+
+**Do NOT use the `Local Cloud Testing` or `MCP Server Testing with
+Local Cloud Dev` sections below for general cloud testing — those
+describe the deprecated direct-mode path and are kept only to
+unblock in-flight work on that legacy path until it's removed.**
+
+For a fully-local livepeer stack (prebuilt go-livepeer + local
+runner, no fal involved), use the separate `testing-livepeer` skill
+instead.
+
 ## MCP Server Testing
 
 When asked to test Scope via MCP tools (e.g., with a workflow JSON), follow this sequence directly — do not read source code to figure out the API. Use the HTTP API directly (not MCP tools) because restarting Scope kills the MCP server connection.
@@ -301,6 +348,11 @@ for name, color in [('test', (0,0,255)), ('test1', (0,255,0)), ('test2', (255,0,
 
 ## MCP Server Testing with Local Cloud Dev
 
+> **DEPRECATED.** Describes MCP testing against the old direct-mode
+> two-instance setup. For cloud MCP testing going forward, combine
+> the `testing-livepeer-fal-deploy` skill with the MCP patterns
+> above. Kept for in-flight work on the legacy path only.
+
 **Only use this section when the user explicitly asks for local cloud / two-instance testing.**
 
 Test the cloud relay flow locally by running two Scope instances — one acting as the "cloud" relay server. This is for testing the cloud WebRTC relay path specifically.