Skip to content

feat: add survey app#422

Open
zackpollard wants to merge 69 commits intomainfrom
feat/survey-builder
Open

feat: add survey app#422
zackpollard wants to merge 69 commits intomainfrom
feat/survey-builder

Conversation

@zackpollard
Copy link
Copy Markdown
Member

No description provided.

@zackpollard zackpollard force-pushed the feat/survey-builder branch from 4998e4b to 062f136 Compare March 31, 2026 02:01
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 31, 2026

Preview Deployments (6fa0cd1)

App Preview URL
api.immich.app api.pr-422.dev.immich.app
awesome.immich.app awesome.pr-422.dev.immich.app
buy.immich.app buy.pr-422.dev.immich.app
datasets.immich.app datasets.pr-422.dev.immich.app
futo-backups-survey.immich.app futo-backups-survey.pr-422.dev.immich.app
get.immich.app get.pr-422.dev.immich.app
my.immich.app my.pr-422.dev.immich.app
root.immich.app pr-422.dev.immich.app
survey.immich.app survey.pr-422.dev.immich.app

@zackpollard zackpollard force-pushed the feat/survey-builder branch 23 times, most recently from d4e0eb0 to c2f88e0 Compare April 1, 2026 03:22
@zackpollard zackpollard force-pushed the feat/survey-builder branch 5 times, most recently from 57bfedf to b3e5da3 Compare April 7, 2026 18:23
node 22's built-in WebSocket couldn't scale to 2000 concurrent
connections — ~15% of connects failed with generic "WS error" and
the p99 upgrade latency was 22 seconds. Those weren't server-side
failures; it was the client struggling. Same server code, different
client, the numbers transformed:

  metric            built-in   ws pkg
  ---               --------   ------
  total errors         412          0
  ws connect errs      312          0
  ws connect p90    11973ms      302ms
  ws connect p99    21924ms      397ms
  throughput         83 r/s      99 r/s

Also fixes the "extra respondents" issue: the built-in WebSocket
can't send Cookie headers, so returning users always created fresh
respondents on reconnect. With the ws package we can set headers
and capture Set-Cookie from the upgrade response, so reconnects
reuse the same rid. Server-side respondent count now matches the
expected user count exactly.

Changes:
  - add ws + @types/ws devDependencies
  - createWsClient uses ws package with Error-typed callbacks and
    HTTP 'unexpected-response' event for real diagnostics (status
    codes, error messages) instead of a generic 'WS error' label
  - capture rid_{slug} cookie from upgrade response, reuse on all
    subsequent reconnects in the same client
  - returning user flow passes the cookie on explicit reconnect so
    they come back as the same respondent
  - load test queries the server's /results endpoint at the end and
    prints "Server-side respondents: X total ✓" to verify the
    expected user count matches persisted state
  - remove debug logging from cookie capture
Researched how leading survey tools (typeform, surveymonkey, jotform,
tally, google forms, microsoft forms) present results per question
type. The previous results page had several well-known anti-patterns:
pie charts as a toggle, word clouds for text, chart.js for simple bars,
no low-sample guards, and a single generic bar chart for every choice
type regardless of what was actually being measured.

Per-type visualisations:

  ChoiceResult (radio/dropdown/checkbox)
    - horizontal bars sorted by count
    - "most common" / "most selected" callout
    - checkbox mode uses respondent-based percentages (not response-
      based) and shows "avg selections per respondent" — the stat
      everyone actually wants
    - top answer is visually highlighted

  RatingResult
    - big mean score with rendered stars
    - "satisfaction" top-box % (rated 4+ of 5)
    - distribution bars ordered high-to-low
    - low-sample notice below n=5

  NpsResult
    - big NPS score colour-coded by industry bands
    - four stats: promoters/passives/detractors/responses
    - classic 3-segment stacked bar
    - per-score distribution (0-10) showing the underlying shape
    - "need 10+ responses for a meaningful NPS" notice

  LikertResult
    - diverging stacked bar centred on the neutral midpoint (the gold
      standard for agreement scales — biggest differentiator vs most
      survey tools)
    - agree/neutral/disagree/mean headline stats
    - inline legend

  NumberResult
    - histogram with auto-bucketing (integer-aware)
    - mean/median/min/max/count stat strip, with median tonally
      highlighted because it's less skewed by outliers
    - raw dot list for low-sample data

  TextResult (text/textarea/email)
    - n-gram frequency (bigrams + trigrams, stopword filtered) —
      replaces the word cloud which is decorative rather than
      analytic. actual phrases with counts are much more useful.
    - email mode shows unique/duplicate counts and top domain
      breakdown instead of n-grams
    - paginated response list with client-side search
    - textarea responses truncate past 200 chars with "show more"

Shared:
  - HorizontalBar component (pure CSS, no chart.js) — faster to
    render and gives much finer layout control for row-per-option
  - StatStrip for consistent stat display
  - LowSampleNotice with configurable threshold
  - analytics-utils gains computeRating, computeLikert, computeNumber,
    bucketNumbers, computeNgrams, computeTextStats, computeEmailStats,
    computeCheckboxStats, npsDistribution

Results page header:
  - 4-up kpi strip (total / completed / completion / live) replacing
    the previous 3-card layout
  - gradients and progress bar for visual clarity

Removed obsolete components: BarChart, PieChart, ChartTypeToggle,
WordCloud, NpsScoreCard. Chart.js is no longer a dependency for the
per-question results (still used for Timeline/Dropoff overviews).

QuestionResult.svelte is now a thin dispatcher that picks the right
component based on question.type.

Tests: 381 unit + 81 e2e passing, production build clean, 0 svelte-
check errors.
follow practice used by typeform, tally, jotform: the overview card
shows a small sample (top 5 by frequency then length) with a link to
the dedicated responses tab, instead of duplicating the full browser
and search that already exist there.
for email questions the actual addresses ARE the value, unlike free
text. split out a new emailresult component with:

- lead-quality split (corporate vs free vs disposable) from hardcoded
  domain lists
- disposable/role-based/invalid flag badges per address
- gmail-aware normalisation (strip dots and +tags) so duplicates get
  properly grouped
- filter chips + search + pagination over the full deduped list
- copy-filtered-list and mailto:?bcc= actions for quick lead export
clicking a search row fetches the respondent and shows every answer
inline, with the matching question highlighted so it's obvious where
the search hit landed.
…rework

fast tier (5s) stays pure in-memory as before. new slow tier fires once
per minute and runs the sql-backed aggregations (timeline, dropoff,
completion times) ONCE per do and fans the result out to every connected
viewer in a single push — so n viewers never become n queries.

timeline:
- add minute granularity (backend + wire op + ws protocol)
- auto-pick initial granularity from the observed timestamp span
- client-side gap fill anchored to now so a single-point survey renders
  as a proper line across the time axis instead of a dot on the left

dropoff:
- pure-css funnel (no more chart.js for this one) — bar width is share
  of the starting cohort, rhs label is drop from previous question
- live updates via slow analytics push

completion-time histogram:
- brand new chart. server groups durations into 8 fixed buckets with
  running mean/median/min/max. green bar is the bucket containing the
  median. live updates via slow analytics push.
…s for histogram

- revert the 2-col grid that made both charts cramped; back to full-width stacked cards
- completion-time histogram now uses chart.js (same dep as timeline, already loaded
  on the page) so the bars actually render. the previous pure-css approach relied on
  percentage heights on flex-1 children, which collapse when the column has no
  intrinsic cross-axis height
enter-to-advance: QuestionCard's keydown handler used to only fire for
input[type=text], so email/number/etc didn't advance on enter. now it
advances on enter for every focus target except textarea (newline),
button and link (native activation).

per-question timing:
- new answer_ms column on answers (migration + DO schema + additive
  alter for existing DO instances)
- shared AnswerInput + client PendingSave carry an optional answerMs
- the survey loader stamps 'question shown at' in a plain non-reactive
  object via a $effect on engine.currentQuestion.id; handleAnswer
  computes elapsed and puts it on the buffered save
- ws submit-answers and the http service layer clamp [0, 24h] then
  store alongside the answer row
- new getAnswerDurationsByQuestion repo query + getQuestionTimings
  service method compute median/mean per question in js
- slow-tier analytics broadcast now includes question timings so the
  chart updates once a minute without fan-out
- new QuestionTimingChart (chart.js horizontal bar) showing median
  seconds per question with tooltip showing median/mean/sample size
question timings:
- backend extended to compute p5/p25/p50/p75/p95 + min/max per question
  via nearest-rank percentile on the sorted duration array
- QuestionTimingChart rewritten as a pure-css horizontal box plot:
  whiskers p5-p95, iqr box p25-p75, median tick, caps at p5 and p95
- shared axis anchored to max-of-p95 x 1.05 so a single slow outlier
  doesn't squash every other row

number histogram:
- NumberResult had the same broken flex-1 percentage-height bar pattern
  as the completion-time chart used to — explicit pixel heights + a
  fixed-height parent fix the invisible bars
unit (analytics-utils.test.ts): normalizeEmail (gmail dots/+tags, validation),
classifyDomain, computeEmailSummary (dedupe, flags, topDomains),
computeNgrams, computeTextStats, computeCheckboxStats, computeRating,
computeLikert, computeNumber + bucketNumbers. 52 new tests bring the
file count from 381 → 433.

e2e:
- Enter-to-advance: text/email/number inputs submit on Enter, textarea
  keeps Enter as a newline
- Search tab: clicking a result row expands the full respondent detail,
  shows the match chip on the originating question, and collapses again
  on re-click
- Overview tab: completion-time and per-question-timing charts render
- new shared constants.ts entries: MAX_ANSWER_MS, clampAnswerMs,
  percentile, BROADCAST_FAST_INTERVAL_MS, BROADCAST_SLOW_TICKS_PER_CYCLE,
  COMPLETION_TIME_BUCKETS. Same magic numbers were scattered across
  ws-handler, ws-broadcaster and respondent.service before.
- respondent.service.getQuestionTimings / getCompletionTimes now share
  a single nearest-rank percentile helper instead of re-declaring a
  local closure in each method. The completion-time bucket definition
  lives in constants so it's testable and the service just clones +
  counts it.
- ws-handler replaces three ADMIN_OPS/EDITOR_OPS/VIEWER_OPS sets + three
  inline hasMinRole checks with a single declarative OP_ROLES map.
  Ops not listed are public (respondent survey-taking); any new op just
  adds one line to the table. The dispatch becomes a single conditional.
- ws-handler's inline answer_ms clamping is replaced with the shared
  helper so the WS fast-path and the HTTP service path validate
  identically.
…lidate slug cache on mutate

Two fixes from the DO auth review:

1) Strip X-WS-Role / X-Respondent-Id / X-Authenticated from inbound
   client requests before forwarding to the DO. The worker is the only
   thing allowed to set these headers — previously, `new Headers(request.headers)`
   copied client-supplied values, and X-Respondent-Id was only
   overwritten when the worker found a valid rid cookie. A client
   sending the header with no cookie could claim an arbitrary
   respondent ID on the WS upgrade and the HTTP path. Fixed by
   deleting all three internal headers up front in stripInternalHeaders()
   and then explicitly setting the ones the worker has verified.

2) Invalidate the per-isolate slug→id cache when a survey is deleted
   or its slug/password_hash mutates. Previously the 60s TTL could
   serve a stale route after delete, and if the slug got reused by a
   fresh survey the cache would still point at the old DO. Now
   invalidateSlugCacheBySurveyId() drops matching entries as part of
   the delete path and the catalog-sync path. Other isolates still
   rely on TTL — keeping that short (60s) bounds the window.

e2e: added a regression test that sends a forged X-Respondent-Id with
no cookie and verifies the DO ignores it and allocates a fresh
respondent.
The chained Enter-key presses raced with the 200ms question transition
animation and the QuestionCard keydown listener teardown/remount cycle
— locators were firing while the old card was unmounting and the new
one hadn't mounted yet. Switched to role-based heading locators,
added waitForTransition() between each advance, and factored the
boilerplate into a startSurveyAtFirstQuestion() helper. The textarea
test now uses the Next button to get to the textarea question so the
feature-under-test (Enter behaviour) is exercised in isolation.
…ator

TextareaQuestion was not passing question.placeholder down to the
Textarea component, so any placeholder configured on a textarea
question was silently ignored. The enter-to-advance e2e test hit this
when trying to locate the textarea by placeholder and timed out.

Fixed the component to forward the placeholder prop and switched the
e2e test to a role-based textarea locator so it no longer depends on
the placeholder being rendered.
text input debounce:
- TextQuestion, TextareaQuestion, EmailQuestion, NumberQuestion all
  debounce their oninput callback by 300ms (clearTimeout on every
  keystroke, fire once the user pauses). onDestroy flushes the last
  value so advancing via Enter or clicking Next doesn't lose input.
- client.ts bufferAnswer no longer increments unflushedCount when the
  same questionId is already buffered — repeated updates to a text
  field were reaching the FLUSH_THRESHOLD of 4 after just 4 keystrokes
  and triggering an unnecessary network flush mid-typing.

dashboard pagination + search:
- backend GET /api/surveys now accepts search, offset, limit params.
  returns { surveys, total } instead of a bare array. repository uses
  LIKE on title for search, standard offset/limit for pagination,
  COUNT(*) for the total.
- frontend dashboard uses listSurveysPaginated with 12 per page.
  debounced search bar (300ms), paginator with prev/next, empty-state
  distinguishes 'no surveys yet' from 'no matching surveys'.
- existing listSurveys() updated to parse the new response shape so
  callers that don't need pagination still work.
the survey page had a branch gap: if loading failed (API error, network
issue) the error was shown in a floating toast with a dismiss button.
dismissing hid the error but the engine/showWelcome/etc were never
initialized, so none of the other template branches matched → blank
page.

fixes:
- error toast only shows for non-fatal errors (i.e. while the user is
  actively answering and a save fails). fatal load errors are handled
  by the main if/else chain which shows the error inline with a 'try
  again' reload button.
- added a catch-all {:else} branch at the bottom of the template that
  shows 'something went wrong' + retry. this makes it impossible for
  the page to ever render as completely blank regardless of state.
the 'Loading...' text was nearly invisible and the user reported a
blank page. adds a proper animated spinner + 'Loading survey...' text
so the loading state is clearly visible while the onMount async
completes (fetch survey + WS connect + resume).
when a respondent answered every question but never hit submit (tab
crash, browser close, network drop), the resume returns a
nextQuestionIndex equal to the total question count. the engine was
initialized with an out-of-bounds index, so currentQuestion was
undefined, the progress bar rendered but no question card or section
header — producing the blank-with-blue-bar state.

now if nextQuestionIndex >= questions.length, the loader calls
postComplete() automatically since the answers are already saved and
there's nothing left to show.
auto-completing was wrong — the user might have been mid-answer on a
free-text field when their tab crashed. now we cap the resume index to
the last valid question so they land back on their final answer and can
review before hitting submit.
the 300ms input debounce I added earlier created a gap: if the user
typed within the last 300ms before navigating away, the pending timer
hadn't fired yet so the latest value wasn't in the answer buffer when
flushBufferSync sent the beacon.

fix: new useDebouncedAnswer() helper that:
- debounces oninput by 300ms (same as before)
- registers a pre-flush hook via svelte context so the survey loader
  can call it BEFORE flushBufferSync on beforeunload
- flushes on component destroy (normal question-to-question navigation)

all 4 debounced components (text, textarea, email, number) now use the
shared helper instead of inline setTimeout/onDestroy boilerplate.
the resume logic scanned questions in order and returned the first one
without an answer. if the user skipped an optional question mid-survey
and answered later ones, they'd be sent back to the skipped question
on refresh — losing their place entirely.

now both the WS handler and the HTTP service find the LAST answered
question and resume from the one after it. the client-side loader
already clamps out-of-bounds indices so all-answered respondents still
land on the final question.
SurveyShell's seenSectionIds started empty on every mount, so resuming
mid-survey always showed the section interstitial screen before the
question — even for sections the user had already passed through.

now buildInitialSeenSections() marks every section up to and including
the current question's section as 'seen' at mount time. fresh starts
(index 0, no answers) still show the first section header normally.
single shared validation module (shared/answer-validation.ts) imported
by both client and server — no duplication, identical rules on both
sides.

built-in validation per question type:
- text/textarea: minLength, maxLength, minWords, maxWords, custom
  regex pattern with custom error message
- email: format check + optional allowed-domains list
- number: valid number, min/max range, integerOnly, step (multiples)
- rating: integer 1..scaleMax
- nps: integer 0..10
- likert: must be one of the 5 valid labels
- radio: must be a defined option, Other requires otherText
- checkbox: valid options, minSelections/maxSelections, Other text
- dropdown: must be a defined option

client: QuestionCard.handleNext() calls validateAnswer instead of the
old required-only check. email question now passes raw input so the
format error surfaces properly (was swallowing invalid emails as '').

server: ws-handler submit-answers and respondent.service submitBatch
both validate every answer before INSERT. invalid answers get a 400
with a descriptive error message.

builder: QuestionEditor gains config sections for all user-settable
rules (text minLength/words/pattern, email allowedDomains, number
integerOnly/step, checkbox min/maxSelections).

50 unit tests cover every type + edge case.
moved the validation config into a collapsible section matching the
skip logic pattern — both are now chevron toggle buttons sitting side
by side in a single row above the question's bottom border. validation
toggle only appears for question types that have configurable rules
(text, textarea, email, number, checkbox).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants