diff --git a/documentation/configuration/configuration-utils/_cairo.config.json b/documentation/configuration/configuration-utils/_cairo.config.json
index bed859b73..5b95a8393 100644
--- a/documentation/configuration/configuration-utils/_cairo.config.json
+++ b/documentation/configuration/configuration-utils/_cairo.config.json
@@ -319,6 +319,10 @@
"default": "0",
"description": "SampleBy default alignment behaviour. true corresponds to ALIGN TO CALENDAR, false corresponds to ALIGN TO FIRST OBSERVATION."
},
+ "cairo.sql.subsample.max.rows": {
+ "default": "100000000",
+ "description": "Maximum number of input rows SUBSAMPLE will buffer. Exceeding this limit returns an error. Must be between 1 and 2,147,483,647."
+ },
"cairo.date.locale": {
"default": "en",
"description": "The locale to handle date types."
diff --git a/documentation/query/sql/subsample.md b/documentation/query/sql/subsample.md
new file mode 100644
index 000000000..54c2a8da5
--- /dev/null
+++ b/documentation/query/sql/subsample.md
@@ -0,0 +1,496 @@
+---
+title: SUBSAMPLE keyword
+sidebar_label: SUBSAMPLE
+description: SUBSAMPLE SQL keyword reference for time-series downsampling using LTTB, M4, and MinMax algorithms.
+---
+
+`SUBSAMPLE` reduces the number of rows in a query result while preserving the
+visual shape of the data. It selects the most representative points from a
+time-ordered dataset, making it ideal for rendering charts at screen resolution
+without transferring millions of rows to the client.
+
+Unlike [SAMPLE BY](/docs/query/sql/sample-by/), which computes new aggregate
+values at synthetic bucket boundaries, `SUBSAMPLE` selects actual rows from
+the input. Every output row exists in the source table with its original
+timestamp and values. This means output timestamps match real rows (useful for
+joins), and users can drill down to the exact source record behind any point
+on a chart.
+
+Requires a [designated timestamp](/docs/concepts/designated-timestamp/) column.
+
+## Syntax
+
+```questdb-sql title="Value-based algorithms"
+SUBSAMPLE { lttb | m4 | minmax }(valueColumn, targetPoints [, gapThreshold])
+```
+
+```questdb-sql title="Position-based algorithms"
+SUBSAMPLE uniform(targetPoints)
+SUBSAMPLE cadence(stride [, seed])
+```
+
+Where:
+
+- **`valueColumn`** - the numeric column used to decide which points are
+ visually significant. Required for `lttb`, `m4`, and `minmax`. Not used
+ by `uniform` or `cadence`.
+- **`targetPoints`** - target number of output rows. Supports integer
+ literals, [DECLARE](/docs/query/sql/declare/) variables, and bind
+ variables (`$1`). Must be at least 2. Maximum is 2,147,483,647.
+- **`stride`** - (`cadence` only) step distance between emitted rows. This
+ is not an output count: `cadence(500)` emits one row out of every 500.
+- **`seed`** - (`cadence` only) optional integer seed or `NULL`. See
+ [cadence](#cadence---every-nth-row).
+- **`gapThreshold`** - (`lttb` only) optional interval that enables
+ gap-preserving mode. See [gap-preserving LTTB](#gap-preserving-lttb).
+
+### Execution order
+
+`SUBSAMPLE` runs after `SAMPLE BY`, `GROUP BY`, and window functions, but
+before `ORDER BY` and `LIMIT`. All value computations are complete before
+downsampling decides which rows to keep. `SUBSAMPLE` only selects rows - it
+never modifies computed values.
+
+All three algorithms execute serially. `SUBSAMPLE` buffers its entire input,
+runs the selected algorithm, then emits the chosen rows. It does not block
+upstream parallel execution - for example, a parallel `SAMPLE BY` completes
+before `SUBSAMPLE` buffers its output.
+
+### Supported value types
+
+The value column must be a numeric type: `DOUBLE`, `FLOAT`, `INT`, `LONG`,
+`SHORT`, or `BYTE`. `NULL` values in the value column are skipped during
+downsampling.
+
+## Algorithms
+
+Five algorithms are available. The first three (`lttb`, `minmax`, `m4`)
+inspect values to decide which rows are visually significant. The last two
+(`uniform`, `cadence`) ignore values and select rows purely by position -
+they are cheaper and useful when the input is dense or as a baseline.
+
+All five select real rows from the input - no values are ever interpolated
+or computed. The diagrams below use a 24-point series as input (think 24
+hourly bars over one day):
+
+
+
+### lttb - Largest Triangle Three Buckets
+
+Divides the data into equal-sized row-count buckets and selects the point in
+each bucket that forms the largest triangle with its neighbors. The idea is
+that points where the line changes direction sharply (a spike, a valley, a
+sudden trend shift) form large triangles and get kept, while points in the
+middle of a smooth trend form small triangles and get dropped. The first and
+last points are always kept. Output is exactly N points.
+
+Best for line charts where the visual shape matters most - a chart drawn
+from the LTTB output looks nearly identical to one drawn from the full
+dataset, despite using far fewer points.
+
+
+
+How it works:
+
+1. First and last points are always selected.
+2. Remaining data is divided into N-2 equal-sized buckets by row count.
+3. For each bucket, the point creating the largest triangle area with the
+ previously selected point and the average of the next bucket is chosen.
+4. Output preserves the original timestamp order.
+
+```questdb-sql title="Aggregate to hourly bars, then pick the 8 most representative" demo
+SELECT timestamp, avg(price) avg_price
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+ AND timestamp IN '$today'
+SAMPLE BY 1h
+SUBSAMPLE lttb(avg_price, 8)
+```
+
+#### Gap-preserving LTTB
+
+Standard LTTB divides data by row count, so it connects across time gaps. An
+optional third parameter sets a gap threshold:
+
+```questdb-sql
+SUBSAMPLE lttb(price, 12, '6h')
+```
+
+When specified, LTTB scans for gaps where consecutive timestamps are further
+apart than the threshold. Gaps below the threshold are ignored - the data is
+treated as continuous. Gaps above the threshold split the data into separate
+segments, each downsampled independently with its proportional share of the
+target points.
+
+The diagrams below show a dataset with two gaps - a small one (3 hours) and
+a large one (24 hours):
+
+
+
+Without gap detection, LTTB treats all points as continuous and connects
+across both gaps:
+
+
+
+With a threshold of `'6h'`, the small gap (3h) is below the threshold so
+segments A and B are treated as continuous. The large gap (24h) exceeds the
+threshold, so segment C is downsampled separately and the gap is preserved:
+
+
+
+Supported interval units: `s` (seconds), `m` (minutes), `h` (hours),
+`d` (days).
+
+Examples: `'30s'`, `'5m'`, `'1h'`, `'7d'`
+
+```questdb-sql title="Preserve gaps larger than 6 hours in the output" demo
+SELECT timestamp, price
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+SUBSAMPLE lttb(price, 12, '6h')
+```
+
+:::note
+
+Gap-preserving LTTB uses a soft target. Each segment receives at least its
+first and last points. When many segments are detected, the total output may
+exceed `targetPoints`. This is by design so that the same query does not fail
+for one time range and succeed for another. Non-gap LTTB, M4, and MinMax
+treat `targetPoints` as a hard maximum.
+
+:::
+
+### minmax - Min/Max per time interval
+
+Divides the time range into equal time intervals and selects up to 2 points
+per interval: the row with the minimum value and the row with the maximum
+value. This creates a visual envelope - at any point on the chart, you can
+see the full range the data covered during that interval. No spike or drop
+is ever hidden, even under heavy compression. Empty intervals produce no
+output, naturally preserving data gaps.
+
+
+
+How it works:
+
+1. The total time range is divided into N/2 equal time intervals.
+2. For each interval, up to 2 points are selected: min, max.
+3. Duplicate points are removed (if min and max are the same row).
+4. Empty intervals produce no output.
+
+Output is up to N points (N/2 buckets, up to 2 points each).
+
+```questdb-sql title="Hourly bars reduced to 8 with MinMax - min/max per bucket" demo
+SELECT timestamp, avg(price) avg_price
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+ AND timestamp IN '$today'
+SAMPLE BY 1h
+SUBSAMPLE minmax(avg_price, 8)
+```
+
+### m4 - Min/Max/First/Last per time interval
+
+Builds on MinMax by also capturing the first and last rows in each time
+interval. Where MinMax shows you the range of values in a bucket, M4 also
+shows you where the data entered and exited - the opening and closing levels.
+This matters when trends within a bucket are important: a price that opens
+high, dips, then recovers looks different from one that opens low and climbs.
+MinMax would show the same min/max range for both; M4 distinguishes them.
+
+Empty intervals produce no output, naturally preserving data gaps.
+
+
+
+How it works:
+
+1. The total time range is divided into N/4 equal time intervals.
+2. For each interval, up to 4 points are selected: first, last, min, max.
+3. When multiple roles resolve to the same physical row (e.g., the minimum
+ value is also the first row), duplicates are removed. A bucket emits
+ between 1 and 4 rows depending on the data.
+4. Empty intervals produce no output.
+
+Output is up to N points (N/4 buckets, up to 4 points each). In the diagram
+above, compare the right side with MinMax: M4 captures the exit at i=23
+(the pullback after the late spike), while MinMax ends at the peak. M4
+gives a more faithful picture of where the data actually settled.
+
+```questdb-sql title="Hourly bars reduced to 8 with M4 - captures entry/exit levels" demo
+SELECT timestamp, avg(price) avg_price
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+ AND timestamp IN '$today'
+SAMPLE BY 1h
+SUBSAMPLE m4(avg_price, 8)
+```
+
+:::tip
+
+When sizing `targetPoints` for a pixel-wide chart, remember that N/4 gives
+the number of time buckets. A 1920-pixel-wide chart needs
+`SUBSAMPLE m4(col, 1920)` to get 480 time buckets with up to 4 points each.
+
+:::
+
+### uniform - Evenly spaced rows
+
+Selects a target number of rows spaced evenly across the input. First and
+last rows are always kept, interior rows are picked at regular positions
+between them. Unlike the previous algorithms, `uniform` does not inspect
+values - it reduces row count purely by position in the time-ordered input.
+
+Use `uniform` when the input is dense and you care about reducing transfer
+size more than preserving spikes or troughs. For a line chart where visual
+fidelity matters, `lttb` or `m4` produce better results at the same target
+count. For a heatmap, scatter plot, or tabular display where every row looks
+similar, `uniform` is faster and the output is indistinguishable from
+value-aware methods.
+
+
+
+How it works:
+
+1. First and last rows are always selected.
+2. Remaining `targetPoints - 2` rows are selected at evenly spaced positions
+ between first and last.
+3. Output is exactly `targetPoints` rows when the input is larger than the
+ target, otherwise all input rows are returned unchanged.
+
+```questdb-sql title="500 evenly spaced rows from a dense tick table" demo
+SELECT timestamp, price
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+SUBSAMPLE uniform(500)
+```
+
+### cadence - Every Nth row
+
+Selects one row out of every N, starting from a configurable offset. Like
+`uniform`, `cadence` does not inspect values - it reduces row count by
+stepping through the input at a fixed rhythm. An optional second parameter
+sets the starting offset, either as a fixed seed for reproducible results or
+as `NULL` for a fresh random offset each run.
+
+The `stride` parameter is the step distance, not the output count. To keep
+500 rows, use `uniform(500)` or `lttb(col, 500)`. `cadence(500)` emits one
+row out of every 500, which is a different (and input-dependent) number.
+
+
+
+How it works:
+
+1. First and last rows are always selected (except when stride exceeds the
+ input size, in which case only the first row is emitted).
+2. From the offset position, emit one row every `stride` rows.
+3. Output is in timestamp-ascending order.
+
+| Form | Behavior |
+|------|----------|
+| `cadence(N)` | Every Nth row, deterministic, offset 0 |
+| `cadence(N, seed)` | Random offset in [0, N), reproducible given seed |
+| `cadence(N, NULL)` | Random offset in [0, N), fresh each run |
+
+The seeded and NULL forms exist to avoid phase-lock with periodic signals.
+If the input has a 1000-row period and you stride by 1000 with offset 0,
+every emitted row hits the same phase of the period and the chart loses the
+periodic structure. A random offset breaks this alignment.
+
+:::note
+
+Randomizing the offset helps with aliasing on periodic signals, but it does
+not make `cadence` a statistical sampler. It does not produce unbiased
+estimates of aggregates like mean or percentile. For those, use
+[SAMPLE BY](/docs/query/sql/sample-by/) with the appropriate aggregate
+function.
+
+:::
+
+```questdb-sql title="Every 1000th row - simple decimation" demo
+SELECT timestamp, price
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+SUBSAMPLE cadence(1000)
+```
+
+```questdb-sql title="Anti-aliasing with reproducible seed"
+SELECT timestamp, price
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+SUBSAMPLE cadence(1000, 42)
+```
+
+### Algorithm comparison
+
+| Property | lttb | minmax | m4 | uniform | cadence |
+|----------|------|--------|-----|---------|---------|
+| Parameter | targetPoints | targetPoints | targetPoints | targetPoints | stride |
+| Inspects values | Yes | Yes | Yes | No | No |
+| Bucket type | Equal row count | Equal time intervals | Equal time intervals | Equal row spacing | Fixed row stride |
+| Points per bucket | Exactly 1 | Up to 2 (min, max) | Up to 4 (first, last, min, max) | N/A | N/A |
+| Output count | Exactly N (or all rows if fewer) | Up to N | Up to N | Exactly N (or all rows if fewer) | ~rowCount/stride |
+| Gap handling | Connects across (use threshold) | Naturally preserves | Naturally preserves | Connects across | Connects across |
+| Best use case | Line charts | Value range overview | Dashboards, SLA | Dense uniform data | Decimation, anti-aliasing |
+| Relative cost | Higher: triangle area per point | Low: min/max per bucket | Medium: first/last/min/max per bucket | Lowest: position arithmetic | Lowest: stride arithmetic |
+
+## Examples
+
+### Chart-ready downsampling
+
+```questdb-sql title="LTTB: 500 representative points for a line chart" demo
+SELECT timestamp, price
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+SUBSAMPLE lttb(price, 500)
+```
+
+```questdb-sql title="LTTB with gap detection: preserve gaps larger than 1 hour" demo
+SELECT timestamp, price
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+SUBSAMPLE lttb(price, 500, '1h')
+```
+
+```questdb-sql title="M4: pixel-accurate envelope for a 1920px-wide chart" demo
+SELECT timestamp, price
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+SUBSAMPLE m4(price, 1920)
+```
+
+```questdb-sql title="MinMax: lightweight envelope at half the output of M4" demo
+SELECT timestamp, price
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+SUBSAMPLE minmax(price, 500)
+```
+
+```questdb-sql title="Uniform: 500 evenly spaced rows for a dense table" demo
+SELECT timestamp, price
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+SUBSAMPLE uniform(500)
+```
+
+```questdb-sql title="Cadence: every 1000th row for quick decimation" demo
+SELECT timestamp, price
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+SUBSAMPLE cadence(1000)
+```
+
+### Composing with SAMPLE BY
+
+```questdb-sql title="Aggregate to 1-minute bars, then downsample" demo
+SELECT timestamp, avg(price) avg_price
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+SAMPLE BY 1m
+SUBSAMPLE lttb(avg_price, 500)
+```
+
+`SAMPLE BY` computes aggregate values at bucket boundaries. `SUBSAMPLE` then
+selects the most representative rows from that output. The two operations
+complement each other: aggregate first, then reduce for display.
+
+### Multiple columns pass through
+
+Because `SUBSAMPLE` selects real rows rather than computing new ones, every
+column in the output carries its original value from the source table. In
+the query below, `side` and `quantity` are not involved in the downsampling
+decision, but each output row is a real trade with the actual side and
+quantity that occurred at that timestamp.
+
+```questdb-sql title="LTTB selects rows by price; all columns emit" demo
+SELECT timestamp, symbol, side, price, quantity
+FROM fx_trades
+WHERE symbol = 'GBPUSD'
+SUBSAMPLE lttb(price, 500)
+```
+
+### After window functions
+
+```questdb-sql title="Window functions see all rows before SUBSAMPLE selects" demo
+SELECT timestamp, price,
+ avg(price) OVER (ROWS 10 PRECEDING) ma
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+SUBSAMPLE lttb(price, 500)
+```
+
+Window functions compute on the full dataset. `SUBSAMPLE` then selects from
+the result, so the moving average values are accurate.
+
+### With DECLARE variable
+
+```questdb-sql title="Parameterized target point count" demo
+DECLARE @points := 500
+SELECT timestamp, price
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+SUBSAMPLE lttb(price, @points)
+```
+
+### With bind variable
+
+```questdb-sql title="Programmatic integration - target as bind variable"
+SELECT timestamp, price
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+SUBSAMPLE lttb(price, $1)
+```
+
+### With ORDER BY and LIMIT
+
+```questdb-sql title="Downsample, then sort by price" demo
+SELECT timestamp, price
+FROM fx_trades
+WHERE symbol = 'EURUSD'
+SUBSAMPLE lttb(price, 100)
+ORDER BY price DESC
+LIMIT 10
+```
+
+### Inside subqueries
+
+```questdb-sql title="SUBSAMPLE works inside parenthesized subqueries" demo
+SELECT count() FROM (
+ SELECT timestamp, price
+ FROM fx_trades
+ WHERE symbol = 'EURUSD'
+ SUBSAMPLE lttb(price, 500)
+)
+```
+
+## Behavior notes
+
+- If the input has fewer rows than the target, all rows are returned unchanged.
+- Output rows are always in timestamp-ascending order.
+- All columns from the `SELECT` clause pass through for selected rows.
+- `SUBSAMPLE` works with `WHERE`, `SAMPLE BY`, `GROUP BY`, CTEs, subqueries,
+ `ORDER BY`, and `LIMIT`.
+- `SUBSAMPLE` inside a parenthesized subquery applies inside that subquery,
+ not the outer query.
+
+## Configuration
+
+| Property | Default | Description |
+|----------|---------|-------------|
+| `cairo.sql.subsample.max.rows` | 100,000,000 | Maximum input rows SUBSAMPLE will buffer. Exceeding this limit returns an error. |
+
+`SUBSAMPLE` buffers its entire input before running the algorithm. For direct
+table scans, memory usage is 24 bytes per row. For queries involving
+`SAMPLE BY`, `GROUP BY`, or subqueries, memory also scales with the projected
+row width. At the default limit, the base buffer is approximately 2.4 GB.
+
+## See also
+
+- [SAMPLE BY](/docs/query/sql/sample-by/) - time-based aggregation
+ (computes new values at bucket boundaries, while `SUBSAMPLE` selects
+ existing rows)
+- [Designated timestamp](/docs/concepts/designated-timestamp/) - required
+ for `SUBSAMPLE` to operate
+- [Steinarsson, S. (2013). "Downsampling Time Series for Visual Representation"](https://github.com/sveinn-steinarsson/flot-downsample) -
+ the original LTTB algorithm and thesis reference
+- [Jugel, U. et al. (2014). "M4: A Visualization-Oriented Time Series Data Aggregation"](https://www.vldb.org/pvldb/vol7/p797-jugel.pdf) -
+ the M4 paper
diff --git a/documentation/sidebars.js b/documentation/sidebars.js
index bf99c48ad..02ccb0ba2 100644
--- a/documentation/sidebars.js
+++ b/documentation/sidebars.js
@@ -429,6 +429,7 @@ module.exports = {
"query/sql/order-by",
"query/sql/pivot",
"query/sql/sample-by",
+ "query/sql/subsample",
"query/sql/unnest",
"query/sql/where",
"query/sql/window-join",
diff --git a/scripts/gen_subsample_svgs.py b/scripts/gen_subsample_svgs.py
new file mode 100644
index 000000000..c9cdd681d
--- /dev/null
+++ b/scripts/gen_subsample_svgs.py
@@ -0,0 +1,376 @@
+"""Generate SVG diagrams for the SUBSAMPLE documentation page.
+
+Uses @media (prefers-color-scheme) for light/dark theme support since SVGs
+loaded via tags don't inherit CSS from the parent document.
+ViewBox width is ~600 to match typical content width so 1 unit ~ 1px.
+"""
+
+import os
+
+OUT_DIR = os.path.join(os.path.dirname(__file__), "..", "static", "images", "docs", "subsample")
+
+# QuestDB palette
+PINK = "#e289a4" # algorithm lines
+CYAN = "#0cc0df" # titles, M4/MinMax min/max role dots
+GRAY = "#888" # default dots (real rows from raw data)
+
+# Segment A: 24 points, i=0..23 (represents 24 hourly bars)
+# Late spike at i=22 (0.65) with pullback at i=23 (0.60) makes M4 visibly
+# better than MinMax: M4 captures the exit at 0.60, MinMax only sees the peak.
+SEG_A = [
+ 0.50, 0.55, 0.60, 0.65, 0.70, 0.95, 0.85, 0.70, 0.60, 0.55,
+ 0.50, 0.45, 0.55, 0.35, 0.28, 0.20, 0.25, 0.30, 0.35, 0.40,
+ 0.45, 0.50, 0.65, 0.60,
+]
+
+# Gap dataset: 3 data segments, 1 small gap (3h), 1 big gap (24h)
+# Seg A: i=0..10, Seg B: i=14..23 (small gap 11-13), Seg C: i=48..68 (big gap 24-47)
+GAP_SEG_A_I = list(range(0, 11))
+GAP_SEG_A_V = [0.50, 0.55, 0.60, 0.65, 0.70, 0.95, 0.85, 0.70, 0.60, 0.55, 0.50]
+
+GAP_SEG_B_I = list(range(14, 24))
+GAP_SEG_B_V = [0.42, 0.38, 0.35, 0.28, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45]
+
+GAP_SEG_C_I = list(range(48, 69))
+GAP_SEG_C_V = [
+ 0.45, 0.50, 0.55, 0.58, 0.60, 0.65, 0.70, 0.75, 0.70, 0.55,
+ 0.40, 0.25, 0.15, 0.25, 0.40, 0.55, 0.60, 0.62, 0.60, 0.58, 0.55,
+]
+
+# Single panel layout (viewBox units - keep at 600 for good proportions)
+W = 600
+H = 300
+XL, XR = 10, 590 # plot x range - use full width
+YT, YB = 60, 240 # plot y range (180px tall)
+LY = 275 # legend baseline
+
+# Gap SVG layout (three panels: raw, no-gap LTTB, gap LTTB)
+GH = 600
+G0T, G0B = 50, 140 # panel 0: raw data with gap
+G1T, G1B = 200, 310 # panel 1: LTTB without gap detection
+G2T, G2B = 370, 480 # panel 2: LTTB with gap detection
+GLY = 520 # legend
+
+# Intrinsic pixel width - set larger than container so max-width:100% fills it
+PX_W = 1400
+
+# Sizes (viewBox units - rendered ~1.3x on screen)
+TITLE_SZ = 12
+LEG_SZ = 11
+REF_SW = 1.0
+ALGO_SW = 2.0
+DOT_R = 4.5
+LEG_DOT = 3.5
+BK_SW = 0.8
+
+STYLE = f""""""
+
+
+def xp(i, imin, imax):
+ if imax == imin:
+ return (XL + XR) / 2
+ return XL + (i - imin) / (imax - imin) * (XR - XL)
+
+
+def yp(v, yt, yb):
+ return yt + (1 - v) * (yb - yt)
+
+
+def pl(ii, vv, imin, imax, yt, yb):
+ return " ".join(f"{xp(i,imin,imax):.1f},{yp(v,yt,yb):.1f}" for i, v in zip(ii, vv))
+
+
+def cd(ii, vv, imin, imax, yt, yb, fill):
+ return "\n".join(
+ f''
+ for i, v in zip(ii, vv))
+
+
+def cdm(pcs, imin, imax, yt, yb):
+ return "\n".join(
+ f''
+ for i, v, c in pcs)
+
+
+def rpl(ii, vv, imin, imax, yt, yb):
+ return f''
+
+
+def bkl(bounds, imin, imax, yt, yb):
+ return "\n".join(
+ f''
+ for b in bounds)
+
+
+def hdr(w, h, title, desc):
+ px_h = int(h * PX_W / w)
+ return (f'"""
+
+
+def gen_lttb():
+ N = len(SEG_A)
+ im, ix = 0, N - 1
+ ri = list(range(N))
+ # LTTB target 8: first + last always kept, 6 interior buckets
+ si = [0, 4, 5, 11, 15, 18, 22, 23]
+ sv = [SEG_A[i] for i in si]
+ return f"""{hdr(W, H, "LTTB downsampling", "LTTB selects 8 points from 24.")}
+LTTB: 24 hourly bars reduced to 8
+{rpl(ri, SEG_A, im, ix, YT, YB)}
+
+{cd(si,sv,im,ix,YT,YB,GRAY)}
+
+Raw data
+
+Selected points (8 of 24)
+"""
+
+
+def gen_m4():
+ N = len(SEG_A)
+ im, ix = 0, N - 1
+ ri = list(range(N))
+ # M4 target 8 -> 2 time buckets (0..11, 12..23)
+ # Bucket 1: first=0(.50), last=11(.45), min=0(.50)->dup first, max=5(.95) -> 3 pts
+ # Bucket 2: first=12(.55), last=23(.60), min=15(.20), max=22(.65) -> 4 pts
+ # Key: M4 catches the exit at i=23 (0.60) that MinMax misses
+ m4 = [
+ (0,.50,CYAN),(5,.95,GRAY),(11,.45,CYAN),
+ (12,.55,CYAN),(15,.20,GRAY),(22,.65,GRAY),(23,.60,CYAN),
+ ]
+ mi = [p[0] for p in m4]
+ mv = [p[1] for p in m4]
+ return f"""{hdr(W, H, "M4 downsampling", "M4 selects 7 points from 24.")}
+M4: target 8, emitted 7 (2 time buckets)
+{bkl([12], im, ix, YT, YB)}
+{rpl(ri, SEG_A, im, ix, YT, YB)}
+
+{cdm(m4, im, ix, YT, YB)}
+
+Raw data
+
+First / Last
+
+Min / Max
+
+Bucket boundary
+"""
+
+
+def gen_minmax():
+ N = len(SEG_A)
+ im, ix = 0, N - 1
+ ri = list(range(N))
+ # MinMax target 8 -> 4 time buckets of 6 (0..5, 6..11, 12..17, 18..23)
+ # Bucket 1: min=0(.50), max=5(.95)
+ # Bucket 2: min=11(.45), max=6(.85)
+ # Bucket 3: min=15(.20), max=12(.55)
+ # Bucket 4: min=18(.35), max=22(.65) -- misses the exit at i=23 (0.60)
+ mi = [0, 5, 6, 11, 12, 15, 18, 22]
+ mv = [.50, .95, .85, .45, .55, .20, .35, .65]
+ return f"""{hdr(W, H, "MinMax downsampling", "MinMax selects 8 points from 24.")}
+MinMax: target 8, emitted 8 (4 time buckets)
+{bkl([6, 12, 18], im, ix, YT, YB)}
+{rpl(ri, SEG_A, im, ix, YT, YB)}
+
+{cd(mi,mv,im,ix,YT,YB,GRAY)}
+
+Raw data
+
+Selected points (8 of 24)
+
+Bucket boundary
+"""
+
+
+def gen_uniform():
+ N = len(SEG_A)
+ im, ix = 0, N - 1
+ ri = list(range(N))
+ # uniform(8): evenly spaced, first and last pinned
+ # positions: round(i * 23 / 7) for i in 0..7 = 0, 3, 7, 10, 13, 16, 20, 23
+ si = [round(i * (N - 1) / 7) for i in range(8)]
+ sv = [SEG_A[i] for i in si]
+ return f"""{hdr(W, H, "Uniform downsampling", "Uniform selects 8 evenly spaced points from 24.")}
+Uniform: 8 evenly spaced from 24
+{rpl(ri, SEG_A, im, ix, YT, YB)}
+
+{cd(si,sv,im,ix,YT,YB,GRAY)}
+
+Raw data
+
+Selected points (8 of 24)
+"""
+
+
+def gen_cadence():
+ N = len(SEG_A)
+ im, ix = 0, N - 1
+ ri = list(range(N))
+ # cadence(3): every 3rd row from offset 0, plus last row pinned
+ # positions: 0, 3, 6, 9, 12, 15, 18, 21, 23(pinned)
+ stride = 3
+ si = list(range(0, N, stride))
+ if si[-1] != N - 1:
+ si.append(N - 1)
+ sv = [SEG_A[i] for i in si]
+ return f"""{hdr(W, H, "Cadence downsampling", "Cadence selects every 3rd row from 24.")}
+Cadence: stride 3, emitted {len(si)} from 24
+{rpl(ri, SEG_A, im, ix, YT, YB)}
+
+{cd(si,sv,im,ix,YT,YB,GRAY)}
+
+Raw data
+
+Selected points ({len(si)} of 24)
+"""
+
+
+def _gap_helpers():
+ """Shared helpers for the three gap SVGs."""
+ im, ix = 0, 68
+ raw_color = "#888"
+ small_gap_mid = 12
+ big_gap_mid = 35.5
+
+ def raw_pls(yt, yb):
+ return (f"{rpl(GAP_SEG_A_I, GAP_SEG_A_V, im, ix, yt, yb)}\n"
+ f"{rpl(GAP_SEG_B_I, GAP_SEG_B_V, im, ix, yt, yb)}\n"
+ f"{rpl(GAP_SEG_C_I, GAP_SEG_C_V, im, ix, yt, yb)}")
+
+ def raw_dots_str(yt, yb):
+ parts = []
+ for si, sv in [(GAP_SEG_A_I, GAP_SEG_A_V),
+ (GAP_SEG_B_I, GAP_SEG_B_V),
+ (GAP_SEG_C_I, GAP_SEG_C_V)]:
+ parts.extend(
+ f''
+ for i, v in zip(si, sv))
+ return "\n".join(parts)
+
+ def raw_lines_str(yt, yb):
+ return (
+ f'\n'
+ f'\n'
+ f'')
+
+ return im, ix, raw_color, small_gap_mid, big_gap_mid, raw_pls, raw_dots_str, raw_lines_str
+
+
+def gen_gap_raw():
+ """Raw data with gaps - shows where the gaps are."""
+ im, ix, raw_color, sg, bg, _, raw_dots_str, raw_lines_str = _gap_helpers()
+ total = len(GAP_SEG_A_V) + len(GAP_SEG_B_V) + len(GAP_SEG_C_V)
+ return f"""{hdr(W, H, "Raw data with gaps", "42 points with a small and large gap.")}
+Raw data: {total} points, small gap (3h) and large gap (24h)
+{bkl([sg, bg], im, ix, YT, YB)}
+{raw_lines_str(YT, YB)}
+{raw_dots_str(YT, YB)}
+
+Data points ({total})
+
+Gap boundary
+"""
+
+
+def gen_gap_no_detect():
+ """LTTB without gap detection - connects across all gaps."""
+ im, ix, _, sg, bg, raw_pls, _, _ = _gap_helpers()
+ ng_i = [0, 4, 5, 10, 18, 23, 51, 55, 60, 64, 67, 68]
+ ng_v = [.50, .70, .95, .50, .20, .45, .55, .75, .15, .55, .60, .55]
+ return f"""{hdr(W, H, "LTTB without gap detection", "LTTB connects across all gaps.")}
+LTTB without gap detection: connects across all gaps
+{raw_pls(YT, YB)}
+
+{cd(ng_i,ng_v,im,ix,YT,YB,GRAY)}
+
+Raw data
+
+Selected points (12 of {len(GAP_SEG_A_V)+len(GAP_SEG_B_V)+len(GAP_SEG_C_V)})
+"""
+
+
+def gen_gap_detect():
+ """LTTB with gap detection - small gap connected, large gap preserved."""
+ im, ix, _, sg, bg, raw_pls, _, _ = _gap_helpers()
+ g_ab_i = [0, 5, 10, 18, 22, 23]
+ g_ab_v = [.50, .95, .50, .20, .40, .45]
+ g_c_i = [48, 55, 58, 60, 65, 68]
+ g_c_v = [.45, .75, .55, .15, .60, .55]
+ return f"""{hdr(W, H, "LTTB with gap detection", "Small gap connected, large gap preserved.")}
+LTTB with gap threshold '6h': small gap connected, large gap preserved
+{bkl([bg], im, ix, YT, YB)}
+{raw_pls(YT, YB)}
+
+
+{cd(g_ab_i,g_ab_v,im,ix,YT,YB,GRAY)}
+{cd(g_c_i,g_c_v,im,ix,YT,YB,GRAY)}
+
+Raw data
+
+Selected points (12)
+
+Gap boundary
+"""
+
+
+if __name__ == "__main__":
+ os.makedirs(OUT_DIR, exist_ok=True)
+ for name, fn in [("raw.svg", gen_raw), ("lttb.svg", gen_lttb),
+ ("minmax.svg", gen_minmax), ("m4.svg", gen_m4),
+ ("uniform.svg", gen_uniform), ("cadence.svg", gen_cadence),
+ ("gap-raw.svg", gen_gap_raw),
+ ("gap-no-detect.svg", gen_gap_no_detect),
+ ("gap-detect.svg", gen_gap_detect)]:
+ path = os.path.join(OUT_DIR, name)
+ with open(path, "w") as f:
+ f.write(fn())
+ print(f"Wrote {path}")
diff --git a/src/css/_global.css b/src/css/_global.css
index 0f5ce0d7c..c82e2a1f9 100644
--- a/src/css/_global.css
+++ b/src/css/_global.css
@@ -485,3 +485,8 @@ html[data-theme="dark"] .DocSearch {
font-family: SegoeUI, -apple-system, BlinkMacSystemFont, Ubuntu, sans-serif;
font-size: var(--font-size-small);
}
+
+/* Make subsample diagram SVGs fill the content width */
+article img[src*="/subsample/"] {
+ width: 100%;
+}
diff --git a/static/images/docs/subsample/cadence.svg b/static/images/docs/subsample/cadence.svg
new file mode 100644
index 000000000..1666f2dc8
--- /dev/null
+++ b/static/images/docs/subsample/cadence.svg
@@ -0,0 +1,38 @@
+
\ No newline at end of file
diff --git a/static/images/docs/subsample/gap-detect.svg b/static/images/docs/subsample/gap-detect.svg
new file mode 100644
index 000000000..8afadec02
--- /dev/null
+++ b/static/images/docs/subsample/gap-detect.svg
@@ -0,0 +1,47 @@
+
\ No newline at end of file
diff --git a/static/images/docs/subsample/gap-no-detect.svg b/static/images/docs/subsample/gap-no-detect.svg
new file mode 100644
index 000000000..9d9c03e23
--- /dev/null
+++ b/static/images/docs/subsample/gap-no-detect.svg
@@ -0,0 +1,43 @@
+
\ No newline at end of file
diff --git a/static/images/docs/subsample/gap-raw.svg b/static/images/docs/subsample/gap-raw.svg
new file mode 100644
index 000000000..26bf617a6
--- /dev/null
+++ b/static/images/docs/subsample/gap-raw.svg
@@ -0,0 +1,74 @@
+
\ No newline at end of file
diff --git a/static/images/docs/subsample/lttb.svg b/static/images/docs/subsample/lttb.svg
new file mode 100644
index 000000000..03d7ee4c8
--- /dev/null
+++ b/static/images/docs/subsample/lttb.svg
@@ -0,0 +1,37 @@
+
\ No newline at end of file
diff --git a/static/images/docs/subsample/m4.svg b/static/images/docs/subsample/m4.svg
new file mode 100644
index 000000000..3e2ca661b
--- /dev/null
+++ b/static/images/docs/subsample/m4.svg
@@ -0,0 +1,41 @@
+
\ No newline at end of file
diff --git a/static/images/docs/subsample/minmax.svg b/static/images/docs/subsample/minmax.svg
new file mode 100644
index 000000000..6e4acc54e
--- /dev/null
+++ b/static/images/docs/subsample/minmax.svg
@@ -0,0 +1,42 @@
+
\ No newline at end of file
diff --git a/static/images/docs/subsample/raw.svg b/static/images/docs/subsample/raw.svg
new file mode 100644
index 000000000..8d9cb1add
--- /dev/null
+++ b/static/images/docs/subsample/raw.svg
@@ -0,0 +1,50 @@
+
\ No newline at end of file
diff --git a/static/images/docs/subsample/uniform.svg b/static/images/docs/subsample/uniform.svg
new file mode 100644
index 000000000..adcf661bb
--- /dev/null
+++ b/static/images/docs/subsample/uniform.svg
@@ -0,0 +1,37 @@
+
\ No newline at end of file