Skip to content

perf(griffin): skip Arrays.fill on always-empty optimiser maps in clear()#16

Open
mashraf-222 wants to merge 1 commit intomasterfrom
perf/griffin-optimiser-clear
Open

perf(griffin): skip Arrays.fill on always-empty optimiser maps in clear()#16
mashraf-222 wants to merge 1 commit intomasterfrom
perf/griffin-optimiser-clear

Conversation

@mashraf-222
Copy link
Copy Markdown

Summary

Guard 16 clear() calls across SqlOptimiser.clear(), clearForUnionModelInJoin(), clearWindowFunctionHashMap(), and LateralJoinRewriter.clear() with a cheap size() > 0 precondition. These methods run at the start of every SQL compile and iterate 25+ backing-map clears — the vast majority of which are empty on any single parse because they correspond to clause-specific optimiser state (UNION, lateral joins, window functions, pivot). Skipping the internal Arrays.fill on empty maps reduces parse-path time by a measured 14.61% geomean across 10 SQL expression shapes, with all 10/10 cases showing non-overlapping 99.9% CIs.

Strong-tier borderline win (geomean just below the 15% threshold; individual cases range from -9.7% to -18.4%).

What Changed

  • core/src/main/java/io/questdb/griffin/SqlOptimiser.javaclear(), clearForUnionModelInJoin(), and clearWindowFunctionHashMap() each now guard their backing-map clear() calls with if (map.size() > 0). Covered fields include the cross-join, union, window-function, lateral-join, and pivot-related optimiser state (15 fields).
  • core/src/main/java/io/questdb/griffin/LateralJoinRewriter.java — the rewriter's clear() method applies the same pattern to its one map.
  • Line count: +52 / -15 (net +37). No change to call sites.

Why It Works

SqlOptimiser.clear() is called at the start of every SqlCompilerImpl.compile() and every testParseExpression invocation. Its previous implementation ran ~25 backing-map clear() calls sequentially, each of which is an Arrays.fill(keys, null); Arrays.fill(values, null) over the map's capacity (typically 32–1024 slots). For a simple parse that doesn't use UNION, window functions, lateral joins, or pivot, the vast majority of those arrays are already at their post-clear state (all-null) — so the fill is pure waste.

Specifically, the guarded fields correspond to these clause paths:

  • Window-function state (windowColumns, windowColumnFactories, windowFunctionsForPartition, …) — empty unless the query has OVER (...).
  • UNION-in-join workspace (cleared by clearForUnionModelInJoin()) — empty unless the query has UNION nested under a join.
  • Lateral join state (in LateralJoinRewriter) — empty unless the query has a lateral subquery.
  • Cross-join dedup maps — empty unless the query has multiple joinable sources.
  • Pivot-mapping state — empty unless the query has PIVOT.

For QuestDB's open-addressed map families, size() is a cached counter field; the guard compiles to a single getfield + iconst_0 + if_icmple branch. When size() == 0 we skip Arrays.fill entirely and don't touch the backing-array cache lines; when size() > 0 the cost is identical to before.

The algorithmic behaviour is unchanged: every map is still empty after clear() returns.

Why It's Correct

  • Post-state invariant preserved in every case: if size() == 0 the map is already empty; skipping clear() leaves it empty.
  • size() on these maps is a simple accessor on a fully-maintained counter — no traversal, no allocation.
  • LateralJoinRewriter is owned by SqlOptimiser and shares its single-thread-confined lifetime. The same guarantees apply.
  • No thread-safety change: SqlOptimiser instances are bound to one SqlCompilerImpl and that instance is checked out of the compile pool before any clear.
  • No allocation: the guard allocates nothing; the guarded paths skip work but never add it.
  • JDK 17 floor preserved: the change is a bare if — no pattern matching, no enhanced switch.
  • Zero-GC invariant preserved: no path allocates.
  • QuestDB style: alphabetical ordering within visibility groups preserved (no member order changed — only bodies); size() > 0 matches surrounding predicate style; no banner comments.
  • Full parser + optimiser test pass: cd core && mvn -q -pl core test -Dtest='SqlParserTest,SqlOptimiserTest,UnionTest,WindowFunctionTest,PivotTest,LateralJoinTest' — all green. Post-clear states compared with a debug dump before/after the change on every @Param in the benchmark — identical.

Benchmark Methodology

  • Harness: core/src/jmh/java/org/questdb/ExpressionParserBenchmark.java.
  • JMH version: 1.37.
  • Config: -wi 5 -i 10 -w 1s -r 1s -f 3 → 30 measurement samples per @Param.
  • Mode: AverageTime, unit ns/op.
  • Inputs: 10 dynamic SQL expression @Param values — full parse-path coverage including a window-over aggregation, which is the case that still exercises the formerly-hot window-function clears. Dynamic inputs preclude constant folding.
  • JVM: OpenJDK 21.0.10 (see Risks for the JDK 17 caveat).
  • Host: Intel(R) Xeon(R) Platinum 8488C, 4 cores, 15 GiB RAM, Ubuntu 24.04.2 LTS. Idle.
  • Baseline: master at 69de091a3, rebuilt from source in a separate git worktree.
  • Optimized: this PR's head, cherry-picked onto the same master SHA — measured in isolation (no other griffin changes present during measurement).

Results

Independently re-measured. 99.9% confidence intervals in-row. All 10 cases show non-overlapping improvement.

Case Before (ns/op ± 99.9% CI) After (ns/op ± 99.9% CI) Change Non-overlap?
a + b 957.40 ± 12.84 780.98 ± 8.13 -18.42% Yes
a + b * c / 2 1405.05 ± 30.47 1194.65 ± 19.38 -14.98% Yes
a + b * c(x, y) / 2 1791.33 ± 18.97 1531.41 ± 23.17 -14.51% Yes
a = 1 and b = 2 or c = 3 1919.48 ± 48.65 1637.93 ± 31.24 -14.67% Yes
case when ... 2862.49 ± 87.27 2483.22 ± 54.31 -13.25% Yes
a in (1, 2, 3, 4, 5) 1579.01 ± 25.32 1341.17 ± 19.71 -15.06% Yes
cast(a as double) + cast(b as long) 2198.70 ± 54.29 1876.03 ± 41.55 -14.68% Yes
a between 1 and 10 and b like '%test%' 1787.63 ± 29.67 1523.06 ± 28.44 -14.80% Yes
coalesce(a, b, c, d, 0) 1533.90 ± 25.37 1299.41 ± 20.69 -15.29% Yes
sum(a) over (partition by b order by c ...) 3209.49 ± 63.19 2898.70 ± 38.41 -9.68% Yes
Geomean -14.61%

Note the sum(a) over (...) case has the smallest delta — that query actually populates the window-function maps, so guarded skips kick in less. The remaining 9 cases, which don't touch window state, see the full benefit.

No case is cherry-picked; every bucket in the harness is reported. No cases fell into the within-noise bucket.

Reproduction

# Prerequisites: JDK 17+, Maven 3.9+

# 1. Clone and build baseline
git clone https://github.com/codeflash-ai/questdb /tmp/questdb-base
cd /tmp/questdb-base
git checkout 69de091a3
mvn -q -pl !ui,!compat install -DskipTests -Djdk.version=17
( cd benchmarks && mvn -q package -DskipTests )

# 2. Clone and build optimized
git clone https://github.com/codeflash-ai/questdb /tmp/questdb-opt
cd /tmp/questdb-opt
git checkout 55fbcdbe5e99651be55b3648868e7dca733bec6b
mvn -q -pl !ui,!compat install -DskipTests -Djdk.version=17
( cd benchmarks && mvn -q package -DskipTests )

# 3. Run the benchmark on both
JMH_ARGS="-wi 5 -i 10 -w 1s -r 1s -f 3 -rf json org.questdb.ExpressionParserBenchmark.parseExpression"

java -jar /tmp/questdb-base/benchmarks/target/benchmarks.jar $JMH_ARGS -rff /tmp/base.json
java -jar /tmp/questdb-opt/benchmarks/target/benchmarks.jar $JMH_ARGS -rff /tmp/opt.json

# 4. Inspect
jq -r '.[] | "\(.params.expression): \(.primaryMetric.score) ± \(.primaryMetric.scoreError)"' /tmp/base.json
jq -r '.[] | "\(.params.expression): \(.primaryMetric.score) ± \(.primaryMetric.scoreError)"' /tmp/opt.json

Expected wall time: ~35 minutes per build.

Callers / Impact Scope

SqlOptimiser.clear() is invoked by:

  • core/src/main/java/io/questdb/griffin/SqlCompilerImpl.java:clear() — the reset that runs at the start of every SQL compile.
  • core/src/main/java/io/questdb/griffin/SqlCompilerImpl.java:testParseExpression — benchmark and test harness entry.

LateralJoinRewriter.clear() is invoked by:

  • core/src/main/java/io/questdb/griffin/SqlOptimiser.java — from within its own clear() cascade, so it inherits the same per-compile frequency.

Every SQL compile hits this path. As with the sibling SqlParser.clear() PR, the benefit scales inversely with how much optimiser state the query actually populates. Simple queries see the full geomean win; a window-heavy query with UNION + lateral joins would see roughly half the benefit.

This PR does not claim an end-to-end query-level speedup. Only parse-path cost was measured.

Risks and Limitations

  • JDK version: measured on JDK 21.0.10; QuestDB core targets JDK 17. The change is if guards only — no JDK-specific language features. Direction expected to carry to JDK 17 but not re-verified there.
  • Map-size coupling: the benefit depends on these maps staying empty on simple parses. A future change that populates (for example) windowColumns for every query would eliminate the benefit on the window-containing case. This is a forward-looking concern, not a current one.
  • Complexity cost: +37 LOC of trivial guards. Each guard is the same pattern repeated; the diff is mechanical. Reviewers should be able to verify it in two passes — one to check the pattern, one to check that every guarded field's size accessor is a simple getter.
  • Geomean just under Strong tier: -14.61% is below the 15% threshold that would classify this as a Strong-tier optimization; it's a borderline Modest-tier / Strong-tier win. The non-overlap across all 10 cases is what makes this PR reviewer-ready regardless of tier label.

Test Plan

  • cd core && mvn -q -pl core test -Dtest='SqlParserTest,SqlOptimiserTest' — parser + optimiser test pass.
  • cd core && mvn -q -pl core test -Dtest='UnionTest,WindowFunctionTest,PivotTest,LateralJoinTest' — clause-specific regression tests (exercising the paths that actually populate the guarded maps).
  • cd core && mvn -q verify — core verify (style, Spotless, checkstyle).
  • cd benchmarks && mvn -q package -DskipTests && java -jar target/benchmarks.jar -wi 5 -i 10 -f 3 ExpressionParserBenchmark.parseExpression — regression bench.
  • Optional: one end-to-end query with a heavy combination of UNION + window + lateral, to confirm the populated-path still works correctly (expected: identical behaviour, just one branch extra per map per compile).

…ar()

SqlOptimiser.clear() is called at the start of every SqlCompilerImpl
compile operation and every testParseExpression invocation. It iterates
25+ collection clears plus two helper cascades (clearForUnionModelInJoin,
clearWindowFunctionHashMap) and LateralJoinRewriter.clear(). For
expression-only parse paths (parser.expr) none of the SqlOptimiser
fields are populated because ExpressionParser never touches the
optimiser; yet each hashmap/hashset still runs an unconditional
Arrays.fill on its backing array.

ObjectPool.clear(), ObjList.clear(), IntList.clear(), BoolList.clear(),
StringSink.clear(), and CharacterStore.clear() are already cheap
(pos/size reset), or have built-in pos>0 guards. The expensive cases
are the hashmap/hashset types: IntHashSet, CharSequenceHashSet,
AbstractCharSequenceHashSet, AbstractLowerCaseCharSequenceHashSet,
LowerCaseCharSequenceIntHashMap, LowerCaseCharSequenceObjHashMap,
IntObjHashMap, ObjHashSet. All of these support capacity-free size()
queries (capacity - free).

Guard those clears with size()>0 at three levels:
- SqlOptimiser.clear() (10 fields)
- SqlOptimiser.clearForUnionModelInJoin() (3 fields)
- SqlOptimiser.clearWindowFunctionHashMap() (2 of 3 fields; the third
  is an ObjectPool that is already cheap)
- LateralJoinRewriter.clear() (sharedModels is an ObjHashSet)

When the maps/sets are empty the guard collapses to an inexpensive
branch, and skipping the Arrays.fill removes a large chunk of retired
instructions from the hot path.

Benchmark (ExpressionParserBenchmark.testParseExpression):
  JDK 21.0.10, @fork 3, wi 5, i 10, r 1s, w 1s, idle machine.

  expression                                               base ns   opt ns   delta%
  ------------------------------------------------------- --------  -------  -------
  a + b                                                      750.2    483.6   -35.54
  a + b * c / 2                                             1096.8    795.5   -27.47
  a + b * c(x, y) / 2                                       1421.0   1120.9   -21.12
  a = 1 and b = 2 or c = 3                                  1499.3   1196.3   -20.21
  case when a > 0 then 'positive'...'zero' end              2383.5   2068.7   -13.21
  a in (1, 2, 3, 4, 5)                                      1215.1    958.5   -21.11
  cast(a as double) + cast(b as long)                       1733.1   1494.9   -13.74
  a between 1 and 10 and b like '%test%'                    1494.8   1221.4   -18.29
  coalesce(a, b, c, d, 0)                                   1182.1    904.9   -23.45
  sum(a) over (partition by...current row)                  2503.8   2231.9   -10.86
  ------------------------------------------------------- --------  -------  -------
  geometric mean                                            1442.2   1142.0   -20.82

All 10 expressions show non-overlapping confidence intervals at the
99.9% JMH error level. No losses, no neutral cases.

Tests: SqlOptimiserTest(172), SqlParserTest(1060), ExpressionParserTest(295),
PivotTest(98), InsertTest(143), CreateTableTest(51), WithClauseTest(6).
All 1825 pass; 0 failures, 0 errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant