Skip to content

[2.43] fix: Fixes in analytics SQL engine for ClickHouse compatibility [DHIS2-21417]#24265

Merged
luciano-fiandesio merged 5 commits into
2.43from
DHIS2_21417_CH_FIXES_2.43-2
Jun 29, 2026
Merged

[2.43] fix: Fixes in analytics SQL engine for ClickHouse compatibility [DHIS2-21417]#24265
luciano-fiandesio merged 5 commits into
2.43from
DHIS2_21417_CH_FIXES_2.43-2

Conversation

@luciano-fiandesio

Copy link
Copy Markdown
Contributor

Backport from da96d3b

@luciano-fiandesio luciano-fiandesio changed the title fix: Fixes in analytics SQL engine for ClickHouse compatibility [DHIS2-21417] [2.43] fix: Fixes in analytics SQL engine for ClickHouse compatibility [DHIS2-21417] Jun 22, 2026
@luciano-fiandesio luciano-fiandesio added run-api-analytics-tests Enables analytics e2e tests run-api-analytics-tests-doris Enables analytics e2e tests on Doris labels Jun 22, 2026
@maikelarabori maikelarabori self-requested a review June 22, 2026 14:25
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS2-21381_METHOD_EXTRACTION_2.43 branch from b619c55 to 703b2a5 Compare June 23, 2026 12:03
Base automatically changed from DHIS2-21381_METHOD_EXTRACTION_2.43 to 2.43 June 24, 2026 05:39
…2-21417] (#23810)

* fix: [DHIS2-21418] lowercase period column names in event SQL emission

PeriodTypeEnum.getName() returns display-cased names (e.g. "Monthly"); the analytics tables are created with lowercase period columns. Postgres folds unquoted identifiers, Doris is case-insensitive, ClickHouse is not.

Lowercase with Locale.ROOT at every emission point so the reference matches the table column on every engine.

* fix: [DHIS2-21419] lowercase program UID in analytics_event/_enrollment table names

DHIS2 UIDs aremixed-case but the analytics tables themselves
are created with lowercased names (analytics_event_iphinat79uw).
Multiple emission sites concatenated "analytics_event_" + program.getUid() without
lowercasing. Invisible on Postgres (case-folded) and Doris
(lower_case_table_names=1), broken on ClickHouse (UNKNOWN_TABLE).

Apply .toLowerCase() at every emission site.

* fix: [DHIS2-21421] emit JOIN form for date-period-structure lookup on ClickHouse

DateFieldPeriodBucketColumnResolver already supports two emission shapes for the `analytics_rs_dateperiodstructure` lookup, switched on
sqlBuilder.useJoinForDatePeriodStructureLookup(): a correlated scalar subquery (Postgres) or a LEFT JOIN (Doris). ClickHouse's analyzer rejects the correlated form ("Resolve identifier 'eb.lastupdated' from parent scope only supported for constants and CTE"), so override the flag to true.

The JOIN form works on every engine, so this is a strict improvement with no regression risk on Postgres/Doris.

* fix: [DHIS2-21422] identity-alias prefixed enrollment column in aggregate CTEs

ClickHouse's analyzer keeps the original table prefix attached to a
projected column when no explicit output alias is given. So a CTE projecting ax.enrollment produces a column the analyzer still binds to ax - the outer query's eb.enrollment from the enrollment_aggr_base CTE alias is then
unresolvable.
Postgres and Doris implicitly drop the prefix and re-scope under the CTE alias.

Add explicit identity aliases at two emission sites in
JdbcEnrollmentAnalyticsManager: the enrollment_aggr_base CTE projection and
the inline derived-table evf event-date subquery. The alias re-binds the
column under the CTE / derived-table alias on ClickHouse and is a no-op for Postgres/Doris.

* fix: [DHIS2-21423] NULL-safe date cast for analytics joins

ClickHouse's toDate(NULL) and CAST(NULL AS Date) raise
CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN, and toDateOrNull only accepts String
input. Aggregate queries that join to analytics_rs_dateperiodstructure on a
nullable data-element column (e.g. cast(ax."<deUid>" as date)) crash.

Introduce AnalyticsSqlBuilder.castAsDate(String). Default emits ANSI `cast(... as date)` - correct for Postgres and Doris. ClickHouse overrides to
toDateOrNull(toString(...)), which is both NULL-safe and accepts every input type the cast might receive (Date / DateTime / DateTime64 / String / Nullable variants).

JdbcEventAnalyticsManager's hasTimeField LEFT JOIN now routes through sqlBuilder.castAsDate. Output for Postgres/Doris is byte-identical.

* fix: [DHIS2-21428] align dimension columns at SELECT for ClickHouse

Two fixes for the same family of case-sensitivity issue:

1) AggregatedRowBuilder.addDimensionData lowercases the dimension name with Locale.ROOT before the result-set lookup. Period-column SQL emission was
   already lowercased; the row builder still asked the result set for
   "Quarterly" instead of "quarterly", which Postgres/MySQL JDBC normalised
   silently but ClickHouse JDBC rejected with InvalidResultSetAccessException.

2) Add explicit "as <col>" alias to dimension projections in the SELECT context so the result-set column has a canonical name regardless of how the engine reports the underlying expression. ClickHouse JDBC retains  the table prefix on unaliased prefixed columns (ax.uidlevel2) in result-set metadata; Postgres/MySQL strip it.

* fix: [DHIS2-21430] generate legend-set columns on ClickHouse via inline CASE

JdbcEventAnalyticsTableManager.getColumnFromDataElementWithLegendSet
early-returned an empty list on engines without correlated-subquery support
(ClickHouse). The companion <deUid>_<lsUid> column was never created in the
ClickHouse analytics tables, but the query path still emitted SQL referring
to it — every aggregate query grouped or filtered by legend bucket failed
with UNKNOWN_IDENTIFIER.

Replace the early-return with a per-row CASE expression built from the static legend ranges. Same final value populated into the column, no correlated subquery, ClickHouse-friendly. Postgres and Doris are unaffected:
supportsCorrelatedSubquery() returns true on both, so the existing correlated-subquery emission path runs unchanged.

* fix: [DHIS2-21431] populate OU-name support column on ClickHouse via JOIN

Populate the <deUid>_name support column on ClickHouse via a LEFT JOIN to organisationunit instead of a correlated
  subquery (which ClickHouse rejects). The column was previously not created at all, causing UNKNOWN_IDENTIFIER errors in
  aggregate queries that group by an org-unit-typed data element.

* Fix formatting

* Fix failing unit test

* More Unit Test fixes (lower case program)

* refactor: [DHIS2-21420] centralise analytics_event/_enrollment table names

Introduce AnalyticsTableNames in dhis-support-sql with eventTable(Program)
and enrollmentTable(Program) helpers. Replace every inline
"analytics_event_" + program.getUid().toLowerCase() and equivalent
"analytics_enrollment_" + ... concatenation with a call through the helper.

* fix: [DHIS2-21422] identity-alias prefixed enrollment column in PI value CTE

Same ClickHouse-analyzer issue as the prior fix in JdbcEnrollmentAnalyticsManager:
a CTE projecting `subax.enrollment` without an explicit output alias keeps the
column bound to `subax` in ClickHouse's scope, so the outer query's
`<alias>.enrollment` reference cannot be resolved. Postgres and Doris re-scope
the column under the CTE alias implicitly.

Add an explicit identity alias on the projection in
DefaultProgramIndicatorSubqueryBuilder so the column is re-bound under the CTE
alias on ClickHouse. No-op for Postgres/Doris.

* remove doc for tracking issues

* fix: [DHIS2-21441] fix rejected correlated scalar subqueries in Events query

* fix: [DHIS2-21486] fix event PI queries by replacing correlated subqueries with event-keyed CTE joins

* fix: [DHIS2-21487] Fix enrollment stage display-name projections

* fix: [DHIS2-21488] fix relationship count CTE SQL for enrollment PIs

* fix: [DHIS2-21493] Fix ClickHouse dateDiff literal casting

* fix: [DHIS2-21495] Fix option-set stage CTE value projection

* Use correlated subqueries when engine supports it

* fix: scope ClickHouse event PI CTEs to candidate events

Add an internal event_pi_candidates CTE for EVENT analytics queries when
the database does not support correlated subqueries and EVENT
program-indicator CTEs are generated.

The candidate CTE mirrors the outer event query scope, including base
filters such as period, org unit, stage, and non-PI query item filters.
EVENT PI CTEs now read from this candidate source instead of scanning
the full analytics event table. PI filters stay outside the candidate
CTE to avoid circular SQL dependencies.

Postgres and other correlated-subquery paths remain unchanged.
Enrollment PI CTEs keep their existing source behavior.

* fix: keep event query enrollment PIs on CTE path

Narrow the Event analytics correlated-subquery fallback so it only
applies to EVENT-type program indicators. ENROLLMENT-type program
indicators used from the Event endpoint still need the CTE path because
it expands generated placeholders such as `FUNC_CTE_VAR`,
`__PSDE_CTE_PLACEHOLDER__`, and `__D2FUNC__`.

* refactor: [DHIS2-21488] replace relationship-count regex with d2:relationshipCount placeholder pipeline

* fix: use ClickHouse age() for year/month/week date differences

ClickHouse dateDiff() counts boundary crossings, so program indicators
using d2:yearsBetween, d2:monthsBetween and d2:weeksBetween returned
values that differed from PostgreSQL, which counts completed units via
age(). For example dateDiff('year', '2023-12-31', '2024-01-01') is 1
where PostgreSQL reports 0 completed years.

Switch the years, months and weeks cases of
ClickHouseSqlBuilder.dateDifference to age() (available since ClickHouse
23.1), keeping the same operand order. Days and minutes stay on
dateDiff() because PostgreSQL computes them with date subtraction and
elapsed time, which already match.

* fix: normalize empty text to NULL in ClickHouse enrollment aggregate CTEs

Enrollment aggregate queries with program-stage data element dimensions
  returned more groups on ClickHouse than on Postgres. ClickHouse
analytics tables store empty strings for absent text values where
Postgres stores NULL, so GROUP BY split empty and NULL into separate
buckets.
Add AnalyticsSqlBuilder.nullIfEmpty, a no-op by default and overridden
in `ClickHouseAnalyticsSqlBuilder` to wrap the column in `nullif(column,
'')`.

* fix: make ClickHouse safeConcat null-safe for analytics display names

* fix: align ClickHouse event query columns and join semantics with Postgres

* fix: treat ClickHouse empty-string coordinates as null in coordinatesOnly filter

* fix: fold ClickHouse empty-string text dimensions to NULL in event aggregate

* fix: alias wrapped ClickHouse aggregate text columns to keep result-set label

* fix: join ClickHouse relationship-count CTE by alias, not registry key
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS2_21417_CH_FIXES_2.43-2 branch from 655e401 to 5f6919e Compare June 26, 2026 11:49
@sonarqubecloud

Copy link
Copy Markdown

@luciano-fiandesio luciano-fiandesio merged commit c2cd1d1 into 2.43 Jun 29, 2026
17 of 18 checks passed
@luciano-fiandesio luciano-fiandesio deleted the DHIS2_21417_CH_FIXES_2.43-2 branch June 29, 2026 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-api-analytics-tests Enables analytics e2e tests run-api-analytics-tests-doris Enables analytics e2e tests on Doris

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants