[2.43] fix: Fixes in analytics SQL engine for ClickHouse compatibility [DHIS2-21417]#24265
Merged
Conversation
maikelarabori
approved these changes
Jun 22, 2026
maikelarabori
approved these changes
Jun 23, 2026
b619c55 to
703b2a5
Compare
…2-21417] (#23810) * fix: [DHIS2-21418] lowercase period column names in event SQL emission PeriodTypeEnum.getName() returns display-cased names (e.g. "Monthly"); the analytics tables are created with lowercase period columns. Postgres folds unquoted identifiers, Doris is case-insensitive, ClickHouse is not. Lowercase with Locale.ROOT at every emission point so the reference matches the table column on every engine. * fix: [DHIS2-21419] lowercase program UID in analytics_event/_enrollment table names DHIS2 UIDs aremixed-case but the analytics tables themselves are created with lowercased names (analytics_event_iphinat79uw). Multiple emission sites concatenated "analytics_event_" + program.getUid() without lowercasing. Invisible on Postgres (case-folded) and Doris (lower_case_table_names=1), broken on ClickHouse (UNKNOWN_TABLE). Apply .toLowerCase() at every emission site. * fix: [DHIS2-21421] emit JOIN form for date-period-structure lookup on ClickHouse DateFieldPeriodBucketColumnResolver already supports two emission shapes for the `analytics_rs_dateperiodstructure` lookup, switched on sqlBuilder.useJoinForDatePeriodStructureLookup(): a correlated scalar subquery (Postgres) or a LEFT JOIN (Doris). ClickHouse's analyzer rejects the correlated form ("Resolve identifier 'eb.lastupdated' from parent scope only supported for constants and CTE"), so override the flag to true. The JOIN form works on every engine, so this is a strict improvement with no regression risk on Postgres/Doris. * fix: [DHIS2-21422] identity-alias prefixed enrollment column in aggregate CTEs ClickHouse's analyzer keeps the original table prefix attached to a projected column when no explicit output alias is given. So a CTE projecting ax.enrollment produces a column the analyzer still binds to ax - the outer query's eb.enrollment from the enrollment_aggr_base CTE alias is then unresolvable. Postgres and Doris implicitly drop the prefix and re-scope under the CTE alias. Add explicit identity aliases at two emission sites in JdbcEnrollmentAnalyticsManager: the enrollment_aggr_base CTE projection and the inline derived-table evf event-date subquery. The alias re-binds the column under the CTE / derived-table alias on ClickHouse and is a no-op for Postgres/Doris. * fix: [DHIS2-21423] NULL-safe date cast for analytics joins ClickHouse's toDate(NULL) and CAST(NULL AS Date) raise CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN, and toDateOrNull only accepts String input. Aggregate queries that join to analytics_rs_dateperiodstructure on a nullable data-element column (e.g. cast(ax."<deUid>" as date)) crash. Introduce AnalyticsSqlBuilder.castAsDate(String). Default emits ANSI `cast(... as date)` - correct for Postgres and Doris. ClickHouse overrides to toDateOrNull(toString(...)), which is both NULL-safe and accepts every input type the cast might receive (Date / DateTime / DateTime64 / String / Nullable variants). JdbcEventAnalyticsManager's hasTimeField LEFT JOIN now routes through sqlBuilder.castAsDate. Output for Postgres/Doris is byte-identical. * fix: [DHIS2-21428] align dimension columns at SELECT for ClickHouse Two fixes for the same family of case-sensitivity issue: 1) AggregatedRowBuilder.addDimensionData lowercases the dimension name with Locale.ROOT before the result-set lookup. Period-column SQL emission was already lowercased; the row builder still asked the result set for "Quarterly" instead of "quarterly", which Postgres/MySQL JDBC normalised silently but ClickHouse JDBC rejected with InvalidResultSetAccessException. 2) Add explicit "as <col>" alias to dimension projections in the SELECT context so the result-set column has a canonical name regardless of how the engine reports the underlying expression. ClickHouse JDBC retains the table prefix on unaliased prefixed columns (ax.uidlevel2) in result-set metadata; Postgres/MySQL strip it. * fix: [DHIS2-21430] generate legend-set columns on ClickHouse via inline CASE JdbcEventAnalyticsTableManager.getColumnFromDataElementWithLegendSet early-returned an empty list on engines without correlated-subquery support (ClickHouse). The companion <deUid>_<lsUid> column was never created in the ClickHouse analytics tables, but the query path still emitted SQL referring to it — every aggregate query grouped or filtered by legend bucket failed with UNKNOWN_IDENTIFIER. Replace the early-return with a per-row CASE expression built from the static legend ranges. Same final value populated into the column, no correlated subquery, ClickHouse-friendly. Postgres and Doris are unaffected: supportsCorrelatedSubquery() returns true on both, so the existing correlated-subquery emission path runs unchanged. * fix: [DHIS2-21431] populate OU-name support column on ClickHouse via JOIN Populate the <deUid>_name support column on ClickHouse via a LEFT JOIN to organisationunit instead of a correlated subquery (which ClickHouse rejects). The column was previously not created at all, causing UNKNOWN_IDENTIFIER errors in aggregate queries that group by an org-unit-typed data element. * Fix formatting * Fix failing unit test * More Unit Test fixes (lower case program) * refactor: [DHIS2-21420] centralise analytics_event/_enrollment table names Introduce AnalyticsTableNames in dhis-support-sql with eventTable(Program) and enrollmentTable(Program) helpers. Replace every inline "analytics_event_" + program.getUid().toLowerCase() and equivalent "analytics_enrollment_" + ... concatenation with a call through the helper. * fix: [DHIS2-21422] identity-alias prefixed enrollment column in PI value CTE Same ClickHouse-analyzer issue as the prior fix in JdbcEnrollmentAnalyticsManager: a CTE projecting `subax.enrollment` without an explicit output alias keeps the column bound to `subax` in ClickHouse's scope, so the outer query's `<alias>.enrollment` reference cannot be resolved. Postgres and Doris re-scope the column under the CTE alias implicitly. Add an explicit identity alias on the projection in DefaultProgramIndicatorSubqueryBuilder so the column is re-bound under the CTE alias on ClickHouse. No-op for Postgres/Doris. * remove doc for tracking issues * fix: [DHIS2-21441] fix rejected correlated scalar subqueries in Events query * fix: [DHIS2-21486] fix event PI queries by replacing correlated subqueries with event-keyed CTE joins * fix: [DHIS2-21487] Fix enrollment stage display-name projections * fix: [DHIS2-21488] fix relationship count CTE SQL for enrollment PIs * fix: [DHIS2-21493] Fix ClickHouse dateDiff literal casting * fix: [DHIS2-21495] Fix option-set stage CTE value projection * Use correlated subqueries when engine supports it * fix: scope ClickHouse event PI CTEs to candidate events Add an internal event_pi_candidates CTE for EVENT analytics queries when the database does not support correlated subqueries and EVENT program-indicator CTEs are generated. The candidate CTE mirrors the outer event query scope, including base filters such as period, org unit, stage, and non-PI query item filters. EVENT PI CTEs now read from this candidate source instead of scanning the full analytics event table. PI filters stay outside the candidate CTE to avoid circular SQL dependencies. Postgres and other correlated-subquery paths remain unchanged. Enrollment PI CTEs keep their existing source behavior. * fix: keep event query enrollment PIs on CTE path Narrow the Event analytics correlated-subquery fallback so it only applies to EVENT-type program indicators. ENROLLMENT-type program indicators used from the Event endpoint still need the CTE path because it expands generated placeholders such as `FUNC_CTE_VAR`, `__PSDE_CTE_PLACEHOLDER__`, and `__D2FUNC__`. * refactor: [DHIS2-21488] replace relationship-count regex with d2:relationshipCount placeholder pipeline * fix: use ClickHouse age() for year/month/week date differences ClickHouse dateDiff() counts boundary crossings, so program indicators using d2:yearsBetween, d2:monthsBetween and d2:weeksBetween returned values that differed from PostgreSQL, which counts completed units via age(). For example dateDiff('year', '2023-12-31', '2024-01-01') is 1 where PostgreSQL reports 0 completed years. Switch the years, months and weeks cases of ClickHouseSqlBuilder.dateDifference to age() (available since ClickHouse 23.1), keeping the same operand order. Days and minutes stay on dateDiff() because PostgreSQL computes them with date subtraction and elapsed time, which already match. * fix: normalize empty text to NULL in ClickHouse enrollment aggregate CTEs Enrollment aggregate queries with program-stage data element dimensions returned more groups on ClickHouse than on Postgres. ClickHouse analytics tables store empty strings for absent text values where Postgres stores NULL, so GROUP BY split empty and NULL into separate buckets. Add AnalyticsSqlBuilder.nullIfEmpty, a no-op by default and overridden in `ClickHouseAnalyticsSqlBuilder` to wrap the column in `nullif(column, '')`. * fix: make ClickHouse safeConcat null-safe for analytics display names * fix: align ClickHouse event query columns and join semantics with Postgres * fix: treat ClickHouse empty-string coordinates as null in coordinatesOnly filter * fix: fold ClickHouse empty-string text dimensions to NULL in event aggregate * fix: alias wrapped ClickHouse aggregate text columns to keep result-set label * fix: join ClickHouse relationship-count CTE by alias, not registry key
655e401 to
5f6919e
Compare
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Backport from da96d3b