perf: major speed up when querying jobs by tags#429
Open
michaeladler wants to merge 12 commits intosiemens:mainfrom
Open
perf: major speed up when querying jobs by tags#429michaeladler wants to merge 12 commits intosiemens:mainfrom
michaeladler wants to merge 12 commits intosiemens:mainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #429 +/- ##
==========================================
- Coverage 73.90% 73.73% -0.18%
==========================================
Files 96 96
Lines 4055 4059 +4
==========================================
- Hits 2997 2993 -4
- Misses 828 839 +11
+ Partials 230 227 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ef187e2 to
56b1473
Compare
stormc
reviewed
Apr 14, 2026
stormc
reviewed
Apr 16, 2026
stormc
reviewed
Apr 16, 2026
stormc
reviewed
Apr 16, 2026
Enable logging of all SQL queries when the log level is set to trace. This is useful for identifying slow or inefficient queries during development and debugging, e.g. to analyze N+1 query problems. Signed-off-by: Michael Adler <michael.adler@siemens.com>
Signed-off-by: Michael Adler <michael.adler@siemens.com>
Signed-off-by: Michael Adler <michael.adler@siemens.com>
Signed-off-by: Michael Adler <michael.adler@siemens.com>
Signed-off-by: Michael Adler <michael.adler@siemens.com>
Previously this was ui/priv, now it's in ui/dist. Signed-off-by: Michael Adler <michael.adler@siemens.com>
Signed-off-by: Michael Adler <michael.adler@siemens.com>
Introduce a 'populate' command to easily fill the database with sample data. This is useful for reproducing performance issues or testing scenarios that require a non-empty database. Signed-off-by: Michael Adler <michael.adler@siemens.com>
The old ent-generated code looped over each tag and added a separate
HasTagsWith predicate, producing one correlated IN subquery per tag:
WHERE job.id IN (SELECT tag_jobs.job_id FROM tag_jobs
JOIN tag ON ... WHERE tag.name = 'TAG1')
AND job.id IN (SELECT tag_jobs.job_id FROM tag_jobs
JOIN tag ON ... WHERE tag.name = 'TAG2')
Each subquery performs an independent scan of the tag_jobs table, which
is expensive when the jobs table is large.
Replace this with a single explicit JOIN on tag_jobs and tags, filtering
all requested tags in one IN clause:
FROM job
JOIN tag_jobs ON job.id = tag_jobs.job_id
JOIN tag ON tag_jobs.tag_id = tag.id
WHERE tag.name IN ('TAG1', 'TAG2')
The JOIN allows the database to resolve the tag filter in a single pass.
Add a database index on tag_jobs(job_id) for MySQL, PostgreSQL, and
SQLite so the join can use an index lookup instead of a sequential scan.
Use DISTINCT to deduplicate rows introduced by the join.
Signed-off-by: Michael Adler <michael.adler@siemens.com>
Signed-off-by: Michael Adler <michael.adler@siemens.com>
Add a `pagination` boolean query parameter to the GET /jobs and GET /workflows endpoints. When not set (default: false), the pagination object is omitted from the response, reducing payload size for clients that don't need it. Signed-off-by: Michael Adler <michael.adler@siemens.com>
b552c74 to
7bfc656
Compare
This removes the retry loops for storage initialization and creating network listeners. These are unnecessary in both common scenarios: - Developer use: fast failure with a clear error is more useful than silently retrying for minutes. - Production: service managers (systemd, k8s) already handle restarts with proper backoff and observability. Fail fast and let the caller decide how to recover. Signed-off-by: Michael Adler <michael.adler@siemens.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Computing the
totalfield in the/jobspagination result parameter was done in a very inefficient way (correlating with the number of queried tags).This patch series contains the following two optimizations:
pagination=trueto include the pagination metadata in responses.HasTagsWithpredicate, producing one correlated IN subquery per tag:Each subquery performs an independent scan of the
tag_jobstable, which is expensive when the jobs table is large.Replace this with a single explicit
JOINontag_jobsandtags, filtering all requested tags in a single IN clause:The
JOINallows the database to resolve the tag filter in a single pass.Add a database index on
tag_jobs(job_id)so the join can use an index lookup instead of a sequential scan.Benchmarks
I used the enhanced wfx-loadtest to populate a locally running PostgreSQL database with 1 million jobs, each having two tags.
Issues Addressed
List and link all the issues addressed by this PR.
Change Type
Please select the relevant options:
Checklist