Skip to content

feat(temporal): add composite metric (backlog + running workflow count)#7460

Open
Sanil2108 wants to merge 1 commit intokedacore:mainfrom
Sanil2108:feature/temporal-scaler-composite-metric
Open

feat(temporal): add composite metric (backlog + running workflow count)#7460
Sanil2108 wants to merge 1 commit intokedacore:mainfrom
Sanil2108:feature/temporal-scaler-composite-metric

Conversation

@Sanil2108
Copy link
Copy Markdown

@Sanil2108 Sanil2108 commented Feb 18, 2026

  • Add includeRunningWorkflowCount (default: true) to use backlog + running workflow count as the scaling metric, avoiding premature scale-down when workers are fast and backlog is often zero.
  • Add optional workflowTaskQueueForCount to scope running count by workflow task queue (e.g. when scaling activity workers).
  • getRunningWorkflowCount() uses CountWorkflow with visibility query ExecutionStatus = 'Running' AND TaskQueue = ''.
  • On CountWorkflow failure, fall back to backlog-only and log at V(1).
  • Add tests for new metadata and default composite behavior.

Provide a description of what has been changed

Checklist

  • When introducing a new scaler, I agree with the scaling governance policy
  • I have verified that my change is according to the deprecations & breaking changes policy
  • Tests have been added (if applicable)
  • Ensure make generate-scalers-schema has been run to update any outdated generated files
  • Changelog has been updated and is aligned with our changelog requirements, only when the change impacts end users
  • A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
  • A PR is opened to update the documentation on (repo) (if applicable)
  • Commits are signed with Developer Certificate of Origin (DCO - learn more)

Fixes #7459
Docs: kedacore/keda-docs#1714

@keda-automation keda-automation requested a review from a team February 18, 2026 11:07
@github-actions
Copy link
Copy Markdown

Thank you for your contribution! 🙏

Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected.

While you are waiting, make sure to:

  • Add an entry in our changelog in alphabetical order and link related issue
  • Update the documentation, if needed
  • Add unit & e2e tests for your changes
  • GitHub checks are passing
  • Is the DCO check failing? Here is how you can fix DCO issues

Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient.

Learn more about our contribution guide.

@keda-automation keda-automation requested a review from a team February 18, 2026 11:07
@snyk-io
Copy link
Copy Markdown

snyk-io Bot commented Feb 18, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@rickbrouwer
Copy link
Copy Markdown
Member

rickbrouwer commented Feb 26, 2026

I think it's mainly solved by running this command from root so the schema's are created automatically:

make generate-scalers-schema

And the DCO of course :)

@Sanil2108 Sanil2108 force-pushed the feature/temporal-scaler-composite-metric branch from 782db31 to 3f6bb6c Compare March 8, 2026 03:01
@Sanil2108 Sanil2108 marked this pull request as ready for review March 8, 2026 03:07
@Sanil2108 Sanil2108 force-pushed the feature/temporal-scaler-composite-metric branch from 3f6bb6c to 6d5c27f Compare March 8, 2026 03:56
@Sanil2108
Copy link
Copy Markdown
Author

Relevant Keda docs PR - kedacore/keda-docs#1714

Comment thread pkg/scalers/temporal_scaler.go Outdated
Comment thread pkg/scalers/temporal_scaler.go
Comment thread pkg/scalers/temporal_scaler.go Outdated
Comment thread pkg/scalers/temporal_scaler.go Outdated
@keda-automation keda-automation requested a review from a team March 15, 2026 11:39
@Sanil2108 Sanil2108 force-pushed the feature/temporal-scaler-composite-metric branch 6 times, most recently from cc169f8 to bfe999b Compare March 15, 2026 14:18
@Sanil2108
Copy link
Copy Markdown
Author

The tests are failing, is it possible they are flaky? They sometimes pass and sometimes fail

@rickbrouwer
Copy link
Copy Markdown
Member

The tests are failing, is it possible they are flaky? They sometimes pass and sometimes fail

Yeah, that is a flaky test. It would be nice if it were solved soon.

=== Failed
=== FAIL: pkg/scalers TestWaitForState (6.01s)
    external_scaler_test.go:301: waitForState should be get connectivity.Shutdown.

=== FAIL: pkg/scalers TestWaitForState (re-run 1) (6.01s)
    external_scaler_test.go:301: waitForState should be get connectivity.Shutdown.

=== FAIL: pkg/scalers TestWaitForState (re-run 2) (6.01s)
    external_scaler_test.go:301: waitForState should be get connectivity.Shutdown.

@Sanil2108
Copy link
Copy Markdown
Author

The tests are failing, is it possible they are flaky? They sometimes pass and sometimes fail

Yeah, that is a flaky test. It would be nice if it were solved soon.

=== Failed
=== FAIL: pkg/scalers TestWaitForState (6.01s)
    external_scaler_test.go:301: waitForState should be get connectivity.Shutdown.

=== FAIL: pkg/scalers TestWaitForState (re-run 1) (6.01s)
    external_scaler_test.go:301: waitForState should be get connectivity.Shutdown.

=== FAIL: pkg/scalers TestWaitForState (re-run 2) (6.01s)
    external_scaler_test.go:301: waitForState should be get connectivity.Shutdown.

@rickbrouwer Understood, I can't rerun it as well, what should I do here?

@rickbrouwer
Copy link
Copy Markdown
Member

@rickbrouwer Understood, I can't rerun it as well, what should I do here?

I will keep re-running it until it's green

@Sanil2108
Copy link
Copy Markdown
Author

@rickbrouwer I think they are all green now, PTAL

@saifxhatem
Copy link
Copy Markdown

Is there anything blocking this PR from being merged? It would be really helpful for me and I've been watching it every day for 2 weeks :/

Copy link
Copy Markdown
Member

@rickbrouwer rickbrouwer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a validation in Validate() that returns an error when workflowTaskQueueForCount is set but includeRunningWorkflowCount is false?

Without this, a user who sets workflowTaskQueueForCount but forgets to enable includeRunningWorkflowCount will silently get no effect from the parameter.

Comment thread pkg/scalers/temporal_scaler.go
@mwelk
Copy link
Copy Markdown

mwelk commented Apr 9, 2026

We're also waiting on this — we run Temporal workers on EKS with KEDA and hit the same premature scale-down issue during long-running sequential workflows. The queueTypes: "workflow,activity" parameter helps for parallel activities, but for sequential workflows where the queue appears empty between activities, includeRunningWorkflowCount is the missing piece. Would love to see this merged. Thanks for the great work @Sanil2108!

@rickbrouwer rickbrouwer added the waiting-author-response All PR's or Issues where we are waiting for a response from the author label Apr 9, 2026
@keda-automation keda-automation requested a review from a team April 9, 2026 12:26
@rickbrouwer rickbrouwer removed the waiting-author-response All PR's or Issues where we are waiting for a response from the author label Apr 9, 2026
@Sanil2108 Sanil2108 force-pushed the feature/temporal-scaler-composite-metric branch from 2882049 to 55d8a82 Compare April 16, 2026 20:24
@Sanil2108
Copy link
Copy Markdown
Author

Can you add a validation in Validate() that returns an error when workflowTaskQueueForCount is set but includeRunningWorkflowCount is false?

Without this, a user who sets workflowTaskQueueForCount but forgets to enable includeRunningWorkflowCount will silently get no effect from the parameter.

@rickbrouwer Done

@Sanil2108
Copy link
Copy Markdown
Author

Hi @rickbrouwer — just checking in. I've addressed all the review feedback (including the validation in Validate() for workflowTaskQueueForCount without includeRunningWorkflowCount). CI is green (the flaky test aside). Is there anything else needed from my side to get this ready for merge? Happy to make any further changes. Thanks!

if err != nil {
s.logger.Error(err, "failed to get running workflow count, skipping for activity check")
} else {
isActive = runningCount > 0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intentionally 0 or should it be > s.metadata.ActivationTargetQueueSize?

Copy link
Copy Markdown
Member

@rickbrouwer rickbrouwer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're almost there. I see that DCO is still failing and that some unit tests are still failing. Could you fix this?

Add support for including running workflow count alongside task queue backlog
as the scaling metric, so workers don't get prematurely scaled down when the
backlog is frequently drained but workflows are still executing.

New metadata fields:
- includeRunningWorkflowCount (default: false) — when true, scale by
  backlog + number of running workflow executions.
- workflowTaskQueueForCount (optional) — scope the running-count query by
  workflow task queue, useful when scaling activity workers whose task
  queue differs from the workflow task queue.

Running-count query uses CountWorkflow with visibility filter
ExecutionStatus = 'Running' [ AND TaskQueue = '<workflowTaskQueueForCount>' ].
On CountWorkflow failure the scaler falls back to backlog-only to avoid
scale-to-zero on transient errors.

Validate() rejects workflowTaskQueueForCount without
includeRunningWorkflowCount so the parameter can't be silently ignored.

Unit tests cover new metadata parsing, the composite path, and the
validation rule.

Ref: kedacore#7459
Signed-off-by: Sanil2108 <sanilkhurana7@gmail.com>
@Sanil2108 Sanil2108 force-pushed the feature/temporal-scaler-composite-metric branch from 09fdd74 to 3d8526d Compare April 24, 2026 09:37
@keda-automation keda-automation requested a review from a team April 24, 2026 09:37
@Sanil2108
Copy link
Copy Markdown
Author

@rickbrouwer apologies for the churn — I rebased onto latest main and squashed into a single clean commit:

  • Fixes DCO (all commits now signed-off)
  • Pulls in latest main (branch was 54 commits behind)
  • Regenerated schema/generated/scalers-schema.{json,yaml}
  • Added the Validate() cross-field check that was lost during an earlier revert: workflowTaskQueueForCount without includeRunningWorkflowCount now returns a clear error

Local go test ./pkg/scalers/ -run TestTemporal is green. Please re-trigger CI when you get a chance. Thanks for the patience!

@rickbrouwer rickbrouwer added the waiting-on-other-pr All PR's that are waiting for an other PR which must be merged first label Apr 24, 2026
@rickbrouwer
Copy link
Copy Markdown
Member

We will first wait until #7672 has merged in terms of order.

@Sanil2108
Copy link
Copy Markdown
Author

Sounds good, @rickbrouwer — will hold off until #7672 lands. Let me know if I should rebase after it merges, or if you'd prefer I leave the branch as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting-on-other-pr All PR's that are waiting for an other PR which must be merged first

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhance the Temporal scaler to use a composite metric that considers both task queue backlog and running workflows

4 participants