feat: add operational counters for sampling, requirements, and tools (#467)#883
feat: add operational counters for sampling, requirements, and tools (#467)#883ajbozarth wants to merge 2 commits intogenerative-computing:mainfrom
Conversation
…enerative-computing#467) Adds six new OpenTelemetry counters giving operators visibility into retry behaviour, validation failure rates, and tool call health: mellea.sampling.attempts/successes/failures, mellea.requirement.checks/failures, and mellea.tool.calls. Follows the established lazy-init globals + record_* helpers + Plugin hooks pattern. Extends SamplingIterationPayload and SamplingLoopEndPayload with a strategy_name field so plugins can tag counters by strategy class. Assisted-by: Claude Code Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
PeriodicExportingMetricReader background threads (60 s default) would fire after pytest closed stdout once the suite crossed the 60 s mark, causing "I/O operation on closed file" and OTLP UNAVAILABLE errors. Assisted-by: Claude Code Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
|
This is parallel work to #882 which adds the other remaining metrics telemetry and whichever merges second will need a manual merge to deal with the overlapping git diffs. |
|
This looks good to me as the implementation of the issue #467. I wonder how these counters are useful. They are grouped by strategies and requirements. They may not give useful insight without some more fine groupings. |
akihikokuroda
left a comment
There was a problem hiding this comment.
LGTM. I have a general comment about the issue that this PR implements.
@jakelorocco @psschwei I actually wondered about this while implementing. I thought the idea was interesting and valuable, but my work did not create an example of how it could be used (which maybe I should add?) |
Misc PR
Type of PR
Description
Adds six new OpenTelemetry counters as part of the telemetry epic (#443),
giving operators visibility into retry behaviour, validation failure rates,
and tool call health alongside the existing LLM-level metrics:
mellea.sampling.attempts/mellea.sampling.successes/mellea.sampling.failures— tagged by strategy class namemellea.requirement.checks/mellea.requirement.failures— tagged by requirement class name and failure reasonmellea.tool.calls— tagged by tool name andstatus("success"/"failure")Follows the established plugin pattern: lazy-init counter globals +
record_*helpers in
metrics.py, consumed by three new Plugin classes inmetrics_plugins.py(SamplingMetricsPlugin,RequirementMetricsPlugin,ToolMetricsPlugin) that hook into existing hook events inFIRE_AND_FORGETmode. No metric calls in business-logic files.
Also extends
SamplingIterationPayloadandSamplingLoopEndPayloadwith astrategy_namefield (backward-compatible, defaults to"") so plugins cantag counters with the strategy class name.
Testing
Attribution