feat: simplify evaluation schema to flat score/reasoning shape by jsonbailey · Pull Request #1286 · launchdarkly/js-core

jsonbailey · 2026-04-16T18:22:05Z

Summary

Removed the metric key from the structured output schema. EvaluationSchemaBuilder.build() no longer takes an evaluationMetricKey parameter. Since there is only ever a single evaluation metric key per judge config, it does not need to be embedded in the schema sent to the LLM.
Flattened the schema to a top-level {score, reasoning} shape. The old nested structure ({evaluations: {metricKey: {score, reasoning}}}) is replaced with a simple {score: number, reasoning: string} object. This is easier for LLMs to produce correctly and matches the Python SDK (fix: Remove evaluation metric key from schema which failed on some LLMs python-server-sdk-ai#105).
Updated parsing in Judge.ts. _parseEvaluationResponse now reads score and reasoning directly from the top-level response data. The metric key is still sourced from the judge config's evaluationMetricKey and used to key the result — it just no longer appears in the schema or LLM response.

Test plan

All 144 existing tests pass (yarn workspace @launchdarkly/server-sdk-ai test)
Lint passes (yarn workspace @launchdarkly/server-sdk-ai lint)
Test mocks updated to use new flat response shape
_parseEvaluationResponse unit tests updated for simplified signature and data shape

🤖 Generated with Claude Code

Note

Medium Risk
Changes the wire format expected from the AI provider for judge evaluations, so any callers/providers still producing the old nested evaluations shape will now fail parsing and return unsuccessful results.

Overview
Judge structured-output evaluation is simplified to a fixed, flat schema: the LLM is now asked to return top-level score and reasoning instead of {evaluations: {<metricKey>: ...}}, and response parsing is updated accordingly.

This removes the dynamic EvaluationSchemaBuilder and tightens failure handling/logging when the structured response cannot be parsed; tests are updated to reflect the new response shape and malformed/empty-response behavior.

^{Reviewed by Cursor Bugbot for commit d81b202. Bugbot is set up for automated code reviews on this repo. Configure here.}

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-16T18:23:44Z

@launchdarkly/js-sdk-common size report
This is the brotli compressed size of the ESM build.
Compressed size: 25623 bytes
Compressed size limit: 29000
Uncompressed size: 125843 bytes

github-actions · 2026-04-16T18:23:56Z

@launchdarkly/js-client-sdk size report
This is the brotli compressed size of the ESM build.
Compressed size: 31655 bytes
Compressed size limit: 34000
Uncompressed size: 112792 bytes

github-actions · 2026-04-16T18:24:01Z

@launchdarkly/browser size report
This is the brotli compressed size of the ESM build.
Compressed size: 179375 bytes
Compressed size limit: 200000
Uncompressed size: 829982 bytes

github-actions · 2026-04-16T18:24:04Z

@launchdarkly/js-client-sdk-common size report
This is the brotli compressed size of the ESM build.
Compressed size: 37169 bytes
Compressed size limit: 38000
Uncompressed size: 204305 bytes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Delete EvaluationSchemaBuilder.ts and define EVALUATION_SCHEMA as a module-level const in Judge.ts. Remove per-field warnings from _parseEvaluationResponse (keep it pure) and emit a single warning in evaluate() that includes the judge key and raw response data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit d81b202. Configure here.}

cursor · 2026-04-16T21:59:21Z

        this._logger?.warn(
-          'Judge evaluation did not return the expected evaluation',
-          tracker.getTrackData(),
+          `Could not parse evaluation response for judge "${this._aiConfig.key}": ${JSON.stringify(response.data)}`,


Parse-failure warning drops tracker context data

Low Severity

The new warn call at the parse-failure point no longer passes tracker.getTrackData() as a second argument, unlike the other two warn calls in the same method (for missing metric key and missing messages), which still include it. The track data contains runId, variationKey, version, modelName, and providerName — operational context useful for correlating warnings in production. Since tracker is available in scope, this appears to be an accidental omission during the refactor.

^{Reviewed by Cursor Bugbot for commit d81b202. Configure here.}

feat: simplify evaluation schema to flat score/reasoning shape

ea9f2d6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jsonbailey and others added 2 commits April 16, 2026 16:07

chore: remove unnecessary comment from EvaluationSchemaBuilder

f3a8bf3

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jsonbailey marked this pull request as ready for review April 16, 2026 21:55

jsonbailey requested a review from a team as a code owner April 16, 2026 21:55

cursor bot reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: simplify evaluation schema to flat score/reasoning shape#1286

feat: simplify evaluation schema to flat score/reasoning shape#1286
jsonbailey wants to merge 3 commits intofeat/ai-sdk-next-releasefrom
jb/aic-2253/simplify-eval-schema

jsonbailey commented Apr 16, 2026 •

edited by cursor bot

Loading

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jsonbailey commented Apr 16, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 16, 2026

Choose a reason for hiding this comment

Parse-failure warning drops tracker context data

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jsonbailey commented Apr 16, 2026 •

edited by cursor bot

Loading