feat: dx improvements for optimization package#139
Open
andrewklatzke wants to merge 1 commit intoaklatze/AIC-2178/verify-runs-endpointfrom
Open
feat: dx improvements for optimization package#139andrewklatzke wants to merge 1 commit intoaklatze/AIC-2178/verify-runs-endpointfrom
andrewklatzke wants to merge 1 commit intoaklatze/AIC-2178/verify-runs-endpointfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Requirements
Describe the solution you've provided
Improves the developer experience when using the SDK and fixes a bug where the global model was being ignored for judges.
Describe alternatives you've considered
This is a QoL change for folks consuming this SDK method. Weren't really alternatives considered.
Additional context
the TLDR; here is that when implementing this against multiple frameworks I found myself falling into the pattern of specifying the same handler for both agents and judges. Since that's the case, I've updated it so that
handle_judge_callis optional and defaults tohandle_agent_callif it's not specified. With this change, the optimization config when using an LD-built config is reduced to just this:Additionally just adds an
is_evaluationflag as the final argument forhandle_agent_callso that if you're using the singular method you can still discern which is which if necessary.Note
Medium Risk
Public callback signatures change (extra
is_evaluationarg) and judge-model selection behavior is corrected, which can break existing integrations or alter evaluation results if consumers relied on per-judge model overrides.Overview
Improves the optimization SDK callback ergonomics by making
handle_judge_calloptional (defaults tohandle_agent_call) and adding anis_evaluationboolean argument to both agent/judge call handlers so a shared implementation can differentiate evaluation vs generation.Fixes judge execution to always use the globally configured
judge_model(while still forwarding judge-flag model parameters like temperature/tools) and routes judge calls through a single internal_judge_callfallback. Tests are updated to reflect the new callback signature and defaulting behavior.Reviewed by Cursor Bugbot for commit 7074cfa. Bugbot is set up for automated code reviews on this repo. Configure here.