Support vector-valued combined_score for multi-objective optimization#458
Support vector-valued combined_score for multi-objective optimization#458yuxuan-z19 wants to merge 3 commits intoalgorithmicsuperintelligence:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds first-class support for vector-valued combined_score (e.g., tuple[float, ...] / list[float]) to enable multi-objective optimization semantics across scoring, logging, early stopping, and island statistics.
Changes:
- Generalize
combined_scorehandling (fitness extraction, formatting, and prompt rendering) to support scalar and vector scores. - Update comparison/delta and aggregation paths to use NumPy operations for non-scalar scores.
- Add unit tests and a MoE load-balancing example workload demonstrating tuple-based scoring.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_tuple_score.py |
Adds tests validating get_fitness_score() behavior for float/list/tuple combined_score. |
tests/test_database.py |
Adds tests for island stats aggregation with float/list/tuple scores; minor formatting cleanups. |
openevolve/utils/metrics_utils.py |
Returns list/tuple combined_score as-is; adds broadcast_value() helper. |
openevolve/utils/format_utils.py |
Introduces format_score() and uses NumPy subtraction for improvement formatting. |
openevolve/prompts/defaults/fragments.json |
Removes scalar-only :.4f formatting placeholders for fitness fragments. |
openevolve/prompt/sampler.py |
Uses format_score() in prompt rendering; adds NumPy allclose for “stable” fitness; broadcasts thresholds for vector scores. |
openevolve/process_parallel.py |
Updates early-stopping logging/threshold handling to support vector-valued scores. |
openevolve/database.py |
Uses format_score() for logging; uses np.subtract for score diffs; computes island averages element-wise via NumPy. |
openevolve/api.py |
Widens EvolutionResult.best_score type and updates __repr__ formatting via format_score(). |
examples/moe_lb/run.sh |
Adds a runnable script for the MoE load-balancing example. |
examples/moe_lb/requirements.txt |
Adds dependencies for the MoE load-balancing example. |
examples/moe_lb/initial_program.py |
Provides a baseline implementation for the MoE load-balancing task. |
examples/moe_lb/evaluator.py |
Adds an evaluator emitting tuple combined_score for multi-objective scoring. |
examples/moe_lb/config.yaml |
Adds an example config emphasizing lexicographic multi-objective scoring. |
examples/moe_lb/archives/task_config.yaml.bak |
Stores archived upstream task config reference. |
examples/moe_lb/archives/prompt_multi_obj.md |
Archived prompt text for multi-objective variant. |
examples/moe_lb/archives/prompt_lp.md |
Archived prompt text for LP-focused variant. |
examples/moe_lb/archives/prompt_common.md |
Archived prompt text for common/heuristic variant. |
examples/moe_lb/archives/loongflow_best_lp.py |
Archived “best” LP solution reference implementation. |
examples/moe_lb/archives/loongflow_best_common.py |
Archived “best” heuristic solution reference implementation. |
.pre-commit-config.yaml |
Updates isort and black hook revisions. |
| HIGH_SCORE = broadcast_value(0.8, score) | ||
| MED_SCORE = broadcast_value(0.6, score) | ||
| LOW_SCORE = broadcast_value(0.4, score) | ||
|
|
||
| # Classify based on score ranges | ||
| if score >= 0.8: | ||
| if score >= HIGH_SCORE: |
There was a problem hiding this comment.
For vector-valued score, the score >= HIGH_SCORE / MED_SCORE / LOW_SCORE checks rely on Python sequence comparisons, which are lexicographic rather than element-wise. This can misclassify programs (e.g., a high first objective but very low secondary objectives still counts as “high performer”). If the intent is element-wise thresholds, use NumPy comparisons (e.g., np.all(np.greater_equal(score, HIGH_SCORE)) after coercing to arrays); if the intent is lexicographic ordering, consider comparing only the primary objective explicitly to make the rule unambiguous.
| else: | ||
| # Event-based early stopping | ||
| if current_score == self.config.convergence_threshold: | ||
| if np.allclose(current_score, convergence_threshold): | ||
| best_score = current_score |
There was a problem hiding this comment.
convergence_threshold is defined only inside the early_stopping_patience > 0 branch, but it’s referenced in the event-based early stopping branch. If early_stopping_patience <= 0, this will raise an UnboundLocalError at runtime. Define/broadcast convergence_threshold before the patience check (or reference self.config.convergence_threshold directly) so both modes work.
This PR introduces support for vector-valued
combined_score(e.g.,tuple[float, ...]) and generalizes its handling across the OpenEvolve pipeline.Related issues:
combined_scoreto support multi-objective optimization via Lexicographical ordering #453Changes
Fitness Representation
combined_scorenow supports:float | tuple | listfloatComparison Semantics
np.subtractnp.allcloseEarly Stopping
np.allclose()for convergence checksLogging and Formatting
format_score()for unified scalar/vector formatting%.4f,:.4f)Aggregation / Statistics
Pipeline Integration
Testing
combined_scoreBackward Compatibility
combined_scoreremains fully supportedComparison logic is extended to support vector semantics when applicable.
Fallback Behavior
When
combined_scoreis absent, the existing fallback is preserved:safe_numeric_average(p.metrics). This remains scalar (no implicit broadcasting).Broadcasting this value to match vector-valued comparisons was considered, but not included to avoid introducing implicit behavior. This can be revisited if stricter consistency is preferred.
Summary
This PR extends
combined_scorefrom a scalar assumption to a general vector-compatible representation, with consistent behavior across evaluation, comparison, logging, and aggregation.Regards,
@yuxuan-z19 (/w Baidu FM-Agent @baidubce)