Preflight memory by mike-ferguson · Pull Request #2366 · brain-score/vision

mike-ferguson · 2026-04-16T15:34:23Z

Summary

Adds a pre-flight memory estimation system that catches OOM errors before committing to a multi-hour benchmark run.

Problem

Scoring large models on neural benchmarks can take 6+ hours. If the job runs out of RAM, you lose all that time with nothing to show for it. There was no way to know upfront whether a model would OOM on a given benchmark.

Solution: Metric-aware memory formula

Before scoring begins, the system estimates total RAM needed using a formula matched to each benchmark's regression type:

Benchmark type	Formula	Notes
RDM/RSA	`activation_gb × 3`	Pairwise distance computation passes through the full activation matrix — overhead scales with features, not stimulus count. Validated: ≤4% error across alexnet/resnet50/ViT.
Ridge/RidgeCV (`n_features ≤ n_stimuli`, calibrated)	`activation_gb + fixed_benchmark_cost_gb`	sklearn primal solver — gram matrix is `n_stimuli×n_stimuli`, model-independent. Accurate when features are smaller than stimulus count (e.g. alexnet on most benchmarks). Validated: ≤1% error.
Ridge/RidgeCV (`n_features > n_stimuli`)	`activation_gb × 6`	sklearn switches to SVD of X — overhead ≈ 5× activation, so calibrated fixed cost (measured on small models) severely underestimates. Falls back to ×6 so pre-flight raises `MemoryError` cleanly before the OS kills the container. Validated: −1.5% error for resnet50 and ViT on Gifford2022.
Ridge/RidgeCV (`n_features ≤ n_stimuli`, no calibration entry)	`activation_gb + n_stimuli²×4B`	Formula fallback using gram matrix size.
PLS (`-pls`, `-reverse_pls`)	`activation_gb × 7 + fixed_benchmark_cost_gb`	PLS cross-covariance matrices scale with `num_features` — overhead is not model-independent. Multiplier calibrated on 3-model × 2-benchmark grid; worst miss: −12.7% (within 15%). Warning printed at runtime.
Fallback	`activation_gb × 6`	Used when benchmark type is unrecognised.

activation_gb — model-dependent. Measured by running a single forward pass (the "probe") on 1 stimulus and extrapolating: num_stimuli × num_features × num_timebins × 4 bytes
fixed_benchmark_cost_gb — benchmark-dependent, model-independent. Stored in benchmark_costs.json.

Changes

New file: `brainscore_vision/benchmark_helpers/benchmark_costs.json`

Calibration table with fixed overhead costs for 49 neural benchmarks. Used for ridge/ridgecv and PLS benchmarks. RDM benchmarks do not require calibration entries.

`brainscore_vision/benchmark_helpers/memory.py`

preallocate_memory(model, benchmark) — probes the model with 1 stimulus, dispatches to the correct formula based on benchmark type, and raises MemoryError if estimate exceeds available RAM
Metric-type detection — three new helpers: _is_pls_minimum, _is_rdm_benchmark, _is_ridge_benchmark. Detection is by identifier suffix (-pls, -reverse_pls, -temporal-pls, -rdm, -ridge, -ridgecv) and/or isinstance check for RSABenchmark
RSABenchmark support — previously raised TypeError for RSA/RDM benchmarks; now fully handled with the model-independent n_stimuli² formula
MemoryEstimate dataclass — added formula_type field ('pls', 'rdm', 'ridge_formula', 'calibrated', 'fallback') and rdm_overhead_gb field; __str__ renders the correct formula per type
PLS overhead multiplier — reduced from ×10 to ×7; a warning is now printed when PLS formula is used: WARNING: PLS overhead multiplier (×7) is approximate. Actual usage can vary significantly depending on model feature count and convergence.
load_calibration() / save_calibration() — unchanged; automatically loads bundled benchmark_costs.json; falls back to ~/.brainscore/benchmark_costs.json

`brainscore_vision/init.py`

Pre-flight runs automatically on every score() call via score_benchmark
Improved AssertionError message for stale activations cache, with the exact rm command to fix it

`scripts/preflight_check.py` (new)

Integration test script for a single (model, benchmark) pair:

`scripts/mem_profile_suite.py`

Added --calibrate mode: runs alexnet on all neural benchmarks to produce the fixed-cost table
Saves incrementally after each benchmark (crash-safe)
Added --resume-from BENCHMARK_ID to pick up after a crash

`scripts/validation.py` (new)

Runs a 3-model × 3-benchmark grid to validate how accurately the pre-flight estimator predicts actual peak RSS. Reports over/under estimates per pair and writes results to validation_results.jsonl.

How to extend the calibration table

# Run alexnet on all benchmarks and save results
python scripts/mem_profile_suite.py --calibrate --csv ~/calibration.csv

# Resume after a crash
python scripts/mem_profile_suite.py --calibrate --csv ~/calibration.csv \
    --resume-from MajajHong2015public.IT-pls

## Tests (`tests/test_plugin_management/test_memory_precheck.py`)

- **`TestCalibrationIO`** — load/save roundtrip, missing file, corrupt JSON, directory creation
- **`TestCalibratedFormula`** — calibrated, RDM formula, ridge formula fallback, PLS ×7, fallback ×6; OOM detection for each path
- **`TestMemoryEstimateStr`** — `__str__` renders correct formula label per type; `OK` / `OOM LIKELY` status
- **`TestCalibratedIntegration`** — full roundtrip with mock model and benchmark; confirms `score_benchmark` calls `preallocate_memory` before scoring

---

## Still ToDo

- [x] Complete remaining benchmark estimates via EC2 runs (in progress) to populate JSON

Note: This PR is dependent on changes in `core` — specifically PR #168

KartikP

Additional change required in an unmodified file

brainscore_vision.benchmarks.Benchmark is a separate ABC from brainscore_core.benchmarks.Benchmark, so the new no-op from Core doesn't propagate. Most if not all behavioral/engineering benchmarks will likely crash with an AttributeError at _run_score's score_benchmark(benchmark, model) call.

To resolve this, you probably just have to add the same def preallocate_memory(self, candidate): pass to brainscore_vision/benchmarks/__init__.py Benchmark.

Actual review

Outside of that, I think the rest of the comments will help resolve this PR. Depending on the follow up from integrating with gated scoring, another recommendation might be to structure the failure signal for easier parsing rather than String parsing. Edit: Following up to this, how does an OOM from a behavior benchmark trigger retry policy in gated scoring. Since behavior and engineering benchmarks do not run through this path, if they run OOM on the smallest EC2 instances (As per gated scoring), will they be just incremented up one level to the next queue?

Additionally, this PR adds a lot of lines changed, but largely for the core Brain-Score developer team to maintain (likely unseen be regular users). If we continue to use this approach, we should consider logging every run in a table so we can refine the benchmark_costs.json over time.

Moving forward, we should aim to resolve common reasons for OOM e.g., switching to dual ridge when n_features > n_stimuli or incremental/chunked solvers for when feature count is large.

This PR adds an important step towards improving brain-score scoring success and appropriately and cost-effectively allocating compute resources.

into preflight_memory

mike-ferguson added 15 commits April 15, 2026 09:32

add initial suite of memory check

fbb9f4f

add better logging to profile suite

8b01039

add recommendations

5e240a5

more logging

88fdbec

moved to empirical fixed cost vs. estimate

8e55283

add more changes

45e87c6

update docs, make _init_ work

59fa312

add cahce clear hint

2830075

add better tests

442030e

add json

21593a3

final json

b7f44d8

add validation script

6112fe7

add benchmark types

0546240

update validation script

f1b0dff

change RDM calculation

d2a3f47

mike-ferguson requested a review from KartikP May 7, 2026 19:35

changes to memory.py

313b0f3

KartikP requested changes May 13, 2026

View reviewed changes

KartikP and others added 11 commits May 13, 2026 09:08

Merge branch 'master' into preflight_memory

1679d63

Merge branch 'master' into preflight_memory

1ffbb3e

add JSON sentinel, ABC for b/e

31b69ee

add rsa benchmark override

766f808

mj2015 test fix

5431dca

fix assertion issue

ccd8654

fix print issues

ab481ed

other QOL fixes

65abdbe

Merge branch 'preflight_memory' of https://github.com/brain-score/vision

e438734

into preflight_memory

Merge branch 'master' into preflight_memory

a09f869

Merge branch 'master' into preflight_memory

e2b277e

mike-ferguson closed this May 27, 2026

mike-ferguson reopened this May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preflight memory#2366

Preflight memory#2366
mike-ferguson wants to merge 27 commits into
masterfrom
preflight_memory

mike-ferguson commented Apr 16, 2026 •

edited

Loading

Uh oh!

KartikP left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mike-ferguson commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution: Metric-aware memory formula

Changes

New file: brainscore_vision/benchmark_helpers/benchmark_costs.json

brainscore_vision/benchmark_helpers/memory.py

brainscore_vision/__init__.py

scripts/preflight_check.py (new)

scripts/mem_profile_suite.py

scripts/validation.py (new)

How to extend the calibration table

Note: This PR is dependent on changes in core — specifically PR #168

Uh oh!

KartikP left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional change required in an unmodified file

Actual review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mike-ferguson commented Apr 16, 2026 •

edited

Loading

New file: `brainscore_vision/benchmark_helpers/benchmark_costs.json`

`brainscore_vision/benchmark_helpers/memory.py`

`brainscore_vision/init.py`

`scripts/preflight_check.py` (new)

`scripts/mem_profile_suite.py`

`scripts/validation.py` (new)

Note: This PR is dependent on changes in `core` — specifically PR #168

KartikP left a comment •

edited

Loading