feat: record per-structure peak device memory by lwalew · Pull Request #134 · instadeepai/mlipaudit

lwalew · 2026-05-22T09:49:26Z

Summary

Records peak_bytes_in_use from jax.devices()[0].memory_stats() after each structure's simulation in the scaling benchmark.
New optional field ScalingStructureResult.peak_memory_bytes (NonNegativeInt | None) — backward-compatible with existing stored results (defaults to None when the field is absent in older JSON rows).
New "Peak device memory vs system size" chart on the scaling UI page, alongside the existing step-time chart. Falls back to a message when the active JAX backend does not expose memory_stats() (CPU runs).

Why

ScalingBenchmark currently records timing only. For models that OOM on the larger systems in the dataset (e.g. 1vsq, 1a7m, 1ab7 at ~6700 / 2800 / 1400 atoms), we have no recorded signal of where on the size axis the GPU pressure starts climbing — only that the simulation failed. Adding peak_memory_bytes gives us a "high-water mark vs num_atoms" curve per model, which is the natural complement to the existing "step time vs num_atoms" plot.

Semantics of the value

peak_bytes_in_use is monotonic-since-process-start. The reading is captured in a finally: block so it lands on both the success path and the failure path. For the size-sorted structure list, the value plotted against num_atoms traces the cumulative high-water mark — i.e. the value at structure $i$ is an upper bound on the per-system peak for structure $i$. Documented in the docstrings.

Test plan

pytest tests/scaling/ — both existing tests updated + passing.
pre-commit run --files … — ruff / ruff-format / mypy / conventional-commit all green.
Validate end-to-end on a GPU run (queued in the internal driver — will report numbers in a follow-up comment).
Sanity-check the new chart renders with mixed None/numeric data when one model was run on CPU and another on GPU.

Notes

Drafted against develop; the in-flight feat/pass-charges-to-models work touches the same run_model loop but on different lines — should rebase cleanly when that lands.
For internal consumers (mlipaudit-internal leaderboard DB): experiment.result is stored as JSON, and Pydantic validates older rows with the new optional field defaulting to None. No migration needed.

🤖 Generated with Claude Code

CLAassistant · 2026-05-22T12:40:30Z

All committers have signed the CLA.

github-actions · 2026-05-22T12:43:22Z

Coverage Report

File	Stmts	Miss	Cover	Missing
src/mlipaudit
__init__.py	2	0	100%
benchmark.py	77	7	90%	146, 165, 231, 237–238, 248, 262
exceptions.py	2	0	100%
io.py	100	25	75%	117, 121, 146–150, 152, 154, 171–172, 174–175, 192–193, 195, 213–221
io_helpers.py	52	0	100%
run_mode.py	5	0	100%
scoring.py	29	7	75%	101–102, 104, 107, 111–112, 114
src/mlipaudit/benchmarks
__init__.py	29	0	100%
src/mlipaudit/benchmarks/bond_length_distribution
bond_length_distribution.py	95	3	96%	208, 211, 276
src/mlipaudit/benchmarks/conformer_selection
conformer_selection.py	132	5	96%	249, 252, 295, 300, 340
src/mlipaudit/benchmarks/dihedral_scan
dihedral_scan.py	116	5	95%	222, 225, 259, 262, 304
src/mlipaudit/benchmarks/folding_stability
folding_stability.py	106	4	96%	212, 270, 275, 339
helpers.py	35	0	100%
src/mlipaudit/benchmarks/noncovalent_interactions
noncovalent_interactions.py	168	10	94%	245, 248, 272, 283, 407, 424, 441, 525–527
src/mlipaudit/benchmarks/nudged_elastic_band
engine.py	126	94	25%	62, 81–82, 84–89, 91, 93, 95–96, 98–99, 101–103, 112–113, 115–116, 120–121, 123, 130, 132, 134–136, 138–139, 141, 143–145, 147–151, 153, 159–163, 165–167, 169–171, 173–175, 177–180, 182, 185, 189–190, 192, 194–195, 197, 201–202, 205–210, 212–213, 229, 234, 236–238, 240, 242–243, 246–247, 253, 256–257, 261–262, 267
nudged_elastic_band.py	121	6	95%	290–291, 298, 310, 316, 323
src/mlipaudit/benchmarks/reactivity
reactivity.py	117	3	97%	239–240, 272
src/mlipaudit/benchmarks/reference_geometry_stability
reference_geometry_stability.py	122	3	97%	233, 305, 328
src/mlipaudit/benchmarks/ring_planarity
ring_planarity.py	99	6	93%	229, 232, 264, 267–268, 287
src/mlipaudit/benchmarks/sampling
helpers.py	64	4	93%	46, 67, 97, 129
sampling.py	225	10	95%	303, 344, 385, 388–390, 440, 707, 825–826
src/mlipaudit/benchmarks/scaling
scaling.py	118	44	62%	67–74, 166–168, 176–177, 179–182, 192–194, 231–237, 240–242, 248–249, 251–252, 254–255, 258–259, 261, 263, 286, 296, 316, 329
src/mlipaudit/benchmarks/solvent_radial_distribution
solvent_radial_distribution.py	101	8	92%	189–190, 192, 231, 236–237, 302, 325
src/mlipaudit/benchmarks/stability
stability.py	150	18	88%	166, 169, 201–202, 205–206, 208, 210, 213, 215, 431–432, 442, 478, 487, 522, 537, 548
src/mlipaudit/benchmarks/tautomers
tautomers.py	92	5	94%	182, 208, 211–212, 231
src/mlipaudit/benchmarks/water_radial_distribution
water_radial_distribution.py	94	2	97%	186, 268
src/mlipaudit/ui
__init__.py	16	0	100%
bond_length_distribution.py	58	11	81%	89–90, 95–96, 141–142, 147, 150, 176, 188, 192
conformer_selection.py	100	7	93%	133–134, 139–140, 195, 320, 324
dihedral_scan.py	115	12	89%	141–142, 147–148, 259–260, 302, 313, 353, 355, 365, 369
folding_stability.py	96	7	92%	44, 226–227, 232–233, 406, 410
leaderboard.py	80	10	87%	67, 74, 78, 85, 287–289, 291, 293, 303
noncovalent_interactions.py	152	9	94%	205–206, 211–212, 392, 397, 432, 454, 458
nudged_elastic_band.py	55	39	29%	42–46, 50–51, 53, 66, 68, 75, 82, 89–90, 93, 97–99, 101, 103–105, 107–109, 111–113, 115, 117, 119, 121–122, 124, 135, 137, 149, 159, 163
page_wrapper.py	14	2	85%	36, 46
reactivity.py	59	6	89%	96–97, 102–103, 173, 177
reference_geometry_stability.py	66	9	86%	62, 131–132, 137–138, 199, 226, 236, 240
ring_planarity.py	56	9	83%	71–72, 77–78, 123–124, 160, 170, 174
sampling.py	87	9	89%	85, 129–130, 135–136, 166, 234, 265, 269
scaling.py	65	10	84%	124, 138–140, 175–176, 187–188, 210, 214
solvent_radial_distribution.py	83	20	75%	117–118, 123–124, 150, 156, 162–166, 168–169, 175, 178, 190, 197, 199, 211, 215
stability.py	60	10	83%	106–107, 112–113, 134–135, 138, 148, 177, 181
tautomers.py	62	6	90%	82–83, 88–89, 194, 198
utils.py	106	19	82%	69, 73–74, 130, 192–195, 201, 251, 269–270, 273–274, 277, 287–289, 291
water_radial_distribution.py	98	6	93%	132–133, 138–139, 279, 283
src/mlipaudit/utils
__init__.py	4	0	100%
inference.py	27	7	74%	54–56, 65–66, 69, 72
simulation.py	52	4	92%	133, 171–173
stability.py	31	5	83%	61, 68, 92, 99, 103
trajectory_helpers.py	34	0	100%
unallowed_elements.py	10	0	100%
TOTAL	3963	486	87%

Tests	Skipped	Failures	Errors	Time
128	0 💤	0 ❌	0 🔥	11.811s ⏱️

Add a peak_memory_bytes field to ScalingStructureResult populated from jax.devices()[0].memory_stats()["peak_bytes_in_use"], captured after each structure's simulation (including the failure path). Surfaced in the UI as a "Peak device memory vs system size" chart alongside the existing step-time chart, with a fallback message when the active JAX backend does not expose memory stats. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The UI surface for the new peak_memory_bytes field can be added later (or in a separate PR) once we have a better sense of what we want to plot. The data field on ScalingStructureResult is the only piece that needs to land first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This reverts commit 53b6791.

lwalew changed the title ~~feat(scaling): record per-structure peak device memory~~ feat: record per-structure peak device memory May 22, 2026

lwalew force-pushed the feat/scaling-peak-memory branch from 3acf901 to d5d84d4 Compare May 22, 2026 12:40

lwalew force-pushed the feat/scaling-peak-memory branch from d5d84d4 to 66b89b6 Compare May 22, 2026 12:42

lwehrhan and others added 4 commits May 22, 2026 16:23

feat: all neutral protein systems for scaling benchmark

a90112b

Revert "refactor(scaling): drop UI chart, keep data-layer only"

13cc84e

This reverts commit 53b6791.

lwalew force-pushed the feat/scaling-peak-memory branch from 66b89b6 to 13cc84e Compare May 22, 2026 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: record per-structure peak device memory#134

feat: record per-structure peak device memory#134
lwalew wants to merge 4 commits into
developfrom
feat/scaling-peak-memory

lwalew commented May 22, 2026

Uh oh!

CLAassistant commented May 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lwalew commented May 22, 2026

Summary

Why

Semantics of the value

Test plan

Notes

Uh oh!

CLAassistant commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented May 22, 2026 •

edited

Loading

github-actions Bot commented May 22, 2026 •

edited

Loading