Skip to content

feat: record per-structure peak device memory#134

Draft
lwalew wants to merge 4 commits into
developfrom
feat/scaling-peak-memory
Draft

feat: record per-structure peak device memory#134
lwalew wants to merge 4 commits into
developfrom
feat/scaling-peak-memory

Conversation

@lwalew

@lwalew lwalew commented May 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Records peak_bytes_in_use from jax.devices()[0].memory_stats() after each structure's simulation in the scaling benchmark.
  • New optional field ScalingStructureResult.peak_memory_bytes (NonNegativeInt | None) — backward-compatible with existing stored results (defaults to None when the field is absent in older JSON rows).
  • New "Peak device memory vs system size" chart on the scaling UI page, alongside the existing step-time chart. Falls back to a message when the active JAX backend does not expose memory_stats() (CPU runs).

Why

ScalingBenchmark currently records timing only. For models that OOM on the larger systems in the dataset (e.g. 1vsq, 1a7m, 1ab7 at ~6700 / 2800 / 1400 atoms), we have no recorded signal of where on the size axis the GPU pressure starts climbing — only that the simulation failed. Adding peak_memory_bytes gives us a "high-water mark vs num_atoms" curve per model, which is the natural complement to the existing "step time vs num_atoms" plot.

Semantics of the value

peak_bytes_in_use is monotonic-since-process-start. The reading is captured in a finally: block so it lands on both the success path and the failure path. For the size-sorted structure list, the value plotted against num_atoms traces the cumulative high-water mark — i.e. the value at structure $i$ is an upper bound on the per-system peak for structure $i$. Documented in the docstrings.

Test plan

  • pytest tests/scaling/ — both existing tests updated + passing.
  • pre-commit run --files … — ruff / ruff-format / mypy / conventional-commit all green.
  • Validate end-to-end on a GPU run (queued in the internal driver — will report numbers in a follow-up comment).
  • Sanity-check the new chart renders with mixed None/numeric data when one model was run on CPU and another on GPU.

Notes

  • Drafted against develop; the in-flight feat/pass-charges-to-models work touches the same run_model loop but on different lines — should rebase cleanly when that lands.
  • For internal consumers (mlipaudit-internal leaderboard DB): experiment.result is stored as JSON, and Pydantic validates older rows with the new optional field defaulting to None. No migration needed.

🤖 Generated with Claude Code

@lwalew lwalew changed the title feat(scaling): record per-structure peak device memory feat: record per-structure peak device memory May 22, 2026
@lwalew lwalew force-pushed the feat/scaling-peak-memory branch from 3acf901 to d5d84d4 Compare May 22, 2026 12:40
@CLAassistant

CLAassistant commented May 22, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@lwalew lwalew force-pushed the feat/scaling-peak-memory branch from d5d84d4 to 66b89b6 Compare May 22, 2026 12:42
@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown

Coverage

Coverage Report
FileStmtsMissCoverMissing
src/mlipaudit
   __init__.py20100% 
   benchmark.py77790%146, 165, 231, 237–238, 248, 262
   exceptions.py20100% 
   io.py1002575%117, 121, 146–150, 152, 154, 171–172, 174–175, 192–193, 195, 213–221
   io_helpers.py520100% 
   run_mode.py50100% 
   scoring.py29775%101–102, 104, 107, 111–112, 114
src/mlipaudit/benchmarks
   __init__.py290100% 
src/mlipaudit/benchmarks/bond_length_distribution
   bond_length_distribution.py95396%208, 211, 276
src/mlipaudit/benchmarks/conformer_selection
   conformer_selection.py132596%249, 252, 295, 300, 340
src/mlipaudit/benchmarks/dihedral_scan
   dihedral_scan.py116595%222, 225, 259, 262, 304
src/mlipaudit/benchmarks/folding_stability
   folding_stability.py106496%212, 270, 275, 339
   helpers.py350100% 
src/mlipaudit/benchmarks/noncovalent_interactions
   noncovalent_interactions.py1681094%245, 248, 272, 283, 407, 424, 441, 525–527
src/mlipaudit/benchmarks/nudged_elastic_band
   engine.py1269425%62, 81–82, 84–89, 91, 93, 95–96, 98–99, 101–103, 112–113, 115–116, 120–121, 123, 130, 132, 134–136, 138–139, 141, 143–145, 147–151, 153, 159–163, 165–167, 169–171, 173–175, 177–180, 182, 185, 189–190, 192, 194–195, 197, 201–202, 205–210, 212–213, 229, 234, 236–238, 240, 242–243, 246–247, 253, 256–257, 261–262, 267
   nudged_elastic_band.py121695%290–291, 298, 310, 316, 323
src/mlipaudit/benchmarks/reactivity
   reactivity.py117397%239–240, 272
src/mlipaudit/benchmarks/reference_geometry_stability
   reference_geometry_stability.py122397%233, 305, 328
src/mlipaudit/benchmarks/ring_planarity
   ring_planarity.py99693%229, 232, 264, 267–268, 287
src/mlipaudit/benchmarks/sampling
   helpers.py64493%46, 67, 97, 129
   sampling.py2251095%303, 344, 385, 388–390, 440, 707, 825–826
src/mlipaudit/benchmarks/scaling
   scaling.py1184462%67–74, 166–168, 176–177, 179–182, 192–194, 231–237, 240–242, 248–249, 251–252, 254–255, 258–259, 261, 263, 286, 296, 316, 329
src/mlipaudit/benchmarks/solvent_radial_distribution
   solvent_radial_distribution.py101892%189–190, 192, 231, 236–237, 302, 325
src/mlipaudit/benchmarks/stability
   stability.py1501888%166, 169, 201–202, 205–206, 208, 210, 213, 215, 431–432, 442, 478, 487, 522, 537, 548
src/mlipaudit/benchmarks/tautomers
   tautomers.py92594%182, 208, 211–212, 231
src/mlipaudit/benchmarks/water_radial_distribution
   water_radial_distribution.py94297%186, 268
src/mlipaudit/ui
   __init__.py160100% 
   bond_length_distribution.py581181%89–90, 95–96, 141–142, 147, 150, 176, 188, 192
   conformer_selection.py100793%133–134, 139–140, 195, 320, 324
   dihedral_scan.py1151289%141–142, 147–148, 259–260, 302, 313, 353, 355, 365, 369
   folding_stability.py96792%44, 226–227, 232–233, 406, 410
   leaderboard.py801087%67, 74, 78, 85, 287–289, 291, 293, 303
   noncovalent_interactions.py152994%205–206, 211–212, 392, 397, 432, 454, 458
   nudged_elastic_band.py553929%42–46, 50–51, 53, 66, 68, 75, 82, 89–90, 93, 97–99, 101, 103–105, 107–109, 111–113, 115, 117, 119, 121–122, 124, 135, 137, 149, 159, 163
   page_wrapper.py14285%36, 46
   reactivity.py59689%96–97, 102–103, 173, 177
   reference_geometry_stability.py66986%62, 131–132, 137–138, 199, 226, 236, 240
   ring_planarity.py56983%71–72, 77–78, 123–124, 160, 170, 174
   sampling.py87989%85, 129–130, 135–136, 166, 234, 265, 269
   scaling.py651084%124, 138–140, 175–176, 187–188, 210, 214
   solvent_radial_distribution.py832075%117–118, 123–124, 150, 156, 162–166, 168–169, 175, 178, 190, 197, 199, 211, 215
   stability.py601083%106–107, 112–113, 134–135, 138, 148, 177, 181
   tautomers.py62690%82–83, 88–89, 194, 198
   utils.py1061982%69, 73–74, 130, 192–195, 201, 251, 269–270, 273–274, 277, 287–289, 291
   water_radial_distribution.py98693%132–133, 138–139, 279, 283
src/mlipaudit/utils
   __init__.py40100% 
   inference.py27774%54–56, 65–66, 69, 72
   simulation.py52492%133, 171–173
   stability.py31583%61, 68, 92, 99, 103
   trajectory_helpers.py340100% 
   unallowed_elements.py100100% 
TOTAL396348687% 

Tests Skipped Failures Errors Time
128 0 💤 0 ❌ 0 🔥 11.811s ⏱️

lwehrhan and others added 4 commits May 22, 2026 16:23
Add a peak_memory_bytes field to ScalingStructureResult populated from
jax.devices()[0].memory_stats()["peak_bytes_in_use"], captured after each
structure's simulation (including the failure path). Surfaced in the UI
as a "Peak device memory vs system size" chart alongside the existing
step-time chart, with a fallback message when the active JAX backend
does not expose memory stats.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The UI surface for the new peak_memory_bytes field can be added later
(or in a separate PR) once we have a better sense of what we want to
plot. The data field on ScalingStructureResult is the only piece that
needs to land first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lwalew lwalew force-pushed the feat/scaling-peak-memory branch from 66b89b6 to 13cc84e Compare May 22, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants