Skip to content

chore: switch Dockerfile to slim Python + uv lockfile#719

Open
cailmdaley wants to merge 4 commits intodevelopfrom
chore/uv-lockfile
Open

chore: switch Dockerfile to slim Python + uv lockfile#719
cailmdaley wants to merge 4 commits intodevelopfrom
chore/uv-lockfile

Conversation

@cailmdaley
Copy link
Copy Markdown
Contributor

Summary

Replaces the skaha/astroml base (which brings conda → double-installs every Python dep, ~5–10 GB image) with python:3.12-slim-bookworm + uv driven by a committed lockfile.

Before After
Base images.canfar.net/skaha/astroml:latest (conda Python) python:3.12-slim-bookworm
Deps hand-pinned in Dockerfile, drifts from pyproject.toml pyproject.toml is SSOT (what), uv.lock is the pinned manifest (which versions)
WeightWatcher built from source with patched headers Debian's weightwatcher package
CI builds image, runs shapepipe_run -c … also invokes source-extractor --version, weightwatcher --version, psfex --version

Why

Two problems with the v2.0 image:

  1. Conda + pip double install: skaha/astroml ships conda; we then pip install everything on top. Two copies of every scientific package, multi-GB bloat, slow builds.
  2. Runtime regressions slip through CI: the original failure that prompted this PR (a tool that builds inside the image but can't actually run on canfar) wasn't caught because the deploy workflow only invoked shapepipe_run.

Lockfile workflow

# regenerate after editing pyproject.toml dependencies
uv lock

# add a new dep
uv add 'somepackage>=1.2'    # updates pyproject and uv.lock

# bump pinned versions (deliberate)
uv lock --upgrade

uv sync --frozen in the Dockerfile fails the build if pyproject.toml and uv.lock are out of sync — so a stale lockfile cannot ship.

Test plan

  • CI image build succeeds on this branch
  • Test — binaries step passes (sextractor, weightwatcher, psfex)
  • Test — shapepipe entry point passes
  • Pull built image locally on canfar and run a real ShapePipe job (Cail to do, post-CI green)

Out of scope / follow-ups

  • Optional: separate Dockerfile.canfar building on skaha/astroml if there's a deployment reason to keep that base image around (Martin mentioned this could be cleaner — currently conjectural).
  • Dockerfile.jupyter is unchanged; it inherits from this base.

🤖 Generated with Claude Code

The v2.0 merge moved the base image to skaha/astroml, which ships conda.
That double-installs every Python dep (conda + pip) and bloats the image
to 5–10 GB. Reverting to python:3.12-slim-bookworm and using uv with a
lockfile gives:

- pyproject.toml as the single source of truth for *what* shapepipe needs
- uv.lock pinning *which versions* — upstream changes only land when we
  regenerate the lockfile deliberately
- a lean apt layer using Debian's astromatic packages (psfex,
  source-extractor, weightwatcher) instead of building weightwatcher
  from source
- ~10× faster rebuilds and a much smaller image

Also adds a CI smoke test that invokes source-extractor, weightwatcher,
and psfex inside the built image. The original failure mode that
prompted this work (a binary that builds fine but can't run on canfar)
will now surface at PR time, not deployment time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cailmdaley cailmdaley marked this pull request as ready for review April 27, 2026 09:41
The Dockerfile now carries an inline note about why we use Debian's
weightwatcher package instead of the source build with sed-patched
headers — short version: GCC 10 (Apr 2020) flipped its default to
-fno-common, breaking WeightWatcher 1.12's (2014) use of common-symbol
globals in headers. The old Dockerfile patched them inline; Debian's
weightwatcher 1.12+dfsg-3 patches the same way upstream of the build.

Pyproject minimums bumped to current major versions where the major
itself moved (astropy 6→7, numpy 1→2, pandas 2→3) so the abstract
constraint signals our actual target. Minor-version drift stays in
uv.lock. ngmix line carries an explicit "do not modernize" note since
it's pinned to Axel's stable_version branch until upstream absorbs the
fixes (tracked in the ngmix-update fiber).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cailmdaley
Copy link
Copy Markdown
Contributor Author

@martinkilbinger ready for your eyes when you have a moment. Two things worth flagging:

Image works end-to-end on the CI runner. Build is green in 3m31s (uv installs 238 packages in 322ms — everything's a prebuilt wheel). The added Test — binaries step runs source-extractor --version, weightwatcher --version, psfex --version inside the built image and they all answer. shapepipe_run -c /app/example/config.ini also boots cleanly. Since that's the same failure mode you hit on canfar (a binary that the image carries but couldn't run), this should work on canfar too — but it'd be good to pull the image and try a real job to confirm before we merge.

Modernized package versions in the same PR (commit b62e9dbb). The lockfile now resolves to numpy 2.4, astropy 7.2, pandas 3.0, galsim 2.8, mpi4py 4.1, etc. (everything at current latest). Pyproject minimums bumped to the new major versions. The only line that's still pinned the old way is ngmix — that stays on Axel's stable_version branch until we do the upstream-ngmix migration. Worth knowing in case you hit a numpy-2 or pandas-3 compatibility surprise in real jobs; rolling back is just uv lock with a constraint.

(Disclosure: PR drafted by Claude in collaboration with me; I haven't run a full pipeline through the new image yet — that's the canfar test I want to do.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cailmdaley
Copy link
Copy Markdown
Contributor Author

@martinkilbinger one more thing worth flagging since it explains a Dockerfile shape change you'll notice — why we dropped the WeightWatcher source build:

The previous Dockerfile (v1.x) built WeightWatcher from source with six `sed` patches to `prefs.h`, `globals.h`, `fitscat.h`, `prefs.c`, `main.c`, and `fitscat.c`. The new Dockerfile just `apt install`s `weightwatcher` instead.

The patches were converting tentative definitions like `prefstruct prefs;` in headers into proper `extern` declarations + a single definition in the corresponding `.c` file. They were a workaround for GCC 10 (April 2020) flipping its default to `-fno-common`, which broke WeightWatcher 1.12's (2014) use of common-symbol globals — every translation unit started getting its own definition, the linker complained.

Debian's `weightwatcher 1.12+dfsg-3` package on bookworm carries an equivalent patch upstream of the build (the `+dfsg-3` suffix indicates Debian-applied patches). So `apt install` gives us the same WeightWatcher 1.12 binary, just with Debian carrying the GCC-compatibility fix instead of us doing it inline.

Tradeoff for completeness:

Source build (old) apt (new)
Version control We pin `WW_VERSION=1.12` Tied to bookworm
Patch maintenance Ours, in plain sight Debian's, transparent
Build time ~30s + autotools deps ~1s
Risk if GCC changes again We patch Debian handles it

The apt route is probably the better default — Debian has stronger Astromatic build infrastructure than we do. If we ever needed a version bookworm doesn't ship, we'd swap back to source-build.

@martinkilbinger
Copy link
Copy Markdown
Contributor

When running the tests from within the docker of this branch, I get errors:
For uv sync --extra test:
error: failed to remove file /app/.venv/lib/python3.12/site-packages/appdirs-1.4.4.dist-info/INSTALLER: Read-only file system (os error 30)
And
uv run pytest:
error: Failed to spawn: pytest Caused by: No such file or directory (os error 2)

And the slim bookworm image does not have some useful programs, e.g. vi, which should be added.

@martinkilbinger
Copy link
Copy Markdown
Contributor

When running the tests from within the docker of this branch, I get errors:
For uv sync --extra test:
error: failed to remove file /app/.venv/lib/python3.12/site-packages/appdirs-1.4.4.dist-info/INSTALLER: Read-only file system (os error 30)
And
uv run pytest:
error: Failed to spawn: pytest Caused by: No such file or directory (os error 2)

And the slim bookworm image does not have some useful programs, e.g. vi, which should be added.

And cloning the branch and using that version, I also get an error message for `uv run pytest:
WARNING: Failed to generate report: No data to report.

Rewrites the slim-bookworm Dockerfile as a two-target multi-stage build:

- runtime — minimal image for canfar batch jobs and downstream FROM
  clauses. Ships astromatic binaries + jupyter + fitsio extras only.
- dev — runtime plus everyday CLI tools (vim, less, tmux, htop, ripgrep,
  fd, jq, bat, git-lfs, …) and *all* extras pre-installed via the `dev`
  meta-extra (test, lint, doc, release, jupyter, fitsio). Default target
  when `--target` is omitted; published as `:<branch>` (no suffix).

Both stages share a `base` stage carrying system deps + uv + the lockfile
copy, so the heavy apt + wheel resolution is cached once.

Addresses Martin's PR feedback on #719:

- `uv run pytest: No such file or directory` — pytest now ships in the
  dev image (was previously only in the never-built `--extra test`).
- `uv sync --extra X: Read-only file system` — the dev image has every
  extra pre-baked, so live `uv sync` is no longer needed at all. As a
  belt-and-braces also `chmod -R go+rwX /app` so non-root users on
  canfar/skaha can still mutate the venv when they want to.
- "no vim in the slim image" — `vim`, `less`, `tmux`, `htop`, plus a
  curated subset of cailmdaley/containers (rg/fd/jq/bat/…) now ship in
  the dev target.

deploy-image.yml builds, smoke-tests, and pushes both targets:
- runtime: source-extractor / weightwatcher / psfex / shapepipe_run all
  invoked inside the built image.
- dev: vim, ripgrep, pytest verified runnable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@martinkilbinger
Copy link
Copy Markdown
Contributor

Some of the issues are fixed, but I think /app is still a read-only file system and I cannot run the pytests.

I use apptainer pull shapepipe_uv.sif docker://ghcr.io/cosmostat/shapepipe:chore-uv-lockfile to get the docker image, then apptainer shell shapepipe_uv.sif and cd /app; uv run pytest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants