Add S3-native ingest_results() for MinIO result recovery by GondekNP · Pull Request #32 · SchmidtDSE/joshpy

GondekNP · 2026-04-15T23:07:55Z

(Claude summary)

Summary

Adds ingest_results(cli, registry, label) — recovers simulation results from MinIO into the RunRegistry by label. DuckDB reads CSVs directly from S3 via httpfs (no local download needed). Missing replicates (e.g. from OOM) are skipped gracefully.
Adds configure_s3() — reusable DuckDB httpfs + S3 credential setup, the foundation for all future S3 access patterns (serverless aggregators, WASM, multi-machine)
Adds CellDataLoader.load_csv() S3 URL support — accepts s3:// strings alongside local Path objects
Adds SweepManager.ingest() convenience wrapper
Adds StageFromMinioConfig + JoshCLI.stage_from_minio() for download=True fallback
Fixes pre-existing test regression from 3e487fe (data file extension logic changed without test update)

Part of #31 (PR 1). Access model: S3 CSVs are source of truth, local .duckdb is a materialized cache any machine can rebuild.

User-facing usage

from joshpy.sweep import ingest_results

# Reads CSVs directly from S3 into DuckDB — no download
rows = ingest_results(cli, registry, "my-label")

# Or download locally first
rows = ingest_results(cli, registry, "my-label", download=True)

# Via SweepManager
manager.ingest()

Pixi task example (for josh-models):

recover = { cmd = "python scripts/recover.py", env = { JOSH_LABEL = "{{ LABEL }}" }, args = [{ arg = "LABEL" }] }

Test plan

867 tests pass (856 existing + 11 new), 0 failures
TestStageFromMinioConfig — frozen dataclass, optional defaults
TestStageFromMinio — subprocess arg building, minio flags only when non-None
TestIngestResults — local file protocol, minio S3 read, missing replicate skip, josh_content fallback, missing creds error, unknown label error
TestConfigureS3 — INSTALL httpfs + CREATE SECRET SQL generation
Pre-existing test_run_with_data_files regression fixed
Manual e2e test against real MinIO (blocked on dev JAR update)

🤖 Generated with Claude Code

GondekNP · 2026-04-15T23:10:31Z

+        >>> # Download locally first, then load
+        >>> rows = ingest_results(cli, registry, "my-label", download=True)
+    """
+    import os


move imports out of methods - these are base utils, it's not like we want a lazy import here

GondekNP · 2026-04-15T23:11:09Z

+    import os
+    import tempfile
+
+    # 1. Resolve label/hash to run metadata


Break these steps into more readable methods

GondekNP · 2026-04-15T23:15:06Z

+                )
+
+        # 5. Resolve run_id
+        run_id = registry._resolve_run_id_for_hash(run_hash)


I'm now remembering we have a run_id var indexing the actual runs in RunRegistry, but josh refers to the same run_id as the batch identififier for polling from the remote target - are we handling this? Seems smelly, probably something we should just change in josh (batch_id) to disambiguate

GondekNP · 2026-04-15T23:18:25Z

+                    run_id=run_id,
+                    run_hash=run_hash,
+                    entity_type=export_type,
+                )


I'm concerned with the case where attribute the wrong csvs to a given hash on this query - is that possible? I suppose if we pull our label from RunRegistry and the hash is included in the csv outputs, we are probably relatively safe by convention. But I wonder if any enforced run_hash in path business might allow us to guarantee no mismatch? Not sure it's worth it....

- Move os, tempfile, StageFromMinioConfig, configure_s3 imports to module level instead of inside methods - Extract monolithic ingest_results() into focused helpers: _resolve_ingest_metadata(), _get_josh_source(), _configure_minio_access(), _load_ingest_replicates() - Fix test mock patch path to match new top-level import Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Enable recovering simulation results from MinIO into the RunRegistry by label. DuckDB reads CSVs directly from S3 via httpfs — no local download needed. Also provides a download=True fallback via stageFromMinio. - configure_s3(): reusable DuckDB httpfs + S3 credential setup - CellDataLoader.load_csv(): accepts s3:// URLs alongside local Paths - ingest_results(): label lookup → export path discovery → S3 read → load - SweepManager.ingest(): convenience wrapper - StageFromMinioConfig + JoshCLI.stage_from_minio(): download fallback - Fix pre-existing test regression from 3e487fe (data file extension) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Move os, tempfile, StageFromMinioConfig, configure_s3 imports to module level instead of inside methods - Extract monolithic ingest_results() into focused helpers: _resolve_ingest_metadata(), _get_josh_source(), _configure_minio_access(), _load_ingest_replicates() - Fix test mock patch path to match new top-level import Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- TestStageFromMinio: mock JarManager.get_jar so tests don't require a local JAR file (they already mock subprocess.run) - TestDiffCLI.test_main_view: mock _launch_ide so test doesn't require VS Code's `code` CLI in PATH Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Escalating integration tests against a real MinIO service container: - Level 1: DuckDB httpfs write/read to MinIO - Level 2: Josh JAR simulation exports to MinIO, Python reads back - Level 3: CellDataLoader.load_csv from s3:// URLs - Level 4: End-to-end ingest_results() by label from MinIO - Level 5: Partial/interrupted sweep recovery (missing replicates) - Edge cases: bad credentials, missing bucket, namespace isolation Infrastructure: - tests/conftest.py with shared fixtures (minio_conn, minio_registry, seed_csv, etc.) - tests/fixtures/minio_export.josh minimal test simulation - pytest 'integration' marker registered in pyproject.toml - pixi tasks: 'test' excludes integration, 'test-integration' runs only integration - CI workflow with unit-tests + integration-tests jobs (MinIO service container) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixes zizmor alerts: - Pin bitnamilegacy/minio to SHA digest (unpinned image reference) - Add persist-credentials: false to checkout (credential persistence) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

GondekNP · 2026-04-17T18:25:36Z

Superseded by #34 (feat/batch-run), which includes all commits from this PR plus PR2 (target profiles + devcontainer K8s tooling). Consolidating all batch remote PRs into a single branch per #31.

GondekNP commented Apr 15, 2026

View reviewed changes

GondekNP and others added 4 commits April 16, 2026 18:39

GondekNP force-pushed the feat/ingest-results branch from af4e3ba to ecbec04 Compare April 16, 2026 18:39

github-advanced-security AI found potential problems Apr 16, 2026

View reviewed changes

Comment thread .github/workflows/test.yml Fixed

Comment thread .github/workflows/test.yml Fixed

Pin MinIO image digest and add persist-credentials: false

c68f96e

Fixes zizmor alerts: - Pin bitnamilegacy/minio to SHA digest (unpinned image reference) - Add persist-credentials: false to checkout (credential persistence) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

GondekNP closed this Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add S3-native ingest_results() for MinIO result recovery#32

Add S3-native ingest_results() for MinIO result recovery#32
GondekNP wants to merge 5 commits intomainfrom
feat/ingest-results

GondekNP commented Apr 15, 2026 •

edited

Loading

Uh oh!

GondekNP Apr 15, 2026

Uh oh!

GondekNP Apr 15, 2026

Uh oh!

GondekNP Apr 15, 2026

Uh oh!

GondekNP Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

GondekNP commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

GondekNP commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

User-facing usage

Test plan

Uh oh!

GondekNP Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

GondekNP Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

GondekNP Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

GondekNP Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

GondekNP commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GondekNP commented Apr 15, 2026 •

edited

Loading