Skip to content

Fix xdist worker race condition on scanpy dataset cache#83

Merged
Marius1311 merged 2 commits into
mainfrom
claude/inspiring-dirac-c6v97q
Jun 9, 2026
Merged

Fix xdist worker race condition on scanpy dataset cache#83
Marius1311 merged 2 commits into
mainfrom
claude/inspiring-dirac-c6v97q

Conversation

@Marius1311

Copy link
Copy Markdown
Member

Summary

Fixed intermittent HDF5 read failures when running tests under pytest -n auto by isolating each xdist worker's scanpy dataset cache to its own directory. Previously, multiple workers would race on the shared pbmc3k_raw.h5ad cache file, causing one worker to read while another was still downloading/writing.

Behavior Or Invariants Changed

  • Test suite now creates worker-specific cache directories under tests/data/scanpy_cache/{worker_id}/
  • Each pytest-xdist worker uses its own isolated scanpy dataset cache, eliminating file contention
  • Added tests/data/scanpy_cache/ to .gitignore

Tests Run

Existing test suite passes. The fix is validated by running tests under parallel execution (pytest -n auto), which previously surfaced the race condition intermittently.

Reviewer Focus

  • pytest_configure() hook in conftest.py: Ensures cache isolation is set up before any tests run
  • Environment variable check for PYTEST_XDIST_WORKER: Only applies isolation when running under xdist
  • Directory creation with parents=True, exist_ok=True: Safe for both single-worker and multi-worker scenarios

Context

The adata_pbmc3k fixture uses sc.datasets.pbmc3k(), which downloads a large dataset to a shared cache directory. Under parallel test execution, multiple workers attempt to read/write this file simultaneously, causing HDF5 synchronization errors. This is a common issue with pytest-xdist when fixtures depend on shared external resources.

The solution leverages pytest's pytest_configure hook to detect xdist workers (via the PYTEST_XDIST_WORKER environment variable) and redirect each worker's scanpy cache before any tests execute.

Open Questions Or Follow-Ups

None. This is a straightforward isolation fix with no behavioral changes to the actual test logic.

https://claude.ai/code/session_018GKskG6NPe5KeUyhSWfLLn

claude added 2 commits June 9, 2026 17:06
Under pytest -n auto, multiple xdist workers invoked sc.datasets.pbmc3k()
concurrently, racing on the single shared cache file pbmc3k_raw.h5ad. One
worker reading while another wrote/downloaded intermittently produced an HDF5
"filter returned failure during read" OSError during adata_pbmc3k setup.

Give each worker its own scanpy datasetdir keyed by PYTEST_XDIST_WORKER via a
pytest_configure hook, eliminating the shared-file contention. The fixture
stays function-scoped, so downstream mutation behavior is unchanged.
@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.37%. Comparing base (6200344) to head (440fc8a).

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #83   +/-   ##
=======================================
  Coverage   86.37%   86.37%           
=======================================
  Files          13       13           
  Lines        1387     1387           
=======================================
  Hits         1198     1198           
  Misses        189      189           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Marius1311 Marius1311 merged commit 7d1565f into main Jun 9, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants