Skip to content

Releases: labomics/midas

v0.3.0

10 May 03:55
95651c0

Choose a tag to compare

v0.3.0 (2026-05-09)

Major refresh of the user-facing API around a single :class:MuData. The new entry points (setup_mudata, MIDAS(mdata), get_latent_representation, get_imputed_values, save / load) compose directly with mdata.obsm, sc.pp.neighbors(use_rep=...), and the rest of the standard single-cell stack. A new plotting namespace scmidas.pl and a data-prep tutorial round out the package for users coming straight from raw 10x output.

  • 🚀 New — MIDAS entry points centred on MuData
    • MIDAS.setup_mudata(mdata, batch_key=...) — register a MuData (writes config to mdata.uns['_scmidas']).
    • MIDAS(mdata, ...) — construct directly from a registered MuData; instance state instead of class-level state (fixes a multi-instance interference bug).
    • model.get_latent_representation(kind='c'|'u'|'joint') — returns the joint latent aligned to mdata.obs_names. Drop straight into mdata.obsm['X_midas'].
    • model.get_imputed_values(modality='rna') — returns imputed counts aligned to mdata.obs_names.
    • model.save(dir) / MIDAS.load(dir, mdata) — symmetric save/load (writes model.pt + setup.json).
    • MIDAS(mdata) now defaults to transform={'atac': 'binarize'} whenever 'atac' is among the modalities (override by passing your own transform dict).
  • 🚀 New — scmidas.pl plotting namespace
    • scmidas.pl.umap(mdata, basis='X_midas', color=[...]) — one-line UMAP that works around the current scanpy + MuData plotting limitations via a thin AnnData wrapper.
    • scmidas.pl.modality_grid(model, mdata, label_key=...) — collapses the per-modality vs per-batch grid (~22 lines in the previous demos) into one call. Modality columns are ordered ATAC, RNA, ADT, Joint when present.
  • 🚀 New — scmidas.datasets.from_dir
    • Loads the directory-format datasets (mat/<m>.mtx, mask/<m>.csv, feat/feat_dims.toml) into a MuData, including masks, labels, and ATAC chunk dims.
  • 📚 New tutorial — Preparing your data
    • docs/source/tutorials/basics/preparing_your_data.ipynb walks from a public 10x Genomics 5k PBMC CITE-seq sample through QC, HVG selection, MuData wrap, MIDAS integration, Leiden clustering, and a synthetic mosaic example.
  • 📚 Docs cleanup
    • inputs.rst + outputs.rst merged into data_layout.rst — a single page describing the MuData input/output contract. The directory format is moved to an "advanced" section.
    • All three demos (demo1, demo2, demo3) rewritten to use the new API: from_dirsetup_mudataMIDAS(mdata)get_latent_representation. The 22-line per-modality grid block became scmidas.pl.modality_grid(model, mdata). Each demo gained a 6.4 "After integration" section (Leiden + UMAP).
    • README adds a "Bring your own data" section linking the new tutorial and the data-layout reference.
  • 🛠 Backwards compatibility
    • MIDAS.configure_data_from_mdata and MIDAS.configure_data_from_dir still work — they emit a DeprecationWarning and will be removed in 0.4.0.
    • save_checkpoint / load_checkpoint still work; new code should use save / load.
  • 🐛 Fixes
    • predict(joint_latent=False) no longer raises KeyError: 'z_c'.
    • Multiple MIDAS() instances in one process now have independent state (was previously class-level — a second instance would clobber the first).

v0.2.0

03 May 06:44
37a5688

Choose a tag to compare

v0.2.0 (2026-05-03)

  • 🚀 New — scmidas.integrate(mdata) one-line entry point
    • A thin top-level wrapper around MIDAS.configure_data_from_mdata
      • train() with toy-tuned defaults (batch_size=128,
        max_epochs=65, lr=3e-4) so that the bundled quickstart
        dataset converges in roughly one minute on a single mid-range
        GPU. The full MIDAS class API is unchanged for users who
        need control.
    • ⚠️ The defaults are tuned for the toy quickstart only. For
      real datasets, override max_epochs (1000-2000) and consider
      batch_size=256.
  • 🚀 New — bundled quickstart dataset
    • scmidas.datasets.quickstart() returns a 1600-cell PBMC RNA+ADT
      mosaic MuData (4 batches, full mosaic structure: one RNA-only,
      one ADT-only, two paired). 500 RNA HVGs + 224 ADT features,
      2.66 MB shipped inside the wheel.
    • Source: hand-tuned subset of wnn_mosaic_8batch_mtx. Build
      script: scripts/build_quickstart_demo.py.
  • 📚 Documentation
    • New examples/quickstart.ipynb — pre-rendered notebook that
      users can open in Colab via the new badge in the README, no
      local install required.
    • README quickstart rewritten: replaces the previous ... API
      sketch with a runnable five-line snippet using
      scmidas.datasets.quickstart() + scmidas.integrate(),
      followed by the rendered UMAP image.
  • ⚙️ Packaging
    • pyproject.toml ships data/*.h5mu as package data so the
      quickstart dataset travels with the wheel.
    • Module-level logging.basicConfig(level=INFO) removed from
      five files (config, data, model, nn, utils); each
      now does the canonical logger = logging.getLogger(__name__)
      instead. Demo notebooks call logging.basicConfig themselves
      so visible output is unchanged. Libraries should not call
      basicConfig — it overrides the user's own logging config.

Version 0.1.x

v0.1.19

03 May 01:55
2f8b024

Choose a tag to compare

v0.1.19 (2026-05-03)

  • 📦 Packaging — narrow torch upper bound to <2.11
    • torch 2.11 dropped Volta (V100, CC 7.0) and Pascal (P100, GTX
      10xx, CC 6.x) from its default cu128 / cu129 wheels (to
      ship cuDNN 9.15.1, which is incompatible with those archs). On
      those GPUs pip install scmidas==0.1.18 would silently install
      a torch that fails at the first CUDA op with
      no kernel image is available for execution on the device.
    • The pin now reads torch>=2.5,<2.11 (with matching
      torchvision<0.26 / torchaudio<2.11). Users on
      Ampere/Hopper/Ada/Blackwell GPUs can manually upgrade past the
      cap; users on Volta/Pascal stay on a working default install.
    • No source-code change — same scmidas as 0.1.18.
  • ✨ Enhancements
    • import scmidas now runs a one-time GPU self-check: if the
      local torch wheel has no kernels for the local GPU, scmidas
      emits a UserWarning with actionable guidance (downgrade torch
      or use the cu126 wheel) instead of the user later seeing a raw
      no kernel image is available error from somewhere deep in
      their training loop. The check no-ops on CPU-only environments
      and on working GPU setups.
  • ⚙️ CI
    • Test matrix gained a torch 2.10 job (the new upper bound) and
      dropped the previous experimental torch latest job. Lower
      bound remains torch 2.5.1 across Python 3.10 / 3.11 / 3.12.

v0.1.18

02 May 18:41
c6d28b5

Choose a tag to compare

v0.1.18 (2026-05-02)

  • 🐛 Bug Fixes (DDP + mosaic data)
    • Default sampler_type='auto' now picks the DDP sampler when a
      process group is initialized. Previously 'auto' silently fell
      back to MultiBatchSampler (a rank-agnostic sampler), so DDP
      runs computed each batch on every rank in parallel — correct
      but with no throughput gain over single-GPU. Users who already
      passed sampler_type='ddp' explicitly are unaffected.
    • MyDistributedSampler now derives its shuffle order from a
      seeded random.Random instance (cross-rank-consistent for the
      dataset visit order, rank-specific for the within-dataset
      shuffle), and properly initialises the base
      DistributedSampler. Previously it used the global Python
      random module, so each DDP rank sampled a different sub-batch
      at the same step. With non-uniform per-sub-batch modality
      combinations (mosaic data), this produced different encoder
      graphs per rank and caused NCCL all-reduce to hang under
      find_unused_parameters=False (Lightning default), eventually
      triggering a watchdog timeout.
    • Heads-up — DDP reproducibility: the DDP sampling order has
      changed as a side-effect of the fix. Existing seeded DDP runs
      will produce different numerics; checkpoints from prior
      versions still load and continue training, but the post-fix
      sampling sequence is not bit-equivalent to the pre-fix one.
      Single-GPU users (using MultiBatchSampler) are unaffected.
  • 🐛 Bug Fixes (API hardening)
    • MIDAS.configure_optimizers no longer raises AttributeError
      when entered through the simpler configure_data path
      (load_optimizer_state was only set by
      configure_data_from_dir / configure_data_from_mdata /
      load_checkpoint).
    • MIDAS.configure_data default batch_names now use f-string
      formatting (f'batch_{i}') instead of the literal string
      'batch_%d' repeated len(datalist) times.
    • Bad ATAC configuration in configure_data now raises
      ValueError instead of calling exit() (which killed the
      Jupyter kernel without a traceback).
    • download_file now accepts both str and pathlib.Path for
      dest_path. The signature was annotated str but the body
      called .name.
    • Encoder.forward no longer mutates the caller's batch dict.
      The mask multiply is now out-of-place; the previous in-place
      data[m] *= mask corrupted upstream tensors for any modality
      without a trsf_before_enc_* transform. Mathematically
      equivalent (the mask is a 0/1 modality-presence indicator, and
      calc_recon_loss already multiplies the loss by the same
      mask), but makes the encoder safe to re-call on the same
      batch (e.g. predict's mod_latent / translate paths).
    • VAE.forward no longer wraps the PoE call in a bare
      try/except that swallowed real errors with a malformed
      logging.debug call.
  • ✅ Tests
    • Added tests/test_invariants.py pinning down the bugs above
      plus the DDP sampler determinism fix (cross-rank disjoint
      indices, set_epoch actually changes ordering).
  • 📚 Documentation
    • Each basics demo now exposes a single # === GPU configuration ===
      block (GPUS + STRATEGY) at the top so switching from
      single-GPU to multi-GPU only requires editing two values.
    • Removed the redundant standalone advanced/multi_gpu.rst
      tutorial — its contents now live inline in the basics demos
      where the failure modes would actually be encountered.
    • README: removed the duplicated MuData section (the from_mdata
      path is one link away in the docs), corrected the Quick
      Example comment about input format, and fixed the License
      badge link.
  • ⚙️ Packaging
    • Version is now single-sourced from pyproject.toml;
      scmidas.__version__ and the Sphinx release both read it via
      importlib.metadata.version("scmidas") instead of duplicating
      the literal in three files.
    • Relax the torch pin from >=2.5,<2.6 to >=2.5,<3 (and the
      matching torchvision / torchaudio companions). The previous
      <2.6 cap was a workaround for a suspected Lightning-DDP
      incompatibility; torch 2.8 has now been verified end-to-end in
      the mosaic DDP path (1000-epoch run with UMAP and numerics
      consistent with the single-GPU baseline), so users on torch 2.6
      / 2.7 / 2.8 no longer have to manually override the pin.