Skip to content

Auto-fetch mmseqs2, document bare-germline annotation, add arda skill#1

Merged
mikessh merged 3 commits into
mainfrom
feat/mmseqs-autofetch-skills
Jun 8, 2026
Merged

Auto-fetch mmseqs2, document bare-germline annotation, add arda skill#1
mikessh merged 3 commits into
mainfrom
feat/mmseqs-autofetch-skills

Conversation

@mikessh

@mikessh mikessh commented Jun 8, 2026

Copy link
Copy Markdown
Member

mmseqs2 auto-install (zero user bother):

  • arda._mmseqs_fetch: download a static MMseqs2 binary into bin/ (packaged so it works for wheel installs too); browser User-Agent avoids GitHub 504s.
  • mmseqs.mmseqs_binary() lazily auto-fetches when nothing is found ($ARDA_MMSEQS -> bin/ -> PATH -> fetch). Opt-out $ARDA_NO_AUTO_FETCH; asset override $ARDA_MMSEQS_ASSET. setup.sh --no-conda fetches eagerly.
  • scripts/fetch_mmseqs.py CLI wrapper; tests/unit/test_mmseqs_fetch.py.

Bare germline segments: README recipe + docs note + regression test (tests/synthetic/test_germline_segments.py) — a bare V yields FR1-3, a bare J yields FR4 (no coverage filter). This is how mirpy bakes per-allele FR/CDR subsequences into its gene library.

Docs: installation.rst MMseqs2-without-conda section.

Skill: new skills/arda/ (lean SKILL.md router + references for annotation, region/germline segments, mmseqs install, reference build).

mmseqs2 auto-install (zero user bother):
- arda._mmseqs_fetch: download a static MMseqs2 binary into bin/ (packaged so it
  works for wheel installs too); browser User-Agent avoids GitHub 504s.
- mmseqs.mmseqs_binary() lazily auto-fetches when nothing is found
  ($ARDA_MMSEQS -> bin/ -> PATH -> fetch). Opt-out $ARDA_NO_AUTO_FETCH; asset
  override $ARDA_MMSEQS_ASSET. setup.sh --no-conda fetches eagerly.
- scripts/fetch_mmseqs.py CLI wrapper; tests/unit/test_mmseqs_fetch.py.

Bare germline segments: README recipe + docs note + regression test
(tests/synthetic/test_germline_segments.py) — a bare V yields FR1-3, a bare J
yields FR4 (no coverage filter). This is how mirpy bakes per-allele FR/CDR
subsequences into its gene library.

Docs: installation.rst MMseqs2-without-conda section.

Skill: new skills/arda/ (lean SKILL.md router + references for annotation,
region/germline segments, mmseqs install, reference build).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 8, 2026 08:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes mmseqs2 setup hands-off by adding a packaged auto-fetch mechanism for a static mmseqs binary, documents/locks in “bare germline segment” annotation semantics (V-only → FR1–FR3/CDR1–CDR2/FR3, J-only → FR4), and introduces an arda agent skill with reference docs.

Changes:

  • Add arda._mmseqs_fetch + lazy auto-fetch fallback in arda.mmseqs.mmseqs_binary() (with opt-out/env overrides) and an eager-fetch CLI wrapper.
  • Add regression tests for mmseqs auto-fetch behavior (offline) and for bare V/J germline segment annotation invariants.
  • Expand user-facing docs (README + installation docs) and add skills/arda/ documentation bundle.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/arda/mmseqs.py Adds lazy auto-fetch fallback and updates binary discovery docs/error message.
src/arda/_mmseqs_fetch.py New stdlib-only downloader/installer for a static MMseqs2 release asset.
scripts/fetch_mmseqs.py New CLI wrapper to fetch mmseqs eagerly into bin/.
setup.sh Eagerly fetches mmseqs in --no-conda installs when not already on PATH.
tests/unit/test_mmseqs_fetch.py Unit tests for asset selection and binary discovery/opt-out behavior.
tests/synthetic/test_germline_segments.py Synthetic regression tests for bare V-only and J-only region emission.
README.md Documents how to annotate bare germline V/J segments.
docs/installation.rst Documents mmseqs auto-fetch controls for non-conda installs.
skills/arda/SKILL.md Adds an arda skill router/guide covering annotation semantics and mmseqs setup.
skills/arda/references/annotation.md Skill reference: runtime API, parameters, and AIRR schema.
skills/arda/references/region-segments.md Skill reference: region projection + bare germline semantics + junction/CDR3.
skills/arda/references/install-mmseqs.md Skill reference: discovery order, auto-fetch, env vars, version mismatch.
skills/arda/references/reference-build.md Skill reference: offline reference-build pipeline and outputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/arda/_mmseqs_fetch.py Outdated
Comment thread src/arda/mmseqs.py Outdated
mikessh and others added 2 commits June 8, 2026 12:02
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@mikessh mikessh merged commit 5ce8866 into main Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants