Auto-fetch mmseqs2, document bare-germline annotation, add arda skill#1
Merged
Conversation
mmseqs2 auto-install (zero user bother): - arda._mmseqs_fetch: download a static MMseqs2 binary into bin/ (packaged so it works for wheel installs too); browser User-Agent avoids GitHub 504s. - mmseqs.mmseqs_binary() lazily auto-fetches when nothing is found ($ARDA_MMSEQS -> bin/ -> PATH -> fetch). Opt-out $ARDA_NO_AUTO_FETCH; asset override $ARDA_MMSEQS_ASSET. setup.sh --no-conda fetches eagerly. - scripts/fetch_mmseqs.py CLI wrapper; tests/unit/test_mmseqs_fetch.py. Bare germline segments: README recipe + docs note + regression test (tests/synthetic/test_germline_segments.py) — a bare V yields FR1-3, a bare J yields FR4 (no coverage filter). This is how mirpy bakes per-allele FR/CDR subsequences into its gene library. Docs: installation.rst MMseqs2-without-conda section. Skill: new skills/arda/ (lean SKILL.md router + references for annotation, region/germline segments, mmseqs install, reference build). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR makes mmseqs2 setup hands-off by adding a packaged auto-fetch mechanism for a static mmseqs binary, documents/locks in “bare germline segment” annotation semantics (V-only → FR1–FR3/CDR1–CDR2/FR3, J-only → FR4), and introduces an arda agent skill with reference docs.
Changes:
- Add
arda._mmseqs_fetch+ lazy auto-fetch fallback inarda.mmseqs.mmseqs_binary()(with opt-out/env overrides) and an eager-fetch CLI wrapper. - Add regression tests for mmseqs auto-fetch behavior (offline) and for bare V/J germline segment annotation invariants.
- Expand user-facing docs (README + installation docs) and add
skills/arda/documentation bundle.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/arda/mmseqs.py |
Adds lazy auto-fetch fallback and updates binary discovery docs/error message. |
src/arda/_mmseqs_fetch.py |
New stdlib-only downloader/installer for a static MMseqs2 release asset. |
scripts/fetch_mmseqs.py |
New CLI wrapper to fetch mmseqs eagerly into bin/. |
setup.sh |
Eagerly fetches mmseqs in --no-conda installs when not already on PATH. |
tests/unit/test_mmseqs_fetch.py |
Unit tests for asset selection and binary discovery/opt-out behavior. |
tests/synthetic/test_germline_segments.py |
Synthetic regression tests for bare V-only and J-only region emission. |
README.md |
Documents how to annotate bare germline V/J segments. |
docs/installation.rst |
Documents mmseqs auto-fetch controls for non-conda installs. |
skills/arda/SKILL.md |
Adds an arda skill router/guide covering annotation semantics and mmseqs setup. |
skills/arda/references/annotation.md |
Skill reference: runtime API, parameters, and AIRR schema. |
skills/arda/references/region-segments.md |
Skill reference: region projection + bare germline semantics + junction/CDR3. |
skills/arda/references/install-mmseqs.md |
Skill reference: discovery order, auto-fetch, env vars, version mismatch. |
skills/arda/references/reference-build.md |
Skill reference: offline reference-build pipeline and outputs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
mmseqs2 auto-install (zero user bother):
Bare germline segments: README recipe + docs note + regression test (tests/synthetic/test_germline_segments.py) — a bare V yields FR1-3, a bare J yields FR4 (no coverage filter). This is how mirpy bakes per-allele FR/CDR subsequences into its gene library.
Docs: installation.rst MMseqs2-without-conda section.
Skill: new skills/arda/ (lean SKILL.md router + references for annotation, region/germline segments, mmseqs install, reference build).