Skip to content

Add bioproject2srr and srr2sam Galaxy tools for NCBI metadata retrieval#1

Open
crashfrog wants to merge 2 commits into
CFSAN-Biostatistics:mainfrom
crashfrog:main
Open

Add bioproject2srr and srr2sam Galaxy tools for NCBI metadata retrieval#1
crashfrog wants to merge 2 commits into
CFSAN-Biostatistics:mainfrom
crashfrog:main

Conversation

@crashfrog

Copy link
Copy Markdown
Member

Summary

  • Add bioproject2srr tool: retrieves SRR accessions and biosample metadata from NCBI BioProjects
  • Add srr2sam tool: reverse operation that retrieves BioSample metadata from a list of SRR accessions
  • Both tools use Entrez E-utilities API with rate limiting and optional NCBI API key support

bioproject2srr

Recursively follows links from BioProject → subprojects → BioSamples → SRA runs.

Outputs:

  • accessions.txt: deduplicated list of SRR accessions
  • metadata.tsv: full metadata table with biosample attributes joined to SRA run information

srr2sam

Traverses from SRR → SRA → BioSample to retrieve metadata.

Output:

  • output.tsv: tabular file with SRR accession as first column, followed by all BioSample metadata fields

Testing

  • Unit tests via pytest for both tools
  • Galaxy functional tests with test data included
  • Handles edge cases: unreleased samples, duplicates, missing data

Documentation

Includes CLAUDE.md with architecture details, testing commands, and API usage patterns.

🤖 Generated with Claude Code

crashfrog and others added 2 commits May 21, 2026 18:17
…Projects

Retrieves SRR accessions and biosample metadata from NCBI BioProjects using
Entrez E-utilities API. Recursively follows links to subprojects and biosamples.

Outputs:
- accessions.txt: deduplicated list of SRR accessions
- metadata.tsv: full metadata table joining biosample attributes with SRA runs

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Adds complementary tool that retrieves BioSample metadata from a list of SRR
accessions. Performs reverse operation of bio2srr by traversing NCBI Entrez
from SRA to BioSample.

Output: tabular file with SRR accession as first column followed by all
BioSample metadata fields.

Also adds CLAUDE.md documentation covering both tools' architecture, testing,
and constraints.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant