Skip to content

Assemble modelCIF - DUMMY implementation without touching workflow wiring#617

Draft
keiran-rowell-unsw wants to merge 48 commits into
nf-core:devfrom
Australian-Structural-Biology-Computing:assemble_modelcif
Draft

Assemble modelCIF - DUMMY implementation without touching workflow wiring#617
keiran-rowell-unsw wants to merge 48 commits into
nf-core:devfrom
Australian-Structural-Biology-Computing:assemble_modelcif

Conversation

@keiran-rowell-unsw

@keiran-rowell-unsw keiran-rowell-unsw commented May 20, 2026

Copy link
Copy Markdown
Contributor

Part implements an ASSEMBLE_MODELCIF{} process as sketched out in this project board.

  • DUMMY files used to populate ModelCIF fields
    • This PR avoids touching upstream workflow wiring intentionally: to merge first as test-suite then wire in values from pipeline carefully
    • DUMMY_METRIC.tsv now uses 'realistic' 20 residues values, and is synced with docs/output.md
  • Comprehensive nf-test suite:
    • Uses the RCSB CifCheck program to validate against the ModelArchive dictionary
    • Reject completely invalid .mmcif; distinguishes from .pdb; and uses a real ma-.cif deposition to validate fields
    • assemble_modelcif_X has ext.args config files to test creation of .bcif, or PAE embedded or linked
  • msa_tool used to specify .protocol.CoevolutionMSAStep() since the steps in-container aren't always inspectable
  • DUMMY_SOFTWARE_DETAILS used to handle minimal protocol ingest into .mmcif classes for now
  • populate_modelcif.py has a variety of _helper() local functions leading to a build_modelcif() function populated by argparse

DATABASES: databases will not be handled in .data.Datagroup.ReferenceDatabasein this PR. populate_modelcif.py is getting quite long already. Plus, it's a separate concept that can tie into the work done for reference dataset at NCI.

#575 might make this database handling easier, if considered valuable

DRAFT: still in draft as I'm LLM'ing and doc'ing through features and will got back for deeper inspection of .mmcif spec when ready to review

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • Make sure your code lints (nf-core pipelines lint).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • CHANGELOG.md is updated.

@keiran-rowell-unsw keiran-rowell-unsw self-assigned this May 20, 2026
@keiran-rowell-unsw keiran-rowell-unsw added this to the 3.0.0 milestone May 20, 2026
@keiran-rowell-unsw keiran-rowell-unsw changed the title Assemble modelcif - DUMMY implementation without touching workflow wiring Assemble modelCIF - DUMMY implementation without touching workflow wiring May 20, 2026
@jscgh

jscgh commented May 22, 2026

Copy link
Copy Markdown
Contributor

Tom and I were thinking that, since they're used for tests, the DUMMY files in assets/ might be best uploaded to the testdata repo (https://github.com/nf-core/test-datasets/tree/proteinfold/testdata) with the other test files.

@keiran-rowell-unsw

keiran-rowell-unsw commented May 22, 2026

Copy link
Copy Markdown
Contributor Author

Tom and I were thinking that, since they're used for tests, the DUMMY files in assets/ might be best uploaded to the testdata repo (https://github.com/nf-core/test-datasets/tree/proteinfold/testdata) with the other test files.

Hadn't thought of that, makes a lot of sense! I put them in assets/ for a faster dev loop, sounds good when it's ready!

@keiran-rowell-unsw keiran-rowell-unsw marked this pull request as draft May 22, 2026 03:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants