Skip to content

RMSD filtering module #1550

Draft
AnnaKravchenko wants to merge 2 commits intomainfrom
rmds-filt-mod
Draft

RMSD filtering module #1550
AnnaKravchenko wants to merge 2 commits intomainfrom
rmds-filt-mod

Conversation

@AnnaKravchenko
Copy link
Copy Markdown
Contributor

@AnnaKravchenko AnnaKravchenko commented May 1, 2026

You are about to submit a new Pull Request. Before continuing make sure you read the contributing guidelines.

Checklist

  • Tests added for the new code
  • Documentation added for the code changes
  • Modifications / enhancements are reflected on the haddock3 user-manual
  • CHANGELOG.md is updated to incorporate new changes
  • Does not break licensing
  • Does not add any dependencies, if it does please add a thorough explanation

Summary of the Pull Request

Added new module. It is based on caprieval, tho I moved align_func to before jobs are created.
Also, as discussed, moved find_ff and handle_input_reference from caprieval/__init__.py to libpdb.py and libstructure.py respectively. And added check on empty references file to handle_input_reference.

Related Issue

#1525

Additional Info

@AnnaKravchenko AnnaKravchenko changed the title RMDS filtering module RMSD filtering module May 1, 2026
Comment on lines +91 to +95
self.finish_with_error(
"Models have been clustered!"
"[rmsdfilter] cannot be performed after clustering - "
"filtering individual models after clustering would leave "
"remaining models with stale and inconsistent cluster assignments."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simply overwrite the attribute and go on

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? Then I guess models have to be renamed to smth like model_filter_1.pdb

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary, e.g:

  • seletop does not generate new names
  • filter does not generate new names
  • scoring modules that do not change the coordinates do not generate new names, but rather pass the inputs to their outputs, and simply update the score attribute

Comment on lines +544 to +545
if reference.stat().st_size == 0:
raise ValueError(f"Reference file is empty: {reference}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it checks that the file is indeed empty, but not that it contains coordinates. Could be composed of REMARKS or not even be a pdb file

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked that case - libalign already handles it with error message "Please check the input file and queried filterings.”

But also now that I look in the code - may-be it’s better to remove this check entirely and instead in libpdb.py > handle_input_reference > line 570

    for line in wc_return.split("\n"):
        if "No. models" in line:
            nb_models = int(line.strip().split()[-1])
            break

add lines to read the number in line No. atoms and if 0 - exit with message like “No atoms found in the reference file, please check”. What do you think?

@VGPReys
Copy link
Copy Markdown
Contributor

VGPReys commented May 2, 2026

Do you see an option for user to chose between "global rmsd", or "l-rmsd" or "il-rmsd" or "fnat" or "dockq" too ?
You could even directly use the CAPRI class to perform all theses computations ?

@AnnaKravchenko
Copy link
Copy Markdown
Contributor Author

Do you see an option for user to chose between "global rmsd", or "l-rmsd" or "il-rmsd" or "fnat" or "dockq" too ? You could even directly use the CAPRI class to perform all theses computations ?

Can you think of how those metrics would be useful for filtering? For RMDS the idea is to remove model from the workflow - since no other way to do that without manual tweaking, but I don’t see how filtering by fnat or dockq would be useful, so not sure if it’s worth the time.

Maybe let’s discuss this Wednesday? Will be easier in person

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants