Skip to content

Dev/refactor clustering#71

Open
jorenretel wants to merge 7 commits intors-station:mainfrom
jorenretel:dev/refactor_clustering
Open

Dev/refactor clustering#71
jorenretel wants to merge 7 commits intors-station:mainfrom
jorenretel:dev/refactor_clustering

Conversation

@jorenretel
Copy link

Hi,

thanks for this great package! I have a PR here that modifies scripts/run_msacluster.py and scripts/run_msascore.py. The changes that I propose here don't actually change the way you call these command line scripts. I left that exactly the same, so it should not break anyones code using these scripts.

What I did:

Adding python function for use besides the command line.
Create a "normal" python function with a signature allowing this procedure to be called easily from another python process without having to you over the command line. The main function now just parses the arguments and internally calls that function. As said, the CLI did not change.

Change to pathlib in clustering scripts.
Replace some string editing and os.path manipulations to pathlib. The main reason for this is that I got tripped up a bit when I was trying to follow this example: https://rocket-9.gitbook.io/rocket-docs/latent-state-of-a-human-serpin-with-msa-subsampling. The scoring script kept on not finding the clusters we just created and then I realised that the slash behind "-i serpin_clusters/" was really important here:

rk.score \
  . \                      # Working Path
  1lj5 \                   # File_id
  -i serpin_clusters/      # Prefix for msas to use, Working Path will prepend 
  -o serpin_scores/        # Name of output directory, Working Path will prepend
  --init_recycling 20      # Initial recrycling, set to be 1 for fast process
  --score_fullmsa          # Flag on to also score the full msa, assume full msa is in Working Path/alignments/      
  --datamode xray          # Specify whether this is a [cryoem/xray] dataset

The reason was that the paths are string formatted here: https://github.com/rs-station/ROCKET/blob/de3041a08265a7e4933960aa969974e3453f146c/rocket/scripts/run_msascore.py#L298C1-L299C1 . Moving away from string editing prevents these finicky behavior with slashes.

If you want I can also move the new underlying functions inside of new modules directly underneath rocket and import them in the scripts.

have a nice day,

Joren

@minhuanli
Copy link
Collaborator

Hi Joren,

Thanks for the PR! Wrapping the functions for Python interaction is a good idea! And catching the pathlib issue is much appreciated, we were aware of that problem but hadn’t had the opportunity to address it.

As you suggested, it would be better to move the functions run_msa_cluster and run_msa_score into two new modules directly under the rocket named msa_cluster.py and msa_score.py. In rocket/__init__.py, add the following lines:

from rocket.msa_cluster import run_msa_cluster  
from rocket.msa_score import run_msa_score  

Then update the scripts to import these functions directly from rocket.

Let me know what you think. If it makes sense and you have the capacity to make the change, we’ll review, approve and merge to main.

Thanks for your contribution!

Minhuan

@jorenretel jorenretel marked this pull request as draft February 6, 2026 10:14
@jorenretel jorenretel marked this pull request as ready for review February 6, 2026 10:24
@minhuanli minhuanli self-requested a review February 11, 2026 17:39
@minhuanli
Copy link
Collaborator

Thank you! The PR looks good to me.

There are a few leftover code quality issues from our previous implementation—such as using a file open context manager and replacing the Biopython with Biotite if possible—that we’ll address in a future update, as they’re outside the scope of this PR.

I’ve approved it and will merge by the end of today if no other issues are found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants