Open
Conversation
…ents including third part tools
…ut' jupyter notebook For the Jupyter notebook that contains all the development of these functions, see the directory 'notebooks' within the root of the pyrewton repository
…obnobMancer/pyrewton into parsing_prediction_tool_output
Codecov Report
@@ Coverage Diff @@
## master #45 +/- ##
===========================================
- Coverage 93.67% 76.60% -17.07%
===========================================
Files 20 20
Lines 759 932 +173
===========================================
+ Hits 711 714 +3
- Misses 48 218 +170 |
…e function from main()
…to each Query class instance
…puts to a function separate from main()
…o get_fasta_paths out of main()
…, statistical evaluationa and writing summary reports from main() and then program closes
…voking each prediction tool is invoked by its own function
…put files to separate functions
add quality checking to retrieval of domain ranges. Improve the retrieval of retrieving EC numbers by checking if multiple to given, how they are given and collecting all EC numbers and separating them by ', '. Factorise out the many additional tasks to separate functions
…aces after separators in function calls
add wuality checking, checking EC# are formated correctly, and standardised EC numbers so missing digits are represented by '-'. Standardise the domain range so ranges are spearated by '..'. Add checking of CAZy family and subfamily names. Log any irregularities
…to separate scripts
…, add logging of irregularities
…if can't open output files
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Notes:
Add the following new functions to pyrewton.cazymes.prediction.parse for the standardising/parsing the output from the prediction tools:
A dataframe of the output is created each for dbCAN (containing the consensus result, defined as all CAZy families that at least 2 tools predict for a query protein sequence), HMMER, Hotpep, DIAMOND, CUPP and eCAMI.
For each prediction tool the following data is retrieved:
dbCAN: CAZy family, CAZy subfamily (can predict multiple domains per protein)
HMMER: CAZy family, CAZy subfamily (can predict multiple domains per protein), domain ranges (the starting and end amino acid of the domain)
Hotpep: CAZy family, CAZy subfamily (can predict multiple domains per protein)
DIAMOND: CAZy family, CAZy subfamily (can predict multiple domains per protein)
CUPP: CAZy family, CAZy subfamily, predicated EC number and domain range
eCAMI: CAZy family, CAZy subfamily (can predict multiple domains per protein), EC number, here the best result is listed under the CAZy fam and subfam headings and additional domains under "additional_domains"