Parsing prediction tool output by HobnobMancer · Pull Request #45 · HobnobMancer/pyrewton

HobnobMancer · 2020-11-09T11:21:57Z

Notes:

still need to add the unit tests
development for these functions is within the Jupyter notebook 'parsing_prediction_tools_output', which is stored in HTML and .ipyn format within the directory ./notebooks/planning_parsing_prediction_tool_output, which additionally contains the output files from the tools that were used to develop the new functions
have no added calls to these function from the main script (predict_cazymes.py)

Add the following new functions to pyrewton.cazymes.prediction.parse for the standardising/parsing the output from the prediction tools:

parse_dbcan_output()
add_hotpep_ec_predictions()
parse_hmmer_output()
parse_hotpep_output()
parse_diamond_output()
get_dbcan_consensus()
add_hotpep_ec_predictions()
parse_cupp_output()
parse_ecami_output()

A dataframe of the output is created each for dbCAN (containing the consensus result, defined as all CAZy families that at least 2 tools predict for a query protein sequence), HMMER, Hotpep, DIAMOND, CUPP and eCAMI.

For each prediction tool the following data is retrieved:
dbCAN: CAZy family, CAZy subfamily (can predict multiple domains per protein)
HMMER: CAZy family, CAZy subfamily (can predict multiple domains per protein), domain ranges (the starting and end amino acid of the domain)
Hotpep: CAZy family, CAZy subfamily (can predict multiple domains per protein)
DIAMOND: CAZy family, CAZy subfamily (can predict multiple domains per protein)
CUPP: CAZy family, CAZy subfamily, predicated EC number and domain range
eCAMI: CAZy family, CAZy subfamily (can predict multiple domains per protein), EC number, here the best result is listed under the CAZy fam and subfam headings and additional domains under "additional_domains"

…ents including third part tools

…ut' jupyter notebook For the Jupyter notebook that contains all the development of these functions, see the directory 'notebooks' within the root of the pyrewton repository

…obnobMancer/pyrewton into parsing_prediction_tool_output

codecov · 2020-11-09T11:24:27Z

Codecov Report

Merging #45 (70891eb) into master (d2e7066) will decrease coverage by 17.06%.
The diff coverage is 1.73%.

@@             Coverage Diff             @@
##           master      #45       +/-   ##
===========================================
- Coverage   93.67%   76.60%   -17.07%     
===========================================
  Files          20       20               
  Lines         759      932      +173     
===========================================
+ Hits          711      714        +3     
- Misses         48      218      +170

…e function from main()

…e created

…to each Query class instance

…puts to a function separate from main()

…o get_fasta_paths out of main()

…, statistical evaluationa and writing summary reports from main() and then program closes

…hey relate

…voking each prediction tool is invoked by its own function

…put files to separate functions

add quality checking to retrieval of domain ranges. Improve the retrieval of retrieving EC numbers by checking if multiple to given, how they are given and collecting all EC numbers and separating them by ', '. Factorise out the many additional tasks to separate functions

…aces after separators in function calls

add wuality checking, checking EC# are formated correctly, and standardised EC numbers so missing digits are represented by '-'. Standardise the domain range so ranges are spearated by '..'. Add checking of CAZy family and subfamily names. Log any irregularities

…to separate scripts

…, add logging of irregularities

…if can't open output files

HobnobMancer and others added 5 commits November 3, 2020 13:16

start to write out shell script for install pyrewton and all requirem…

d4e5954

…ents including third part tools

add notebook for planning parsing of prediction tools output

1e98a1d

Delete finding_a_consensus_result.ipynb

c062839

add functions to parse tool output from 'parsing_prediction_tool_outp…

077b9e4

…ut' jupyter notebook For the Jupyter notebook that contains all the development of these functions, see the directory 'notebooks' within the root of the pyrewton repository

Merge branch 'parsing_prediction_tool_output' of https://github.com/H…

f28c381

…obnobMancer/pyrewton into parsing_prediction_tool_output

HobnobMancer added enhancement New feature or request update/expand feature labels Nov 9, 2020

HobnobMancer requested a review from widdowquinn November 9, 2020 11:21

remove print statements, reorder functions

73e4afd

HobnobMancer added 20 commits November 10, 2020 10:26

factorise building prediction queries and invoking tools to a separat…

70891eb

…e function from main()

remove white space

54f80f7

add dict instance to data class to store output file paths as they ar…

332393d

…e created

add creation, retrieval and adding of path to the tools output files …

674b03a

…to each Query class instance

build dictionary containing path to tools output files

f59c92e

add tqdm progress bars

32d449b

add tqdm progress bars

3b3a621

fix typo and line length issues

c319aab

factorise out coordinating the standardising the prediction tools out…

d9186c8

…puts to a function separate from main()

reorder functions so follows order of data processing and move call t…

cdaecd7

…o get_fasta_paths out of main()

make single call to function that will coordinate standarising output…

7088a86

…, statistical evaluationa and writing summary reports from main() and then program closes

update variable names so they are consistent and clearer as to what t…

482c611

…hey relate

factorise out preparing args for and invoking prediction tools, so in…

0b817c4

…voking each prediction tool is invoked by its own function

updae retrieval of output from invoke_prediction_tools()

58af3e5

move coordiantion of standardising output and retrieving paths to out…

e97bb2c

…put files to separate functions

update parameters in functin call

7e81008

add writing out of dataframes to disk

cbb0767

import, correct syntax error of logger, and remove white space

47fd212

add missing params to function call and remove whitespace, and add sp…

7702516

…aces after separators in function calls

HobnobMancer and others added 30 commits November 25, 2020 09:35

shorten long lines and remove white space

323eb4b

shorten long line

9d5d34c

add logging and checking if cannot find ecami or cupp output file

30c4e13

move functions for parsing CUPP output to a separate script

5fbb6b2

move functions for parsing eCAMI output to a separate script

7941f05

remove redundant functions copies from file as these have been moved …

2540973

…to separate scripts

add logging if EC number is in non-standard format when parsing CUPP

04f1b01

add quality checking for retrieving ec numbers and cazy (sub)families…

9235fc9

…, add logging of irregularities

add missing ')' and remove white space

3066ea3

remove comments adding reminders to add logging

30dfd8c

add detail of consequences of catching error to logger messages

fc6f533

add details of consequences of error catching and add error catching …

c457bcb

…if can't open output files

update function calls for parsing prediction tool outputs

132011c

add check if dataframe is None to raise error and pass

40e3ff9

Delete process_cazyme_predictions.py

48d9a23

Delete cazyme_class_evaluation.py

7fba26c

Delete binary_cazyme_evaluation.py

e8b831c

Delete __init__.py

ad1caed

removed uneeded empty dicyionary

87919e0

correct moving up dir from ../ to ..

476ccac

update capturing taxid to not be NCBI specific

c6b545c

add function call () to end of iterdir

8c385f1

fix catching and logging when no output files are found

b7eda97

remove print statements

81ac74a

correct changing dir using ../ to ..

0f57f94

correct name of CUPP prediction python file

0b64756

add print statemetns to track progress

3eff8e6

save local changes, fixing merge conflict

ad9f0f9

write CTP output to terminal and log file

af6d76b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing prediction tool output#45

Parsing prediction tool output#45
HobnobMancer wants to merge 122 commits intomasterfrom
parsing_prediction_tool_output

HobnobMancer commented Nov 9, 2020 •

edited

Loading

Uh oh!

codecov bot commented Nov 9, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HobnobMancer commented Nov 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HobnobMancer commented Nov 9, 2020 •

edited

Loading

codecov bot commented Nov 9, 2020 •

edited

Loading