Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
122 commits
Select commit Hold shift + click to select a range
d4e5954
start to write out shell script for install pyrewton and all requirem…
HobnobMancer Nov 3, 2020
1e98a1d
add notebook for planning parsing of prediction tools output
HobnobMancer Nov 9, 2020
c062839
Delete finding_a_consensus_result.ipynb
HobnobMancer Nov 9, 2020
077b9e4
add functions to parse tool output from 'parsing_prediction_tool_outp…
HobnobMancer Nov 9, 2020
f28c381
Merge branch 'parsing_prediction_tool_output' of https://github.com/H…
HobnobMancer Nov 9, 2020
73e4afd
remove print statements, reorder functions
HobnobMancer Nov 9, 2020
70891eb
factorise building prediction queries and invoking tools to a separat…
HobnobMancer Nov 10, 2020
54f80f7
remove white space
HobnobMancer Nov 10, 2020
332393d
add dict instance to data class to store output file paths as they ar…
HobnobMancer Nov 10, 2020
674b03a
add creation, retrieval and adding of path to the tools output files …
HobnobMancer Nov 10, 2020
f59c92e
build dictionary containing path to tools output files
HobnobMancer Nov 10, 2020
32d449b
add tqdm progress bars
HobnobMancer Nov 10, 2020
3b3a621
add tqdm progress bars
HobnobMancer Nov 10, 2020
c319aab
fix typo and line length issues
HobnobMancer Nov 10, 2020
d9186c8
factorise out coordinating the standardising the prediction tools out…
HobnobMancer Nov 10, 2020
cdaecd7
reorder functions so follows order of data processing and move call t…
HobnobMancer Nov 13, 2020
7088a86
make single call to function that will coordinate standarising output…
HobnobMancer Nov 13, 2020
482c611
update variable names so they are consistent and clearer as to what t…
HobnobMancer Nov 13, 2020
0b817c4
factorise out preparing args for and invoking prediction tools, so in…
HobnobMancer Nov 13, 2020
58af3e5
updae retrieval of output from invoke_prediction_tools()
HobnobMancer Nov 13, 2020
e97bb2c
move coordiantion of standardising output and retrieving paths to out…
HobnobMancer Nov 13, 2020
7e81008
update parameters in functin call
HobnobMancer Nov 13, 2020
cbb0767
add writing out of dataframes to disk
HobnobMancer Nov 16, 2020
9895bdc
add quality checking to parsing of cupp output
HobnobMancer Nov 16, 2020
47fd212
import, correct syntax error of logger, and remove white space
HobnobMancer Nov 16, 2020
7702516
add missing params to function call and remove whitespace, and add sp…
HobnobMancer Nov 16, 2020
838f2e9
add quality checking for parsing ecami
HobnobMancer Nov 17, 2020
830474b
shorten long lines and remove white space
HobnobMancer Nov 17, 2020
04e4d23
shorten long line
HobnobMancer Nov 17, 2020
d21b864
add logging and checking if cannot find ecami or cupp output file
HobnobMancer Nov 17, 2020
89bb29b
move all dbCAN parsing functions to a separate script
HobnobMancer Nov 17, 2020
4bffa9e
move functions for parsing CUPP output to a separate script
HobnobMancer Nov 17, 2020
62bc443
move functions for parsing eCAMI output to a separate script
HobnobMancer Nov 17, 2020
ee60358
remove redundant functions copies from file as these have been moved …
HobnobMancer Nov 17, 2020
44bd2f8
add logging if EC number is in non-standard format when parsing CUPP
HobnobMancer Nov 17, 2020
dcc8872
add quality checking for retrieving ec numbers and cazy (sub)families…
HobnobMancer Nov 17, 2020
3a43e1d
add missing ')' and remove white space
HobnobMancer Nov 17, 2020
ab452de
remove comments adding reminders to add logging
HobnobMancer Nov 17, 2020
6f7af3b
add detail of consequences of catching error to logger messages
HobnobMancer Nov 17, 2020
cc872ae
add details of consequences of error catching and add error catching …
HobnobMancer Nov 17, 2020
7532011
update function calls for parsing prediction tool outputs
HobnobMancer Nov 17, 2020
3877d17
add check if dataframe is None to raise error and pass
HobnobMancer Nov 17, 2020
7dbeb63
Delete process_cazyme_predictions.py
HobnobMancer Nov 17, 2020
5732985
Delete cazyme_class_evaluation.py
HobnobMancer Nov 17, 2020
e63e2f8
Delete binary_cazyme_evaluation.py
HobnobMancer Nov 17, 2020
c2785e9
Delete __init__.py
HobnobMancer Nov 17, 2020
542f617
add get_uniprot_proteins configuration files to package data
HobnobMancer Nov 23, 2020
83a242a
make shell script executable
HobnobMancer Nov 23, 2020
a719e36
remove installation of requirements and pyrewton from shell script
HobnobMancer Nov 23, 2020
a49ec43
update installation instructions
HobnobMancer Nov 23, 2020
dbb5d63
add all requirements to setup.py
HobnobMancer Nov 23, 2020
c0ffe81
add dbcan to requirements in setup.py
HobnobMancer Nov 23, 2020
8c025c9
remove installing dbcan from shell script
HobnobMancer Nov 23, 2020
8d72c12
fixing changing directory to ensure installation is correct for ecami…
HobnobMancer Nov 23, 2020
386b26e
remove print statements
HobnobMancer Nov 23, 2020
29dafa2
update requirements.txt
HobnobMancer Nov 23, 2020
885085d
Merge branch 'create_installation_script' of https://github.com/Hobno…
HobnobMancer Nov 23, 2020
e66af57
add version requirements
HobnobMancer Nov 23, 2020
7eaab4c
update python and pip in circleCI config
HobnobMancer Nov 23, 2020
1db2ca2
retrieve asbpath to dir containing setup.py, remove reliance on relat…
Nov 24, 2020
778bd6a
remove movement through relative paths
Nov 24, 2020
e27d591
fix typo in requirements
HobnobMancer Nov 24, 2020
57ff515
direct eCAMI cloning to fix bug of not renaming eCAMI to ecami
Nov 24, 2020
1dd2bf7
Merge branch 'create_installation_script' of https://github.com/Hobno…
Nov 24, 2020
accf115
update instructions on how to install cupp, ecami and dbcan
HobnobMancer Nov 24, 2020
e9cf888
correct command call in README for installing CPTs
HobnobMancer Nov 24, 2020
3124dcb
Merge pull request #46 from HobnobMancer/create_installation_script
HobnobMancer Nov 24, 2020
ce23a8c
add notebook for planning parsing of prediction tools output
HobnobMancer Nov 9, 2020
1c9afe4
add functions to parse tool output from 'parsing_prediction_tool_outp…
HobnobMancer Nov 9, 2020
e410c12
Delete finding_a_consensus_result.ipynb
HobnobMancer Nov 9, 2020
9f77937
remove print statements, reorder functions
HobnobMancer Nov 9, 2020
e3b8aef
factorise building prediction queries and invoking tools to a separat…
HobnobMancer Nov 10, 2020
328e487
remove white space
HobnobMancer Nov 10, 2020
7b4e0e6
add dict instance to data class to store output file paths as they ar…
HobnobMancer Nov 10, 2020
36bd12b
add creation, retrieval and adding of path to the tools output files …
HobnobMancer Nov 10, 2020
d5a9e97
build dictionary containing path to tools output files
HobnobMancer Nov 10, 2020
a0429fd
add tqdm progress bars
HobnobMancer Nov 10, 2020
d5263fe
add tqdm progress bars
HobnobMancer Nov 10, 2020
d2f2e44
fix typo and line length issues
HobnobMancer Nov 10, 2020
10cdb2c
factorise out coordinating the standardising the prediction tools out…
HobnobMancer Nov 10, 2020
5824bd9
reorder functions so follows order of data processing and move call t…
HobnobMancer Nov 13, 2020
a9dfb91
make single call to function that will coordinate standarising output…
HobnobMancer Nov 13, 2020
8e9e5d2
update variable names so they are consistent and clearer as to what t…
HobnobMancer Nov 13, 2020
7339c4e
factorise out preparing args for and invoking prediction tools, so in…
HobnobMancer Nov 13, 2020
1b1c684
updae retrieval of output from invoke_prediction_tools()
HobnobMancer Nov 13, 2020
c0f97a8
move coordiantion of standardising output and retrieving paths to out…
HobnobMancer Nov 13, 2020
cb4d155
update parameters in functin call
HobnobMancer Nov 13, 2020
b703c7f
add writing out of dataframes to disk
HobnobMancer Nov 16, 2020
0187eb0
add quality checking to parsing of cupp output
HobnobMancer Nov 16, 2020
9764193
import, correct syntax error of logger, and remove white space
HobnobMancer Nov 16, 2020
f02b3d3
add missing params to function call and remove whitespace, and add sp…
HobnobMancer Nov 16, 2020
554b2c7
add quality checking for parsing ecami
HobnobMancer Nov 17, 2020
323eb4b
shorten long lines and remove white space
HobnobMancer Nov 17, 2020
9d5d34c
shorten long line
HobnobMancer Nov 17, 2020
30c4e13
add logging and checking if cannot find ecami or cupp output file
HobnobMancer Nov 17, 2020
66c8ca6
move all dbCAN parsing functions to a separate script
HobnobMancer Nov 17, 2020
5fbb6b2
move functions for parsing CUPP output to a separate script
HobnobMancer Nov 17, 2020
7941f05
move functions for parsing eCAMI output to a separate script
HobnobMancer Nov 17, 2020
2540973
remove redundant functions copies from file as these have been moved …
HobnobMancer Nov 17, 2020
04f1b01
add logging if EC number is in non-standard format when parsing CUPP
HobnobMancer Nov 17, 2020
9235fc9
add quality checking for retrieving ec numbers and cazy (sub)families…
HobnobMancer Nov 17, 2020
3066ea3
add missing ')' and remove white space
HobnobMancer Nov 17, 2020
30dfd8c
remove comments adding reminders to add logging
HobnobMancer Nov 17, 2020
fc6f533
add detail of consequences of catching error to logger messages
HobnobMancer Nov 17, 2020
c457bcb
add details of consequences of error catching and add error catching …
HobnobMancer Nov 17, 2020
132011c
update function calls for parsing prediction tool outputs
HobnobMancer Nov 17, 2020
40e3ff9
add check if dataframe is None to raise error and pass
HobnobMancer Nov 17, 2020
48d9a23
Delete process_cazyme_predictions.py
HobnobMancer Nov 17, 2020
7fba26c
Delete cazyme_class_evaluation.py
HobnobMancer Nov 17, 2020
e8b831c
Delete binary_cazyme_evaluation.py
HobnobMancer Nov 17, 2020
ad1caed
Delete __init__.py
HobnobMancer Nov 17, 2020
87919e0
removed uneeded empty dicyionary
Nov 24, 2020
476ccac
correct moving up dir from ../ to ..
Nov 25, 2020
c6b545c
update capturing taxid to not be NCBI specific
Nov 25, 2020
8c385f1
add function call () to end of iterdir
Nov 25, 2020
b7eda97
fix catching and logging when no output files are found
Nov 25, 2020
81ac74a
remove print statements
Nov 25, 2020
0f57f94
correct changing dir using ../ to ..
Nov 25, 2020
0b64756
correct name of CUPP prediction python file
Nov 25, 2020
3eff8e6
add print statemetns to track progress
Nov 26, 2020
ad9f0f9
save local changes, fixing merge conflict
Nov 26, 2020
af6d76b
write CTP output to terminal and log file
Nov 27, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ orbs:
jobs:
build-and-test:
docker:
- image: circleci/python:3.7.5
- image: circleci/python:3.8.5

working_directory: ~/phd_eastbio

Expand All @@ -23,7 +23,7 @@ jobs:
command: |
python3 -m venv venv
. venv/bin/activate
pip install -r requirements.txt
pip3 install -r requirements.txt

- save_cache:
paths:
Expand All @@ -34,7 +34,7 @@ jobs:
name: install package
command: |
. venv/bin/activate
pip install -e .
pip3 install -e .

- run:
name: run tests
Expand Down
22 changes: 9 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,28 +44,24 @@ Development plans are stored within the [Wiki](https://github.com/HobnobMancer/p

## Installation

1. Navigate the directory you wish to store pyrewton in, then clone this repository.
`git clone https://github.com/HobnobMancer/pyrewton.git`
The easiest method is to use pip to install `pyrewton` and all requirements.

1. Create a virtual environment with dependencies, then activate the environment.
1. Create a virtual environment with dependencies, then activate the environment - _where venv_name is an chosen name for the virtual environment_
`conda create -n <venv_name> python=3.8 diamond hmmer prodigal -c conda-forge -c bioconda`
`conda activate <venv_name>`

2. Install all requirements from requirements.txt file. The requirements.txt file is stored in the root of this repository.
`pip3 install -r <path to requirements.txt file>`
2. Clone the repository
`git clone https://github.com/HobnobMancer/pyrewton.git`

3. Install pyrewton.
3. Install pyrewton
`pip3 install -e <path to directory containing setup.py file>`
Do not forget to use the **-e** option when install using pip3, otherwise each time pyrewton is invoked a ModuleNotFound error will be raised. Pass the path to the **directory** containign the setup.py file not the path to the setup.py file; if you are currently in the root directory of the repoistory where the file is located, simply use '.' to indicate the current working directory.

4. Install third party tools.
Pyrewton invokves 3 third party tools: dbCAN, CUPP and eCAMI.

To install dbCAN follow the instructions within their [GitHub repository](https://github.com/linnabrown/run_dbcan), **BUT ignore** steps 1 and 2 of their installtion guide, becuase the necessary virtual environment was already created in the second step of this installation and it meets all requirements of dbCAN. Install dbCAN within **'pyrewton/cazymes/prediction/tools/dbcan'** directory within the repository, otherwise pyrewton will not be able to find the tool.

To install eCAMI follow the instructions within their [GitHub respository](https://github.com/yinlabniu/eCAMI). eCAMI must be installed within the directory pyrewton/cazymes/prediction/tools/ecami. Following the method from the eCAMI repository will write eCAMI to 'pyrewton/cazymes/prediction/tools/ecami/**eCAMI**', to avoid this perform the installation within 'pyrewton/cazymes/prediction/tools' and rename 'eCAMI' to 'ecami', thus install eCAMI in **'pyrewton/cazymes/prediction/tools/ecami'**.
4. Install the third party CAZyme prediction tools
The easiest way to do this, and ensure they are installed into the correct directories is to use:
`python3 <path to pyrewton setup.py> cpt -p .`

To install CUPP download the CUPP files from the [DTU Bioengineering server](https://www.bioengineering.dtu.dk/english/ResearchNy/Research-Sections/Section-for-Protein-Chemistry-and-Enzyme-Technology/Enzyme-Technology/CUPP), and store the files in **'pyrewton/cazymes/prediction/tools/cupp'**. It is not necessary to download all the files becuase the .tar and .tar.gz directories each contain all the files, therefore, download either the .tar _or_ .tar.gz directories and unpackage them or download all the files located within 'CUPP_v1.0.14'.
For alternative methods of installation see the full documentation at [Read the Docs](https://phd-project-scripts.readthedocs.io/en/latest/).

<p>&nbsp;</p>

Expand Down
37 changes: 37 additions & 0 deletions installation.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#!/bin/bash

# :args $1: absolute path to pyrewton prediction/tools dir

# Installing the CAZyme prediciton tools dbCAN, eCAMI and CUPP
# Requirements for these tools are installed via requirements.txt

cd $1

# install dbCAN
test -d dbcan || mkdir dbcan
cd dbcan

test -d db || mkdir db
cd db \
&& wget http://bcb.unl.edu/dbCAN2/download/CAZyDB.07312019.fa.nr && diamond makedb --in CAZyDB.07312019.fa.nr -d CAZy \
&& wget http://bcb.unl.edu/dbCAN2/download/Databases/dbCAN-HMMdb-V8.txt && mv dbCAN-HMMdb-V8.txt dbCAN.txt && hmmpress dbCAN.txt \
&& wget http://bcb.unl.edu/dbCAN2/download/Databases/tcdb.fa && diamond makedb --in tcdb.fa -d tcdb \
&& wget http://bcb.unl.edu/dbCAN2/download/Databases/tf-1.hmm && hmmpress tf-1.hmm \
&& wget http://bcb.unl.edu/dbCAN2/download/Databases/tf-2.hmm && hmmpress tf-2.hmm \
&& wget http://bcb.unl.edu/dbCAN2/download/Databases/stp.hmm && hmmpress stp.hmm \
&& cd ../ && wget http://bcb.unl.edu/dbCAN2/download/Samples/EscheriaColiK12MG1655.fna \
&& wget http://bcb.unl.edu/dbCAN2/download/Samples/EscheriaColiK12MG1655.faa \
&& wget http://bcb.unl.edu/dbCAN2/download/Samples/EscheriaColiK12MG1655.gff
# To check the installtion of dbCAN has worked, navigate to the dbCAN directory and run:
# run_dbcan.py EscheriaColiK12MG1655.fna prok --out_dir output_EscheriaColiK12MG1655

# download eCAMI
cd $1
git clone https://github.com/zhanglabNKU/eCAMI.git ecami

# download CUPP
curl -o CUPP_v1.0.14.tar.gz "https://files.dtu.dk/fss/public/link/public/stream/read/CUPP_v1.0.14.tar.gz?linkToken=hLin6ni4p-SWuKfp&itemName=CUPP_program"
tar -xzf CUPP_v1.0.14.tar.gz
mv CUPP_v1.0.14 cupp
# delete old file
rm CUPP_v1.0.14.tar.gz
1,170 changes: 1,170 additions & 0 deletions notebooks/planning_parsing_prediction_tool_output/143535_output.txt

Large diffs are not rendered by default.

Loading