InfereClaDR #68

kostyat · 2018-07-25T03:11:42Z

This code now runs front to back for me on my yeast data set. It creates a version of the inferelator that first fits optimal RNA half-lives for each gene and condition cluster (those have to be provided into the system currently) and then uses the vector of predicted taus to predict a network of interactions for each condition cluster.

This is the first version that actually works, so feel free to look over the changes and even try running it yourself if you want. But there are still many issues that I need to address before it can be merged into master:

Because we use kvs.view for sharing mi_clr info from rank==0 worker with others, kvs.put keeps adding more and more mi_clr matrices to each worker. Because I have a few nested loops, this means that when I would never be able to do 17 taus, 20 random seeds, (50 bootstraps - although the number of bootstraps does not affect this), and 4 condition condition clusters, because the mi_clr variables accumulate. I think this is a similar issue to one of Nick's pull requests tries to address, but I am confused about that one.
I need to add a last step where I merge the final predicted networks from all four clusters (with already optimized taus), and put all of the outputs in one folder (currently there is one folder for the output of the optimization step, and one folder for each condition cluster).
I need to make sure that I get the same results I got using R (Tchourine et al 2018), because currently the results are looking different, and I think this might be because of incorrectly doing the leave-out set of the prior, or something like that.
I need to add unit tests for the new code (or maybe someone will later).
I need to figure out a way to make it easy to download the expression data for each condition cluster, since currently the expression files are too big for the yeast dataset. But that information is also contained in the meta data so one can just subset the full expression dataset using the conditions in each condition-cluster-specific meta data file.
Eventually the goal is to merge the last step with Dayanne's code. This should only require inheriting from the InfereCLaDR class and rewriting one function (run()).

Note that I'm using one more module that has not been used in the Inferelator before: xarray. So that probably needs to be added to the file that downloads all the necessary modules (which would be necessary for Travis to pass, for example)

…d Standard. This significantly reduces run time for yeast and gives the same results in terms of calculated AUPR (which is how I estimate RNA half-lives).

…en GS is split into prior and validation set, because in such a case, good predictions that are artificially removed from the validation set should be put at the end of the list of confidence scores. Although maybe the better way to do this would not be to copy/paste and modify that function, but to add an extra small function in the original function, and only change that small function.

… only need to replace the filtering step (creating filtered gold standard and confidences) the Gold Standard split class

…as well as the gene names for the 5 gene clusters for yeast. note that also expression data cluster files are required, but they probably will not fit on the GitHub server so they would have to be downloaded separately

… (previously it was only by condition cluster); started writing code that uses the xarray module to make a 4D DataArray of AUPRs as a function of condition and gene cluster, random seed, and tau. Next step is plotting AUPR as a function of tau for each bicluster and seed, and recording the optimal (median across seeds of max across taus) tau for each bicluster.

… every prior resample, and predicting the optimal half-life for each bicluster by taking the median across prior resamples

…and 1 node

…workflow, which added the prior-splitting class, which is already added in the inferecladr class in the first place

…t can be modified by other versions; i.e. for the InfereCLaDR to have tau as a vector instead of a number

…_variable

… compute_response_variable

…ld standard) to the gene clusters

…) from inside run() into general variables of class BBSR_TFA_Workflow, so that they can be modified in child classes (in particular, with the different calculation of tau)

…_TFA_Workflow(). Instead, I initiate different instances of BBSR_TFA_Workflow(), and one of them is modified to be with the GS_split. Also added PythonDRDriver_with_tau_vector that inherits from PythonDRDriver but allows tau to be a vector. Also now there is a run() function that first optimizes taus and then uses those predicted taus for a full run on each condition cluster

… process initiates with workflow.run()

…rices that accumulated at the end of every iteration of the loop in optimize_taus(). also removed an empty dimension that caused if you run this for one condition cluster or one gene cluster

kostyat · 2018-09-06T04:34:24Z

I finally finished running each cluster on the NYU HPC, but because of the KVS memory accumulation bug, I had to run each condition cluster separately, and was only able to go up to around 10 prior resamples (ideally would be 20), 15 different taus (ideally it would be 17), and 30 expression data bootstraps (ideally it would be 50) for each of them. But the good news is that the bicluster-specific AUPR-vs-half-life curves, as well as the optimal taus for each bicluster, closely match the results I got in my InfereCLaDR paper using R. This means that the code was implemented correctly in principle and that the original predictions were robust. So now the main goal should be eliminating the memory accumulation bug and hopefully adding some unit tests.

kostyat added 18 commits June 11, 2018 20:27

added a workflow for InfereCLaDR. still unfinished

6b4e991

added an option to only run for genes that have regulators in the Gol…

141cc41

…d Standard. This significantly reduces run time for yeast and gives the same results in terms of calculated AUPR (which is how I estimate RNA half-lives).

broke calculate_precision_recall up into smaller functions, so that I…

30817d6

… only need to replace the filtering step (creating filtered gold standard and confidences) the Gold Standard split class

added plotting of RNA half-life distributions for every bicluster and…

400f086

… every prior resample, and predicting the optimal half-life for each bicluster by taking the median across prior resamples

removed extra kvs files; kvs still doesn't work for more than 1 task …

b99413c

…and 1 node

removed redundancy with creating an extra class in yeast_inferecladr_…

1997c11

…workflow, which added the prior-splitting class, which is already added in the inferecladr class in the first place

made the calculation of interp_res into a separate function so that i…

d944f52

…t can be modified by other versions; i.e. for the InfereCLaDR to have tau as a vector instead of a number

forgot to add 'self' before calling the new function compute_response…

ff386ae

…_variable

forgot to remove 'self' in the argument when calling the new function…

8fb8849

… compute_response_variable

this file has the mappings of every gene (not just the ones in the go…

4e14124

…ld standard) to the gene clusters

moved the drivers (mi_R, bbsr_python, and design_response_translation…

8c6fe40

…) from inside run() into general variables of class BBSR_TFA_Workflow, so that they can be modified in child classes (in particular, with the different calculation of tau)

removed the extra class that inherited bbsr_tfa and gs_split; now the…

220e0ee

… process initiates with workflow.run()

added a loop over bootstraps that used kvs.get() to get rid of MI mat…

f1dcc18

…rices that accumulated at the end of every iteration of the loop in optimize_taus(). also removed an empty dimension that caused if you run this for one condition cluster or one gene cluster

minor changes

8a00de0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

InfereClaDR #68

InfereClaDR #68

Uh oh!

kostyat commented Jul 25, 2018

Uh oh!

kostyat commented Sep 6, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

InfereClaDR #68

Are you sure you want to change the base?

InfereClaDR #68

Uh oh!

Conversation

kostyat commented Jul 25, 2018

Uh oh!

kostyat commented Sep 6, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants