Skip to content

Reproducing EDS1 (14856) results #7

@cmatKhan

Description

@cmatKhan

Hi all,

I am interested in reproducing the chexmix "all events" results from the yeastepigenome.org ftp server. My goal is to get metrics from more of the data, which seems to mean lowering --minfold and --q to 0.0. I did try that initially, but ran into an error. However, when I removed those settings and used the command below, which is an amalgam of the settings from the methods section from the paper:

https://doi.org/10.1038/s41586-021-03314-8

and the example provided in this repo, the output is much different than the FTP output.

In the cmd below, the files sacCer3.info and sacCer3.back are downloaded from the links provided in this github repo README (not the files provided in the test example).

The file in --exlude is from here: https://github.com/CEGRcode/2021-Rossi_Nature/blob/master/02_References_and_Features_Files/ChExMix_Peak_Filter_List_190612.bed

And the filtered bam is from https://www.datacommons.psu.edu/download/eberly/pughlab/yeast-epigenome-project/14856_YEP.zip

java -Xmx10G \
    -jar chexmix.v0.52.public.jar \
    --geninfo sacCer3.info \
    --threads 16 \
    --expt $INPUT \
    --ctrl masterNoTag_20180928.bam \
    --format BAM \
    --back sacCer3.back \
    --noread2 \
    --scalewin 1000 \
    --minmodelupdateevents 50 \
    --fixedalpha 0 \
    --minmodelupdaterefs 25 \
    --lenientplus \
    --exclude ChExMix_Peak_Filter_List_190612.bed \
    --nomotifs \
    --out $OUTPUT > \
    $OUTPUT.out 2>&1

I launched this like so:

./cmd.sh 14856_YEP/14856_filtered.bam

and got the following output:

ChExMix version 0.52


WARNING: No exclude regions detected in regions file ChExMix_Peak_Filter_List_190612.bed. Attempting to load as BED file.
No genome sequence data was provided with --seq, so motif-finding and the motif prior are switched off.
Processing HitLoaders for:	DEFAULT	DEFAULT
Processing HitLoaders for:	experiment	rep1
Loading data from DEFAULT:DEFAULT:control	Loaded.
Loading data from experiment:rep1:signal	Loaded.
Calculating scaling factors for condition:	experiment	Complete.
Loaded all experiments:
 Condition experiment:	#Replicates:	1
	Replicate:	experiment:rep1
		Signal:	3319682.0	Control:	7703636.0	ScalingFactor:	0.428
	Pooled replicates for condition:	experiment
		Signal:	3319682.0	Control:7703636.0	ScalingFactor:0.428
Finding potential binding regions.
PotentialRegionFilter: condition genomic threshold for experiment with bin width 50.0 = 40.0
PotentialRegionFilter: replicate genomic threshold for experiment:rep1 with bin width 50.0 = 40.0
43 potential regions found. Total length: 5250.0
Initializing mixture model
Alpha experiment	Range=50	42.0
Global noise per base initialization for experiment = 0.2634

============================ Round 0 ============================
experiment:rep1	0	signal-noise ratio:	0.0003
Performing read distribution clustering
Alpha experiment	Range=50	42.0

============================ Round 1 ============================
Number of binding components is too few for distribution update ( 1 < 50 )
Alpha experiment	Range=50	42.0

============================ Round 2 ============================
Number of binding components is too few for distribution update ( 1 < 50 )
Alpha experiment	Range=50	42.0

============================ Round 3 ============================

============================ ML read assignment ============================
experiment:rep1	0	signal-noise ratio:	0.0003
ML read assignment finished.

============================= Post-processing ==============================
Binding event detection finished!
Binding events are printed to files in 14856_filtered_out beginning with: 14856_filtered_out
Events post-analysis
	Peak-peak distance histograms (same condition)
--win 250 --bins 250 --batch --nocolorbar --linemax 3 --linethick 1 --noborder --peaks /home/chase/software/chexmix/14856_filtered_out/intermediate-results/14856_filtered_out_experiment.subtype.aligned.events.txt --strand + --color blue --noborder --out experiment.events
Binding Parameters:	Window size: 250	Bins: 250
Batch running...
Single set mode...
Loading points
3 points loaded
Saving images with root name: /home/chase/software/chexmix/14856_filtered_out/images/14856_filtered_out_experiment.events_experiment_+
Finished
Binding Parameters:	Window size: 250	Bins: 250
Batch running...
Single set mode...
Loading points
3 points loaded
Saving images with root name: /home/chase/software/chexmix/14856_filtered_out/images/14856_filtered_out_experiment.events_experiment_-
Finished
Writing results report to: 14856_filtered_out/ChExMix_14856_filtered_out_results.html

However, this is what is in the all.events file:

### ChExMix output
#Condition	Name	Index	TotalSigCount	SigCtrlScaling	SignalFraction
#Condition	experiment	0	3319682.0	0.428	0.000
#Replicate	ParentCond	Name	Index	SigCount	CtrlCount	SigCtrlScaling	SignalFraction
#Replicate	experiment	experiment:rep1	0	3319682.0	7703636.0	0.4280	0.000
#
#Point	experiment_Sig	experiment_Ctrl	experiment_log2Fold	experiment_log2Q	ActiveConds
chr3:166522	59.3	35.2	1.976	-7.105	1
chr3:152022	99.0	82.1	1.494	-7.328	1
chr3:161215	78.1	61.6	1.568	-7.021	1
chr3:167533	63.1	53.9	1.451	-4.619	1
chr3:164127	73.1	70.1	1.285	-4.796	1
chr3:168004	64.5	56.2	1.422	-4.930	1
chr3:161939	44.3	37.0	1.482	-4.727	1
chr3:166329	42.8	36.7	1.449	-4.276	1
chr3:153384	43.8	38.9	1.395	-4.158	1
chr3:162253	53.4	56.0	1.157	-3.496	1
chr3:157251	52.0	54.1	1.168	-3.157	1
chr3:162405	45.2	48.7	1.118	-3.123	1
chr3:162983	134.8	172.2	0.871	-3.133	1
chr3:164284	42.3	43.1	1.197	-3.238	1
chr3:162623	50.3	64.5	0.865	-1.900	1

This is significantly different than what is available from the ftp site.

If you would be willing to help me reproduce the results from the ftp site -- assuming the filtered bam is the correct input -- I would greatly appreciate it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions