-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Hello,
I'm having an issue running this pipeline with a new genome index. This install works very well when aligning to human, but with a STAR index/genome/GTF file for another species it fails after logging
INFO:STPipeline: Starting creating dataset 2021-12-13 17:18:47.819617
This is what it outputs to standard out
[bam_sort_core] merging from 0 files and 20 in-memory blocks...
/var/spool/slurmd/job21176315/slurm_script: line 57: 71249 Segmentation fault st_pipeline_run.py --output-folder $OUTPUT --ids $ID --ref-map $MAP --ref-annotation $ANN --expName $sample --htseq-no-ambiguous --verbose --log-file$OUTPUT/$ {sample}_log.txt --demultiplexing-kmer 5 --threads 20 --temp-folder $TMP_ST --no-clean-up --umi-start-position 16 --umi-end-position 26 --demultiplexing-overhang 0 --min-length-qual-trimming 20 $FW $RV
Thus far I've tried mapping/counting running STAR and HTSeq outside of the pipeline, and they do run without error in that context.
Here is the complete log
INFO:STPipeline:ST Pipeline 1.8.1
INFO:STPipeline:Output directory: /gpfs/ysm/project/ya-chi_ho/kma57/sample_dir_000006867/Sample_NTC/output
INFO:STPipeline:Temporary directory: /gpfs/ysm/project/ya-chi_ho/kma57/sample_dir_000006867/Sample_NTC/output/tmp
INFO:STPipeline:Dataset name: NTC
INFO:STPipeline:Forward(R1) input file: /gpfs/ysm/project/ya-chi_ho/kma57/sample_dir_000006867/Sample_NTC/tmp/NTC_R2_processed.fastq
INFO:STPipeline:Reverse(R2) input file: /gpfs/ysm/project/ya-chi_ho/kma57/sample_dir_000006867/Sample_NTC/tmp/NTC_R1_filtered.fastq.gz
INFO:STPipeline:Reference mapping STAR index folder: /gpfs/ysm/home/kma57/genome/RM_SIV/STAR
INFO:STPipeline:Reference annotation file: /gpfs/ysm/home/kma57/genome/RM_SIV/GCF_003339765.1_Mmul_10_genomic.gtf
INFO:STPipeline:CPU Nodes: 20
INFO:STPipeline:Ids(barcodes) file: /gpfs/ysm/home/kma57/genome/spatial_barcodes.txt
INFO:STPipeline:TaggD allowed mismatches: 2
INFO:STPipeline:TaggD kmer size: 5
INFO:STPipeline:TaggD overhang: 0
INFO:STPipeline:TaggD metric: Subglobal
INFO:STPipeline:Mapping reverse trimming: 0
INFO:STPipeline:Mapping inverse reverse trimming: 0
INFO:STPipeline:Mapping tool: STAR
INFO:STPipeline:Mapping minimum intron size allowed (splice alignments) with STAR: 1
INFO:STPipeline:Mapping maximum intron size allowed (splice alignments) with STAR: 1
INFO:STPipeline:STAR genome loading strategy NoSharedMemory
INFO:STPipeline:Annotation tool: HTSeq
INFO:STPipeline:Annotation mode: intersection-nonempty
INFO:STPipeline:Annotation strandness yes
INFO:STPipeline:UMIs start position: 16
INFO:STPipeline:UMIs end position: 26
INFO:STPipeline:UMIs allowed mismatches: 1
INFO:STPipeline:UMIs clustering algorithm: AdjacentBi
INFO:STPipeline:Allowing an offset of 250 when clustering UMIs by strand-start in a gene-spot
INFO:STPipeline:Allowing 6 low quality bases in an UMI
INFO:STPipeline:Discarding reads that after trimming are shorter than 20
INFO:STPipeline:Removing polyA sequences of a length of at least: 10
INFO:STPipeline:Removing polyT sequences of a length of at least: 10
INFO:STPipeline:Removing polyG sequences of a length of at least: 10
INFO:STPipeline:Removing polyC sequences of a length of at least: 10
INFO:STPipeline:Removing polyN sequences of a length of at least: 10
INFO:STPipeline:Allowing 0 mismatches when removing homopolymers
INFO:STPipeline:Remove reads whose AT content is 90%
INFO:STPipeline:Remove reads whose GC content is 90%
INFO:STPipeline:Starting the pipeline: 2021-12-13 16:36:29.608163
INFO:STPipeline:Start filtering raw reads 2021-12-13 16:36:29.627480
INFO:STPipeline:Trimming stats total reads (pair): 81470284
INFO:STPipeline:Trimming stats 4122973 reads have been dropped!
INFO:STPipeline:Trimming stats you just lost about 5.06% of your data
INFO:STPipeline:Trimming stats reads remaining: 77347311
INFO:STPipeline:Trimming stats dropped pairs due to incorrect UMI: 0
INFO:STPipeline:Trimming stats dropped pairs due to low quality UMI: 121432
INFO:STPipeline:Trimming stats dropped pairs due to high AT content: 2105513
INFO:STPipeline:Trimming stats dropped pairs due to high GC content: 39
INFO:STPipeline:Trimming stats dropped pairs due to presence of artifacts: 1778429
INFO:STPipeline:Trimming stats dropped pairs due to being too short: 117560
INFO:STPipeline:Starting genome alignment 2021-12-13 17:01:37.963875
INFO:STPipeline:Mapping stats:
INFO:STPipeline:Mapping stats are computed from all the pair reads present in the raw files
INFO:STPipeline: Uniquely mapped reads number | 663018
INFO:STPipeline: Uniquely mapped reads % | 0.86%
INFO:STPipeline: Number of reads mapped to multiple loci | 139153
INFO:STPipeline: % of reads mapped to multiple loci | 0.18%
INFO:STPipeline: % of reads unmapped: too short | 98.73%
INFO:STPipeline:Total mapped reads: 802171
INFO:STPipeline:Starting barcode demultiplexing 2021-12-13 17:16:42.503838
INFO:STPipeline:Demultiplexing Mapping stats:
INFO:STPipeline:# Total reads: 802171
INFO:STPipeline:# Total reads written: 718743
INFO:STPipeline:# Ambiguous matches: 10508 [1.309945136386132%]
INFO:STPipeline:# - Non-unique ambiguous matches: 23405
INFO:STPipeline:# Unmatched: 12272 [1.529848373975125%]
INFO:STPipeline:Starting annotation 2021-12-13 17:17:03.172980
INFO:STPipeline:Annotated reads: 480326
INFO:STPipeline:Starting creating dataset 2021-12-13 17:18:47.819617
Any suggestions are appreciated.