Skip to content

Segmentation fault during dataset creation step #126

@JackCollora

Description

@JackCollora

Hello,
I'm having an issue running this pipeline with a new genome index. This install works very well when aligning to human, but with a STAR index/genome/GTF file for another species it fails after logging

INFO:STPipeline: Starting creating dataset 2021-12-13 17:18:47.819617

This is what it outputs to standard out

[bam_sort_core] merging from 0 files and 20 in-memory blocks...
/var/spool/slurmd/job21176315/slurm_script: line 57: 71249 Segmentation fault st_pipeline_run.py --output-folder $OUTPUT --ids $ID --ref-map $MAP --ref-annotation $ANN --expName $sample --htseq-no-ambiguous --verbose --log-file $OUTPUT/${sample}_log.txt --demultiplexing-kmer 5 --threads 20 --temp-folder $TMP_ST --no-clean-up --umi-start-position 16 --umi-end-position 26 --demultiplexing-overhang 0 --min-length-qual-trimming 20 $FW $RV

Thus far I've tried mapping/counting running STAR and HTSeq outside of the pipeline, and they do run without error in that context.

Here is the complete log

INFO:STPipeline:ST Pipeline 1.8.1
INFO:STPipeline:Output directory: /gpfs/ysm/project/ya-chi_ho/kma57/sample_dir_000006867/Sample_NTC/output
INFO:STPipeline:Temporary directory: /gpfs/ysm/project/ya-chi_ho/kma57/sample_dir_000006867/Sample_NTC/output/tmp
INFO:STPipeline:Dataset name: NTC
INFO:STPipeline:Forward(R1) input file: /gpfs/ysm/project/ya-chi_ho/kma57/sample_dir_000006867/Sample_NTC/tmp/NTC_R2_processed.fastq
INFO:STPipeline:Reverse(R2) input file: /gpfs/ysm/project/ya-chi_ho/kma57/sample_dir_000006867/Sample_NTC/tmp/NTC_R1_filtered.fastq.gz
INFO:STPipeline:Reference mapping STAR index folder: /gpfs/ysm/home/kma57/genome/RM_SIV/STAR
INFO:STPipeline:Reference annotation file: /gpfs/ysm/home/kma57/genome/RM_SIV/GCF_003339765.1_Mmul_10_genomic.gtf
INFO:STPipeline:CPU Nodes: 20
INFO:STPipeline:Ids(barcodes) file: /gpfs/ysm/home/kma57/genome/spatial_barcodes.txt
INFO:STPipeline:TaggD allowed mismatches: 2
INFO:STPipeline:TaggD kmer size: 5
INFO:STPipeline:TaggD overhang: 0
INFO:STPipeline:TaggD metric: Subglobal
INFO:STPipeline:Mapping reverse trimming: 0
INFO:STPipeline:Mapping inverse reverse trimming: 0
INFO:STPipeline:Mapping tool: STAR
INFO:STPipeline:Mapping minimum intron size allowed (splice alignments) with STAR: 1
INFO:STPipeline:Mapping maximum intron size allowed (splice alignments) with STAR: 1
INFO:STPipeline:STAR genome loading strategy NoSharedMemory
INFO:STPipeline:Annotation tool: HTSeq
INFO:STPipeline:Annotation mode: intersection-nonempty
INFO:STPipeline:Annotation strandness yes
INFO:STPipeline:UMIs start position: 16
INFO:STPipeline:UMIs end position: 26
INFO:STPipeline:UMIs allowed mismatches: 1
INFO:STPipeline:UMIs clustering algorithm: AdjacentBi
INFO:STPipeline:Allowing an offset of 250 when clustering UMIs by strand-start in a gene-spot
INFO:STPipeline:Allowing 6 low quality bases in an UMI
INFO:STPipeline:Discarding reads that after trimming are shorter than 20
INFO:STPipeline:Removing polyA sequences of a length of at least: 10
INFO:STPipeline:Removing polyT sequences of a length of at least: 10
INFO:STPipeline:Removing polyG sequences of a length of at least: 10
INFO:STPipeline:Removing polyC sequences of a length of at least: 10
INFO:STPipeline:Removing polyN sequences of a length of at least: 10
INFO:STPipeline:Allowing 0 mismatches when removing homopolymers
INFO:STPipeline:Remove reads whose AT content is 90%
INFO:STPipeline:Remove reads whose GC content is 90%
INFO:STPipeline:Starting the pipeline: 2021-12-13 16:36:29.608163
INFO:STPipeline:Start filtering raw reads 2021-12-13 16:36:29.627480
INFO:STPipeline:Trimming stats total reads (pair): 81470284
INFO:STPipeline:Trimming stats 4122973 reads have been dropped!
INFO:STPipeline:Trimming stats you just lost about 5.06% of your data
INFO:STPipeline:Trimming stats reads remaining: 77347311
INFO:STPipeline:Trimming stats dropped pairs due to incorrect UMI: 0
INFO:STPipeline:Trimming stats dropped pairs due to low quality UMI: 121432
INFO:STPipeline:Trimming stats dropped pairs due to high AT content: 2105513
INFO:STPipeline:Trimming stats dropped pairs due to high GC content: 39
INFO:STPipeline:Trimming stats dropped pairs due to presence of artifacts: 1778429
INFO:STPipeline:Trimming stats dropped pairs due to being too short: 117560
INFO:STPipeline:Starting genome alignment 2021-12-13 17:01:37.963875
INFO:STPipeline:Mapping stats:
INFO:STPipeline:Mapping stats are computed from all the pair reads present in the raw files
INFO:STPipeline: Uniquely mapped reads number | 663018
INFO:STPipeline: Uniquely mapped reads % | 0.86%
INFO:STPipeline: Number of reads mapped to multiple loci | 139153
INFO:STPipeline: % of reads mapped to multiple loci | 0.18%
INFO:STPipeline: % of reads unmapped: too short | 98.73%
INFO:STPipeline:Total mapped reads: 802171
INFO:STPipeline:Starting barcode demultiplexing 2021-12-13 17:16:42.503838
INFO:STPipeline:Demultiplexing Mapping stats:
INFO:STPipeline:# Total reads: 802171
INFO:STPipeline:# Total reads written: 718743
INFO:STPipeline:# Ambiguous matches: 10508 [1.309945136386132%]
INFO:STPipeline:# - Non-unique ambiguous matches: 23405
INFO:STPipeline:# Unmatched: 12272 [1.529848373975125%]
INFO:STPipeline:Starting annotation 2021-12-13 17:17:03.172980
INFO:STPipeline:Annotated reads: 480326
INFO:STPipeline:Starting creating dataset 2021-12-13 17:18:47.819617

Any suggestions are appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions