Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/Understanding-the-output-of-SQANTI3-QC.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ The output `_classification.txt` has the following fields:
22. `FL` or `FL.<sample>`: FL count associated with this isoform per sample if `--fl_count` is provided, otherwise NA.
23. `n_indels`: total number of indels based on alignment.
24. `n_indels_junc`: number of junctions in this isoform that have alignment indels near the junction site (indicating potentially unreliable junctions).
25. `bite`: TRUE if contains at least one "bite" positive SJ.
25. `bite`: TRUE if any junction in the isoform is "bite" positive (i.e., the novel intron extends past the nearest annotated splice sites on both ends, overlapping adjacent annotated exons). This is calculated from the `bite_junction` field in the junction output file: if any junction has `bite_junction == TRUE`, the isoform `bite` is TRUE. Isoforms with no junctions (mono-exonic) retain the default value of NA. See also the `bite_junction` field in the junction file glossary below.
26. `iso_exp`: short read expression for this isoform if `--expression` is provided, otherwise NA.
27. `gene_exp`: short read expression for the gene associated with this isoform (summing over all isoforms) if `--expression` is provided, otherwise NA.
28. `ratio_exp`: ratio of `iso_exp` to `gene_exp` if `--expression` is provided, otherwise NA.
Expand Down Expand Up @@ -197,7 +197,7 @@ The `_junctions.txt` file contains the following columns:
10. `end_site_category`: `known` if the junction end site is annotated. If on - strand, this is actually the acceptor site.
11. `diff_to_Ref_start_site`: distance to closest annotated junction start site. If on - strand, this is actually the donor site.
12. `diff_to_Ref_end_site`: distance to closest annotated junction end site. If on - strand, this is actually the acceptor site.
13. `bite_junction`: Applies only to novel splice junctions. If the novel intron partially overlaps annotated exons the bite value is TRUE, otherwise it is FALSE.
13. `bite_junction`: TRUE if the novel junction's intron extends past the annotated splice sites on both ends (i.e., the novel donor is at or upstream of the closest reference donor AND the novel acceptor is at or downstream of the closest reference acceptor, with at least one being strictly past the reference position). This indicates that the novel intron "bites into" the adjacent annotated exons. Calculated from `diff_to_Ref_start_site` ≤ 0 and `diff_to_Ref_end_site` ≤ 0 with at least one being strictly negative. Known junctions (where `junction_category` is `known`) always have `bite_junction = FALSE`.
14. `splice_site`: Splice motif.
15. `RTS_junction`: TRUE if junction is predicted to a template switching artifact.
16. `indel_near_junct`: TRUE if there is alignment indel error near the junction site, indicating potential junction incorrectness.
Expand Down
9 changes: 8 additions & 1 deletion src/classification_steps.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,8 +143,10 @@ def write_junction_info(trec, junctions_by_chr, accepted_canonical_sites, indelI
min_diff_s = min_diff_e = 0
else:
# Find the closest junction start site
# min_diff_s = d - closest_donor: negative if query donor is upstream of (inside) the adjacent exon
min_diff_s = -find_closest_in_list(junctions_by_chr[trec.chrom]['donors'], d)
# find the closest junction end site
# Find the closest junction end site
# min_diff_e = closest_acceptor - a: negative if query acceptor is downstream of (inside) the adjacent exon
min_diff_e = find_closest_in_list(junctions_by_chr[trec.chrom]['acceptors'], a)

else:
Expand Down Expand Up @@ -184,6 +186,11 @@ def write_junction_info(trec, junctions_by_chr, accepted_canonical_sites, indelI
"end_site_category": "known" if min_diff_e==0 else "novel",
"diff_to_Ref_start_site": min_diff_s if min_diff_s==min_diff_s else "NA", # check if min_diff is actually nan
"diff_to_Ref_end_site": min_diff_e if min_diff_e==min_diff_e else "NA", # check if min_diff is actually nan
# min_diff_s = d - closest_donor: negative means query donor is upstream of (inside) the reference exon
# min_diff_e = closest_acceptor - a: negative means query acceptor is downstream of (inside) the reference exon
# bite_junction is TRUE when the novel intron extends past the reference junction on both ends
# (i.e., min_diff_s <= 0 AND min_diff_e <= 0, with at least one being strictly negative),
# meaning the novel intron "bites into" the adjacent annotated exons.
"bite_junction": "TRUE" if ((min_diff_s<0 or min_diff_e<0) and not(min_diff_s>0 or min_diff_e>0)) else "FALSE",
"splice_site": splice_site,
"canonical": "canonical" if splice_site in accepted_canonical_sites else "non_canonical",
Expand Down
2 changes: 2 additions & 0 deletions src/qc_computations.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,8 @@ def isoforms_junctions(isoforms_info, reader):
(r['canonical'] == 'non_canonical'):
isoforms_info[r['isoform']].canonical = r['canonical']

# bite: isoform is TRUE if any junction has bite_junction == TRUE
# Once set to TRUE it stays TRUE; if still 'NA' (first junction), set to whatever bite_junction is
if (isoforms_info[r['isoform']].bite == 'NA') or (r['bite_junction'] == 'TRUE'):
isoforms_info[r['isoform']].bite = r['bite_junction']

Expand Down