Skip to content

Clarification on Total Sum Scaling (TSS) #7

@zhengxiaoUVic

Description

@zhengxiaoUVic

Hello, Esteban, Xi, Suhana, Dr. Li, and boost-r users,

I am writing to inquire about the implementation of the Total Sum Scaling (TSS) transformation method in boost::normalize.st.

Upon reviewing the source code, it appears that the scale factor computations (Lines 83–86) are not invoked in the default TSS procedure. Instead, the raw data matrix is normalized solely by the library sizes (i.e., row sums), which aligns with the approach described by Sun et al. (2020; SPARK).

  gene_num   <- ncol(count)
  sample_num <- nrow(count)

  N <- rowSums(count)

  if (scaling.method == "TSS")
  {
    ##
    ## TSS(Total Sum Scaling)
    ##

    ### scale-factors
    raw_s_factors <- N
    scale_coeff <- exp((-1/nrow(count)) * sum(log(raw_s_factors)))
    scaled_s_factors <- scale_coeff * raw_s_factors

    ### normalized count matrix
    db.norm <- sweep(count, 1, N, FUN = "/")
    count_nor <- db.norm
  }

I also came across what seems to be a mistake in the TSS method presented in, for example, Jiang et al. (2022, p. 4649; BOOST-MI) and Li et al. (2021, p. 4131; BOOST-GP).

In those works, the size factor is defined as $s_i = \sum_{j=1}^p y_{ij} / \prod_{i=1}^n \sum_{j=1}^p y_{ij}$, where $i$ indexes the $n$ spots and $j$ indexes the $p$ genes. The numerator corresponds to the library size.

It might be tempting to assume that $\prod_{i=1}^n s_i = 1$, but this is not the case. The denominator lacks the $n$-th root and should be the geometric mean of the library sizes, as specified in the source code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions