Clarification on Total Sum Scaling (TSS)

Hello, Esteban, Xi, Suhana, Dr. Li, and boost-r users,

I am writing to inquire about the implementation of the Total Sum Scaling (TSS) transformation method in `boost::normalize.st`.

Upon reviewing the [source code](https://github.com/estfernan/boost/blob/main/R/normalize.R), it appears that the scale factor computations (Lines 83–86) are not invoked in the default TSS procedure. Instead, the raw data matrix is normalized solely by the library sizes (i.e., row sums), which aligns with the approach described by Sun et al. ([2020](https://doi.org/10.1038/s41592-019-0701-7); SPARK).

```
  gene_num   <- ncol(count)
  sample_num <- nrow(count)

  N <- rowSums(count)

  if (scaling.method == "TSS")
  {
    ##
    ## TSS(Total Sum Scaling)
    ##

    ### scale-factors
    raw_s_factors <- N
    scale_coeff <- exp((-1/nrow(count)) * sum(log(raw_s_factors)))
    scaled_s_factors <- scale_coeff * raw_s_factors

    ### normalized count matrix
    db.norm <- sweep(count, 1, N, FUN = "/")
    count_nor <- db.norm
  }
```

I also came across what seems to be a mistake in the TSS method presented in, for example, Jiang et al. ([2022](https://doi.org/10.1002/sim.9530), p. 4649; BOOST-MI) and Li et al. ([2021](https://doi.org/10.1093/bioinformatics/btab455), p. 4131; BOOST-GP).

In those works, the size factor is defined as $s_i = \sum_{j=1}^p y_{ij} / \prod_{i=1}^n \sum_{j=1}^p y_{ij}$, where $i$ indexes the $n$ spots and $j$ indexes the $p$ genes. The numerator corresponds to the library size.

It might be tempting to assume that $\prod_{i=1}^n s_i = 1$, but this is not the case. The denominator lacks the $n$-th root and should be the geometric mean of the library sizes, as specified in the source code.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on Total Sum Scaling (TSS) #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Clarification on Total Sum Scaling (TSS) #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions