The goal of LDTM is to …
You can install the development version of LDTM from GitHub with:
# install.packages("devtools")
devtools::install_github("Goodgolden/LDTM")This is a basic example which shows you how to solve a common problem:
library(LDTM)
#> Loading required package: tidyverse
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
#> âś” ggplot2 3.3.5 âś” purrr 0.3.4
#> âś” tibble 3.1.6 âś” dplyr 1.0.8
#> âś” tidyr 1.2.0 âś” stringr 1.4.0
#> âś” readr 2.1.2 âś” forcats 0.5.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> âś– dplyr::filter() masks stats::filter()
#> âś– dplyr::lag() masks stats::lag()
#> Welcome to my package
## basic example codemicrobial genomic sequencess clustered by sequence similarity
partition sequences into discrete groups instead of traditional taxonomic units.
the most abundant sequence in an OTU is the representative sequence
representative sequences from all the OTUs are used to construct a phylogenetic tree among all the OTUs
Microbial community information == OTUs + counts + phylogenetic relationship + taxonomy
the effect of diet on gut microbiome composition
- gut microbiome data
- nutrient intake data
-
identify a few gut microbiome associated nutrients
-
unable to provide information on how dietary nutrient affect bacterial taxa
identify both the key nutrients as well as the taxa the nutrient affect
Chen and Li (2013) adopted a regression-based approach
OTU abundance data as multivariate count responses, and nutrient as covaraite
The link function is a multinomial-Poisson transformation
might need to use Poissonization to simulate the data
-
the multinomial distribution is not appropriate
-
Dirichlet is a conjugate prior to multinomial distribution
-
posterior is Dirichlet multionmial distribution, aka the Dirichlet compound multinomial distribution.
-
all components must share a common variance parameter
-
components are mutually independent, up to the constraint that must sum up to 1
-
distribution fails to take into account the special and inherent property of microbiome count data (evolutionary relationships in the phylogenetic tree)
-
the relationships among the components of the count vector can be represented as a tree, node-by-node.
-
each component has a independent variance
-
components are correlated at subtree levels
-
-
a regression model with the effects of covariates
-
a regularized methods for selecting covarites (nutrients) that are associated with the count responses (OTUs)
-
Billheimer with Aitchison’s logistic normal distribution instead of Dirichlet
- covaraites to the count vector (Billheimer et al. 2001)
- link dietary nutrients with bacteria counts (Xia et al. 2013)
- cannot exploit the tree structure information, logistic normal multinomial does not have a closed-form expression.
-
total number of counts, determined by the sequence depth, as an ancillary statistics
- analysis conditioning on this number
-
A Tree
representing the hierarchical structure over the count responses
-
that each component in the product
corresponds to a interior node in the tree
-
the Dirichlet multinomial distribution based on the accumulated counts along branches of given node.
-
parametrization for Dirichlet Tree Multinomial Regression Model