Federated Principal Component Analysis

Principal component analysis is conceptually the eigendecomposition of the covariance matrix. While there are efficient covariance free algorithms for the compuation of the PCA, for instance singular value decomposition of the data matrix, it is useful to use this traditional representation.

Data partionining

In federated learning the data is distribtued over several data holders ('sites'). The data can be distributed in different ways. Either the samples are distributed, but every site has all variable for all samples ('horizontal data partitioning'), or the variables are distributed so that every site has a different subset of measurements for all samples ('vertical data partitioning'). Depending on the data distribution, different types of principal component analysis are required, due to the way the covariance matrix is formed. For sample stratification this situation is even more complicated, because the PCA is done on the sample by sample covariance matrix, but the samples are distributed which means the PCA is conceptually done on vertically partionned data, although it looks like the data is horizontally partionned. See the figure for illustration.

Vertical data partititioning

The specific use case for principal component analysis handeled by this version of principal component analysis is patient stratification for medical data. This is a very special use case, where the patients are distributed over serveral sites, which cannot share their data. Unlike 'feature' reduction PCA, where one wants to compute a lower dimensional approximation of the feature-by-feature covariance matrix, the PCA in GWAS is computed on the sample-by-sample covariance matrix. The samples are distributed over several sites, therefore the covariance matrix cannot be computed directly. It is not desirable to compute this covariance matrix, because the number of samples in a GWAS cohort can become quite large. Therefore here, we describe a covariance free alternative.

Library code for the simulation of vertical federated PCA (for sample stratification) is available in vertical. The folder misc_scripts contains examples on how to run the code and a basic workflow. Dependencies are managed using a conda environment and can be installed using environment.yaml.

Horizontal data partitioning

PCA that deals with classicial feature reduction PCA has been evaluated. There are several algorithms available with different aims [2]. there are approximate algorithms and exact algorithms. Approximate algorithms have little communication overhead, exact algorithms have a higher number of required communication rounds, with the obvious advantage that they converge to the centralised solution. Simulation code for horizontal PCA is available in horizontal. Notebooks and figures for an illustration of federated horizontal using single-cell-transcriptome data is available in notebooks.

Efficient federated SVD

PCA can be applied to high dimensional data in a low number of communication rounds (depending only on the number of singular vectors, and in practice these usually are chosen low), and without computing the covariance matrix. For more information please refer to [3].

Citation

If you use this repository in your work please use one of the following references:

[1] Hartebrodt, A., Nasirigerdeh, R., Blumenthal, D. B., & Richard, R. (2021). Federated Principal Component Analysis for Genome-Wide Association Studies. Icdm, 1090–1095. https://doi.org/10.1109/ICDM51629.2021.00127

[2] Anne Hartebrodt, Richard Röttger, Federated horizontally partitioned principal component analysis for biomedical applications, Bioinformatics Advances, Volume 2, Issue 1, 2022, vbac026, https://doi.org/10.1093/bioadv/vbac026

[3] Hartebrodt, A., Röttger,R. and Blumenthal, D., 2022. Federated singular value decomposition for high dimensional data. arXiv preprint arXiv:2205.12109

Name		Name	Last commit message	Last commit date
Latest commit History 340 Commits
R		R
misc_scipts		misc_scipts
notebooks		notebooks
python		python
setup		setup
workflows		workflows
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
environment.yaml		environment.yaml
federated_pca.png		federated_pca.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Federated Principal Component Analysis

Data partionining

Vertical data partititioning

Horizontal data partitioning

Efficient federated SVD

Citation

About

Uh oh!

Releases

Packages

Languages

License

AnneHartebrodt/federated-pca-simulation

Folders and files

Latest commit

History

Repository files navigation

Federated Principal Component Analysis

Data partionining

Vertical data partititioning

Horizontal data partitioning

Efficient federated SVD

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages