Data Lake for the Platform for Analytical Models in Epidemiology (PAMEpi)

General info

Each folder in this directory contains all the descriptions for building a data lake that can enable studies on a specific infectious disease. Each folder has four subfolders: Data Collection, Data Curation, Data Description, and Data ETL.

Data Collection: contains the scripts for downloading data from open sources and updating when new versions are available in their original system (source).
Data Curation: contains the scripts for data harmonisation and cleansing for each data set. The scripts may change over time due to changes detected after the record update.
Data Description: we provide codes to perform basic data analysis and data validation.
Data ETL: we provide codes to format data for modelling and visualization.

Installation

Currently the library is on production, so the easiest way to use is clone our repository or copy the functions available in this directory.

Dependencies

Models were implemented using Python > 3.5 and depend on libraries such as Pandas, SciPy, Numpy, Matplotlib, etc. For the full list of dependencies as well libraries versions check requirements.txt inside each folder.

License

MIT License

Citing the directory

Platform For Analytical Modelis in Epidemiology. (2022). GitHub directory: https://github.com/PAMepi/PAMepi_scripts_datalake.git. PAMepi/PAMepi_scripts_datalake: v1.0.0 (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.6384641

Note that in each folder you will find a doi linked to the dataset processed by the team, which can be cited in your work.

Support

This study was financed by

* Bill and Melinda Gates Foundation and Minderoo Foundation HDR UK, through the Grand  	Challenges ICODA COVID-19 Data Science, with reference number 2021.0097 

* Fiocruz Innovation Promotion Program - Innovative ideas and products - COVID-19, orders and strategies INOVA-FIOCRUZ, with reference Number VPPIS-005-FIO-20-2-40.

References

[1] Platform for Analytical Models in Epidemiology - PAMEpi (2020).

[2] Platform for Analytical Models in Epidemiology - PAMEpi-Covid-19: Data (2020).

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
Brazilian COVID-19 data streaming		Brazilian COVID-19 data streaming
Images		Images
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
readme_old.rtf		readme_old.rtf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Lake for the Platform for Analytical Models in Epidemiology (PAMEpi)

Table of contents

General info

Installation

Dependencies

License

Citing the directory

Support

References

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

PAMepi/PAMepi_scripts_datalake

Folders and files

Latest commit

History

Repository files navigation

Data Lake for the Platform for Analytical Models in Epidemiology (PAMEpi)

Table of contents

General info

Installation

Dependencies

License

Citing the directory

Support

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages