Skip to content

gesiscss/kodaqs-toolbox.gesis.org

Repository files navigation

kodaqs-toolbox.gesis.org

This repository contains the code for building the KODAQS Data Quality Toolbox website kodaqs-toolbox.gesis.org.

Based on the R package andrew (Aggregator for Navigatable Discoverable Reproducible and Educational work), reusable tools in the form of literate programming documents such as R Markdown, Quarto Documents, and Jupyter Notebooks are collected from different repositories, reproduced in containers, and compiled into a single static website. In our KODAQS Toolbox, we focus on tools and resources related to data quality. However, the approach is generic and can be applied to other domains as well.

Below is the workflow of the building process.

Workflow

Dependencies

Dependencies installation

For Docker, follow the steps in https://docs.docker.com/engine/install/. Note that Docker must be configured to be able to run without superuser privileges. You can achieve this by either:

For Quarto, download the latest release from https://github.com/quarto-dev/quarto-cli/releases.

Except for Docker and Quarto, all the dependencies can be installed with mamba or conda. Install micromamba following the Mamba Documentation and create the environment specified in env.yaml:

micromamba create -y -n andrew -f env.yaml

How to build the website

Make sure the environment andrew is activated, for example:

micromamba activate andrew

To build the KODAQS Toolbox website as a demo, run the following command in the root directory of the repository:

Rscript start.R

Then, render the website with Quarto:

./render.sh

The static website will be generated in the demo/_site/ folder.

Developer notes

The file main.R is the entrypoint for the pipeline. It consists of the following steps:

  • downloading (cloning the repositories) in download_contributions.R. They will be stored in the repository names without underscore.
  • compiling the contributions to markdown and removing all dynamic elements (should be static md afterwards). This is done in render_contributions.R.
    1. create a docker container depending on the needs (python, R, etc.)
    2. run compilation scripts in the container (inst/docker-scripts) to map the different repository types and entry points
    3. copy/using valumes to move the resulting static markdown to the repositories with underscore.
  • automatically create a quarto structure for composing the different repositories into one website

Using a minimal example for debugging

In the directory minimal_example/ there is a pipeline to build only one tool to test the process. It does not fulfill all the requirements of the main pipeline but it is a faster way of testing new tool integration. The corresponding scripts are start_minimal.R and render_minimal.sh.

Deployment

  • deploy.sh deploys the rendered website to /var/www/html/. (NOTE: all content in /var/www/html/ will be deleted before deployment!)

Customization

Edit the entries in the following files for customized tools:

  • content-contributions.json (with the git tag for fixed version)
  • tags.json (to generated link page)
  • zettelkasten.json (for the hierarchy generation)

Similar projects

Contributing

To contribute to this repository, please fork the repository and create a pull request with your changes. We welcome contributions that improve the code, documentation, or add new features.

About KODAQS

The Competence Center Data Quality in the Social Sciences (KODAQS), a partnership between GESIS, the University of Mannheim, and LMU Munich, offers demand-oriented support for the evaluation and analysis of the quality of social science data. Learn more about the KODAQS project here.

About

Code to build the KODAQS Data Quality Toolbox. This repository is a mirror.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5