This repository contains the code for building the KODAQS Data Quality Toolbox website kodaqs-toolbox.gesis.org.
Based on the R package andrew (Aggregator for Navigatable Discoverable Reproducible and Educational work), reusable tools in the form of literate programming documents such as R Markdown, Quarto Documents, and Jupyter Notebooks are collected from different repositories, reproduced in containers, and compiled into a single static website.
In our KODAQS Toolbox, we focus on tools and resources related to data quality. However, the approach is generic and can be applied to other domains as well.
Below is the workflow of the building process.
For Docker, follow the steps in https://docs.docker.com/engine/install/. Note that Docker must be configured to be able to run without superuser privileges. You can achieve this by either:
- rootless Docker installation (https://docs.docker.com/engine/security/rootless/)
- or: add your user to the
dockergroup (https://docs.docker.com/engine/install/linux-postinstall/)
For Quarto, download the latest release from https://github.com/quarto-dev/quarto-cli/releases.
Except for Docker and Quarto, all the dependencies can be installed with mamba or conda.
Install micromamba following the Mamba Documentation and create the environment specified in env.yaml:
micromamba create -y -n andrew -f env.yamlMake sure the environment andrew is activated, for example:
micromamba activate andrewTo build the KODAQS Toolbox website as a demo, run the following command in the root directory of the repository:
Rscript start.RThen, render the website with Quarto:
./render.shThe static website will be generated in the demo/_site/ folder.
The file main.R is the entrypoint for the pipeline. It consists of the following steps:
- downloading (cloning the repositories) in
download_contributions.R. They will be stored in the repository names without underscore. - compiling the contributions to markdown and removing all dynamic elements (should be static md afterwards). This is done in
render_contributions.R.- create a docker container depending on the needs (python, R, etc.)
- run compilation scripts in the container (
inst/docker-scripts) to map the different repository types and entry points - copy/using valumes to move the resulting static markdown to the repositories with underscore.
- automatically create a quarto structure for composing the different repositories into one website
In the directory minimal_example/ there is a pipeline to build only one tool to test the process. It does not fulfill all the requirements of the main pipeline but it is a faster way of testing new tool integration.
The corresponding scripts are start_minimal.R and render_minimal.sh.
deploy.shdeploys the rendered website to/var/www/html/. (NOTE: all content in/var/www/html/will be deleted before deployment!)
Edit the entries in the following files for customized tools:
content-contributions.json(with the git tag for fixed version)tags.json(to generated link page)zettelkasten.json(for the hierarchy generation)
To contribute to this repository, please fork the repository and create a pull request with your changes. We welcome contributions that improve the code, documentation, or add new features.
The Competence Center Data Quality in the Social Sciences (KODAQS), a partnership between GESIS, the University of Mannheim, and LMU Munich, offers demand-oriented support for the evaluation and analysis of the quality of social science data. Learn more about the KODAQS project here.

