Skip to content

This project contains how to gather German firm / company data that are freely available on datarequests.org

License

Notifications You must be signed in to change notification settings

akirawisnu/GermanFirmRepository

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains open firm data and registration from datarequest.org/company

GDPR Leads: Transparent German Company Privacy Database

Inspired by the principles of OffeneRegister.de, this project aims to liberate public data from datarequest.org web pages and transform it into a machine-readable format. We believe that access to company information, especially how they handle our privacy. They should be open, free, and easy to analyze for everyone.


The Motivation

Why scrape a database that is already public? Because public access is not the same as open data.

In Germany, the Handelsregister (Commercial Register) has long been hidden behind paywalls and restrictive interfaces. Projects like OffeneRegister.de fought to change that by providing bulk downloads for the benefit of investigative journalism and civil society. However, this project contains only company registry between 2017 - 2019, thefore, a minor update will be needed for 2025/2026 data.

This project carries that torch into the realm of Privacy Rights. By scraping DataRequests.org, we are creating a structured "Privacy Map" of German firms. Whether you are a researcher studying GDPR compliance, a developer building privacy tools, or a citizen advocate, this tool gives you the power to see who is responsible for data at thousands of companies all at once.


About the Source: DataRequests.org

DataRequests.org is a non-profit project run by the Zweigstelle e.V. Their mission is to help citizens exercise their GDPR rights such as access, deletion, and rectification.

They maintain a massive and high quality database of company contact details. However, while their website is great for individual requests, it does not allow for bulk analysis of corporate privacy trends. This scraper fills that gap.

Ethics and Compliance

  • No robots.txt Violation: This scraper respects the rules set by the site. As of Jan 2026, the scraping paths used do not violate robots.txt instructions.
  • Polite Scraping: The script includes a built-in delay to ensure we do not overwhelm the non-profit's servers.
  • Purpose: This tool is for transparency and research. Please use the data responsibly.

Features

  • Dual-Level Extraction: First gathers the list of all companies, then visits each individual profile for deep data.
  • Comprehensive Data Points:
    • Company Name and Category
    • Full Postal Address formatted for mail merge
    • Phone and Fax
    • Direct Privacy Email
    • Official Website
    • Data Sources separated with pipe "|" for easy auditing
  • Excel Ready: Exports to CSV with utf-8-sig encoding to ensure German characters like ä, ö, ü, and ß display perfectly.

Installation and Usage

  1. Clone the repo: git clone https://github.com/yourusername/gdpr-german-company-scraper.git then cd gdpr-german-company-scraper
  2. Install dependencies: pip install requests beautifulsoup4 pandas
  3. Run the scraper: python scraper.py

Data Preview

company_name category address email website
:data factory GmbH Marketing Dr.-Karl-Lexer-Weg B300, 86633 Neuburg info@data-factory.net https://data-factory.net/

🤝 Contributing

In the spirit of Open Knowledge, contributions are welcome! If the website layout changes and the selectors break, please open an Issue or a Pull Request.


⚖️ License

Distributed under the GNU General Public License v3.0. See the LICENSE file for more information.

About

This project contains how to gather German firm / company data that are freely available on datarequests.org

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages