Skip to content

Input Preparation

omerkahveciuf edited this page Dec 4, 2025 · 3 revisions

Input Preparation

Before running the Exposome Geocoder, you need to prepare your input data in one of three supported formats.


Overview

You need to prepare only ONE of the following data elements per encounter:

  • Option 1: Address data (multi-column or single-column format)
  • Option 2: Coordinate data (latitude/longitude)
  • Option 3: OMOP CDM database tables

Option 1: Address Data

Format A: Multi-Column Address

Prepare a CSV file with the following columns:

street city state zip year entity_id
1250 W 16th St Jacksonville FL 32209 2019 1
2001 SW 16th St Gainesville FL 32608 2019 2

Required Columns:

  • street - Street address (required for precise geocoding)
  • city - City name
  • state - State abbreviation (2 letters)
  • zip - 5-digit ZIP code (required for precise geocoding)
  • year - Year for the address
  • entity_id - Unique identifier for the entity

Important: Both street and zip are required. Missing these fields may lead to imprecise geocoding.

Format B: Single Column Address

Alternatively, combine all address components into a single column:

address year entity_id
1250 W 16th St Jacksonville FL 32209 2019 1
2001 SW 16th St Gainesville FL 32608 2019 2

Required Columns:

  • address - Full address as a single string
  • year - Year for the address
  • entity_id - Unique identifier for the entity

Sample Files


Option 2: Coordinate Data

If you already have geocoded coordinates, prepare a CSV file with latitude and longitude:

latitude longitude entity_id
30.353463 -81.6749 1
29.634219 -82.3433 2

Required Columns:

  • latitude - Latitude in decimal degrees
  • longitude - Longitude in decimal degrees
  • entity_id - Unique identifier for the entity

Sample Files


Option 3: OMOP CDM Data

If you're working with an OMOP Common Data Model database, the geocoder can extract data directly.

Required Tables and Columns

Table Required Columns
person person_id
visit_occurrence visit_occurrence_id, visit_start_date, visit_end_date, person_id
location location_id, address_1, address_2, city, state, zip, location_source_value, country_concept_id, country_source_value, latitude, longitude
location_history location_id, relationship_type_concept_id, domain_id, entity_id, start_date, end_date

Sample Files


Optional Supporting Files

Including these optional files helps streamline the end-to-end workflow between geocoding and exposome linkage:

LOCATION.csv

CDM-formatted location table with geocoded information:

location_id address_1 address_2 city state zip county location_source_value country_concept_id country_source_value latitude longitude
1 1248 N Blackstone Ave FRESNO CA 93703 UNITED STATES OF AMERICA UNITED STATES OF AMERICA 36.75891146 -119.7902719

LOCATION_HISTORY.csv

CDM-formatted location history table:

location_id relationship_type_concept_id domain_id entity_id start_date end_date
1 32848 1147314 3763 1998-01-01 2020-01-01

Important: Do not date-shift your LOCATION/LOCATION_HISTORY files before linkage. Date shifting (if used) should occur post-linkage in the GIS Linkage step.

If LOCATION.csv and LOCATION_HISTORY.csv are provided during geocoding:

  • Output automatically includes updated latitude/longitude information
  • Ready for immediate use with the postgis linkage container

If not provided:

  • You must manually update LOCATION files with geocoded lat/lon before linkage

Folder Structure

Organize your input files in a dedicated folder:

your_project/
├── input_address/              # For address-based data
│   ├── patients_address.csv
│   ├── LOCATION.csv           # Optional
│   └── LOCATION_HISTORY.csv   # Optional
│
├── input_coordinates/          # For coordinate-based data
│   ├── coordinates.csv
│   ├── LOCATION.csv           # Optional
│   └── LOCATION_HISTORY.csv   # Optional

⚠️ File Format: Only .csv files are supported. Convert .xlsx or other formats to CSV before running the tool.


Next Steps

Once your input data is prepared:

  1. For CSV inputs (Option 1 & 2): Proceed to Geocoding Setup
  2. For OMOP inputs (Option 3): Review Running the Geocoder

Return to Home