-
Notifications
You must be signed in to change notification settings - Fork 3
Input Preparation
Before running the Exposome Geocoder, you need to prepare your input data in one of three supported formats.
You need to prepare only ONE of the following data elements per encounter:
- Option 1: Address data (multi-column or single-column format)
- Option 2: Coordinate data (latitude/longitude)
- Option 3: OMOP CDM database tables
Prepare a CSV file with the following columns:
| street | city | state | zip | year | entity_id |
|---|---|---|---|---|---|
| 1250 W 16th St | Jacksonville | FL | 32209 | 2019 | 1 |
| 2001 SW 16th St | Gainesville | FL | 32608 | 2019 | 2 |
Required Columns:
-
street- Street address (required for precise geocoding) -
city- City name -
state- State abbreviation (2 letters) -
zip- 5-digit ZIP code (required for precise geocoding) -
year- Year for the address -
entity_id- Unique identifier for the entity
Important: Both
streetandzipare required. Missing these fields may lead to imprecise geocoding.
Alternatively, combine all address components into a single column:
| address | year | entity_id |
|---|---|---|
| 1250 W 16th St Jacksonville FL 32209 | 2019 | 1 |
| 2001 SW 16th St Gainesville FL 32608 | 2019 | 2 |
Required Columns:
-
address- Full address as a single string -
year- Year for the address -
entity_id- Unique identifier for the entity
If you already have geocoded coordinates, prepare a CSV file with latitude and longitude:
| latitude | longitude | entity_id |
|---|---|---|
| 30.353463 | -81.6749 | 1 |
| 29.634219 | -82.3433 | 2 |
Required Columns:
-
latitude- Latitude in decimal degrees -
longitude- Longitude in decimal degrees -
entity_id- Unique identifier for the entity
If you're working with an OMOP Common Data Model database, the geocoder can extract data directly.
| Table | Required Columns |
|---|---|
person |
person_id |
visit_occurrence |
visit_occurrence_id, visit_start_date, visit_end_date, person_id |
location |
location_id, address_1, address_2, city, state, zip, location_source_value, country_concept_id, country_source_value, latitude, longitude |
location_history |
location_id, relationship_type_concept_id, domain_id, entity_id, start_date, end_date |
Including these optional files helps streamline the end-to-end workflow between geocoding and exposome linkage:
CDM-formatted location table with geocoded information:
| location_id | address_1 | address_2 | city | state | zip | county | location_source_value | country_concept_id | country_source_value | latitude | longitude |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1248 N Blackstone Ave | FRESNO | CA | 93703 | UNITED STATES OF AMERICA | UNITED STATES OF AMERICA | 36.75891146 | -119.7902719 |
CDM-formatted location history table:
| location_id | relationship_type_concept_id | domain_id | entity_id | start_date | end_date |
|---|---|---|---|---|---|
| 1 | 32848 | 1147314 | 3763 | 1998-01-01 | 2020-01-01 |
Important: Do not date-shift your LOCATION/LOCATION_HISTORY files before linkage. Date shifting (if used) should occur post-linkage in the GIS Linkage step.
If LOCATION.csv and LOCATION_HISTORY.csv are provided during geocoding:
- Output automatically includes updated latitude/longitude information
- Ready for immediate use with the postgis linkage container
If not provided:
- You must manually update LOCATION files with geocoded lat/lon before linkage
Organize your input files in a dedicated folder:
your_project/
├── input_address/ # For address-based data
│ ├── patients_address.csv
│ ├── LOCATION.csv # Optional
│ └── LOCATION_HISTORY.csv # Optional
│
├── input_coordinates/ # For coordinate-based data
│ ├── coordinates.csv
│ ├── LOCATION.csv # Optional
│ └── LOCATION_HISTORY.csv # Optional
⚠️ File Format: Only.csvfiles are supported. Convert.xlsxor other formats to CSV before running the tool.
Once your input data is prepared:
- For CSV inputs (Option 1 & 2): Proceed to Geocoding Setup
- For OMOP inputs (Option 3): Review Running the Geocoder