Identify pregnancy episodes in OMOP CDM data using the HIPPS algorithm (Smith et al. 2024, doi:10.1093/jamia/ocae195).
Observational health data rarely has pregnancy_start or
pregnancy_end variables. More often we get scattered pregnancy-related
events such as live birth, gestational week 12, delivery procedure,
miscarriage, etc. PregnancyIdentifier turns pregnancy-related codes
into:
- One row per pregnancy episode
- Inferred start and end dates (and precision) from gestational timing evidence.
- Standard outcome categories (LB, SB, AB, SA, ECT, DELIV, PREG) you can use in analyses or exports.
The pipeline combines outcome-anchored episodes (HIP), timing-anchored episodes (PPS), merges them (HIPPS), then refines start dates (ESD)—so you get a consistent definition of a pregnancy across sites and data sources.
Install (requires R ≥ 4.1 and CDMConnector):
# From GitHub (DARWIN EU)
remotes::install_github("darwin-eu/PregnancyIdentifier")Run the full pipeline (initializes concepts, runs HIP → PPS → merge → ESD, writes outputs):
library(PregnancyIdentifier)
library(CDMConnector)
cdm <- mockPregnancyCdm() # or your real cdm_reference
runPregnancyIdentifier(
cdm = cdm,
outputDir = "pregnancy_output",
startDate = as.Date("2000-01-01"),
endDate = Sys.Date(),
runExport = FALSE
)Use the result:
pregnancy_output/final_pregnancy_episodes.rds is a data frame with one
row per pregnancy episode: person_id, final_episode_start_date,
final_episode_end_date, final_outcome_category,
esd_precision_days, and other esd_* QA/concordance columns. Load it
for cohort definition, export, or further analysis.
Optional: set runExport = TRUE to run export automatically after
ESD, or run export yourself for de-identified summary CSVs and a
ZIP:
runPregnancyIdentifier(cdm, outputDir = "pregnancy_output", runExport = TRUE)
# or:
exportPregnancies(cdm, outputDir = "pregnancy_output", exportDir = "pregnancy_output/export")- Vignettes: Pipeline overview, HIP, PPS, Merge, ESD, Export.
- Reference: pkgdown site.
- Issues: GitHub issues.
Apache 2.0.