Skip to content

Last observation carried forward with time constraints #7

@prockenschaub

Description

@prockenschaub

Hi all,

I have created a function that performs a last observation carried forward that takes account of how long ago the last observation was recorded. Code for the function below.

library(tibble)
library(lubridate)
library(zoo)
library(testthat)

locf_window <- function(x, by, date, window, unit = "hours"){
  # Perform last observation carried forward (LOCF) based on the time difference 
  # to the last measured observation. Allow for stratification by identifier 
  # (e.g.patient ID) 
  #
  # Parameters 
  # x : character or numeric
  #    vector of measurements on which to perform LOCF 
  # by : character or numeric 
  #   vector indicating the group to stratify by 
  # date : datetime
  #   vector of times at which the value in 'x' was attempted to be measured. 
  # window : numeric 
  #   length of the time window 
  # units : character 
  #    units of the time window 
  
  if (is.character(x)) { 
    placeholder <- "-Infinity" 
  } else if (is.numeric(x)) { 
    placeholder <- -Inf 
  } else { 
    stop("vector 'x' must either be character or numeric") 
  }
  
  x <- if_else(is.na(x) & by != lag(by), placeholder, x) 
  date_measure <- as_datetime(ifelse(!is.na(x), date, NA)) 
  date_measure <- zoo::na.locf(date_measure) 
  n_measure <- unlist(tapply(!is.na(x), by, cumsum)) 
  date_measure <- as_datetime(ifelse(n_measure != 0, date_measure, NA)) 
  
  x <- if_else(is.na(x) & !is.na(date_measure) & 
                 time_length(lag(date_measure) %--% date, unit = unit) < window, 
               zoo::na.locf(x, na.rm = FALSE), x, x) 
  x[x == placeholder] <- NA 
  x 
}

test_locf <- tribble( 
  ~patid, ~start_date , ~value, 
  1, ymd_hms("2010-01-05 12:00:00"), 5, 
  1, ymd_hms("2010-01-05 13:00:00"), NA, 
  1, ymd_hms("2010-01-05 15:59:59"), NA, 
  1, ymd_hms("2010-01-05 17:00:00"), NA, 
  1, ymd_hms("2010-01-05 18:00:00"), 10, 
  2, ymd_hms("2010-01-05 13:00:00"), NA, 
  2, ymd_hms("2010-01-05 14:00:00"), NA, 
  2, ymd_hms("2010-01-05 15:00:00"), 2, 
  2, ymd_hms("2010-01-05 15:31:01"), NA, 
  2, ymd_hms("2010-01-06 16:00:00"), NA 
) %>% as.data.table()

with(test_locf, {
  expect_identical(locf_window(value, start_date, 4, by = patid), c(5, 5, 5, NA, 10, NA, NA, 2, 2, NA))
  expect_error(locf_window(list(2), ymd_hms("2010-01-06 16:00:00"), 4, by = patid))
})

The function generally seems to work, but it seems overly complicated. Does anyone have an idea of how to simplify or speed up the function?

P

Metadata

Metadata

Assignees

No one assigned

    Labels

    Ranything related to R programming languageoptimizationThis works, but could've been better

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions