Sample complete groups (“blocks”) from a grouped data frame. This package implements a simple block bootstrap style sampler: instead of sampling individual rows, you sample entire groups preserving the intra-group structure.
# install.packages("devtools")
remotes::install_github("numbats/blockstrap")When observations belonging to the same experimental unit are spread across multiple rows (e.g. multiple measurements per subject, dose combinations, time series segments), ordinary row-wise sampling breaks these units apart. A block sampler keeps units intact by sampling at the group level.
slice_block() works on a grouped data frame. If you call it on an
ungrouped data frame, it throws a helpful error.
Key arguments:
n: number of groups (blocks) to sample.replace: sample with replacement? Needed whennexceeds number of groups.weight_by: optional expression (unquoted) evaluated per-group to weight sampling probabilities....: passed to basesample()
We use the built-in ToothGrowth dataset and treat each supplement-dose
combination as a block.
library(dplyr)
library(blockstrap)
set.seed(1)
ToothGrowth |>
group_by(supp, dose) |>
slice_block(n = 2)## # A tibble: 20 × 3
## # Groups: supp, dose [2]
## len supp dose
## <dbl> <fct> <dbl>
## 1 15.2 OJ 0.5
## 2 21.5 OJ 0.5
## 3 17.6 OJ 0.5
## 4 9.7 OJ 0.5
## 5 14.5 OJ 0.5
## 6 10 OJ 0.5
## 7 8.2 OJ 0.5
## 8 9.4 OJ 0.5
## 9 16.5 OJ 0.5
## 10 9.7 OJ 0.5
## 11 4.2 VC 0.5
## 12 11.5 VC 0.5
## 13 7.3 VC 0.5
## 14 5.8 VC 0.5
## 15 6.4 VC 0.5
## 16 10 VC 0.5
## 17 11.2 VC 0.5
## 18 11.2 VC 0.5
## 19 5.2 VC 0.5
## 20 7 VC 0.5
If you want to sample more groups than exist, or allow repeats:
ToothGrowth |>
group_by(supp, dose) |>
slice_block(n = 10, replace = TRUE) |>
count(supp, dose)## # A tibble: 5 × 3
## # Groups: supp, dose [5]
## supp dose n
## <fct> <dbl> <int>
## 1 OJ 0.5 20
## 2 OJ 1 20
## 3 OJ 2 30
## 4 VC 1 20
## 5 VC 2 10
Repeated blocks will appear multiple times (row counts summed accordingly).
Weight blocks by a statistic, e.g. mean tooth length, to favor larger mean response groups:
set.seed(42)
weighted <- ToothGrowth |>
group_by(supp, dose) |>
slice_block(n = 3, weight_by = mean(len))You can verify weighting bias by repeating and tallying frequencies:
set.seed(99)
rep_draws <- replicate(500, {
ToothGrowth |> group_by(supp, dose) |> slice_block(n = 3, weight_by = mean(len)) |> distinct(supp, dose)
}, simplify = FALSE)
freqs <- bind_rows(rep_draws) |> count(supp, dose, name = "times") |> arrange(desc(times))
freqs## # A tibble: 6 × 3
## # Groups: supp, dose [6]
## supp dose times
## <fct> <dbl> <int>
## 1 OJ 2 334
## 2 VC 2 326
## 3 OJ 1 292
## 4 VC 1 224
## 5 OJ 0.5 205
## 6 VC 0.5 119