Skip to content

numbats/blockstrap

Repository files navigation

blockstrap

Sample complete groups (“blocks”) from a grouped data frame. This package implements a simple block bootstrap style sampler: instead of sampling individual rows, you sample entire groups preserving the intra-group structure.

Installation

# install.packages("devtools")
remotes::install_github("numbats/blockstrap")

Motivation

When observations belonging to the same experimental unit are spread across multiple rows (e.g. multiple measurements per subject, dose combinations, time series segments), ordinary row-wise sampling breaks these units apart. A block sampler keeps units intact by sampling at the group level.

Core function

slice_block() works on a grouped data frame. If you call it on an ungrouped data frame, it throws a helpful error.

Key arguments:

  • n: number of groups (blocks) to sample.
  • replace: sample with replacement? Needed when n exceeds number of groups.
  • weight_by: optional expression (unquoted) evaluated per-group to weight sampling probabilities.
  • ...: passed to base sample()

Basic example

We use the built-in ToothGrowth dataset and treat each supplement-dose combination as a block.

library(dplyr)
library(blockstrap)

set.seed(1)
ToothGrowth |>
  group_by(supp, dose) |>
  slice_block(n = 2)
## # A tibble: 20 × 3
## # Groups:   supp, dose [2]
##      len supp   dose
##    <dbl> <fct> <dbl>
##  1  15.2 OJ      0.5
##  2  21.5 OJ      0.5
##  3  17.6 OJ      0.5
##  4   9.7 OJ      0.5
##  5  14.5 OJ      0.5
##  6  10   OJ      0.5
##  7   8.2 OJ      0.5
##  8   9.4 OJ      0.5
##  9  16.5 OJ      0.5
## 10   9.7 OJ      0.5
## 11   4.2 VC      0.5
## 12  11.5 VC      0.5
## 13   7.3 VC      0.5
## 14   5.8 VC      0.5
## 15   6.4 VC      0.5
## 16  10   VC      0.5
## 17  11.2 VC      0.5
## 18  11.2 VC      0.5
## 19   5.2 VC      0.5
## 20   7   VC      0.5

Sampling with replacement

If you want to sample more groups than exist, or allow repeats:

ToothGrowth |>
  group_by(supp, dose) |>
  slice_block(n = 10, replace = TRUE) |>
  count(supp, dose)
## # A tibble: 5 × 3
## # Groups:   supp, dose [5]
##   supp   dose     n
##   <fct> <dbl> <int>
## 1 OJ      0.5    20
## 2 OJ      1      20
## 3 OJ      2      30
## 4 VC      1      20
## 5 VC      2      10

Repeated blocks will appear multiple times (row counts summed accordingly).

Weighted sampling

Weight blocks by a statistic, e.g. mean tooth length, to favor larger mean response groups:

set.seed(42)
weighted <- ToothGrowth |>
  group_by(supp, dose) |>
  slice_block(n = 3, weight_by = mean(len))

You can verify weighting bias by repeating and tallying frequencies:

set.seed(99)
rep_draws <- replicate(500, {
  ToothGrowth |> group_by(supp, dose) |> slice_block(n = 3, weight_by = mean(len)) |> distinct(supp, dose)
}, simplify = FALSE)

freqs <- bind_rows(rep_draws) |> count(supp, dose, name = "times") |> arrange(desc(times))
freqs
## # A tibble: 6 × 3
## # Groups:   supp, dose [6]
##   supp   dose times
##   <fct> <dbl> <int>
## 1 OJ      2     334
## 2 VC      2     326
## 3 OJ      1     292
## 4 VC      1     224
## 5 OJ      0.5   205
## 6 VC      0.5   119

About

Block-based bootstrapping, like sampling a flock of sheep all at once

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages