Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 17 additions & 12 deletions 020-Data-generating-models.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,7 @@ Here is a plot of 30 observations from the bivariate Poisson distribution with m
#| fig.width: 6
#| fig.height: 4

set.seed( 40440 )
bvp_dat <- r_bivariate_Poisson(
30,
rho = 0.65,
Expand All @@ -252,9 +253,12 @@ ggplot(bvp_dat) +
aes(C1, C2) +
geom_vline(xintercept = 0) +
geom_hline(yintercept = 0) +
scale_x_continuous(expand = expansion(0,c(0,1))) +
scale_y_continuous(expand = expansion(0,c(0,1))) +
geom_point(position = position_jitter(width = 0.2, height = 0.2)) +
scale_x_continuous(expand = expansion(0,c(0,1)),
breaks = c( 2, 4, 6, 8, 10, 12, 14, 16 )) +
scale_y_continuous(expand = expansion(0,c(0,1)),
breaks = c( 2, 4, 6, 8, 10, 12 )) +
coord_fixed() +
geom_point(position = position_jitter(width = 0.075, height = 0.075)) +
theme_minimal()
```
Plots like these are useful for building intuitions about a model. For instance, we can inspect \@ref(fig:bivariate-Poisson-scatter) to get a sense of the order of magnitude and range of the observations, as well as the likelihood of obtaining multiple observations with identical counts.
Expand Down Expand Up @@ -444,9 +448,9 @@ The mathematical model gives us the details we need to execute with each of thes
Here is the skeleton of a DGP function with arguments for each of the parameters we might want to control, including defaults for each (see \@ref(default-arguments) for more on function defaults):

```{r, echo = FALSE}
source( "case_study_code/gen_cluster_RCT.R" )
source( "case_study_code/gen_cluster_RCT_bug.R" )

cluster_RCT_fun <- readLines("case_study_code/gen_cluster_RCT.R")
cluster_RCT_fun <- readLines("case_study_code/gen_cluster_RCT_bug.R")
skeleton <- c(cluster_RCT_fun[1:10], " # Code (see below) goes here", "}")
```

Expand Down Expand Up @@ -477,7 +481,7 @@ Next, we use the site characteristics to generate the individual-level variables

A key piece here is the `rep()` function that takes a list and repeats each element of the list a specified number of times.
In particular, `rep()` repeats each number ($1, 2, \ldots, J$), the corresponding number of times as listed in `nj`.
Putting the code above into the function skeleton will produce a complete DGP function (view the [complete function here](/case_study_code/gen_cluster_RCT.R)).
Putting the code above into the function skeleton will produce a complete DGP function (view the [complete function here](/case_study_code/gen_cluster_RCT_bug.R)).
We can then call the function as so:
```{r}
dat <- gen_cluster_RCT(
Expand All @@ -486,7 +490,7 @@ dat <- gen_cluster_RCT(
sigma2_u = 0.4, sigma2_e = 1
)

dat
head( dat )
```

With this function, we can control the average size of the clusters (`n`), the number of clusters (`J`), the proportion treated (`p`), the average outcome in the control group (`gamma_0`), the average treatment effect (`gamma_1`), the site size by treatment interaction (`gamma_2`), the amount of cross site variation (`sigma2_u`), the residual variation (`sigma2_e`), and the amount of site size variation (`alpha`).
Expand Down Expand Up @@ -534,10 +538,12 @@ A further consequence of setting the overall scale of the outcome to 1 is that t
The standardized mean difference for a treatment impact is defined as the average impact over the standard deviation of the outcome among control observations.^[An alternative definition is based on the pooled standard deviation, but this is usually a bad choice if one suspects treatment variation. More treatment variation should not reduce the effect size for the same absolute average impact.]
Letting $\delta$ denote the standardized mean difference parameter,
$$ \delta = \frac{E(Y | Z_j = 1) - E(Y | Z_j = 0)}{SD( Y | Z_j = 0 )} = \frac{\gamma_1}{\sqrt{ \sigma^2_u + \sigma^2_\epsilon } } $$
Because we have constrained the total variance, $\gamma_1$ is equivalent to $\delta$. This equivalence holds for any value of $\gamma_0$, so we do not have to worry about manipulating $\gamma_0$ in the simulations---we can simply leave it at its default value.

Because we have constrained the total variance to 1, $\gamma_1$ is equivalent to $\delta$.
This equivalence holds for any value of $\gamma_0$, so we do not have to worry about manipulating $\gamma_0$ in the simulations---we can simply leave it at its default value.
The $\gamma_2$ parameter only impacts outcomes on the treatment side, with larger values induces more variation. Standardizing on something stable (here the control side) rather than something that changes in reaction to various parameters (which would happen if we standardized by overall variation or pooled variation) will lead to more interpretable quantities.
<!-- JEP: Do we want to discuss anything about interpretation of gamma_2? Or add an exercise about it? -->
<!-- LWM: I don't think there is a need -->
<!-- LWM: I added the prior sentence. Is this what you mean? Or what kind of exercise? -->


## Sometimes a DGP is all you need {#three-parameter-IRT}

Expand Down Expand Up @@ -719,8 +725,7 @@ The exercises below will give you further practice in developing, testing, and e
Of course, we have not covered every consideration and challenge that arises when writing DGPs for simulations---there is _much_ more to explore!

We will cover some further challenges in subsequent chapters.
<!-- JEP: Links to specific further chapters? -->
<!-- LWM: TODO later when book is settled. -->
See, for example, Chapters \@ref(sec:power) and \@ref(potential-outcomes).
Yet more challenges will surely arise as you apply simulations in your own work.
Even though such challenges might not align with the examples or contexts that we have covered here, the concepts and principles that we have introduced should allow you to reason about potential solutions.
In doing so, we encourage you adopt the attitude of a chef in a test kitchen by trying out your ideas with code.
Expand Down
65 changes: 41 additions & 24 deletions 030-Estimation-procedures.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ set.seed(20250606)
source("case_study_code/generate_ANOVA_data.R")
source("case_study_code/ANOVA_Welch_F.R")
source("case_study_code/r_bivariate_Poisson.R")
source("case_study_code/gen_cluster_RCT_rev.R")
source("case_study_code/gen_cluster_RCT.R")
source("case_study_code/analyze_cluster_RCT.R")
```

Expand All @@ -40,7 +40,7 @@ The easiest approach here is to implement each estimator in turn, and then bundl
We describe how to do this in Section \@ref(multiple-estimation-procedures).

In Section \@ref(validating-estimation-function), we describe strategies for validating the coded-up estimator before running a full simulation.
We offer a few strategies for validating one's code, including checking against existing implementations, checking theoretical properties, and using simulations to check for bugs.
We offer a few strategies for validating ones code, including checking against existing implementations, checking theoretical properties, and using simulations to check for bugs.

For a full simulation, an estimator needs to be reliable, not crashing and giving answers across the wide range of datasets to which it is applied.
An estimator will need to be able to handle odd edge cases or pathological datasets that might be generated as we explore a full range of simulation scenarios.
Expand Down Expand Up @@ -119,11 +119,14 @@ Several different procedures might be used to estimate an overall average effect

All three of these methods are widely used and have some theoretical guarantees supporting their use.
Education researchers tend to be more comfortable using multi-level regression models, whereas economists tend to use OLS with clustered standard errors.
<!-- JEP: Any rationale/precedent for the averaging approach? -->

<!-- JEP: Do we want to add any citations for further reading about these strategies? Any further comments about these methods? -->
<!-- LWM: We can cite my working draft of the cluster rct paper when I post it? -->

We next develop estimation functions for each of these procedures, focusing on a simple model that does not include any covariates besides the treatment indicator.
Each function needs to produce a point estimate, standard error, and $p$-value for the average treatment effect.
To have data to practice on, we generate a sample dataset using [a revised version of `gen_cluster_RCT()`](/case_study_code/gen_cluster_RCT_rev.R), which corrects the bug discussed in Exercise \@ref(cluster-RCT-checks):
To have data to practice on, we generate a sample dataset using [a revised version of `gen_cluster_RCT()`](/case_study_code/gen_cluster_RCT.R), which corrects the bug discussed in Exercise \@ref(cluster-RCT-checks):
<!-- LWM: Do we put the solution to the exercise here? Or leave it out? -->
```{r}
dat <- gen_cluster_RCT(
J=16, n_bar = 30, alpha = 0.8, p = 0.5,
Expand Down Expand Up @@ -396,7 +399,6 @@ Furthermore, warnings can clutter up the console and slow down code execution, s
On a conceptual level, we need to decide how to use the information contained in errors and warnings, whether that be by further elaborating the estimators to address different contingencies or by evaluating the performance of the estimators in a way that appropriately accounts for these events.
We consider both these problems here, and then revisit the conceptual considerations in Chapter \@ref(performance-measures), where we discuss assessing estimator performance.


### Capturing errors and warnings

Some estimation functions will require complicated or stochastic calculations that can sometimes produce errors.
Expand Down Expand Up @@ -458,6 +460,7 @@ However, `quietly()` does not trap errors:
rs <- replicate(20, my_quiet_function( 7 ))
```
To handle both errors and warnings, we double-wrap the function, first with `possibly()` and then with `quietly()`:

```{r}
my_safe_quiet_function <- quietly( possibly( my_complex_function, otherwise = NA ) )
my_safe_quiet_function(7)
Expand All @@ -470,39 +473,53 @@ set.seed(101012) # (hand-picked to show a warning.)
dat <- gen_cluster_RCT( J = 50, n_bar = 100, sigma2_u = 0 )
mod <- analysis_MLM(dat)
```

Wrapping `lmer()` with `quietly()` makes it possible to catch such output and store it along with other results, as in the following:
```{r}
quiet_safe_lmer <- quietly( possibly( lmerTest::lmer, otherwise=NULL ) )
M1 <- quiet_safe_lmer( Yobs ~ 1 + Z + (1|sid), data=dat )
M1
```
We then pick apart the pieces and construct a dataset of results:

In our analysis method, we then pick apart the pieces and make a dataframe out of them (we also add in a generally preferred optimizer, bobyqa, to try and avoid the warnings and errors in the first place):

```{r}
if ( is.null( M1$result ) ) {
# we had an error!
tibble( ATE_hat = NA, SE_hat = NA, p_value = NA,
message = M1$message,
warning = M1$warning,
error = TRUE )
} else {
sum <- summary( M1$result )
tibble(
ATE_hat = sum$coefficients["Z","Estimate"],
SE_hat = sum$coefficients["Z","Std. Error"],
p_value = sum$coefficients["Z", "Pr(>|t|)"],
message = list( M1$message ),
warning = list( M1$warning ),
error = FALSE )
analysis_MLM_safe <- function( dat ) {

M1 <- quiet_safe_lmer( Yobs ~ 1 + Z + (1 | sid),
data=dat,
control = lme4::lmerControl(
optimizer = "bobyqa"
) )

if ( is.null( M1$result ) ) {
# we had an error!
tibble( ATE_hat = NA, SE_hat = NA, p_value = NA,
message = M1$message,
warning = M1$warning,
error = TRUE )
} else {
sum <- summary( M1$result )

tibble(
ATE_hat = sum$coefficients["Z","Estimate"],
SE_hat = sum$coefficients["Z","Std. Error"],
p_value = sum$coefficients["Z", "Pr(>|t|)"],
message = length( M1$messages ) > 0,
warning = length( M1$warning ) > 0,
error = FALSE
)
}
}
```
Now we can plug in the above code to make a nicely quieted and safe function.
This version runs without extraneous messages:

Our improved function runs without extraneous messages and returns the estimation results along with any diagnostic information from messages or warnings:

```{r}
mod <- analysis_MLM_safe(dat)
mod
```

Now we have the estimation results along with any diagnostic information from messages or warnings.
Storing this information will let us evaluate what proportion of the time there was a warning or message, run additional analyses on the subset of replications where there was no such warning, or even modify the estimator to take the diagnostics into account.
We have solved the technical problem---our code will run and give results---but not the conceptual one: what does it mean when our estimator gives an NA or a convergence warning with a nominal answer?
How do we decide how good our estimator is when it does this?
Expand Down
Loading