jepusto · lmiratrix · Jan 10, 2026 · Feb 1, 2026 · Feb 1, 2026 · Feb 19, 2026
diff --git a/020-Data-generating-models.Rmd b/020-Data-generating-models.Rmd
@@ -242,6 +242,7 @@ Here is a plot of 30 observations from the bivariate Poisson distribution with m
 #| fig.width: 6
 #| fig.height: 4
 
+set.seed( 40440 )
 bvp_dat <- r_bivariate_Poisson(
   30, 
   rho = 0.65, 
@@ -252,9 +253,12 @@ ggplot(bvp_dat) +
   aes(C1, C2) + 
   geom_vline(xintercept = 0) + 
   geom_hline(yintercept = 0) + 
-  scale_x_continuous(expand = expansion(0,c(0,1))) + 
-  scale_y_continuous(expand = expansion(0,c(0,1))) + 
-  geom_point(position = position_jitter(width = 0.2, height = 0.2)) + 
+  scale_x_continuous(expand = expansion(0,c(0,1)),
+                      breaks = seq(0,16,2)) + 
+  scale_y_continuous(expand = expansion(0,c(0,1)),
+                     breaks = seq(2,12,2)) + 
+  coord_fixed() +
+  geom_point(position = position_jitter(width = 0.075, height = 0.075)) + 
   theme_minimal()
 ```
 Plots like these are useful for building intuitions about a model. For instance, we can inspect \@ref(fig:bivariate-Poisson-scatter) to get a sense of the order of magnitude and range of the observations, as well as the likelihood of obtaining multiple observations with identical counts.
@@ -444,9 +448,9 @@ The mathematical model gives us the details we need to execute with each of thes
 Here is the skeleton of a DGP function with arguments for each of the parameters we might want to control, including defaults for each (see \@ref(default-arguments) for more on function defaults):
 
 ```{r, echo = FALSE}
-source( "case_study_code/gen_cluster_RCT.R" )
+source( "case_study_code/gen_cluster_RCT_bug.R" )
 
-cluster_RCT_fun <- readLines("case_study_code/gen_cluster_RCT.R")
+cluster_RCT_fun <- readLines("case_study_code/gen_cluster_RCT_bug.R")
 skeleton <- c(cluster_RCT_fun[1:10], "  # Code (see below) goes here", "}")
 ```
 
@@ -477,7 +481,7 @@ Next, we use the site characteristics to generate the individual-level variables
 
 A key piece here is the `rep()` function that takes a list and repeats each element of the list a specified number of times.
 In particular, `rep()` repeats each number ($1, 2, \ldots, J$), the corresponding number of times as listed in `nj`.
-Putting the code above into the function skeleton will produce a complete DGP function (view the [complete function here](/case_study_code/gen_cluster_RCT.R)).
+Putting the code above into the function skeleton will produce a complete DGP function (view the [complete function here](/case_study_code/gen_cluster_RCT_bug.R)).
 We can then call the function as so:
 ```{r}
 dat <- gen_cluster_RCT( 
@@ -486,7 +490,7 @@ dat <- gen_cluster_RCT(
   sigma2_u = 0.4, sigma2_e = 1
 )
 
-dat
+head( dat )
 ```
 
 With this function, we can control the average size of the clusters (`n`), the number of clusters (`J`), the proportion treated (`p`), the average outcome in the control group (`gamma_0`), the average treatment effect (`gamma_1`), the site size by treatment interaction (`gamma_2`), the amount of cross site variation (`sigma2_u`), the residual variation (`sigma2_e`), and the amount of site size variation (`alpha`).
@@ -534,10 +538,13 @@ A further consequence of setting the overall scale of the outcome to 1 is that t
 The standardized mean difference for a treatment impact is defined as the average impact over the standard deviation of the outcome among control observations.^[An alternative definition is based on the pooled standard deviation, but this is usually a bad choice if one suspects treatment variation. More treatment variation should not reduce the effect size for the same absolute average impact.]
 Letting $\delta$ denote the standardized mean difference parameter, 
 $$ \delta = \frac{E(Y | Z_j = 1) - E(Y | Z_j = 0)}{SD( Y | Z_j = 0 )} = \frac{\gamma_1}{\sqrt{ \sigma^2_u + \sigma^2_\epsilon } } $$
-Because we have constrained the total variance, $\gamma_1$ is equivalent to $\delta$. This equivalence holds for any value of $\gamma_0$, so we do not have to worry about manipulating $\gamma_0$ in the simulations---we can simply leave it at its default value.
+Because we have constrained the total variance to 1, $\gamma_1$ is equivalent to $\delta$.
+This equivalence holds for any value of $\gamma_0$, so we do not have to worry about manipulating $\gamma_0$ in the simulations---we can simply leave it at its default value.
+The $\gamma_2$ parameter only influences outcomes for those in the treatment condition, with larger values inducing more variation.
+Standardizing based on the total variance also makes $\gamma_2$ somewhat easier to interpret because this parameter the same units as $\gamma_1$.
+Specifically, $\gamma_2$ is the expected difference in the standardized treatment impact for schools that differ in size by $\bar{n}$ (the overall average school size).
+More broadly, standardizing by a stable parameter (the variation among those in the control condition) rather than something that changes in reaction to other parameters (which would be the case if we standardized by overall variation or pooled variation) will lead to more interpretable quantities.
 
-<!-- JEP: Do we want to discuss anything about interpretation of gamma_2? Or add an exercise about it? -->
-<!-- LWM: I don't think there is a need -->
 
 ## Sometimes a DGP is all you need {#three-parameter-IRT}
 
@@ -719,8 +726,7 @@ The exercises below will give you further practice in developing, testing, and e
 Of course, we have not covered every consideration and challenge that arises when writing DGPs for simulations---there is _much_ more to explore!
 
 We will cover some further challenges in subsequent chapters.
-<!-- JEP: Links to specific further chapters? -->
-<!-- LWM: TODO later when book is settled. -->
+See, for example, Chapters \@ref(sec:power) and \@ref(potential-outcomes).
 Yet more challenges will surely arise as you apply simulations in your own work.
 Even though such challenges might not align with the examples or contexts that we have covered here, the concepts and principles that we have introduced should allow you to reason about potential solutions.
 In doing so, we encourage you adopt the attitude of a chef in a test kitchen by trying out your ideas with code.

diff --git a/030-Estimation-procedures.Rmd b/030-Estimation-procedures.Rmd
@@ -11,7 +11,7 @@ set.seed(20250606)
 source("case_study_code/generate_ANOVA_data.R")
 source("case_study_code/ANOVA_Welch_F.R")
 source("case_study_code/r_bivariate_Poisson.R")
-source("case_study_code/gen_cluster_RCT_rev.R")
+source("case_study_code/gen_cluster_RCT.R")
 source("case_study_code/analyze_cluster_RCT.R")
 ```
 
@@ -114,11 +114,15 @@ Several different procedures might be used to estimate an overall average effect
 
 All three of these methods are widely used and have some theoretical guarantees supporting their use. 
 Education researchers tend to be more comfortable using multi-level regression models, whereas economists tend to use OLS with clustered standard errors.
-<!-- JEP: Any rationale/precedent for the averaging approach? -->
+
+<!-- JEP: Do we want to add any citations for further reading about these strategies? Any further comments about these methods? -->
+<!-- LWM: We can cite my working draft of the cluster rct paper when I post it? -->
 
 We next develop estimation functions for each of these procedures, focusing on a simple model that does not include any covariates besides the treatment indicator.
 Each function needs to produce a point estimate, standard error, and $p$-value for the average treatment effect.
-To have data to practice on, we generate a sample dataset using [a revised version of `gen_cluster_RCT()`](/case_study_code/gen_cluster_RCT_rev.R), which corrects the bug discussed in Exercise \@ref(cluster-RCT-checks):
+To write estimation functions, it is useful to work with an example dataset that has the same structure as the data that will be simulated. To that end, we generate a sample dataset using [a revised version of `gen_cluster_RCT()`](/case_study_code/gen_cluster_RCT.R), which corrects the bug discussed in Exercise \@ref(cluster-RCT-checks):
+<!-- LWM: Do we put the solution to the exercise here?  Or leave it out? -->
+<!-- JEP: I guess leave it out? -->
 ```{r}
 dat <- gen_cluster_RCT( 
   J=16, n_bar = 30, alpha = 0.8, p = 0.5, 
@@ -391,12 +395,11 @@ Furthermore, warnings can clutter up the console and slow down code execution, s
 On a conceptual level, we need to decide how to use the information contained in errors and warnings, whether that be by further elaborating the estimators to address different contingencies or by evaluating the performance of the estimators in a way that appropriately accounts for these events.
 We consider both these problems here, and then revisit the conceptual considerations in Chapter \@ref(performance-measures), where we discuss assessing estimator performance.
 
-
-### Capturing errors and warnings
+### Capturing errors and warnings {#error-handling}
 
 Some estimation functions will require complicated or stochastic calculations that can sometimes produce errors. 
 Intermittent errors can really be annoying and time-consuming if not addressed.
-To protect yourself, it is good practice to anticipate potential errors, preventing them from stopping code execution and allowing your simulations to keep running. 
+To protect yourself, it is good practice to anticipate potential errors, so that you can prevent them from stopping code execution and allow your simulations to keep running. 
 We next demonstrate some techniques for error-handling using tools from the `purrr` package.
 
 For illustrative purposes, consider the following error-prone function that sometimes returns what we want, sometimes returns `NaN` due to taking the square root of a negative number, and sometimes crashes completely because `broken_code()` does not exist:
@@ -453,6 +456,7 @@ However, `quietly()` does not trap errors:
 rs <- replicate(20, my_quiet_function( 7 ))
 ```
 To handle both errors and warnings, we double-wrap the function, first with `possibly()` and then with `quietly()`:
+
 ```{r}
 my_safe_quiet_function <- quietly( possibly( my_complex_function, otherwise = NA ) )
 my_safe_quiet_function(7)
@@ -465,39 +469,54 @@ set.seed(101012)  # (hand-picked to show a warning.)
 dat <- gen_cluster_RCT( J = 50, n_bar = 100, sigma2_u = 0 )
 mod <- analysis_MLM(dat)
 ```
+
 Wrapping `lmer()` with `quietly()` makes it possible to catch such output and store it along with other results, as in the following:
 ```{r}
-quiet_safe_lmer <- quietly( possibly( lmerTest::lmer, otherwise=NULL ) )
+quiet_safe_lmer <- quietly( possibly( lmerTest::lmer, otherwise = NULL ) )
 M1 <- quiet_safe_lmer( Yobs ~ 1 + Z + (1|sid), data=dat )
 M1
 ```
-We then pick apart the pieces and construct a dataset of results:
+
+In our estimation function, we replace the original `lmer()` call with our quiet-and-safe version, `quiet_safe_lmer()`. We then pick apart the pieces of its output to return a tidy dataset of results. (Separately, we also add in a generally preferred optimizer, bobyqa, to try and avoid the warnings and errors in the first place.) The resulting function is:
+
 ```{r}
-if ( is.null( M1$result ) ) {
-  # we had an error!
-  tibble( ATE_hat = NA, SE_hat = NA, p_value = NA,
-          message = M1$message,
-          warning = M1$warning,
-          error = TRUE )
-} else {
-  sum <- summary( M1$result )
-  tibble( 
-    ATE_hat = sum$coefficients["Z","Estimate"], 
-    SE_hat = sum$coefficients["Z","Std. Error"], 
-    p_value = sum$coefficients["Z", "Pr(>|t|)"],
-    message = list( M1$message ),
-    warning = list( M1$warning ),
-    error = FALSE )
+analysis_MLM_safe <- function( dat ) {
+
+  M1 <- quiet_safe_lmer( Yobs ~ 1 + Z + (1 | sid),
+                         data=dat,
+                         control = lme4::lmerControl(
+                           optimizer = "bobyqa"
+                         ) )
+
+  if ( is.null( M1$result ) ) {
+    # we had an error!
+    tibble( ATE_hat = NA, SE_hat = NA, p_value = NA,
+            message = M1$message,
+            warning = M1$warning,
+            error = TRUE )
+  } else {
+    # no error!
+    sum <- summary( M1$result )
+
+    tibble( 
+      ATE_hat = sum$coefficients["Z","Estimate"], 
+      SE_hat = sum$coefficients["Z","Std. Error"], 
+      p_value = sum$coefficients["Z", "Pr(>|t|)"],
+      message = length( M1$messages ) > 0,
+      warning = length( M1$warning ) > 0,
+      error = FALSE
+      )
+  }
 }
 ```
-Now we can plug in the above code to make a nicely quieted and safe function.
-This version runs without extraneous messages:
+
+Our improved function runs without extraneous messages and returns the estimation results along with any diagnostic information from messages or warnings:
+
 ```{r}
 mod <- analysis_MLM_safe(dat)
 mod
 ```
 
-Now we have the estimation results along with any diagnostic information from messages or warnings.
 Storing this information will let us evaluate what proportion of the time there was a warning or message, run additional analyses on the subset of replications where there was no such warning, or even modify the estimator to take the diagnostics into account.
 We have solved the technical problem---our code will run and give results---but not the conceptual one: what does it mean when our estimator gives an NA or a convergence warning with a nominal answer?
 How do we decide how good our estimator is when it does this?
@@ -575,7 +594,7 @@ We end up with a 20-entry list, with each element consisting of a pair of the re
 length( resu )
 resu[[3]]
 ```
-We can massage our data to a more easy to parse format:
+We can massage our data into a format that is easier to parse:
 ```{r}
 resu <- transpose( resu )
 unlist(resu$result)