-
Notifications
You must be signed in to change notification settings - Fork 21
Description
After running a panel model (5475 units, 13 years) with stats::lm(), I've been trying to estimate heteroskedasticity robust standard errors with estimatr::lm_robust(). It turns out that some coefficients estimated are (very) different in each function - which shouldn't happen (only se, t.test and p-values should change).
df <- readRDS(url("https://github.com/victorrssx/victorrssx/raw/main/df.RDS")) %>%
dplyr::mutate(ano = as.factor(ano))
lm_mod <- stats::lm(tx_hom_tot ~ pos2018 + res_ou:pos2018 + ti:pos2018 + res_ou:ti:pos2018 + ano, data = df)
summary(lm_mod)
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26.827 0.382 70.25 < 0.0000000000000002 ***
pos2018 2.927 0.538 5.44 0.00000005226909 ***
ano2011 0.220 0.540 0.41 0.68385
ano2012 1.182 0.536 2.20 0.02754 *
ano2013 0.892 0.537 1.66 0.09640 .
ano2014 1.960 0.534 3.67 0.00025 ***
ano2015 2.985 0.533 5.60 0.00000002131005 ***
ano2016 5.556 0.530 10.48 < 0.0000000000000002 ***
ano2017 6.835 0.528 12.95 < 0.0000000000000002 ***
ano2018 4.841 0.531 9.12 < 0.0000000000000002 ***
ano2019 -1.633 0.520 -3.14 0.00169 **
ano2020 0.280 0.518 0.54 0.58944
ano2021 -0.272 0.517 -0.53 0.59925
ano2022 NA NA NA NA
pos2018:res_ou 0.862 0.643 1.34 0.18005
pos2018:ti 2.874 0.629 4.57 0.00000498294524 ***
pos2018:res_ou:ti 9.824 1.365 7.20 0.00000000000063 ***
lm_mod <- estimatr::lm_robust(tx_hom_tot ~ pos2018 + res_ou:pos2018 + ti:pos2018 + res_ou:ti:pos2018 + ano, data = df, se_type = "HC1")
summary(lm_mod)
Standard error type: HC1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26.930 10407.875 0.00258749 0.9979355
pos2018 2842202543499.796 324275566161917504.000 0.00000876 0.9999930
ano2011 0.222 196.496 0.00112810 0.9990999
ano2012 1.179 330.843 0.00356436 0.9971561
ano2013 0.890 397.685 0.00223694 0.9982152
ano2014 1.957 125.512 0.01558929 0.9875621
ano2015 2.982 396.852 0.00751471 0.9940042
ano2016 5.553 470.073 0.01181333 0.9905746
ano2017 6.832 355.406 0.01922433 0.9846622
ano2018 4.838 355.006 0.01362761 0.9891271
ano2019 -2842202543498.518 264769891052461504.000 -0.00001073 0.9999914
ano2020 -2842202543497.091 374441170828414656.000 -0.00000759 0.9999939
ano2021 -2842202543497.189 324275566161908160.000 -0.00000876 0.9999930
ano2022 -2842202543496.979 324275566161925760.000 -0.00000876 0.9999930
pos2018:res_ou 0.863 5.743 0.15017708 0.8806255
pos2018:ti 2.874 0.677 4.24755884 0.0000216
pos2018:res_ou:ti 9.852 2739.434 0.00359654 0.9971304
Specifically, the problem happens sharply on pos2018, ano2019, ano2020, ano2021 which are all dummies. The first is set to 1 for years after 2018 and 0 otherwise; the remainder are year-specific dummies. Even setting fixed_effects = ~ ano produce strange results.
lm_mod <- estimatr::lm_robust(tx_hom_tot ~ pos2018 + res_ou:pos2018 + ti:pos2018 + res_ou:ti:pos2018, data = df, se_type = "HC1", fixed_effects = ~ ano)
summary(lm_mod)
Standard error type: HC1
Coefficients: (1 not defined because the design matrix is rank deficient)
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
pos2018 NA NA NA NA NA NA NA
pos2018:res_ou 0.862 0.707 1.22 0.22251985 -0.523 2.25 50693
pos2018:ti 2.874 0.640 4.49 0.00000709 1.620 4.13 50693
pos2018:res_ou:ti 9.824 2.252 4.36 0.00001292 5.409 14.24 50693
When using lmtest::coeftest(lm_mod, vcov. = plm::vcovHC(lm_mod, type = "HC1")) it seems to produce correct results. So why is this happening with lm_robust? Could it be due to the interpretation of the NAs in the data by the function?