0 1
459 304
PS 813 - Causal Inference
February 2, 2026
\[ \require{cancel} \]
\[X_i {\perp \! \! \! \perp} D_i\]
\[E[X_i | D_i = 1] = E[X_i | D_i = 0]\]
0 1
459 304
# Key pre-treatment covariates from baseline survey (t1_) and voter file (vf_)
baseline_covs <- c(
# Demographics from voter file
"vf_age", # Age
# Baseline survey measures (t1_ = time 1, pre-treatment)
"t1_pid7", # 7-point party ID (1 = Strong Dem to 7 = Strong Rep)
"t1_ideo_self", # Self-reported ideology
"t1_therm_trump", # Feeling thermometer: Trump
"t1_therm_biden", # Feeling thermometer: Biden
"t1_therm_fox", # Feeling thermometer: Fox News
"t1_therm_cnn", # Feeling thermometer: CNN
"t1_trust_fox", # Trust in Fox News
"t1_trust_cnn", # Trust in CNN
# Pre-treatment TV viewership (from set-top box data)
"pre_treat_fox_minutes", # Minutes watching Fox pre-treatment
"pre_treat_cnn_minutes" # Minutes watching CNN pre-treatment
)[1] 695
0 1
417 278
# Use cobalt to create a balance table (at household level)
bal_tab <- bal.tab(
x = bk_hh %>% select(all_of(baseline_covs)), # Covariate data
treat = bk_hh$treat, # Treatment indicator
binary = "std", # Standardize binary variables
continuous = "std", # Standardize continuous variables
s.d.denom = "pooled" # Use pooled SD for standardization
)
print(bal_tab)Balance Measures
Type Diff.Un
vf_age Contin. 0.0352
t1_pid7 Contin. 0.0367
t1_ideo_self Contin. 0.0130
t1_ideo_self:<NA> Binary 0.0000
t1_therm_trump Contin. 0.1688
t1_therm_biden Contin. -0.0855
t1_therm_fox Contin. -0.0224
t1_therm_cnn Contin. -0.0086
t1_trust_fox Contin. 0.0421
t1_trust_cnn Contin. 0.0755
pre_treat_fox_minutes Contin. 0.0480
pre_treat_cnn_minutes Contin. -0.1350
Sample sizes
Control Treated
All 417 278
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 81.758593 1.040406 78.583337 0.00000000 79.7158668 83.801319 693
treat 3.360112 1.519283 2.211643 0.02731688 0.3771622 6.343062 693
set.seed(53706)
iterations <- 10000
tstat_null <- rep(NA, iterations)
for (i in 1:iterations){
bk_hh$treat_perm <- sample(bk_hh$treat)
treat_reg_null <- lm_robust(treat_perm ~ vf_age + t1_pid7 + t1_ideo_self + t1_therm_trump + t1_therm_biden +
t1_therm_fox + t1_therm_cnn + t1_trust_fox + t1_trust_cnn + pre_treat_fox_minutes +
pre_treat_cnn_minutes, data=bk_hh)
tstat_null[i] <- treat_reg_null$fstatistic[1]
}\[\{M_i(1), M_i(0)\} {\perp \! \! \! \perp} D_i\]
\[E[Y_i | D_i = 1, M_i = 1] - E[Y_i | D_i = 0, M_i = 1]\]
\[E[Y_i(1) | D_i = 1, M_i(1) = 1] - E[Y_i(0) | D_i = 0, M_i(0) = 1]\]
\[E[Y_i(1) | M_i(1) = 1] - E[Y_i(0) | M_i(0) = 1]\]
| Stratum | \(M_i(1)\) | \(M_i(0)\) |
|---|---|---|
| “Always-takers” | \(1\) | \(1\) |
| “Never-takers” | \(0\) | \(0\) |
| “Compliers” | \(1\) | \(0\) |
| “Defiers” | \(0\) | \(1\) |
\[E[Y_i(1) | M_i(1) = 1] - E[Y_i(0) | M_i(0) = 1]\]
\[M_i(1) \ge M_i(0) \text{ } \forall i \quad \text{or} \quad M_i(1) \le M_i(0) \text{ } \forall i\]
Additionally, the share of survivors is balanced between treated and control
\[Pr(M_i(1) = M_i(0) = 1 | D_i = 0) = Pr(M_i(1) = M_i(0) = 1 | D_i = 1)\]
All observed units in the control group are survivors
\[Pr(M_i = 1 | D_i = 0) = Pr(M_i(1) = M_i(0) = 1 | D_i = 0)\]
And observed units in the treated group are survivors + compliers
\[Pr(M_i = 1 | D_i = 1) = Pr(M_i(1) = M_i(0) = 1 | D_i = 1) + Pr(M_i(1) = 1, M_i(0) = 0 | D_i = 1)\]
\[q = \frac{p_1 - p_0}{p_1}\]
\[\tau^{LB} = \mathbb{E}[Y_i | D_i = 1, M_i = 1, Y_i \le y^{1-q}_1] - \mathbb{E}[Y_i | D_i = 0, M_i = 1]\]
\[\tau^{UB} = E[Y_i | D_i = 1, M_i = 1, Y_i \ge y^{q}_1] - E[Y_i | D_i = 0, M_i = 1]\]
Let’s run a quick simulation to show how the bounds work.
Consider a case with \(0\) treatment effect, standard normal outcome, but selection associated with both treatment and outcome
Marginally, half of all observations are compliers and half are survivors.
Generate \(5000\) observations, and look at their outcome distributions?
set.seed(53706)
N <- 5000
lee_df <- data.frame(Y = rnorm(n = N), D = rbinom(N, 1, .5))
lee_df <- lee_df %>% mutate(treatment = case_when(D == 1 ~ "Treated",
D == 0 ~ "Control"))
lee_df <- lee_df %>% mutate(survivor_prob = case_when(Y > 0 ~ .7,
Y < 0 ~ .3))
lee_df <- lee_df %>% mutate(survivor = rbinom(N, 1, survivor_prob))
lee_df <- lee_df %>% mutate(M = as.numeric(D == 0)*survivor + as.numeric(D == 1))[1] 0.4968127
49.68127% 50.31873%
0.02370004 0.03848793
When targeting the average treatment effect, we (try to be) entirely agnostic about the variation in \(\tau_i\) between units in the sample
\[\tau_{\text{ATE}} = \mathbb{E}[Y_i(1) - Y_i(0)]\]
The Conditional Average Treatment Effect (CATE)
\[\tau(x) = \underbrace{E[Y_i(1) | X_i = x]}_{\text{Mean P.O. under treatment among units with } X_i = x} - \underbrace{E[Y_i(0) | X_i = x]}_{\text{Mean P.O. under control among units with } X_i = x}\]
\[\hat{\tau}(x) = \frac{1}{N_{t,x}}\sum_{i: X_i = x}^N Y_i D_i - \frac{1}{N_{c,x}}\sum_{i: X_i = x}^N Y_i (1 - D_i)\]
where \(N_{t, x}\) is the number of units with \(D_i = 1, X_i = x\) and \(N_{c, x}\) is the number of units with \(D_i = 0, X_i = x\)
[1] 0.09209231
[1] 0.07612996
# Estimate the sampling variance
# At least one member of the household voted in Primary 2004
var_ate_voters = var(data_hh$voted[data_hh$treatment == 3&data_hh$voted_p2004 > 0])/sum(data_hh$treatment == 3&data_hh$voted_p2004 > 0) +
var(data_hh$voted[data_hh$treatment == 0&data_hh$voted_p2004 > 0])/sum(data_hh$treatment == 0&data_hh$voted_p2004 > 0)
# No member of the household voted in Primary 2004
var_ate_nonvoters = var(data_hh$voted[data_hh$treatment == 3&data_hh$voted_p2004 == 0])/sum(data_hh$treatment == 3&data_hh$voted_p2004 == 0) +
var(data_hh$voted[data_hh$treatment == 0&data_hh$voted_p2004 == 0])/sum(data_hh$treatment == 0&data_hh$voted_p2004 == 0)[1] 0.08272279 0.10146183
[1] 0.06678196 0.08547795
For inference on the population difference in CATEs, remember that the variances are additive
\[\widehat{Var}(\hat{\tau}_{\text{voter}} - \hat{\tau}_{\text{non-voter}}) = \frac{s_{t, \text{voter}}^2}{N_{t, \text{voter}}} + \frac{s_{c, \text{voter}}^2}{N_{c, \text{voter}}} + \frac{s_{t, \text{non-voter}}^2}{N_{t, \text{non-voter}}} + \frac{s_{c, \text{non-voter}}^2}{N_{c, \text{non-voter}}}\]
[1] 0.01596235
[1] 0.002727064 0.029197645
Estimate Std. Error t value
(Intercept) 0.25896339 0.001821941 142.135971
I(treatment == 3)TRUE 0.07612996 0.004769472 15.961924
I(voted_p2004 > 0)TRUE 0.08847971 0.002625902 33.694977
I(treatment == 3)TRUE:I(voted_p2004 > 0)TRUE 0.01596235 0.006752823 2.363805
Pr(>|t|) CI Lower
(Intercept) 0.000000e+00 0.255392418
I(treatment == 3)TRUE 2.696797e-57 0.066781869
I(voted_p2004 > 0)TRUE 9.922548e-248 0.083332984
I(treatment == 3)TRUE:I(voted_p2004 > 0)TRUE 1.808993e-02 0.002726931
CI Upper DF
(Intercept) 0.26253437 119995
I(treatment == 3)TRUE 0.08547805 119995
I(voted_p2004 > 0)TRUE 0.09362643 119995
I(treatment == 3)TRUE:I(voted_p2004 > 0)TRUE 0.02919778 119995
Often also characterized in terms of generalization and transportability
Findley, Kikuta, Denly (2021) “External Validity” Annual Review of Political Science
PS 813 - University of Wisconsin - Madison