Week 11: Panel Data Causal Inference

PS 813 - Causal Inference

Anton Strezhnev

University of Wisconsin-Madison

April 6, 2026

\[ \require{cancel} \]

Last two weeks

  • Differences-in-differences designs for dealing with unobserved confounding.

  • Identification under Parallel trends

    \[\mathbb{E}[Y_{it}(0) - Y_{it^{\prime}}(0) \mid D_i = 1] - \mathbb{E}[Y_{it}(0) - Y_{it^{\prime}}(0) \mid D_i = 0]\]

  • Estimation via first-differences regression (Callaway and Sant’Anna, 2021)

    • Can adjust for time-varying and time-invariant threats to parallel trends.
  • Pre-treatment placebo test diagnostics

  • Be careful with software defaults!

This week

  • In DiD, our identifying assumptions rule out outcome-treatment feedback
    • Treatment is exogenous conditional on observed and unobserved (but structurally constrained) factors
    • But past outcomes do not directly affect present treatment.
  • Not uncommon to have dynamic treatment settings where this assumption is violated.
    • Doctors change treatment regimen in response to patient outcomes.
    • Political campaigns adjust their strategy in response to observed poll numbers.
    • Economic growth affects democratization (which affects economic growth)
  • In all of these settings, we have \(Y_{t-1}\) affecting treatment \(D_{t}\)
    • And typically also lagged treatment \(D_{t-1}\) affecting current treatment \(D_{t}\)

Review: DiD and Strict Exogeneity

  • For this section, we’re going to work with binary treatment \(D_{it}\) observed at each time period \(t\)

    • For simplicity, we’ll consider two time periods generically as \(t\) and \(t-1\)
    • Let \(D_i = \{D_{i1}, D_{i2}, D_{i3}, \dotsc, D_{iT}\}\) denote the entire treatment history for a unit
  • In DiD, our parallel trends assumption is a special case of a particular kind of exogeneity/ignorability assumption

    • We assume that treatment history is as-good-as-randomly assigned given observed covariates and some latent factor
    • Let \(\mathbf{X}_i = \{X_{i1}, X_{i2}, X_{i3}, \dotsc, X_{iT}\}\) denote the unit’s observed covariate history
    • Let \(\mathbf{U}_i = \{U_{i1}, U_{i2}, U_{i3}, \dotsc, U_{iT}\}\) denote the unit’s latent factor history
  • Strict exogeneity

    \[\{Y_{it}(0), Y_{it}(1) \} \perp\!\!\!\perp D_{i} \mid \mathbf{X}_{i}, \mathbf{U}_i\]

Strict exogeneity

Adapted from Xu (2023) ‘Causal Inference with Time-Series Cross-Sectional Data: A Reflection’

Sequential ignorability

  • But in many panel data settings, ruling out the arrow from \(Y_{t-1}\) to \(D_{t}\) is implausible!

    • We’ll need to make a different assumption
    • Before that though, we need to define the potential outcomes in a new way!
  • We now have to consider potential outcomes in terms of the entire treatment history assigned up to time \(t\)

    \[Y_{it}(d_1, d_2, \dotsc, d_t) = Y_{it}(d_{1:t})\]

  • We can partition this joint potential outcome into the contemporaneous treatment \(d_t\) and past values \(d_1, d_2, \dotsc, d_{t-1}\)

    \[Y_{it}(d_{1:t}) = Y_{it}(d_{t}; d_{1:t-1})\]

  • We’ll define the Contemporaneous Effect of treatment at time \(t\) as:

    \[Y_{it}(1; d_{1:t-1}) - Y_{it}(0; d_{1:t-1}) = Y_{it}(1) - Y_{it}(0)\]

  • Later on when we talk about mediation, we’ll introduce other kinds of estimands for cumulative effects over time.

Sequential ignorability

  • Identification of causal effects under outcome-treatment feedback can be done under a sequential ignorability assumption (Robins, 1986; Blackwell and Glynn, 2018).

  • At each time period, treatment is as-if random conditional on observed covariate and outcome history

    \[\{Y_{it}(1), Y_{it}(0)\} \perp\!\!\!\perp D_{it} \mid X_{it}, Y_{it-1}, D_{it-1}\]

  • We can relax this by conditioning on a longer history of outcomes, treatments and covariates.

  • Here we’re focusing on the version of sequential ignorability needed for the contemporaneous treatment effect, but can make it more general

    \[\{Y_{is}(d_{1:s}) : s = t, \dotsc, T\} \perp\!\!\!\perp D_{it} \mid X_{it}, Y_{it-1}, D_{it-1}\]

  • We’ll cover this more when we talk about mediation

Sequential ignorability

Adapted from Blackwell and Glynn (2018) ‘How to Make Causal Inferences with Time-Series Cross-Sectional Data under Selection on Observables’

Estimation of treatment effects

  • Under sequential ignorability, we can motivate estimation of the contemporaneous effect of treatment using an Autoregressive Distributed Lag (ADL) model framework.

    • Adjust for some number of lagged outcomes (autoregressive)
    • Adjust for some number of lagged treatments (distributed lag)
  • For example, a one-lag linear model

    \[Y_{it} = X_{it}^\prime\beta + \alpha Y_{it-1} + \tau_0 D_{it} + \tau_1 D_{it-1} + \varepsilon_{it}\]

  • If we believe the parametric assumptions behind this model (e.g. constant effects), \(\tau_0\) identifies the average contemporaneous effect of treatment at time \(t\).

  • But does \(\tau_1\) identify the effect one period out?

    • No! We have post-treatment bias (Blackwell and Glynn, 2018).
    • This is the Table 2 fallacy all over again.

Sequential ignorability

Adapted from Blackwell and Glynn (2018) ‘How to Make Causal Inferences with Time-Series Cross-Sectional Data under Selection on Observables’

LDV vs. DiD

  • We’ll talk about how to estimate these long-term and cumulative effects under sequential ignorability when we get to mediation in a few weeks.
    • The relevant estimands and estimators come from this literature.
    • Future treatment and covariates are mediators of past treatment!
  • But let’s consider the simple setting with two time periods and two treatment groups
    • Everyone is under control at time \(t-1\), some units are treated at time \(t\)
  • Two possible ways to estimate the treatment effect.
    1. Difference-in-differences - Regress \(Y_{it} - Y_{it-1}\) on \(D_{it}\)
    2. Lagged-DV adjustment - Regress \(Y_{it}\) on \(D_{it}\) and \(Y_{it-1}\)

Controlling for LDV under strict exogeneity

  • What happens if we control for the lagged dependent variable under strict exogeneity?
  • What we actually want to do is adjust for the expected lagged outcome.
    • But the actual observed outcome contains random error
    • \(Y_{it-1}\) can be high because of the latent factor or the error
  • We are controlling for an imperfect proxy of the true confounder
  • This measurement error induces a bias due to reversion to the mean
    • If treated units tend to be higher in their outcomes than control (on average), then control units with similar observed outcomes are outliers
    • We’d expect the subsequent outcome to be closer to the mean if there’s independent noise at \(t\) versus \(t-1\)

Simulation: Controlling for LDV

  • To see the bias from controlling for lagged \(Y\), consider the following DGP:
    • \(N = 2000\) units, two periods (\(t{-}1\) and \(t\)), true treatment effect \(\tau = 0\)
    • Time-invariant confounder: \(\alpha_i \sim N(0, 1)\)
    • Treatment correlated with confounder: \(D_i \sim \text{Bernoulli}(\text{logit}^{-1}(\alpha_i))\)
    • Outcome at \(t{-}1\): \(Y_{it-1} = \alpha_i + \varepsilon_{it-1}\), where \(\varepsilon_{it-1} \sim N(0, \sigma_\varepsilon^2)\)
    • Outcome at \(t\): \(Y_{it} = \alpha_i + \tau D_i + \varepsilon_{it}\), where \(\varepsilon_{it} \sim N(0, \sigma_\varepsilon^2)\)
  • Parallel trends holds by construction: \(\alpha_i\) is time-invariant
  • Vary \(\sigma_\varepsilon \in \{0.5, 1, 2, 4, 8\}\) to control how noisy the proxy \(Y_{it-1}\) is for \(\alpha_i\)
  • Compare three estimators: simple difference-in-means post-treatment, DiD, and LDV

Simulation: LDV bias from measurement error

Code
set.seed(53703)
N <- 2000
n_sims <- 500
noise_levels <- c(0.5, 1, 2, 4, 8)

sim_results <- map_dfr(noise_levels, function(sigma_e) {
  map_dfr(1:n_sims, function(s) {
    # Time-invariant confounder
    alpha_i <- rnorm(N, 0, 1)
    # Treatment correlated with confounder
    D_i <- rbinom(N, 1, plogis(alpha_i))
    # True treatment effect is ZERO
    beta <- 0
    # Pre and post outcomes: parallel trends holds
    Y_pre <- alpha_i + rnorm(N, 0, sigma_e)
    Y_post <- alpha_i + beta * D_i + rnorm(N, 0, sigma_e)

    # Difference-in-means (biased by alpha_i)
    dim_est <- coef(lm(Y_post ~ D_i))["D_i"]
    # DiD (unbiased)
    did_est <- coef(lm(I(Y_post - Y_pre) ~ D_i))["D_i"]
    # LDV (biased -- errors in variables)
    ldv_est <- coef(lm(Y_post ~ D_i + Y_pre))["D_i"]

    tibble(sigma_e = sigma_e, sim = s,
           `Diff-in-means` = dim_est, DiD = did_est, LDV = ldv_est)
  })
})

sim_summary <- sim_results %>%
  pivot_longer(cols = c(`Diff-in-means`, DiD, LDV),
               names_to = "Estimator", values_to = "estimate") %>%
  group_by(sigma_e, Estimator) %>%
  summarize(mean_est = mean(estimate),
            ci_low = quantile(estimate, 0.025),
            ci_high = quantile(estimate, 0.975), .groups = "drop")

sim_summary$Estimator <- factor(sim_summary$Estimator,
                                levels = c("Diff-in-means", "LDV", "DiD"))

ggplot(sim_summary, aes(x = factor(sigma_e), y = mean_est,
                        color = Estimator, group = Estimator)) +
  geom_point(size = 3, position = position_dodge(width = 0.4)) +
  geom_errorbar(aes(ymin = ci_low, ymax = ci_high),
                width = 0.3, position = position_dodge(width = 0.4)) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(x = expression(paste("Noise in pre-treatment outcome (", sigma[epsilon], ")")),
       y = "Estimate",
       title = "LDV bias under the DiD model (true effect = 0)",
       subtitle = "More noise in Y(pre) = worse proxy for confounder = more residual bias") +
  scale_color_manual(values = c("Diff-in-means" = "gray60",
                                "LDV" = "dodgerblue",
                                "DiD" = "indianred")) +
  theme_bw() + theme(legend.position = "bottom")

Simulation: What if outcomes are autocorrelated?

  • In the previous simulation, \(\varepsilon_{it-1}\) and \(\varepsilon_{it}\) are independent — the only thing linking outcomes over time is \(\alpha_i\)
  • In practice, outcomes are often autocorrelated: a unit that is high at \(t{-}1\) (beyond \(\alpha_i\)) tends to stay high at \(t\)
  • Modified DGP: let \(\varepsilon_{it} = \rho \cdot \varepsilon_{it-1} + \nu_i\) where \(\nu_i \sim N(0, \sigma_\varepsilon^2)\)
    • When \(\rho > 0\), \(Y_{it-1}\) is a better proxy for \(Y_{it}\)’s non-treatment component
    • LDV should perform better because controlling for \(Y_{it-1}\) captures more of the variation
  • We fix \(\sigma_\varepsilon = 2\) and vary \(\rho \in \{0, 0.25, 0.5, 0.75, 0.95\}\)

Simulation: Autocorrelation reduces LDV bias

Code
set.seed(53703)
N <- 2000
n_sims <- 500
sigma_e <- 2
rho_levels <- c(0, 0.25, 0.5, 0.75, 0.95)

sim_results_rho <- map_dfr(rho_levels, function(rho) {
  map_dfr(1:n_sims, function(s) {
    alpha_i <- rnorm(N, 0, 1)
    D_i <- rbinom(N, 1, plogis(alpha_i))
    beta <- 0

    eps_pre <- rnorm(N, 0, sigma_e)
    eps_post <- rho * eps_pre + rnorm(N, 0, sigma_e)

    Y_pre <- alpha_i + eps_pre
    Y_post <- alpha_i + beta * D_i + eps_post

    dim_est <- coef(lm(Y_post ~ D_i))["D_i"]
    did_est <- coef(lm(I(Y_post - Y_pre) ~ D_i))["D_i"]
    ldv_est <- coef(lm(Y_post ~ D_i + Y_pre))["D_i"]

    tibble(rho = rho, sim = s,
           `Diff-in-means` = dim_est, DiD = did_est, LDV = ldv_est)
  })
})

sim_summary_rho <- sim_results_rho %>%
  pivot_longer(cols = c(`Diff-in-means`, DiD, LDV),
               names_to = "Estimator", values_to = "estimate") %>%
  group_by(rho, Estimator) %>%
  summarize(mean_est = mean(estimate),
            ci_low = quantile(estimate, 0.025),
            ci_high = quantile(estimate, 0.975), .groups = "drop")

sim_summary_rho$Estimator <- factor(sim_summary_rho$Estimator,
                                    levels = c("Diff-in-means", "LDV", "DiD"))

ggplot(sim_summary_rho, aes(x = factor(rho), y = mean_est,
                            color = Estimator, group = Estimator)) +
  geom_point(size = 3, position = position_dodge(width = 0.4)) +
  geom_errorbar(aes(ymin = ci_low, ymax = ci_high),
                width = 0.3, position = position_dodge(width = 0.4)) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(x = expression(paste("Autocorrelation (", rho, ")")),
       y = "Estimate",
       title = expression(paste("LDV bias with autocorrelated errors (", sigma[epsilon], " = 2, true effect = 0)")),
       subtitle = "More autocorrelation = lagged Y is better proxy = less LDV bias") +
  scale_color_manual(values = c("Diff-in-means" = "gray60",
                                "LDV" = "dodgerblue",
                                "DiD" = "indianred")) +
  theme_bw() + theme(legend.position = "bottom")

Illustration: The Effects of the Shale Shock

  • Let’s return to our example of the effect of the “shale shock” on coal county voting for the Republican party (Gazmararian, 2025).
    • Last week we showed concerning evidence of a parallel trends violation - coal-heavy regions were already trending Republican prior to 2008.
    • Extrapolating the trend (triple-differences in time) eliminates the observed effect.
  • What if, instead of DiD, we controlled for lagged Republican party vote share in 2004 and 2000.

Lagged DV approach: Setup

Code
dff <- readRDS("assets/data/matched_df.rds")
dff <- dff %>% filter(!is.na(oilgas_prop))

# Create wide-format lagged outcomes
lag_vars <- dff %>%
  select(county_state, year, RepVotesMajorPercent) %>%
  pivot_wider(names_from = year, values_from = RepVotesMajorPercent,
              names_prefix = "Y_")

dff <- dff %>% left_join(lag_vars, by = "county_state")

Lagged DV: Estimates by year

Code
# LDV regressions for each post-treatment year
ldv_years <- c(2008, 2012, 2016, 2020)
ldv_results <- map_dfr(ldv_years, function(yr) {
  df_yr <- dff %>% filter(year == yr)
  fit <- estimatr::lm_robust(
    RepVotesMajorPercent ~ treat + Y_2004 + Y_2000 + oilgas_prop,
    data = df_yr, clusters = county_state, se_type = "stata")
  tidy_fit <- broom::tidy(fit) %>% filter(term == "treat")
  tidy_fit$year <- yr
  tidy_fit
})

ggplot(ldv_results, aes(x = year, y = estimate)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.5) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = 2006, linetype = "dashed", color = "red") +
  labs(x = "Year", y = "Estimate (pp)",
       title = "LDV: Effect of coal county on Republican vote share",
       subtitle = "Controlling for Y[2004], Y[2000], and oil/gas employment") +
  scale_x_continuous(breaks = ldv_years) +
  theme_bw()

DiD: Estimates by year

Code
# DiD regressions for each post-treatment year
did_results <- map_dfr(ldv_years, function(yr) {
  df_yr <- dff %>%
    filter(year == yr) %>%
    mutate(dY = RepVotesMajorPercent - Y_2004)
  fit <- estimatr::lm_robust(
    dY ~ treat + oilgas_prop,
    data = df_yr, clusters = county_state, se_type = "stata")
  tidy_fit <- broom::tidy(fit) %>% filter(term == "treat")
  tidy_fit$year <- yr
  tidy_fit
})

ggplot(did_results, aes(x = year, y = estimate)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.5) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = 2006, linetype = "dashed", color = "red") +
  labs(x = "Year", y = "Estimate (pp)",
       title = "DiD: Effect of coal county on change in Republican vote share",
       subtitle = "Outcome: Y[t] - Y[2004], controlling for oil/gas employment") +
  scale_x_continuous(breaks = ldv_years) +
  theme_bw()

Comparing LDV and DiD

Placebo test: Pre-treatment hold-out

  • Let’s construct a placebo test for the lagged DV approach.
    • Because we have many pre-treatment periods, we can apply the LDV approach to pre-treatment outcomes
  • Placebo 1: Outcome year 2004, controlling for \(Y_{2000}\) and \(Y_{1996}\)
  • Placebo 2: Outcome year 2000, controlling for \(Y_{1996}\) and \(Y_{1992}\)
  • Placebo 3: Outcome year 1996, controlling for \(Y_{1992}\) and \(Y_{1988}\)

Placebo: LDV estimates

Code
# Placebo LDV regressions (2 lagged controls, matching main spec)
placebo_ldv_results <- bind_rows(
  # 2004 outcome, controlling for 2000, 1996
  {
    df_yr <- dff %>% filter(year == 2004)
    fit <- estimatr::lm_robust(
      RepVotesMajorPercent ~ treat + Y_2000 + Y_1996 + oilgas_prop,
      data = df_yr, clusters = county_state, se_type = "stata")
    broom::tidy(fit) %>% filter(term == "treat") %>% mutate(year = 2004)
  },
  # 2000 outcome, controlling for 1996, 1992
  {
    df_yr <- dff %>% filter(year == 2000)
    fit <- estimatr::lm_robust(
      RepVotesMajorPercent ~ treat + Y_1996 + Y_1992 + oilgas_prop,
      data = df_yr, clusters = county_state, se_type = "stata")
    broom::tidy(fit) %>% filter(term == "treat") %>% mutate(year = 2000)
  },
  # 1996 outcome, controlling for 1992, 1988
  {
    df_yr <- dff %>% filter(year == 1996)
    fit <- estimatr::lm_robust(
      RepVotesMajorPercent ~ treat + Y_1992 + Y_1988 + oilgas_prop,
      data = df_yr, clusters = county_state, se_type = "stata")
    broom::tidy(fit) %>% filter(term == "treat") %>% mutate(year = 1996)
  }
)

ggplot(placebo_ldv_results, aes(x = year, y = estimate)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.5) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(x = "Year", y = "Estimate (pp)",
       title = "Placebo test: LDV estimates in pre-treatment periods",
       subtitle = "Each placebo controls for two prior election outcomes + oil/gas share") +
  scale_x_continuous(breaks = c(1996, 2000, 2004)) +
  theme_bw()

Bracketing: Angrist and Pischke (2009)

  • Angrist and Pischke (2009, Ch. 5.4) show that LDV and DiD estimates bracket the true causal effect if one of the two identifying assumptions holds.
    • Focus on the simple linear model case - Ding and Li (2019) extend to the non-parametric setting.
  • Two-period DiD. All units under control at time \(0\)
    • Assume \(\tau > 0\) (positive effect)
  • For exposition, assume time-effects are already removed via differencing and no covariates.

Bracketing: Case 1 – DiD is correct, LDV is misspecified

  • Suppose the FE model is the truth:

    \[Y_{it} = \alpha_i + \tau D_{it} + \varepsilon_{it}\]

    where \(\varepsilon_{it}\) is serially uncorrelated and independent of \(\alpha_i\) and \(D_{it}\). Since \(D_{it-1} = 0\): \(Y_{it-1} = \alpha_i + \varepsilon_{it-1}\)

  • You mistakenly estimate the LDV regression (controlling for \(Y_{it-1}\) instead of \(\alpha_i\)):

    \[Y_{it} = a + b \cdot Y_{it-1} + \tau^{\text{LDV}} D_{it} + u_{it}\]

Bracketing: Case 1 – FWL decomposition

  • By FWL, the LDV estimand is \(\hat{\tau}^{\text{LDV}} \xrightarrow{p} \frac{\text{Cov}(Y_{it}, \tilde{D}_{it})}{\text{Var}(\tilde{D}_{it})}\)

    • \(\tilde{D}_{it} = D_{it} - \hat{\gamma} Y_{it-1}\) is the residual from the auxiliary regression of \(D_{it}\) on \(Y_{it-1}\)
  • To evaluate this, substitute \(\alpha_i = Y_{it-1} - \varepsilon_{it-1}\) into the true model:

    \[Y_{it} = \alpha_i + \tau D_{it} + \varepsilon_{it} = (Y_{it-1} - \varepsilon_{it-1}) + \tau D_{it} + \varepsilon_{it}\]

    \[\Rightarrow Y_{it} = Y_{it-1} + \tau D_{it} + (\varepsilon_{it} - \varepsilon_{it-1})\]

Bracketing: Case 1 – Deriving the bias

  • Substitute in and use properties of covariances

    \[\frac{\text{Cov}(Y_{it}, \tilde{D}_{it})}{\text{Var}(\tilde{D}_{it})} = \frac{\text{Cov}(Y_{it-1} + \tau D_{it} + \varepsilon_{it} - \varepsilon_{it-1},\; \tilde{D}_{it})}{\text{Var}(\tilde{D}_{it})}\]

  • \(\text{Cov}(Y_{it-1}, \tilde{D}_{it}) = 0\) by construction (FWL residualizes out \(Y_{it-1}\))

  • \(\text{Cov}(D_{it}, \tilde{D}_{it}) = \text{Var}(\tilde{D}_{it})\)

  • \(\text{Cov}(\varepsilon_{it}, \tilde{D}_{it}) = 0\) since \(\varepsilon_{it} \perp\!\!\!\perp (D_{it}, Y_{it-1})\)

  • But \(\text{Cov}(\varepsilon_{it-1}, \tilde{D}_{it}) = \text{Cov}(\varepsilon_{it-1}, D_{it}) - \hat{\gamma}\text{Cov}(\varepsilon_{it-1}, Y_{it-1})\)

    • And since \(Y_{it-1} = \alpha_i + \varepsilon_{it-1}\)
    • \(\text{Cov}(\varepsilon_{it-1}, \tilde{D}_{it}) = -\hat{\gamma}\sigma^2_{\varepsilon}\)
  • So: \(\hat{\tau}^{\text{LDV}} \xrightarrow{p} \tau + \frac{\hat{\gamma} \sigma^2_\varepsilon}{\text{Var}(\tilde{D}_{it})}\), with bias having the same sign as \(\hat{\gamma}\) (the sign of selection)

Bracketing: Case 1 – Why is LDV biased?

  • \(Y_{it-1} = \alpha_i + \varepsilon_{it-1}\) is a noisy proxy for the confounder \(\alpha_i\)
    • Controlling for a noisy proxy does not fully remove confounding
  • With positive selection (high outcome \(\leadsto\) more likely to be treated) (\(\hat{\gamma} > 0\))
    • LDV overestimates \(\tau\)
  • With negative selection (\(\hat{\gamma} < 0\))
    • LDV underestimates \(\tau\)

Bracketing: Case 2 – LDV is correct, DiD is misspecified

  • Now suppose the LDV model is the truth:

    \[Y_{it} = \alpha + \theta Y_{it-1} + \tau D_{it} + \varepsilon_{it}\]

  • Where \(\varepsilon_{it}\) is serially uncorrelated, \(0 < \theta < 1\) (stationary)

  • You mistakenly estimate the first-differenced regression (no control for \(Y_{it-1}\)):

    \[\Delta Y_{it} = a + \tau^{\text{DiD}} D_{it} + u_{it}\]

    where \(\Delta Y_{it} = Y_{it} - Y_{it-1}\) and \(D_{it-1} = 0\) for everyone

Bracketing: Case 2 – FWL decomposition

  • The DiD estimand is \(\hat{\tau}^{\text{DiD}} \xrightarrow{p} \frac{\text{Cov}(\Delta Y_{it}, D_{it})}{\text{Var}(D_{it})}\)

  • Subtract \(Y_{it-1}\) from both sides of the true model:

    \[Y_{it} - Y_{it-1} = \alpha + (\theta - 1)Y_{it-1} + \tau D_{it} + \varepsilon_{it}\]

  • So \(\Delta Y_{it} = \alpha + (\theta - 1)Y_{it-1} + \tau D_{it} + \varepsilon_{it}\). Substituting:

    \[\frac{\text{Cov}(\Delta Y_{it}, D_{it})}{\text{Var}(D_{it})} = \frac{\text{Cov}(\alpha + (\theta - 1)Y_{it-1} + \tau D_{it} + \varepsilon_{it}, D_{it})}{\text{Var}(D_{it})} \]

    \[\frac{\text{Cov}(\Delta Y_{it}, D_{it})}{\text{Var}(D_{it})} = \tau + (\theta - 1)\frac{\text{Cov}(Y_{it-1}, D_{it})}{\text{Var}(D_{it})} + \frac{\text{Cov}(\varepsilon_{it}, D_{it})}{\text{Var}(D_{it})}\]

  • \(\text{Cov}(\varepsilon_{it}, D_{it}) = 0\) by assumption (LDV model has no unobserved confounding)

Bracketing: Case 2 – Deriving the bias

  • Therefore:

    \[\hat{\tau}^{\text{DiD}} \xrightarrow{p} \tau + (\theta - 1)\frac{\text{Cov}(Y_{it-1}, D_{it})}{\text{Var}(D_{it})}\]

  • The bias is \((\theta - 1)\frac{\text{Cov}(Y_{it-1}, D_{it})}{\text{Var}(D_{it})}\)

  • The bias has the opposite sign from the selection:

    • With positive selection (\(\text{Cov}(Y_{it-1}, D_{it}) > 0\)) and \(\theta < 1\): bias is negative → DiD underestimates
    • With negative selection: bias is positive → DiD overestimates
  • Under this model, LDV is unbiased: it correctly controls for \(Y_{it-1}\)

Bracketing

  • Combining both cases with positive selection (\(\text{Cov}(\alpha_i, D_i) > 0\)):

    • If strict exogeneity is correct: LDV overestimates, DiD is exact
    • If sequential ignorability is correct: DiD underestimates, LDV is exact
  • The true effect is bracketed between the two estimates if one of these two assumptions is true.

  • This is a useful result, but important to be aware of its limitations

    • It guarantees nothing if neither assumption is true!
  • In many cases, we’re skeptical of parallel trends but we don’t think sequential ignorability is right either

    • Rather, we think the unobserved latent factor structure might be more complicated than \(U_{it} = \alpha_i + \delta_t\)
    • What do we do then?

Synthetic Controls

The Synthetic Control Method

  • Comparative case studies in the social sciences try to assess the impact of some event by comparing the case of interest to some unaffected units
    • We ideally want to pick units that are similar enough that they share the same latent characteristics shaping the evolution of the outcome.
  • The Synthetic Control Method (SCM) builds on this intuition (Abadie and Gardeazabal, 2003; Abadie, Diamond, and Hainmueller, 2010).
    • What if instead of using a single comparison, we use a weighted average of those units.
    • Find weights on the control units that reproduce the pre-treatment outcome trajectory of the treated unit
    • Compare the observed outcome trajectory to the synthetic control average.
  • Justified under two different outcome models
    • Under sequential ignorability, we get an unbiased estimator (because past outcomes are direct confounders)
    • Under a latent factor model, the bias goes away as the number of time periods gets large.

SCM: Setup and Notation

  • Observe outcomes \(Y_{it}\) for units \(i = 1, \dotsc, J+1\) over periods \(t = 1, \dotsc, T\)

    • Let \(T_0\) denote the number of pre-treatment periods, \(T_1\) the number of post-treatment
    • Unit \(i = 1\) is the treated unit, exposed to treatment starting at \(T_0 + 1\)
    • Units \(i = 2, \dotsc, J+1\) are donor units (never treated)
  • We want to estimate:

    \[\tau_{1t} = Y_{1t}(1) - Y_{1t}(0) \quad \text{for } t > T_0\]

  • We define a set of weights on the donor units \(\mathbf{W} = (w_2, \dotsc, w_{J+1})\) with \(w_j \geq 0\) and \(\sum_j w_j = 1\)

  • Estimate the effect by comparing the observed outcome to the imputed counterfactual

    \[\hat{\tau_{1t}} = Y_{1t} - \sum_{j=2}^{J+1} w_j Y_{jt}\]

SCM: The data-generating process

  • The SCM is typically motivated by an underlying latent factor model for the outcome

    \[Y_{it}(0) = \delta_t + \mathbf{\theta}_t \mathbf{Z}_i + \boldsymbol{\lambda}_t' \boldsymbol{\mu}_i + \varepsilon_{it}\]

  • \(\mathbf{Z}_i\) are the observed covariates (with time-varying coefficients \(\mathbf{\theta}_t\)).

  • \(\boldsymbol{\lambda}_t\) are \(F\)-dimensional time-varying factors and \(\boldsymbol{\mu}_i\) are unit-specific loadings

  • This is a more general model of latent unobserved confounding than DiD

    • In DiD, there is a single common time trend across all factors \(\boldsymbol{\lambda}_t = \lambda\)
    • The latent factor model allows for some number of varying time-trends across units (elements of \(\boldsymbol{\lambda}_t\)) that apply to different units (weights dictated by \(\boldsymbol{\mu}_i\)).

SCM: Choosing the weights

  • The goal of the synthetic control method is to find weights such that the weighted average of donor units reproduces the covariates and the pre-treatment trajectory of the treated unit

    \[\sum_{j=2}^{J+1} w^*_j Y_{jt} = Y_{1t} \quad \text{for } t \in \{1, 2, \dotsc, T_0\}\]

    \[\sum_{j=2}^{J+1} w^*_j \mathbf{Z}_j = \mathbf{Z}_{1}\]

  • For notational simplicity, we’ll denote the \(K \times J+1\) matrix of combined covariates and pre-treatment outcomes as \(\mathbf{X}\)

    • \(\mathbf{X}_1\) is the \(K x 1\) vector for the treated unit
    • \(\mathbf{X}_0\) is the \(K x J\) matrix for the control units.
  • The ideal synthetic control has weights weights \(\mathbf{W}^*\) such that \(\mathbf{X}_1 = \mathbf{X}_0 \mathbf{W}^*\)

SCM: Bias

  • Supposing that we find \(\mathbf{W}^*\), what is the error in our imputed counterfactual?

    \[Y_{1t}(0) - \sum_{j=2}^{J+1} w^*_j Y_{jt}\]

  • Abadie, Diamond and Hainmueller (2010) show that under the latent factor model, the expected error goes to zero

    • …when the number of pre-treatment periods is large
    • …relative to the size of the pre-treatment shocks!
  • Intuition: Remember our earlier errors-in-variables example when we controlled for a lagged dependent variable!

    • The lagged outcomes are proxies for the latent factor - the less noisy the proxy, the better it reproduces the true unobserved latent factor.
  • Under an autoregressive outcome model, the bias is zero!

    • “Sequential ignorability” - past \(Y\) directly affects future \(Y\).

SCM: Estimating the weights

  • In practice, we can only get \(\mathbf{X}_1 \approx \mathbf{X}_0 \mathbf{W}^*\)

    • How do we find \(\mathbf{W}^*\)?
  • We solve a constrained minimization problem that minimizes a distance metric

  • For example, the euclidean norm:

    \[||\mathbf{X}_1 - \mathbf{X}_0 \mathbf{W}^*|| = \sqrt{\sum_{h=1}^K v_h(X_{h1} - w_2 X_{h2} - w_3 X_{h3} - \dotsc - w_{J+1}X_{hJ+1})^2}\]

  • We impose constraints:

    • \(\sum_{j=2}^{J+1} w_j = 1\) Weights sum to 1
    • \(w_j \ge 0 \quad \forall j \in \{2, 3, \dotsc, J+1\}\) Weights are non-negative
  • Note that we also get to choose how much importance to put on each dimension (covariate) \(\mathbf{V} = \{v_1, v_2, \dotsc, v_K\}\)

    • Abadie, Diamond and Hainmueller (2010) use an out-of-sample validation approach against the pre-treatment outcomes to choose an optimal \(\mathbf{V}\)

SCM: Notes on the weights

  • The non-negativity and sum to 1 have some useful properties:

    • Sparse - most weights are 0
    • Interpretable - we can characterize the precise contributions of each element of the donor pool to the synthetic control
    • Avoid extrapolation
  • But there are problems!

    • If the treated unit is outside the convex hull of the controls, it’s impossible to reproduce the pre-treatment trajectory!
  • In general, the bias \(\to 0\) results are derived under perfect fit - if the synthetic control is a poor match, bias can be huge!

  • Possible solutions

    • Add an intercept (Doudchenko and Imbens, 2017; Ferman and Pinto, 2021)
    • Allow for negative weights (Doudchenko and Imbens, 2017)
    • Add an outcome model to correct for imperfect fit (Ben-Michael, Feller and Rothstein, 2021)

Application: Basque Country Terrorism

  • Abadie and Gardeazabal (2003): Effect of ETA terrorism on the Basque Country’s GDP per capita
    • Treatment: onset of political terrorism in the late 1960s (they use ~1970 as treatment)
    • Donor pool: 16 other Spanish regions
    • Outcome: real GDP per capita (thousands of 1986 USD)
    • Time: 1955-1997
  • Replication example in the Synth package

Basque Country: Data Preparation

Code
library(Synth)
data(basque)

# Prepare data for Synth
dataprep.out <- dataprep(
  foo = basque,
  predictors = c("school.illit", "school.prim", "school.med",
                 "school.high", "school.post.high", "invest"),
  predictors.op = "mean",
  time.predictors.prior = 1964:1969,
  special.predictors = list(
    list("gdpcap", 1960:1969, "mean"),
    list("sec.agriculture", seq(1961, 1969, 2), "mean"),
    list("sec.energy", seq(1961, 1969, 2), "mean"),
    list("sec.industry", seq(1961, 1969, 2), "mean"),
    list("sec.construction", seq(1961, 1969, 2), "mean"),
    list("sec.services.venta", seq(1961, 1969, 2), "mean"),
    list("sec.services.nonventa", seq(1961, 1969, 2), "mean"),
    list("popdens", 1969, "mean")
  ),
  dependent = "gdpcap",
  unit.variable = "regionno",
  unit.names.variable = "regionname",
  time.variable = "year",
  treatment.identifier = 17,  # Basque Country
  controls.identifier = c(2:16, 18),  # All other regions (excluding Spain aggregate = 1)
  time.optimize.ssr = 1960:1969,
  time.plot = 1955:1997
)

Basque Country: Fitting the Synthetic Control

# Run the synth optimization
synth.out <- synth(data.prep.obj = dataprep.out, method = "BFGS")
# Display the weights
synth.tables <- synth.tab(dataprep.res = dataprep.out, synth.res = synth.out)
cat("Donor weights (non-zero):\n")
Donor weights (non-zero):
w <- synth.tables$tab.w
print(w[w$w.weights > 0.01, ])
   w.weights            unit.names unit.numbers
10     0.851              Cataluna           10
14     0.149 Madrid (Comunidad De)           14

Basque Country: Synthetic vs. Actual

Code
# Extract path data
years <- dataprep.out$tag$time.plot
Y_treat <- dataprep.out$Y1plot
Y_synth <- dataprep.out$Y0plot %*% synth.out$solution.w

plot_df <- data.frame(
  year = years,
  Treated = as.numeric(Y_treat),
  Synthetic = as.numeric(Y_synth)
) %>%
  pivot_longer(cols = c(Treated, Synthetic), names_to = "Unit", values_to = "gdpcap")

ggplot(plot_df, aes(x = year, y = gdpcap, color = Unit, linetype = Unit)) +
  geom_line(linewidth = 1.2) +
  geom_vline(xintercept = 1970, linetype = "dashed", color = "gray40") +
  annotate("text", x = 1971, y = 3, label = "Terrorism\nonset", hjust = 0, size = 3.5) +
  labs(x = "Year", y = "Real per capita GDP\n(1986 USD, thousands)",
       title = "Basque Country vs. Synthetic Basque Country",
       subtitle = "Abadie & Gardeazabal (2003) — Effect of ETA terrorism on GDP") +
  scale_color_manual(values = c("Treated" = "black", "Synthetic" = "firebrick")) +
  scale_linetype_manual(values = c("Treated" = "solid", "Synthetic" = "dashed")) +
  theme_bw() + theme(legend.position = "bottom")

Basque Country: Treatment Effect (Gap Plot)

Code
gap <- as.numeric(Y_treat) - as.numeric(Y_synth)
gap_df <- data.frame(year = years, gap = gap)

ggplot(gap_df, aes(x = year, y = gap)) +
  geom_line(linewidth = 1.2) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = 1970, linetype = "dashed", color = "gray40") +
  annotate("text", x = 1971, y = 0.4, label = "Terrorism onset", hjust = 0, size = 3.5) +
  labs(x = "Year", y = "Gap in real per capita GDP\n(Treated − Synthetic)",
       title = "Estimated effect of terrorism on Basque Country GDP",
       subtitle = "Negative gap = terrorism reduced GDP relative to synthetic control") +
  theme_bw()

SCM: Inference

  • With a single treated unit, standard inference is not straightforward. Abadie et al. (2010) propose a permutation testing approach:
    • Apply SCM to each donor unit as if it were treated
    • If the treated unit’s gap is unusually large relative to these “placebo” gaps, the effect is unlikely due to chance
    • Test statistic: post/pre RMSPE ratio
    • Common practice to drop controls with egregiously worse pre-treatment fit compared to treated unit.
  • Recent work in statistics moves beyond this:
    • “Conformal inference” approach in Chernozhukov, Wüthrich, and Zhu (2019)

Basque Country: Permutation tests

Code
# Run placebo tests for each control region
control_ids <- c(2:16, 18)
placebo_gaps <- list()

for (ctrl_id in control_ids) {
  donor_ids <- setdiff(c(2:16, 18), ctrl_id)

  tryCatch({
    dp <- dataprep(
      foo = basque,
      predictors = c("school.illit", "school.prim", "school.med",
                     "school.high", "school.post.high", "invest"),
      predictors.op = "mean",
      time.predictors.prior = 1964:1969,
      special.predictors = list(
        list("gdpcap", 1960:1969, "mean"),
        list("sec.agriculture", seq(1961, 1969, 2), "mean"),
        list("sec.energy", seq(1961, 1969, 2), "mean"),
        list("sec.industry", seq(1961, 1969, 2), "mean"),
        list("sec.construction", seq(1961, 1969, 2), "mean"),
        list("sec.services.venta", seq(1961, 1969, 2), "mean"),
        list("sec.services.nonventa", seq(1961, 1969, 2), "mean"),
        list("popdens", 1969, "mean")
      ),
      dependent = "gdpcap",
      unit.variable = "regionno",
      unit.names.variable = "regionname",
      time.variable = "year",
      treatment.identifier = ctrl_id,
      controls.identifier = donor_ids,
      time.optimize.ssr = 1960:1969,
      time.plot = 1955:1997
    )

    so <- synth(data.prep.obj = dp, method = "BFGS")

    y1 <- as.numeric(dp$Y1plot)
    y0 <- as.numeric(dp$Y0plot %*% so$solution.w)
    placebo_gaps[[as.character(ctrl_id)]] <- data.frame(
      year = 1955:1997, gap = y1 - y0, unit = ctrl_id
    )
  }, error = function(e) {
    # Skip units where optimization fails
  })
}

X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.001123037 

solution.v:
 0.241915 0.004561014 0.0005982158 0.0004798842 0.001241421 0.01513177 0.3234859 0.01630921 0.02052098 0.1785447 0.00422904 0.006081587 0.09511713 0.09178419 

solution.w:
 3.1e-09 1.9e-09 3e-09 5.1e-09 1.4e-09 3.6e-09 0.4744106 6.396e-07 0.1323778 0 0.3932067 4.3205e-06 3.9e-09 2.1e-09 2.1e-09 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.0005042449 

solution.v:
 0.007951046 0.005383677 0.00339887 0.01137706 0.00460769 0.001229192 0.00942928 0.3711191 0.07144081 0.2706073 0.0009782796 0.2245375 0.01746217 0.0004780441 

solution.w:
 0.01895674 0.1598039 0.04920978 0.1000861 0.01720838 0.09270212 0.002315982 0.0114126 0.007351221 0.06490908 0.008642634 0.03648222 0.01175011 0.4191682 8.657e-07 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 9.150192e-05 

solution.v:
 0.007306654 0.03508239 0.02403886 0.04107291 0.02845283 0.01600231 0.03689203 0.2648573 0.08663512 0.003754926 0.144038 3.1e-09 0.2267502 0.08511643 

solution.w:
 1.3e-09 0.1811191 8.428e-07 0.5491962 6.242e-07 5.4e-09 2.11e-08 0.2696795 4.32e-08 8e-10 1.24e-08 0 3.5521e-06 2.87e-08 2.73e-08 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.121049 

solution.v:
 0.002617738 0.0002791556 2.57468e-05 0.000137047 0.00097715 0.008678213 0.5666581 0.05034707 0.2251532 9.92001e-05 0.01279458 0.02488051 0.07979401 0.0275582 

solution.w:
 4.14567e-05 0.0002089316 1.5421e-06 0.004351484 0.0003378521 4.2789e-06 3.2254e-06 0.5733009 0.0004280817 0.0004464912 5.8471e-06 0.2508225 3.2848e-06 0.1700441 7.3e-09 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.001322934 

solution.v:
 8.488e-07 0.01904002 0.02959735 0.03582989 0.0346101 0.1146028 0.06955046 0.01739945 0.1487795 0.1122417 0.0009692388 0.000475205 0.4166301 0.0002732522 

solution.w:
 2.127e-07 2.18e-08 4.37e-08 0.1713031 1.071e-07 1.498e-07 4.6e-09 1.01e-08 2.41e-08 0.2497729 4.43e-08 2.86e-07 0.5789231 4.2e-09 2.8e-09 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 6.71562e-05 

solution.v:
 0.09936656 0.03044294 0.02972727 0.05003989 0.01908395 0.07927748 0.1458313 0.2134485 0.007984265 0.03630765 0.08577594 0.194891 0.005290142 0.002533157 

solution.w:
 7.2e-09 1.373e-07 0.2463714 0.03016985 8.263e-07 3.86e-08 6.49e-08 2.64e-08 0.5327529 5.04e-08 4.73e-08 3.2e-09 1.3424e-06 0.1076988 0.08300449 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.0003285848 

solution.v:
 0.02044489 0.04385717 0.06860424 7.70025e-05 9.93798e-05 0.0007090437 0.02125673 0.0003438331 0.2376439 0.001942309 0.08550463 0.03828685 0.431263 0.049967 

solution.w:
 0.2405035 0.02499768 0.06589436 7.7222e-06 4.1037e-06 1.86488e-05 5.34252e-05 5.9741e-05 1.20499e-05 0.007073272 0.2371783 0.003946094 0.4199905 0.0001347198 0.0001258654 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.004925438 

solution.v:
 0.0640751 0.08790118 0.001980309 0.0003909784 0.001042542 0.009779076 0.06046502 0.3617145 0.1614419 0.004088867 0.002717818 0.2082107 0.002001196 0.03419076 

solution.w:
 0.001196807 2.3356e-06 6.8792e-05 2.04e-07 3.601e-07 1.7593e-06 0.3517978 1.1187e-06 1.5301e-06 0.6414476 9.8923e-06 1.533e-07 3.784e-06 6.1336e-06 0.005461752 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.01361662 

solution.v:
 0.001433952 0.03871014 0.1084723 0.06228383 0.009103274 0.0398204 0.04808129 0.4397184 0.04134378 0.1485366 0.005719523 0.04843611 0.00131105 0.007029396 

solution.w:
 8.635e-07 2.1e-09 1.1e-09 1e-10 2e-10 0.3980367 1.09e-08 2.6e-09 1.07918e-05 7e-10 6.2e-09 0.6019516 7e-10 3.2e-09 2.9e-09 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.001117842 

solution.v:
 0.0248724 5.94756e-05 0.1038749 0.001693213 0.0546291 0.1669822 0.2045931 0.0009081238 0.06845943 0.01931348 0.01832864 0.01195384 3.0517e-06 0.324329 

solution.w:
 0.194387 7.6484e-06 1.5136e-06 0.0004885163 1.77094e-05 0.4662803 5.0585e-06 0.1050862 0.1032154 4.6847e-06 3.644e-07 0.1251891 2.04468e-05 1.11538e-05 0.005284934 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.1146391 

solution.v:
 0.03571533 0.06494282 0.09362483 0.07491658 0.07469396 1.37939e-05 0.04935978 0.349576 0.1030824 0.08247467 0.001718186 0.0001076685 0.006958453 0.06281548 

solution.w:
 1e-09 6.3e-09 9e-10 6.83e-08 1.731e-07 4e-09 1.5e-09 0.9999997 6e-10 2.5e-09 4.6e-09 3.1e-09 5.5e-09 1.53e-08 4.3e-08 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.0004963764 

solution.v:
 0.0009361261 0.1333922 0.02525684 0.1227094 0.05640474 0.0004846876 0.419331 9.00574e-05 0.01196674 0.05946839 0.1680857 1.42442e-05 0.001858875 1.0247e-06 

solution.w:
 0.286836 0.003897055 0.05920034 0.001340042 0.002269029 0.007676064 0.09404751 0.4512099 0.005251879 0.007917988 0.07104263 0.000415688 0.002069254 0.003351204 0.003475416 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.7209181 

solution.v:
 0.05343497 0.0121308 0.08151981 0.05309881 0.06651345 0.07808911 0.261331 0.09139274 0.03624163 0.03912748 0.01813165 0.08133689 0.02373037 0.1039213 

solution.w:
 5.929e-07 1.123e-07 1.222e-07 0.04087338 1.0595e-06 2.69e-08 1.823e-07 7.34e-08 0.9591236 1.817e-07 8.57e-08 1.869e-07 2.062e-07 1.018e-07 1.011e-07 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.001775702 

solution.v:
 0.0001311986 0.002637929 0.1641209 0.08811159 0.004142852 0.07747617 0.3947256 0.00124808 0.04052377 0.0001841404 0.06422426 0.05199957 0.07180447 0.03866944 

solution.w:
 1.9427e-06 1.1112e-06 0.02391781 1.25e-07 0.3687398 3.1e-09 0.01449668 0.3780162 4.957e-07 5.086e-07 0 1.8853e-06 2.81679e-05 2.478e-07 0.214795 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.0002705289 

solution.v:
 0.01972369 0.0005686467 0.001710033 8.41935e-05 5.4099e-05 0.1506841 0.2549567 0.2465513 0.1368061 0.03261514 0.002974493 0.1309928 0.000170366 0.02210827 

solution.w:
 0.0003172171 0.05824001 6.6413e-05 0.01134615 9.8315e-05 0.09664233 0.0001627458 9.64939e-05 0.1327023 0.0001892526 0.004009706 0.0002083849 0.001109842 0.0001280464 0.6946828 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.0004131752 

solution.v:
 0.1276 0.2059267 0.02060637 0.1085751 0.01427578 0.002102533 0.0835427 0.009201901 0.00282947 0.3837883 0.0110102 6.9e-07 0.02223868 0.008301534 

solution.w:
 1.38e-08 2.58e-08 1.9e-08 0 6.7e-09 1.46788e-05 2.98e-08 1.0763e-06 8.4e-09 3.43e-08 2.49e-08 4.17e-08 8.5e-09 0.1709315 0.8290525 
Code
placebo_df <- do.call(rbind, placebo_gaps)

# Add treated unit
treated_gap_df <- data.frame(year = years, gap = gap, unit = 17)

Basque Country: Permutation tests

Code
ggplot() +
  geom_line(data = placebo_df, aes(x = year, y = gap, group = unit),
            color = "gray70", alpha = 0.6) +
  geom_line(data = treated_gap_df, aes(x = year, y = gap),
            color = "black", linewidth = 1.2) +
  geom_vline(xintercept = 1970, linetype = "dashed", color = "gray40") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(x = "Year", y = "Gap in real per capita GDP",
       title = "Placebo tests: Basque Country (black) vs. donor regions (gray)",
       subtitle = "Each gray line = SCM applied to a donor region as if it were treated") +
  theme_bw()

Permutation Tests: Dropping Poor Fits

Code
# Compute pre-treatment RMSPE for each unit
treated_pre_rmspe <- sqrt(mean((treated_gap_df %>% filter(year < 1970))$gap^2))

placebo_pre_rmspe <- placebo_df %>%
  filter(year < 1970) %>%
  group_by(unit) %>%
  summarize(pre_rmspe = sqrt(mean(gap^2))) %>%
  ungroup()

# Keep only units with pre-RMSPE <= 5x the treated unit
keep_units <- placebo_pre_rmspe %>%
  filter(pre_rmspe <= 5 * treated_pre_rmspe) %>%
  pull(unit)

placebo_df_trimmed <- placebo_df %>% filter(unit %in% keep_units)

n_dropped <- n_distinct(placebo_df$unit) - length(keep_units)

ggplot() +
  geom_line(data = placebo_df_trimmed, aes(x = year, y = gap, group = unit),
            color = "gray70", alpha = 0.6) +
  geom_line(data = treated_gap_df, aes(x = year, y = gap),
            color = "black", linewidth = 1.2) +
  geom_vline(xintercept = 1970, linetype = "dashed", color = "gray40") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(x = "Year", y = "Gap in real per capita GDP",
       title = paste0("Placebo tests: Dropping donors with pre-RMSPE > 5x treated (",
                      n_dropped, " dropped)"),
       subtitle = "Basque Country (black) vs. donor regions with adequate pre-treatment fit (gray)") +
  theme_bw()

Permutation Tests: RMSPE Ratios

Code
# Compute post/pre RMSPE ratios
compute_rmspe_ratio <- function(gap_df, treatment_year = 1970) {
  pre <- gap_df %>% filter(year < treatment_year)
  post <- gap_df %>% filter(year >= treatment_year)
  rmspe_pre <- sqrt(mean(pre$gap^2))
  rmspe_post <- sqrt(mean(post$gap^2))
  rmspe_post / rmspe_pre
}

# Treated unit ratio
treated_ratio <- compute_rmspe_ratio(treated_gap_df)

# Placebo ratios
placebo_ratios <- placebo_df %>%
  group_by(unit) %>%
  group_modify(~ tibble(ratio = compute_rmspe_ratio(.x))) %>%
  ungroup()

all_ratios <- bind_rows(
  placebo_ratios,
  tibble(unit = 17, ratio = treated_ratio)
) %>%
  mutate(is_treated = unit == 17)

# Compute p-value
pval <- mean(all_ratios$ratio >= treated_ratio)

ggplot(all_ratios, aes(x = ratio, fill = is_treated)) +
  geom_histogram(bins = 15, color = "white", alpha = 0.8) +
  geom_vline(xintercept = treated_ratio, linetype = "dashed", color = "red", linewidth = 1) +
  scale_fill_manual(values = c("FALSE" = "gray70", "TRUE" = "black"),
                    labels = c("Donor regions", "Basque Country"),
                    name = "") +
  labs(x = "Post/Pre RMSPE Ratio",
       y = "Count",
       title = "Distribution of RMSPE ratios: Permutation inference",
       subtitle = paste0("Basque Country ratio = ", round(treated_ratio, 2),
                         " | p-value = ", round(pval, 3))) +
  theme_bw() + theme(legend.position = "bottom")

Staggered Adoption SCM

  • Conventional SCM is designed for one treated unit — what if multiple units are treated?

    • What should we weight our control pool to - each unit separately or the average of the units?
  • We might get perfect balance for the average of the units but terrible fit for individual treated units!

  • Ben-Michael, Feller and Rothstein (2022) show that the bias is a combination of these two imbalances.

    • Propose a partial pooling approach that minimizes a combination of the two.
  • Implemented in the augsynth R package (along with their outcome-augmented synthetic control)

Application: Shale Shock and Republican Voting

  • Let’s apply the augmented/multi-unit synthetic control to the Gazmararian (2025) shale shock data
    • Earlier, we saw that DiD and LDV gave different estimates, and placebo tests raised concerns
    • Is SCM the solution?
  • Review of the setup:
    • 116 treated coal counties, 97 control counties
    • Treatment onset: 2008 (shale gas boom undermines coal industry)
    • Outcome: Republican vote share
    • Pre-treatment periods: 1972–2004 (8 election cycles)
    • Post-treatment periods: 2008–2020 (4 election cycles)

multisynth: Average Effect Across Treated Units

Code
# Use multisynth for the multi-unit case
# multisynth infers treatment timing from the binary indicator
msyn <- multisynth(
  RepVotesMajorPercent ~ treated,
  unit = county_state, time = year,
  data = dff
)

msyn_summ <- summary(msyn)

# Extract average ATT for plotting
avg_att <- msyn_summ$att %>%
  filter(Level == "Average")

ggplot(avg_att, aes(x = Time, y = Estimate)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = lower_bound, ymax = upper_bound), width = 0.3) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = -0.5, linetype = "dashed", color = "red") +
  labs(x = "Time relative to treatment (election cycles)",
       y = "Estimated ATT (pp)",
       title = "Multisynth: Effect of shale shock on Republican vote share",
       subtitle = "Average across treated coal counties (partially pooled SCM)") +
  theme_bw()

Placebo testing

  • We’ve fit the synthetic control model to minimize pre-treatment outcome discrepancies.
    • A perfect synthetic control will reproduce the trajectory exactly
    • But this doesn’t mean that our estimator is unbiased - especially if we have a short panel.
    • How do we assess whether the bias is small?
      • Perfect pre-treatment fit is no longer a valid placebo test!
  • We can use a placebo test approach where we backdate the treatment to an earlier period and check whether the synthetic control produces spurious “effects” in held-out pre-treatment periods
    • Same intuition as our discussion last week - we want our placebo tests to involve held-out comparisons.

Backdating falsification tests

Code
# In-time falsification: backdate treatment to 2000 and 1996
# Fit SCM using only pre-placebo data, check for "effects" in held-out periods

placebo_years <- c(1996, 2000, 2004)

falsification_results <- list()

for (placebo_yr in placebo_years) {
  # Create dataset: only pre-2008 data, with treatment backdated
  falsification_df <- dff %>%
    filter(year < 2008) %>%
    mutate(placebo_treat = as.integer(treat == 1 & year >= placebo_yr))

  tryCatch({
    plac_fit <- multisynth(
      RepVotesMajorPercent ~ placebo_treat,
      unit = county_state, time = year,
      data = falsification_df,
    )

    plac_summ <- summary(plac_fit)
    plac_att <- plac_summ$att %>%
      filter(Level == "Average") %>%
      mutate(placebo_year = placebo_yr)

    falsification_results[[as.character(placebo_yr)]] <- plac_att
  }, error = function(e) {
    cat("Error for placebo year", placebo_yr, ":", conditionMessage(e), "\n")
  })
}

all_results <- bind_rows(
  do.call(rbind, falsification_results)
) %>%
  mutate(placebo_label = paste0("Placebo treatment = ", placebo_year))

ggplot(all_results, aes(x = Time, y = Estimate,
                         color = factor(placebo_year))) +
  geom_point(size = 2.5) +
  geom_errorbar(aes(ymin = lower_bound, ymax = upper_bound), width = 0.3) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = -0.5, linetype = "dotted") +
  facet_wrap(~ placebo_label, scales = "free_x") +
  labs(x = "Time relative to (placebo) treatment",
       y = "Estimated ATT (pp)",
       title = "Placebo tests: backdated treatment vs. actual",
       subtitle = "Placebo treatments should show no effect in held-out pre-2008 periods") +
  scale_color_manual(values = c("1996" = "steelblue", "2000" = "darkorange",
                                 "2004" = "black")) +
  theme_bw() + theme(legend.position = "none")

Conclusion

  • Causal inference in time-series data is hard!

    • Even the estimands can get tricky - we’ll come back to this again when we talk about mediation
  • The core problem centers around how we deal with unobserved confounding and the role that past outcomes play in identification/estimation

  • Three scenarios:

    • Unobserved confounding but simple functional form (DiD)
      • Adjust via differencing - works in short panels
    • Observed confounding driven by past \(Y\)
      • Adjust by controlling for past \(Y\) - also fine in short panels
    • Unobserved confounding more complex than DiD
      • Adjust for past \(Y\) to indirectly control for the latent factor - need many pre-treatment periods!

Next week

  • The last of the “holy trinity” - regression discontinuity designs

  • What happens when we have arbitrary unobserved confounding…

    • …but treatment is assigned via a threshold
  • In this case, even if treatment-outcome are confounded, that confounding might be negligible in the window around the threshold!

  • Two ways to think about this

    • “As-if-random” assignment at the cut-off
    • Continuity in potential outcomes at the cut-off
  • Estimation by extrapolation to the threshold.

    • Works best when we have lots of units near the cut-off.