Week 11: Panel Data Causal Inference

PS 813 - Causal Inference

Anton Strezhnev

strezhnev@wisc.edu

University of Wisconsin-Madison

April 6, 2026

\[ \require{cancel} \]

Last two weeks

Differences-in-differences designs for dealing with unobserved confounding.
Identification under Parallel trends

\[\mathbb{E}[Y_{it}(0) - Y_{it^{\prime}}(0) \mid D_i = 1] - \mathbb{E}[Y_{it}(0) - Y_{it^{\prime}}(0) \mid D_i = 0]\]
Estimation via first-differences regression (Callaway and Sant’Anna, 2021)
- Can adjust for time-varying and time-invariant threats to parallel trends.
Pre-treatment placebo test diagnostics
Be careful with software defaults!

This week

In DiD, our identifying assumptions rule out outcome-treatment feedback
- Treatment is exogenous conditional on observed and unobserved (but structurally constrained) factors
- But past outcomes do not directly affect present treatment.
Not uncommon to have dynamic treatment settings where this assumption is violated.
- Doctors change treatment regimen in response to patient outcomes.
- Political campaigns adjust their strategy in response to observed poll numbers.
- Economic growth affects democratization (which affects economic growth)
In all of these settings, we have \(Y_{t-1}\) affecting treatment \(D_{t}\)
- And typically also lagged treatment \(D_{t-1}\) affecting current treatment \(D_{t}\)

Review: DiD and Strict Exogeneity

For this section, we’re going to work with binary treatment \(D_{it}\) observed at each time period \(t\)
- For simplicity, we’ll consider two time periods generically as \(t\) and \(t-1\)
- Let \(D_i = \{D_{i1}, D_{i2}, D_{i3}, \dotsc, D_{iT}\}\) denote the entire treatment history for a unit
In DiD, our parallel trends assumption is a special case of a particular kind of exogeneity/ignorability assumption
- We assume that treatment history is as-good-as-randomly assigned given observed covariates and some latent factor
- Let \(\mathbf{X}_i = \{X_{i1}, X_{i2}, X_{i3}, \dotsc, X_{iT}\}\) denote the unit’s observed covariate history
- Let \(\mathbf{U}_i = \{U_{i1}, U_{i2}, U_{i3}, \dotsc, U_{iT}\}\) denote the unit’s latent factor history
Strict exogeneity

\[\{Y_{it}(0), Y_{it}(1) \} \perp\!\!\!\perp D_{i} \mid \mathbf{X}_{i}, \mathbf{U}_i\]

Strict exogeneity

Adapted from Xu (2023) ‘Causal Inference with Time-Series Cross-Sectional Data: A Reflection’

DiD and parallel trends

If we want to identify the effect of \(D_{t}\) on \(Y_{t}\) (the “instantaneous” effect), via a standard selection-on-observables design, we would need to adjust for both \(X\) and \(U\)
But here, we’ve supposed that \(U\) is unobserved - can’t control for it directly.
- We need more assumptions!
DiD works by making an additional functional form assumption about the latent factor.
- We assume that we can decompose the latent factor into two additive components: \(U_{it} = \alpha_i + \delta_t\)
This gives us the classic two-way fixed effect formulation:

\[Y_{it}(0) = \alpha_i + \delta_t + f(\mathbf{X}_i) + \varepsilon_{it}\]

DiD and parallel trends

It’s straightforward to see under this functional form assumption and the strict exogeneity assumption, we get parallel trends

\[\mathbb{E}[Y_{it}(0) - Y_{it-1}(0) \mid D_{it} = 0, \mathbf{X}] = \mathbb{E}[Y_{it}(0) - Y_{it-1}(0) \mid D_{it} = 1, \mathbf{X}] = \delta_{t} - \delta_{t-1}\]
Under strict exogeneity \(\text{Cov}(\varepsilon_{it}, D_{it}) = 0\)

Sequential ignorability

But in many panel data settings, ruling out the arrow from \(Y_{t-1}\) to \(D_{t}\) is implausible!
- We’ll need to make a different assumption
- Before that though, we need to define the potential outcomes in a new way!
We now have to consider potential outcomes in terms of the entire treatment history assigned up to time \(t\)

\[Y_{it}(d_1, d_2, \dotsc, d_t) = Y_{it}(d_{1:t})\]
We can partition this joint potential outcome into the contemporaneous treatment \(d_t\) and past values \(d_1, d_2, \dotsc, d_{t-1}\)

\[Y_{it}(d_{1:t}) = Y_{it}(d_{t}; d_{1:t-1})\]
We’ll define the Contemporaneous Effect of treatment at time \(t\) as:

\[Y_{it}(1; d_{1:t-1}) - Y_{it}(0; d_{1:t-1}) = Y_{it}(1) - Y_{it}(0)\]
Later on when we talk about mediation, we’ll introduce other kinds of estimands for cumulative effects over time.

Sequential ignorability

Identification of causal effects under outcome-treatment feedback can be done under a sequential ignorability assumption (Robins, 1986; Blackwell and Glynn, 2018).
At each time period, treatment is as-if random conditional on observed covariate and outcome history

\[\{Y_{it}(1), Y_{it}(0)\} \perp\!\!\!\perp D_{it} \mid X_{it}, Y_{it-1}, D_{it-1}\]
We can relax this by conditioning on a longer history of outcomes, treatments and covariates.
Here we’re focusing on the version of sequential ignorability needed for the contemporaneous treatment effect, but can make it more general

\[\{Y_{is}(d_{1:s}) : s = t, \dotsc, T\} \perp\!\!\!\perp D_{it} \mid X_{it}, Y_{it-1}, D_{it-1}\]
We’ll cover this more when we talk about mediation

Sequential ignorability

Adapted from Blackwell and Glynn (2018) ‘How to Make Causal Inferences with Time-Series Cross-Sectional Data under Selection on Observables’

Estimation of treatment effects

Under sequential ignorability, we can motivate estimation of the contemporaneous effect of treatment using an Autoregressive Distributed Lag (ADL) model framework.
- Adjust for some number of lagged outcomes (autoregressive)
- Adjust for some number of lagged treatments (distributed lag)
For example, a one-lag linear model

\[Y_{it} = X_{it}^\prime\beta + \alpha Y_{it-1} + \tau_0 D_{it} + \tau_1 D_{it-1} + \varepsilon_{it}\]
If we believe the parametric assumptions behind this model (e.g. constant effects), \(\tau_0\) identifies the average contemporaneous effect of treatment at time \(t\).
But does \(\tau_1\) identify the effect one period out?
- No! We have post-treatment bias (Blackwell and Glynn, 2018).
- This is the Table 2 fallacy all over again.

Sequential ignorability

Adapted from Blackwell and Glynn (2018) ‘How to Make Causal Inferences with Time-Series Cross-Sectional Data under Selection on Observables’

LDV vs. DiD

We’ll talk about how to estimate these long-term and cumulative effects under sequential ignorability when we get to mediation in a few weeks.
- The relevant estimands and estimators come from this literature.
- Future treatment and covariates are mediators of past treatment!
But let’s consider the simple setting with two time periods and two treatment groups
- Everyone is under control at time \(t-1\), some units are treated at time \(t\)
Two possible ways to estimate the treatment effect.
1. Difference-in-differences - Regress \(Y_{it} - Y_{it-1}\) on \(D_{it}\)
2. Lagged-DV adjustment - Regress \(Y_{it}\) on \(D_{it}\) and \(Y_{it-1}\)

Controlling for LDV under strict exogeneity

What happens if we control for the lagged dependent variable under strict exogeneity?
What we actually want to do is adjust for the expected lagged outcome.
- But the actual observed outcome contains random error
- \(Y_{it-1}\) can be high because of the latent factor or the error
We are controlling for an imperfect proxy of the true confounder
This measurement error induces a bias due to reversion to the mean
- If treated units tend to be higher in their outcomes than control (on average), then control units with similar observed outcomes are outliers
- We’d expect the subsequent outcome to be closer to the mean if there’s independent noise at \(t\) versus \(t-1\)

Simulation: Controlling for LDV

To see the bias from controlling for lagged \(Y\), consider the following DGP:
- \(N = 2000\) units, two periods (\(t{-}1\) and \(t\)), true treatment effect \(\tau = 0\)
- Time-invariant confounder: \(\alpha_i \sim N(0, 1)\)
- Treatment correlated with confounder: \(D_i \sim \text{Bernoulli}(\text{logit}^{-1}(\alpha_i))\)
- Outcome at \(t{-}1\): \(Y_{it-1} = \alpha_i + \varepsilon_{it-1}\), where \(\varepsilon_{it-1} \sim N(0, \sigma_\varepsilon^2)\)
- Outcome at \(t\): \(Y_{it} = \alpha_i + \tau D_i + \varepsilon_{it}\), where \(\varepsilon_{it} \sim N(0, \sigma_\varepsilon^2)\)
Parallel trends holds by construction: \(\alpha_i\) is time-invariant
Vary \(\sigma_\varepsilon \in \{0.5, 1, 2, 4, 8\}\) to control how noisy the proxy \(Y_{it-1}\) is for \(\alpha_i\)
Compare three estimators: simple difference-in-means post-treatment, DiD, and LDV

Simulation: LDV bias from measurement error

Code

set.seed(53703)
N <- 2000
n_sims <- 500
noise_levels <- c(0.5, 1, 2, 4, 8)

sim_results <- map_dfr(noise_levels, function(sigma_e) {
  map_dfr(1:n_sims, function(s) {
    # Time-invariant confounder
    alpha_i <- rnorm(N, 0, 1)
    # Treatment correlated with confounder
    D_i <- rbinom(N, 1, plogis(alpha_i))
    # True treatment effect is ZERO
    beta <- 0
    # Pre and post outcomes: parallel trends holds
    Y_pre <- alpha_i + rnorm(N, 0, sigma_e)
    Y_post <- alpha_i + beta * D_i + rnorm(N, 0, sigma_e)

    # Difference-in-means (biased by alpha_i)
    dim_est <- coef(lm(Y_post ~ D_i))["D_i"]
    # DiD (unbiased)
    did_est <- coef(lm(I(Y_post - Y_pre) ~ D_i))["D_i"]
    # LDV (biased -- errors in variables)
    ldv_est <- coef(lm(Y_post ~ D_i + Y_pre))["D_i"]

    tibble(sigma_e = sigma_e, sim = s,
           `Diff-in-means` = dim_est, DiD = did_est, LDV = ldv_est)
  })
})

sim_summary <- sim_results %>%
  pivot_longer(cols = c(`Diff-in-means`, DiD, LDV),
               names_to = "Estimator", values_to = "estimate") %>%
  group_by(sigma_e, Estimator) %>%
  summarize(mean_est = mean(estimate),
            ci_low = quantile(estimate, 0.025),
            ci_high = quantile(estimate, 0.975), .groups = "drop")

sim_summary$Estimator <- factor(sim_summary$Estimator,
                                levels = c("Diff-in-means", "LDV", "DiD"))

ggplot(sim_summary, aes(x = factor(sigma_e), y = mean_est,
                        color = Estimator, group = Estimator)) +
  geom_point(size = 3, position = position_dodge(width = 0.4)) +
  geom_errorbar(aes(ymin = ci_low, ymax = ci_high),
                width = 0.3, position = position_dodge(width = 0.4)) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(x = expression(paste("Noise in pre-treatment outcome (", sigma[epsilon], ")")),
       y = "Estimate",
       title = "LDV bias under the DiD model (true effect = 0)",
       subtitle = "More noise in Y(pre) = worse proxy for confounder = more residual bias") +
  scale_color_manual(values = c("Diff-in-means" = "gray60",
                                "LDV" = "dodgerblue",
                                "DiD" = "indianred")) +
  theme_bw() + theme(legend.position = "bottom")

Simulation: What if outcomes are autocorrelated?

In the previous simulation, \(\varepsilon_{it-1}\) and \(\varepsilon_{it}\) are independent — the only thing linking outcomes over time is \(\alpha_i\)
In practice, outcomes are often autocorrelated: a unit that is high at \(t{-}1\) (beyond \(\alpha_i\)) tends to stay high at \(t\)
Modified DGP: let \(\varepsilon_{it} = \rho \cdot \varepsilon_{it-1} + \nu_i\) where \(\nu_i \sim N(0, \sigma_\varepsilon^2)\)
- When \(\rho > 0\), \(Y_{it-1}\) is a better proxy for \(Y_{it}\)’s non-treatment component
- LDV should perform better because controlling for \(Y_{it-1}\) captures more of the variation
We fix \(\sigma_\varepsilon = 2\) and vary \(\rho \in \{0, 0.25, 0.5, 0.75, 0.95\}\)

Simulation: Autocorrelation reduces LDV bias

Code

set.seed(53703)
N <- 2000
n_sims <- 500
sigma_e <- 2
rho_levels <- c(0, 0.25, 0.5, 0.75, 0.95)

sim_results_rho <- map_dfr(rho_levels, function(rho) {
  map_dfr(1:n_sims, function(s) {
    alpha_i <- rnorm(N, 0, 1)
    D_i <- rbinom(N, 1, plogis(alpha_i))
    beta <- 0

    eps_pre <- rnorm(N, 0, sigma_e)
    eps_post <- rho * eps_pre + rnorm(N, 0, sigma_e)

    Y_pre <- alpha_i + eps_pre
    Y_post <- alpha_i + beta * D_i + eps_post

    dim_est <- coef(lm(Y_post ~ D_i))["D_i"]
    did_est <- coef(lm(I(Y_post - Y_pre) ~ D_i))["D_i"]
    ldv_est <- coef(lm(Y_post ~ D_i + Y_pre))["D_i"]

    tibble(rho = rho, sim = s,
           `Diff-in-means` = dim_est, DiD = did_est, LDV = ldv_est)
  })
})

sim_summary_rho <- sim_results_rho %>%
  pivot_longer(cols = c(`Diff-in-means`, DiD, LDV),
               names_to = "Estimator", values_to = "estimate") %>%
  group_by(rho, Estimator) %>%
  summarize(mean_est = mean(estimate),
            ci_low = quantile(estimate, 0.025),
            ci_high = quantile(estimate, 0.975), .groups = "drop")

sim_summary_rho$Estimator <- factor(sim_summary_rho$Estimator,
                                    levels = c("Diff-in-means", "LDV", "DiD"))

ggplot(sim_summary_rho, aes(x = factor(rho), y = mean_est,
                            color = Estimator, group = Estimator)) +
  geom_point(size = 3, position = position_dodge(width = 0.4)) +
  geom_errorbar(aes(ymin = ci_low, ymax = ci_high),
                width = 0.3, position = position_dodge(width = 0.4)) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(x = expression(paste("Autocorrelation (", rho, ")")),
       y = "Estimate",
       title = expression(paste("LDV bias with autocorrelated errors (", sigma[epsilon], " = 2, true effect = 0)")),
       subtitle = "More autocorrelation = lagged Y is better proxy = less LDV bias") +
  scale_color_manual(values = c("Diff-in-means" = "gray60",
                                "LDV" = "dodgerblue",
                                "DiD" = "indianred")) +
  theme_bw() + theme(legend.position = "bottom")

Illustration: The Effects of the Shale Shock

Let’s return to our example of the effect of the “shale shock” on coal county voting for the Republican party (Gazmararian, 2025).
- Last week we showed concerning evidence of a parallel trends violation - coal-heavy regions were already trending Republican prior to 2008.
- Extrapolating the trend (triple-differences in time) eliminates the observed effect.
What if, instead of DiD, we controlled for lagged Republican party vote share in 2004 and 2000.

Lagged DV approach: Setup

Code

dff <- readRDS("assets/data/matched_df.rds")
dff <- dff %>% filter(!is.na(oilgas_prop))

# Create wide-format lagged outcomes
lag_vars <- dff %>%
  select(county_state, year, RepVotesMajorPercent) %>%
  pivot_wider(names_from = year, values_from = RepVotesMajorPercent,
              names_prefix = "Y_")

dff <- dff %>% left_join(lag_vars, by = "county_state")

Lagged DV: Estimates by year

Code

# LDV regressions for each post-treatment year
ldv_years <- c(2008, 2012, 2016, 2020)
ldv_results <- map_dfr(ldv_years, function(yr) {
  df_yr <- dff %>% filter(year == yr)
  fit <- estimatr::lm_robust(
    RepVotesMajorPercent ~ treat + Y_2004 + Y_2000 + oilgas_prop,
    data = df_yr, clusters = county_state, se_type = "stata")
  tidy_fit <- broom::tidy(fit) %>% filter(term == "treat")
  tidy_fit$year <- yr
  tidy_fit
})

ggplot(ldv_results, aes(x = year, y = estimate)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.5) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = 2006, linetype = "dashed", color = "red") +
  labs(x = "Year", y = "Estimate (pp)",
       title = "LDV: Effect of coal county on Republican vote share",
       subtitle = "Controlling for Y[2004], Y[2000], and oil/gas employment") +
  scale_x_continuous(breaks = ldv_years) +
  theme_bw()

DiD: Estimates by year

Code

# DiD regressions for each post-treatment year
did_results <- map_dfr(ldv_years, function(yr) {
  df_yr <- dff %>%
    filter(year == yr) %>%
    mutate(dY = RepVotesMajorPercent - Y_2004)
  fit <- estimatr::lm_robust(
    dY ~ treat + oilgas_prop,
    data = df_yr, clusters = county_state, se_type = "stata")
  tidy_fit <- broom::tidy(fit) %>% filter(term == "treat")
  tidy_fit$year <- yr
  tidy_fit
})

ggplot(did_results, aes(x = year, y = estimate)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.5) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = 2006, linetype = "dashed", color = "red") +
  labs(x = "Year", y = "Estimate (pp)",
       title = "DiD: Effect of coal county on change in Republican vote share",
       subtitle = "Outcome: Y[t] - Y[2004], controlling for oil/gas employment") +
  scale_x_continuous(breaks = ldv_years) +
  theme_bw()

Comparing LDV and DiD

Placebo test: Pre-treatment hold-out

Let’s construct a placebo test for the lagged DV approach.
- Because we have many pre-treatment periods, we can apply the LDV approach to pre-treatment outcomes
Placebo 1: Outcome year 2004, controlling for \(Y_{2000}\) and \(Y_{1996}\)
Placebo 2: Outcome year 2000, controlling for \(Y_{1996}\) and \(Y_{1992}\)
Placebo 3: Outcome year 1996, controlling for \(Y_{1992}\) and \(Y_{1988}\)

Placebo: LDV estimates

Code

# Placebo LDV regressions (2 lagged controls, matching main spec)
placebo_ldv_results <- bind_rows(
  # 2004 outcome, controlling for 2000, 1996
  {
    df_yr <- dff %>% filter(year == 2004)
    fit <- estimatr::lm_robust(
      RepVotesMajorPercent ~ treat + Y_2000 + Y_1996 + oilgas_prop,
      data = df_yr, clusters = county_state, se_type = "stata")
    broom::tidy(fit) %>% filter(term == "treat") %>% mutate(year = 2004)
  },
  # 2000 outcome, controlling for 1996, 1992
  {
    df_yr <- dff %>% filter(year == 2000)
    fit <- estimatr::lm_robust(
      RepVotesMajorPercent ~ treat + Y_1996 + Y_1992 + oilgas_prop,
      data = df_yr, clusters = county_state, se_type = "stata")
    broom::tidy(fit) %>% filter(term == "treat") %>% mutate(year = 2000)
  },
  # 1996 outcome, controlling for 1992, 1988
  {
    df_yr <- dff %>% filter(year == 1996)
    fit <- estimatr::lm_robust(
      RepVotesMajorPercent ~ treat + Y_1992 + Y_1988 + oilgas_prop,
      data = df_yr, clusters = county_state, se_type = "stata")
    broom::tidy(fit) %>% filter(term == "treat") %>% mutate(year = 1996)
  }
)

ggplot(placebo_ldv_results, aes(x = year, y = estimate)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.5) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(x = "Year", y = "Estimate (pp)",
       title = "Placebo test: LDV estimates in pre-treatment periods",
       subtitle = "Each placebo controls for two prior election outcomes + oil/gas share") +
  scale_x_continuous(breaks = c(1996, 2000, 2004)) +
  theme_bw()

Bracketing: Angrist and Pischke (2009)

Angrist and Pischke (2009, Ch. 5.4) show that LDV and DiD estimates bracket the true causal effect if one of the two identifying assumptions holds.
- Focus on the simple linear model case - Ding and Li (2019) extend to the non-parametric setting.
Two-period DiD. All units under control at time \(0\)
- Assume \(\tau > 0\) (positive effect)
For exposition, assume time-effects are already removed via differencing and no covariates.

Bracketing: Case 1 – DiD is correct, LDV is misspecified

Suppose the FE model is the truth:

\[Y_{it} = \alpha_i + \tau D_{it} + \varepsilon_{it}\]

where \(\varepsilon_{it}\) is serially uncorrelated and independent of \(\alpha_i\) and \(D_{it}\). Since \(D_{it-1} = 0\): \(Y_{it-1} = \alpha_i + \varepsilon_{it-1}\)
You mistakenly estimate the LDV regression (controlling for \(Y_{it-1}\) instead of \(\alpha_i\)):

\[Y_{it} = a + b \cdot Y_{it-1} + \tau^{\text{LDV}} D_{it} + u_{it}\]

Bracketing: Case 1 – FWL decomposition

By FWL, the LDV estimand is \(\hat{\tau}^{\text{LDV}} \xrightarrow{p} \frac{\text{Cov}(Y_{it}, \tilde{D}_{it})}{\text{Var}(\tilde{D}_{it})}\)
- \(\tilde{D}_{it} = D_{it} - \hat{\gamma} Y_{it-1}\) is the residual from the auxiliary regression of \(D_{it}\) on \(Y_{it-1}\)
To evaluate this, substitute \(\alpha_i = Y_{it-1} - \varepsilon_{it-1}\) into the true model:

\[Y_{it} = \alpha_i + \tau D_{it} + \varepsilon_{it} = (Y_{it-1} - \varepsilon_{it-1}) + \tau D_{it} + \varepsilon_{it}\]

\[\Rightarrow Y_{it} = Y_{it-1} + \tau D_{it} + (\varepsilon_{it} - \varepsilon_{it-1})\]

Bracketing: Case 1 – Deriving the bias

Substitute in and use properties of covariances

\[\frac{\text{Cov}(Y_{it}, \tilde{D}_{it})}{\text{Var}(\tilde{D}_{it})} = \frac{\text{Cov}(Y_{it-1} + \tau D_{it} + \varepsilon_{it} - \varepsilon_{it-1},\; \tilde{D}_{it})}{\text{Var}(\tilde{D}_{it})}\]
\(\text{Cov}(Y_{it-1}, \tilde{D}_{it}) = 0\) by construction (FWL residualizes out \(Y_{it-1}\))
\(\text{Cov}(D_{it}, \tilde{D}_{it}) = \text{Var}(\tilde{D}_{it})\)
\(\text{Cov}(\varepsilon_{it}, \tilde{D}_{it}) = 0\) since \(\varepsilon_{it} \perp\!\!\!\perp (D_{it}, Y_{it-1})\)
But \(\text{Cov}(\varepsilon_{it-1}, \tilde{D}_{it}) = \text{Cov}(\varepsilon_{it-1}, D_{it}) - \hat{\gamma}\text{Cov}(\varepsilon_{it-1}, Y_{it-1})\)
- And since \(Y_{it-1} = \alpha_i + \varepsilon_{it-1}\)…
- \(\text{Cov}(\varepsilon_{it-1}, \tilde{D}_{it}) = -\hat{\gamma}\sigma^2_{\varepsilon}\)
So: \(\hat{\tau}^{\text{LDV}} \xrightarrow{p} \tau + \frac{\hat{\gamma} \sigma^2_\varepsilon}{\text{Var}(\tilde{D}_{it})}\), with bias having the same sign as \(\hat{\gamma}\) (the sign of selection)

Bracketing: Case 1 – Why is LDV biased?

\(Y_{it-1} = \alpha_i + \varepsilon_{it-1}\) is a noisy proxy for the confounder \(\alpha_i\)
- Controlling for a noisy proxy does not fully remove confounding
With positive selection (high outcome \(\leadsto\) more likely to be treated) (\(\hat{\gamma} > 0\))
- LDV overestimates \(\tau\)
With negative selection (\(\hat{\gamma} < 0\))
- LDV underestimates \(\tau\)

Bracketing: Case 2 – LDV is correct, DiD is misspecified

Now suppose the LDV model is the truth:

\[Y_{it} = \alpha + \theta Y_{it-1} + \tau D_{it} + \varepsilon_{it}\]
Where \(\varepsilon_{it}\) is serially uncorrelated, \(0 < \theta < 1\) (stationary)
You mistakenly estimate the first-differenced regression (no control for \(Y_{it-1}\)):

\[\Delta Y_{it} = a + \tau^{\text{DiD}} D_{it} + u_{it}\]

where \(\Delta Y_{it} = Y_{it} - Y_{it-1}\) and \(D_{it-1} = 0\) for everyone

Bracketing: Case 2 – FWL decomposition

The DiD estimand is \(\hat{\tau}^{\text{DiD}} \xrightarrow{p} \frac{\text{Cov}(\Delta Y_{it}, D_{it})}{\text{Var}(D_{it})}\)
Subtract \(Y_{it-1}\) from both sides of the true model:

\[Y_{it} - Y_{it-1} = \alpha + (\theta - 1)Y_{it-1} + \tau D_{it} + \varepsilon_{it}\]
So \(\Delta Y_{it} = \alpha + (\theta - 1)Y_{it-1} + \tau D_{it} + \varepsilon_{it}\). Substituting:

\[\frac{\text{Cov}(\Delta Y_{it}, D_{it})}{\text{Var}(D_{it})} = \frac{\text{Cov}(\alpha + (\theta - 1)Y_{it-1} + \tau D_{it} + \varepsilon_{it}, D_{it})}{\text{Var}(D_{it})} \]

\[\frac{\text{Cov}(\Delta Y_{it}, D_{it})}{\text{Var}(D_{it})} = \tau + (\theta - 1)\frac{\text{Cov}(Y_{it-1}, D_{it})}{\text{Var}(D_{it})} + \frac{\text{Cov}(\varepsilon_{it}, D_{it})}{\text{Var}(D_{it})}\]
\(\text{Cov}(\varepsilon_{it}, D_{it}) = 0\) by assumption (LDV model has no unobserved confounding)

Bracketing: Case 2 – Deriving the bias

Therefore:

\[\hat{\tau}^{\text{DiD}} \xrightarrow{p} \tau + (\theta - 1)\frac{\text{Cov}(Y_{it-1}, D_{it})}{\text{Var}(D_{it})}\]
The bias is \((\theta - 1)\frac{\text{Cov}(Y_{it-1}, D_{it})}{\text{Var}(D_{it})}\)
The bias has the opposite sign from the selection:
- With positive selection (\(\text{Cov}(Y_{it-1}, D_{it}) > 0\)) and \(\theta < 1\): bias is negative → DiD underestimates
- With negative selection: bias is positive → DiD overestimates
Under this model, LDV is unbiased: it correctly controls for \(Y_{it-1}\)

Bracketing

Combining both cases with positive selection (\(\text{Cov}(\alpha_i, D_i) > 0\)):
- If strict exogeneity is correct: LDV overestimates, DiD is exact
- If sequential ignorability is correct: DiD underestimates, LDV is exact
The true effect is bracketed between the two estimates if one of these two assumptions is true.
This is a useful result, but important to be aware of its limitations
- It guarantees nothing if neither assumption is true!
In many cases, we’re skeptical of parallel trends but we don’t think sequential ignorability is right either
- Rather, we think the unobserved latent factor structure might be more complicated than \(U_{it} = \alpha_i + \delta_t\)
- What do we do then?

Synthetic Controls

The Synthetic Control Method

Comparative case studies in the social sciences try to assess the impact of some event by comparing the case of interest to some unaffected units
- We ideally want to pick units that are similar enough that they share the same latent characteristics shaping the evolution of the outcome.
The Synthetic Control Method (SCM) builds on this intuition (Abadie and Gardeazabal, 2003; Abadie, Diamond, and Hainmueller, 2010).
- What if instead of using a single comparison, we use a weighted average of those units.
- Find weights on the control units that reproduce the pre-treatment outcome trajectory of the treated unit
- Compare the observed outcome trajectory to the synthetic control average.
Justified under two different outcome models
- Under sequential ignorability, we get an unbiased estimator (because past outcomes are direct confounders)
- Under a latent factor model, the bias goes away as the number of time periods gets large.

SCM: Setup and Notation

Observe outcomes \(Y_{it}\) for units \(i = 1, \dotsc, J+1\) over periods \(t = 1, \dotsc, T\)
- Let \(T_0\) denote the number of pre-treatment periods, \(T_1\) the number of post-treatment
- Unit \(i = 1\) is the treated unit, exposed to treatment starting at \(T_0 + 1\)
- Units \(i = 2, \dotsc, J+1\) are donor units (never treated)
We want to estimate:

\[\tau_{1t} = Y_{1t}(1) - Y_{1t}(0) \quad \text{for } t > T_0\]
We define a set of weights on the donor units \(\mathbf{W} = (w_2, \dotsc, w_{J+1})\) with \(w_j \geq 0\) and \(\sum_j w_j = 1\)
Estimate the effect by comparing the observed outcome to the imputed counterfactual

\[\hat{\tau_{1t}} = Y_{1t} - \sum_{j=2}^{J+1} w_j Y_{jt}\]

SCM: The data-generating process

The SCM is typically motivated by an underlying latent factor model for the outcome

\[Y_{it}(0) = \delta_t + \mathbf{\theta}_t \mathbf{Z}_i + \boldsymbol{\lambda}_t' \boldsymbol{\mu}_i + \varepsilon_{it}\]
\(\mathbf{Z}_i\) are the observed covariates (with time-varying coefficients \(\mathbf{\theta}_t\)).
\(\boldsymbol{\lambda}_t\) are \(F\)-dimensional time-varying factors and \(\boldsymbol{\mu}_i\) are unit-specific loadings
This is a more general model of latent unobserved confounding than DiD
- In DiD, there is a single common time trend across all factors \(\boldsymbol{\lambda}_t = \lambda\)
- The latent factor model allows for some number of varying time-trends across units (elements of \(\boldsymbol{\lambda}_t\)) that apply to different units (weights dictated by \(\boldsymbol{\mu}_i\)).

SCM: Choosing the weights

The goal of the synthetic control method is to find weights such that the weighted average of donor units reproduces the covariates and the pre-treatment trajectory of the treated unit

\[\sum_{j=2}^{J+1} w^*_j Y_{jt} = Y_{1t} \quad \text{for } t \in \{1, 2, \dotsc, T_0\}\]

\[\sum_{j=2}^{J+1} w^*_j \mathbf{Z}_j = \mathbf{Z}_{1}\]
For notational simplicity, we’ll denote the \(K \times J+1\) matrix of combined covariates and pre-treatment outcomes as \(\mathbf{X}\)
- \(\mathbf{X}_1\) is the \(K x 1\) vector for the treated unit
- \(\mathbf{X}_0\) is the \(K x J\) matrix for the control units.
The ideal synthetic control has weights weights \(\mathbf{W}^*\) such that \(\mathbf{X}_1 = \mathbf{X}_0 \mathbf{W}^*\)

SCM: Bias

Supposing that we find \(\mathbf{W}^*\), what is the error in our imputed counterfactual?

\[Y_{1t}(0) - \sum_{j=2}^{J+1} w^*_j Y_{jt}\]
Abadie, Diamond and Hainmueller (2010) show that under the latent factor model, the expected error goes to zero…
- …when the number of pre-treatment periods is large…
- …relative to the size of the pre-treatment shocks!
Intuition: Remember our earlier errors-in-variables example when we controlled for a lagged dependent variable!
- The lagged outcomes are proxies for the latent factor - the less noisy the proxy, the better it reproduces the true unobserved latent factor.
Under an autoregressive outcome model, the bias is zero!
- “Sequential ignorability” - past \(Y\) directly affects future \(Y\).

SCM: Estimating the weights

In practice, we can only get \(\mathbf{X}_1 \approx \mathbf{X}_0 \mathbf{W}^*\)
- How do we find \(\mathbf{W}^*\)?
We solve a constrained minimization problem that minimizes a distance metric
For example, the euclidean norm:

\[||\mathbf{X}_1 - \mathbf{X}_0 \mathbf{W}^*|| = \sqrt{\sum_{h=1}^K v_h(X_{h1} - w_2 X_{h2} - w_3 X_{h3} - \dotsc - w_{J+1}X_{hJ+1})^2}\]
We impose constraints:
- \(\sum_{j=2}^{J+1} w_j = 1\) Weights sum to 1
- \(w_j \ge 0 \quad \forall j \in \{2, 3, \dotsc, J+1\}\) Weights are non-negative
Note that we also get to choose how much importance to put on each dimension (covariate) \(\mathbf{V} = \{v_1, v_2, \dotsc, v_K\}\)
- Abadie, Diamond and Hainmueller (2010) use an out-of-sample validation approach against the pre-treatment outcomes to choose an optimal \(\mathbf{V}\)

SCM: Notes on the weights

The non-negativity and sum to 1 have some useful properties:
- Sparse - most weights are 0
- Interpretable - we can characterize the precise contributions of each element of the donor pool to the synthetic control
- Avoid extrapolation
But there are problems!
- If the treated unit is outside the convex hull of the controls, it’s impossible to reproduce the pre-treatment trajectory!
In general, the bias \(\to 0\) results are derived under perfect fit - if the synthetic control is a poor match, bias can be huge!
Possible solutions
- Add an intercept (Doudchenko and Imbens, 2017; Ferman and Pinto, 2021)
- Allow for negative weights (Doudchenko and Imbens, 2017)
- Add an outcome model to correct for imperfect fit (Ben-Michael, Feller and Rothstein, 2021)

Application: Basque Country Terrorism

Abadie and Gardeazabal (2003): Effect of ETA terrorism on the Basque Country’s GDP per capita
- Treatment: onset of political terrorism in the late 1960s (they use ~1970 as treatment)
- Donor pool: 16 other Spanish regions
- Outcome: real GDP per capita (thousands of 1986 USD)
- Time: 1955-1997
Replication example in the Synth package

Basque Country: Data Preparation

Code

library(Synth)
data(basque)

# Prepare data for Synth
dataprep.out <- dataprep(
  foo = basque,
  predictors = c("school.illit", "school.prim", "school.med",
                 "school.high", "school.post.high", "invest"),
  predictors.op = "mean",
  time.predictors.prior = 1964:1969,
  special.predictors = list(
    list("gdpcap", 1960:1969, "mean"),
    list("sec.agriculture", seq(1961, 1969, 2), "mean"),
    list("sec.energy", seq(1961, 1969, 2), "mean"),
    list("sec.industry", seq(1961, 1969, 2), "mean"),
    list("sec.construction", seq(1961, 1969, 2), "mean"),
    list("sec.services.venta", seq(1961, 1969, 2), "mean"),
    list("sec.services.nonventa", seq(1961, 1969, 2), "mean"),
    list("popdens", 1969, "mean")
  ),
  dependent = "gdpcap",
  unit.variable = "regionno",
  unit.names.variable = "regionname",
  time.variable = "year",
  treatment.identifier = 17,  # Basque Country
  controls.identifier = c(2:16, 18),  # All other regions (excluding Spain aggregate = 1)
  time.optimize.ssr = 1960:1969,
  time.plot = 1955:1997
)

Basque Country: Fitting the Synthetic Control

# Run the synth optimization
synth.out <- synth(data.prep.obj = dataprep.out, method = "BFGS")

# Display the weights
synth.tables <- synth.tab(dataprep.res = dataprep.out, synth.res = synth.out)
cat("Donor weights (non-zero):\n")

Donor weights (non-zero):

w <- synth.tables$tab.w
print(w[w$w.weights > 0.01, ])

   w.weights            unit.names unit.numbers
10     0.851              Cataluna           10
14     0.149 Madrid (Comunidad De)           14

Basque Country: Synthetic vs. Actual

Code

# Extract path data
years <- dataprep.out$tag$time.plot
Y_treat <- dataprep.out$Y1plot
Y_synth <- dataprep.out$Y0plot %*% synth.out$solution.w

plot_df <- data.frame(
  year = years,
  Treated = as.numeric(Y_treat),
  Synthetic = as.numeric(Y_synth)
) %>%
  pivot_longer(cols = c(Treated, Synthetic), names_to = "Unit", values_to = "gdpcap")

ggplot(plot_df, aes(x = year, y = gdpcap, color = Unit, linetype = Unit)) +
  geom_line(linewidth = 1.2) +
  geom_vline(xintercept = 1970, linetype = "dashed", color = "gray40") +
  annotate("text", x = 1971, y = 3, label = "Terrorism\nonset", hjust = 0, size = 3.5) +
  labs(x = "Year", y = "Real per capita GDP\n(1986 USD, thousands)",
       title = "Basque Country vs. Synthetic Basque Country",
       subtitle = "Abadie & Gardeazabal (2003) — Effect of ETA terrorism on GDP") +
  scale_color_manual(values = c("Treated" = "black", "Synthetic" = "firebrick")) +
  scale_linetype_manual(values = c("Treated" = "solid", "Synthetic" = "dashed")) +
  theme_bw() + theme(legend.position = "bottom")

Basque Country: Treatment Effect (Gap Plot)

Code

gap <- as.numeric(Y_treat) - as.numeric(Y_synth)
gap_df <- data.frame(year = years, gap = gap)

ggplot(gap_df, aes(x = year, y = gap)) +
  geom_line(linewidth = 1.2) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = 1970, linetype = "dashed", color = "gray40") +
  annotate("text", x = 1971, y = 0.4, label = "Terrorism onset", hjust = 0, size = 3.5) +
  labs(x = "Year", y = "Gap in real per capita GDP\n(Treated − Synthetic)",
       title = "Estimated effect of terrorism on Basque Country GDP",
       subtitle = "Negative gap = terrorism reduced GDP relative to synthetic control") +
  theme_bw()

SCM: Inference

With a single treated unit, standard inference is not straightforward. Abadie et al. (2010) propose a permutation testing approach:
- Apply SCM to each donor unit as if it were treated
- If the treated unit’s gap is unusually large relative to these “placebo” gaps, the effect is unlikely due to chance
- Test statistic: post/pre RMSPE ratio
- Common practice to drop controls with egregiously worse pre-treatment fit compared to treated unit.
Recent work in statistics moves beyond this:
- “Conformal inference” approach in Chernozhukov, Wüthrich, and Zhu (2019)

Basque Country: Permutation tests

Code

# Run placebo tests for each control region
control_ids <- c(2:16, 18)
placebo_gaps <- list()

for (ctrl_id in control_ids) {
  donor_ids <- setdiff(c(2:16, 18), ctrl_id)

  tryCatch({
    dp <- dataprep(
      foo = basque,
      predictors = c("school.illit", "school.prim", "school.med",
                     "school.high", "school.post.high", "invest"),
      predictors.op = "mean",
      time.predictors.prior = 1964:1969,
      special.predictors = list(
        list("gdpcap", 1960:1969, "mean"),
        list("sec.agriculture", seq(1961, 1969, 2), "mean"),
        list("sec.energy", seq(1961, 1969, 2), "mean"),
        list("sec.industry", seq(1961, 1969, 2), "mean"),
        list("sec.construction", seq(1961, 1969, 2), "mean"),
        list("sec.services.venta", seq(1961, 1969, 2), "mean"),
        list("sec.services.nonventa", seq(1961, 1969, 2), "mean"),
        list("popdens", 1969, "mean")
      ),
      dependent = "gdpcap",
      unit.variable = "regionno",
      unit.names.variable = "regionname",
      time.variable = "year",
      treatment.identifier = ctrl_id,
      controls.identifier = donor_ids,
      time.optimize.ssr = 1960:1969,
      time.plot = 1955:1997
    )

    so <- synth(data.prep.obj = dp, method = "BFGS")

    y1 <- as.numeric(dp$Y1plot)
    y0 <- as.numeric(dp$Y0plot %*% so$solution.w)
    placebo_gaps[[as.character(ctrl_id)]] <- data.frame(
      year = 1955:1997, gap = y1 - y0, unit = ctrl_id
    )
  }, error = function(e) {
    # Skip units where optimization fails
  })
}


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.001123037 

solution.v:
 0.241915 0.004561014 0.0005982158 0.0004798842 0.001241421 0.01513177 0.3234859 0.01630921 0.02052098 0.1785447 0.00422904 0.006081587 0.09511713 0.09178419 

solution.w:
 3.1e-09 1.9e-09 3e-09 5.1e-09 1.4e-09 3.6e-09 0.4744106 6.396e-07 0.1323778 0 0.3932067 4.3205e-06 3.9e-09 2.1e-09 2.1e-09 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.0005042449 

solution.v:
 0.007951046 0.005383677 0.00339887 0.01137706 0.00460769 0.001229192 0.00942928 0.3711191 0.07144081 0.2706073 0.0009782796 0.2245375 0.01746217 0.0004780441 

solution.w:
 0.01895674 0.1598039 0.04920978 0.1000861 0.01720838 0.09270212 0.002315982 0.0114126 0.007351221 0.06490908 0.008642634 0.03648222 0.01175011 0.4191682 8.657e-07 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 9.150192e-05 

solution.v:
 0.007306654 0.03508239 0.02403886 0.04107291 0.02845283 0.01600231 0.03689203 0.2648573 0.08663512 0.003754926 0.144038 3.1e-09 0.2267502 0.08511643 

solution.w:
 1.3e-09 0.1811191 8.428e-07 0.5491962 6.242e-07 5.4e-09 2.11e-08 0.2696795 4.32e-08 8e-10 1.24e-08 0 3.5521e-06 2.87e-08 2.73e-08 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.121049 

solution.v:
 0.002617738 0.0002791556 2.57468e-05 0.000137047 0.00097715 0.008678213 0.5666581 0.05034707 0.2251532 9.92001e-05 0.01279458 0.02488051 0.07979401 0.0275582 

solution.w:
 4.14567e-05 0.0002089316 1.5421e-06 0.004351484 0.0003378521 4.2789e-06 3.2254e-06 0.5733009 0.0004280817 0.0004464912 5.8471e-06 0.2508225 3.2848e-06 0.1700441 7.3e-09 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.001322934 

solution.v:
 8.488e-07 0.01904002 0.02959735 0.03582989 0.0346101 0.1146028 0.06955046 0.01739945 0.1487795 0.1122417 0.0009692388 0.000475205 0.4166301 0.0002732522 

solution.w:
 2.127e-07 2.18e-08 4.37e-08 0.1713031 1.071e-07 1.498e-07 4.6e-09 1.01e-08 2.41e-08 0.2497729 4.43e-08 2.86e-07 0.5789231 4.2e-09 2.8e-09 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 6.71562e-05 

solution.v:
 0.09936656 0.03044294 0.02972727 0.05003989 0.01908395 0.07927748 0.1458313 0.2134485 0.007984265 0.03630765 0.08577594 0.194891 0.005290142 0.002533157 

solution.w:
 7.2e-09 1.373e-07 0.2463714 0.03016985 8.263e-07 3.86e-08 6.49e-08 2.64e-08 0.5327529 5.04e-08 4.73e-08 3.2e-09 1.3424e-06 0.1076988 0.08300449 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.0003285848 

solution.v:
 0.02044489 0.04385717 0.06860424 7.70025e-05 9.93798e-05 0.0007090437 0.02125673 0.0003438331 0.2376439 0.001942309 0.08550463 0.03828685 0.431263 0.049967 

solution.w:
 0.2405035 0.02499768 0.06589436 7.7222e-06 4.1037e-06 1.86488e-05 5.34252e-05 5.9741e-05 1.20499e-05 0.007073272 0.2371783 0.003946094 0.4199905 0.0001347198 0.0001258654 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.004925438 

solution.v:
 0.0640751 0.08790118 0.001980309 0.0003909784 0.001042542 0.009779076 0.06046502 0.3617145 0.1614419 0.004088867 0.002717818 0.2082107 0.002001196 0.03419076 

solution.w:
 0.001196807 2.3356e-06 6.8792e-05 2.04e-07 3.601e-07 1.7593e-06 0.3517978 1.1187e-06 1.5301e-06 0.6414476 9.8923e-06 1.533e-07 3.784e-06 6.1336e-06 0.005461752 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.01361662 

solution.v:
 0.001433952 0.03871014 0.1084723 0.06228383 0.009103274 0.0398204 0.04808129 0.4397184 0.04134378 0.1485366 0.005719523 0.04843611 0.00131105 0.007029396 

solution.w:
 8.635e-07 2.1e-09 1.1e-09 1e-10 2e-10 0.3980367 1.09e-08 2.6e-09 1.07918e-05 7e-10 6.2e-09 0.6019516 7e-10 3.2e-09 2.9e-09 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.001117842 

solution.v:
 0.0248724 5.94756e-05 0.1038749 0.001693213 0.0546291 0.1669822 0.2045931 0.0009081238 0.06845943 0.01931348 0.01832864 0.01195384 3.0517e-06 0.324329 

solution.w:
 0.194387 7.6484e-06 1.5136e-06 0.0004885163 1.77094e-05 0.4662803 5.0585e-06 0.1050862 0.1032154 4.6847e-06 3.644e-07 0.1251891 2.04468e-05 1.11538e-05 0.005284934 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.1146391 

solution.v:
 0.03571533 0.06494282 0.09362483 0.07491658 0.07469396 1.37939e-05 0.04935978 0.349576 0.1030824 0.08247467 0.001718186 0.0001076685 0.006958453 0.06281548 

solution.w:
 1e-09 6.3e-09 9e-10 6.83e-08 1.731e-07 4e-09 1.5e-09 0.9999997 6e-10 2.5e-09 4.6e-09 3.1e-09 5.5e-09 1.53e-08 4.3e-08 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.0004963764 

solution.v:
 0.0009361261 0.1333922 0.02525684 0.1227094 0.05640474 0.0004846876 0.419331 9.00574e-05 0.01196674 0.05946839 0.1680857 1.42442e-05 0.001858875 1.0247e-06 

solution.w:
 0.286836 0.003897055 0.05920034 0.001340042 0.002269029 0.007676064 0.09404751 0.4512099 0.005251879 0.007917988 0.07104263 0.000415688 0.002069254 0.003351204 0.003475416 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.7209181 

solution.v:
 0.05343497 0.0121308 0.08151981 0.05309881 0.06651345 0.07808911 0.261331 0.09139274 0.03624163 0.03912748 0.01813165 0.08133689 0.02373037 0.1039213 

solution.w:
 5.929e-07 1.123e-07 1.222e-07 0.04087338 1.0595e-06 2.69e-08 1.823e-07 7.34e-08 0.9591236 1.817e-07 8.57e-08 1.869e-07 2.062e-07 1.018e-07 1.011e-07 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.001775702 

solution.v:
 0.0001311986 0.002637929 0.1641209 0.08811159 0.004142852 0.07747617 0.3947256 0.00124808 0.04052377 0.0001841404 0.06422426 0.05199957 0.07180447 0.03866944 

solution.w:
 1.9427e-06 1.1112e-06 0.02391781 1.25e-07 0.3687398 3.1e-09 0.01449668 0.3780162 4.957e-07 5.086e-07 0 1.8853e-06 2.81679e-05 2.478e-07 0.214795 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.0002705289 

solution.v:
 0.01972369 0.0005686467 0.001710033 8.41935e-05 5.4099e-05 0.1506841 0.2549567 0.2465513 0.1368061 0.03261514 0.002974493 0.1309928 0.000170366 0.02210827 

solution.w:
 0.0003172171 0.05824001 6.6413e-05 0.01134615 9.8315e-05 0.09664233 0.0001627458 9.64939e-05 0.1327023 0.0001892526 0.004009706 0.0002083849 0.001109842 0.0001280464 0.6946828 


X1, X0, Z1, Z0 all come directly from dataprep object.


**************** 
 searching for synthetic control unit  
 

**************** 
**************** 
**************** 

MSPE (LOSS V): 0.0004131752 

solution.v:
 0.1276 0.2059267 0.02060637 0.1085751 0.01427578 0.002102533 0.0835427 0.009201901 0.00282947 0.3837883 0.0110102 6.9e-07 0.02223868 0.008301534 

solution.w:
 1.38e-08 2.58e-08 1.9e-08 0 6.7e-09 1.46788e-05 2.98e-08 1.0763e-06 8.4e-09 3.43e-08 2.49e-08 4.17e-08 8.5e-09 0.1709315 0.8290525

Code

placebo_df <- do.call(rbind, placebo_gaps)

# Add treated unit
treated_gap_df <- data.frame(year = years, gap = gap, unit = 17)

Basque Country: Permutation tests

Code

ggplot() +
  geom_line(data = placebo_df, aes(x = year, y = gap, group = unit),
            color = "gray70", alpha = 0.6) +
  geom_line(data = treated_gap_df, aes(x = year, y = gap),
            color = "black", linewidth = 1.2) +
  geom_vline(xintercept = 1970, linetype = "dashed", color = "gray40") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(x = "Year", y = "Gap in real per capita GDP",
       title = "Placebo tests: Basque Country (black) vs. donor regions (gray)",
       subtitle = "Each gray line = SCM applied to a donor region as if it were treated") +
  theme_bw()

Permutation Tests: Dropping Poor Fits

Code

# Compute pre-treatment RMSPE for each unit
treated_pre_rmspe <- sqrt(mean((treated_gap_df %>% filter(year < 1970))$gap^2))

placebo_pre_rmspe <- placebo_df %>%
  filter(year < 1970) %>%
  group_by(unit) %>%
  summarize(pre_rmspe = sqrt(mean(gap^2))) %>%
  ungroup()

# Keep only units with pre-RMSPE <= 5x the treated unit
keep_units <- placebo_pre_rmspe %>%
  filter(pre_rmspe <= 5 * treated_pre_rmspe) %>%
  pull(unit)

placebo_df_trimmed <- placebo_df %>% filter(unit %in% keep_units)

n_dropped <- n_distinct(placebo_df$unit) - length(keep_units)

ggplot() +
  geom_line(data = placebo_df_trimmed, aes(x = year, y = gap, group = unit),
            color = "gray70", alpha = 0.6) +
  geom_line(data = treated_gap_df, aes(x = year, y = gap),
            color = "black", linewidth = 1.2) +
  geom_vline(xintercept = 1970, linetype = "dashed", color = "gray40") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(x = "Year", y = "Gap in real per capita GDP",
       title = paste0("Placebo tests: Dropping donors with pre-RMSPE > 5x treated (",
                      n_dropped, " dropped)"),
       subtitle = "Basque Country (black) vs. donor regions with adequate pre-treatment fit (gray)") +
  theme_bw()

Permutation Tests: RMSPE Ratios

Code

# Compute post/pre RMSPE ratios
compute_rmspe_ratio <- function(gap_df, treatment_year = 1970) {
  pre <- gap_df %>% filter(year < treatment_year)
  post <- gap_df %>% filter(year >= treatment_year)
  rmspe_pre <- sqrt(mean(pre$gap^2))
  rmspe_post <- sqrt(mean(post$gap^2))
  rmspe_post / rmspe_pre
}

# Treated unit ratio
treated_ratio <- compute_rmspe_ratio(treated_gap_df)

# Placebo ratios
placebo_ratios <- placebo_df %>%
  group_by(unit) %>%
  group_modify(~ tibble(ratio = compute_rmspe_ratio(.x))) %>%
  ungroup()

all_ratios <- bind_rows(
  placebo_ratios,
  tibble(unit = 17, ratio = treated_ratio)
) %>%
  mutate(is_treated = unit == 17)

# Compute p-value
pval <- mean(all_ratios$ratio >= treated_ratio)

ggplot(all_ratios, aes(x = ratio, fill = is_treated)) +
  geom_histogram(bins = 15, color = "white", alpha = 0.8) +
  geom_vline(xintercept = treated_ratio, linetype = "dashed", color = "red", linewidth = 1) +
  scale_fill_manual(values = c("FALSE" = "gray70", "TRUE" = "black"),
                    labels = c("Donor regions", "Basque Country"),
                    name = "") +
  labs(x = "Post/Pre RMSPE Ratio",
       y = "Count",
       title = "Distribution of RMSPE ratios: Permutation inference",
       subtitle = paste0("Basque Country ratio = ", round(treated_ratio, 2),
                         " | p-value = ", round(pval, 3))) +
  theme_bw() + theme(legend.position = "bottom")

Staggered Adoption SCM

Conventional SCM is designed for one treated unit — what if multiple units are treated?
- What should we weight our control pool to - each unit separately or the average of the units?
We might get perfect balance for the average of the units but terrible fit for individual treated units!
Ben-Michael, Feller and Rothstein (2022) show that the bias is a combination of these two imbalances.
- Propose a partial pooling approach that minimizes a combination of the two.
Implemented in the augsynth R package (along with their outcome-augmented synthetic control)

Application: Shale Shock and Republican Voting

Let’s apply the augmented/multi-unit synthetic control to the Gazmararian (2025) shale shock data
- Earlier, we saw that DiD and LDV gave different estimates, and placebo tests raised concerns
- Is SCM the solution?
Review of the setup:
- 116 treated coal counties, 97 control counties
- Treatment onset: 2008 (shale gas boom undermines coal industry)
- Outcome: Republican vote share
- Pre-treatment periods: 1972–2004 (8 election cycles)
- Post-treatment periods: 2008–2020 (4 election cycles)

multisynth: Average Effect Across Treated Units

Code

# Use multisynth for the multi-unit case
# multisynth infers treatment timing from the binary indicator
msyn <- multisynth(
  RepVotesMajorPercent ~ treated,
  unit = county_state, time = year,
  data = dff
)

msyn_summ <- summary(msyn)

# Extract average ATT for plotting
avg_att <- msyn_summ$att %>%
  filter(Level == "Average")

ggplot(avg_att, aes(x = Time, y = Estimate)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = lower_bound, ymax = upper_bound), width = 0.3) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = -0.5, linetype = "dashed", color = "red") +
  labs(x = "Time relative to treatment (election cycles)",
       y = "Estimated ATT (pp)",
       title = "Multisynth: Effect of shale shock on Republican vote share",
       subtitle = "Average across treated coal counties (partially pooled SCM)") +
  theme_bw()

Placebo testing

We’ve fit the synthetic control model to minimize pre-treatment outcome discrepancies.
- A perfect synthetic control will reproduce the trajectory exactly
- But this doesn’t mean that our estimator is unbiased - especially if we have a short panel.
- How do we assess whether the bias is small?
  - Perfect pre-treatment fit is no longer a valid placebo test!
We can use a placebo test approach where we backdate the treatment to an earlier period and check whether the synthetic control produces spurious “effects” in held-out pre-treatment periods
- Same intuition as our discussion last week - we want our placebo tests to involve held-out comparisons.

Backdating falsification tests

Code

# In-time falsification: backdate treatment to 2000 and 1996
# Fit SCM using only pre-placebo data, check for "effects" in held-out periods

placebo_years <- c(1996, 2000, 2004)

falsification_results <- list()

for (placebo_yr in placebo_years) {
  # Create dataset: only pre-2008 data, with treatment backdated
  falsification_df <- dff %>%
    filter(year < 2008) %>%
    mutate(placebo_treat = as.integer(treat == 1 & year >= placebo_yr))

  tryCatch({
    plac_fit <- multisynth(
      RepVotesMajorPercent ~ placebo_treat,
      unit = county_state, time = year,
      data = falsification_df,
    )

    plac_summ <- summary(plac_fit)
    plac_att <- plac_summ$att %>%
      filter(Level == "Average") %>%
      mutate(placebo_year = placebo_yr)

    falsification_results[[as.character(placebo_yr)]] <- plac_att
  }, error = function(e) {
    cat("Error for placebo year", placebo_yr, ":", conditionMessage(e), "\n")
  })
}

all_results <- bind_rows(
  do.call(rbind, falsification_results)
) %>%
  mutate(placebo_label = paste0("Placebo treatment = ", placebo_year))

ggplot(all_results, aes(x = Time, y = Estimate,
                         color = factor(placebo_year))) +
  geom_point(size = 2.5) +
  geom_errorbar(aes(ymin = lower_bound, ymax = upper_bound), width = 0.3) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = -0.5, linetype = "dotted") +
  facet_wrap(~ placebo_label, scales = "free_x") +
  labs(x = "Time relative to (placebo) treatment",
       y = "Estimated ATT (pp)",
       title = "Placebo tests: backdated treatment vs. actual",
       subtitle = "Placebo treatments should show no effect in held-out pre-2008 periods") +
  scale_color_manual(values = c("1996" = "steelblue", "2000" = "darkorange",
                                 "2004" = "black")) +
  theme_bw() + theme(legend.position = "none")

Conclusion

Causal inference in time-series data is hard!
- Even the estimands can get tricky - we’ll come back to this again when we talk about mediation
The core problem centers around how we deal with unobserved confounding and the role that past outcomes play in identification/estimation
Three scenarios:
- Unobserved confounding but simple functional form (DiD)
  - Adjust via differencing - works in short panels
- Observed confounding driven by past \(Y\)
  - Adjust by controlling for past \(Y\) - also fine in short panels
- Unobserved confounding more complex than DiD
  - Adjust for past \(Y\) to indirectly control for the latent factor - need many pre-treatment periods!

Next week

The last of the “holy trinity” - regression discontinuity designs
What happens when we have arbitrary unobserved confounding…
- …but treatment is assigned via a threshold
In this case, even if treatment-outcome are confounded, that confounding might be negligible in the window around the threshold!
Two ways to think about this
- “As-if-random” assignment at the cut-off
- Continuity in potential outcomes at the cut-off
Estimation by extrapolation to the threshold.
- Works best when we have lots of units near the cut-off.