Week 13, Part 2: Sensitivity Analysis

PS 813 - Causal Inference

Anton Strezhnev

University of Wisconsin-Madison

April 22, 2026

\[ \require{cancel} \]

Today

  • Back to selection on observables!
  • How can we diagnose our no unobserved confounders assumption?
    • What sorts of confounders would threaten our results (cause them to go to \(0\)).
  • Omitted variable bias formula
    • Allows us to “sign the bias” of a proposed confounder based on the outcome-confounder and outcome-treatment relationships
  • Sensitivity analysis
    • How bad of a hypothetical confounder would we need to break the result?
    • Provides a benchmark for any critiques of a selection-on-observables assumption.

Omitted Variable Bias

OVB in linear models

  • Suppose we want to identify the effect of \(D\) on \(Y\) conditional on pre-treatment covariates \(\mathbf{X}\). Assume we’re willing to assume a linear model for the outcome and that there exists one omitted covariate \(Z\)

    \[Y = \hat{\tau} D + \mathbf{X}\hat{\beta} + \hat{\gamma} Z + \hat{\epsilon}\]

  • What happens if we instead estimate the “restricted” model with \(Z\) omitted?

    \[Y = \hat{\tau}_{\text{res}} D + \mathbf{X}\hat{\beta}_{\text{res}} + \hat{\epsilon}_{\text{res}}\]

  • What’s the relationship between \(\hat{\tau}_{\text{res}}\) and \(\hat{\tau}\)?

OVB in linear models

  • Let’s define \(D^{\perp \mathbf{X}}\) as the “partialled-out” value of \(D\) (the residuals from a regression of \(D\) on \(X\)).

    • Similarly define \(Y^{\perp \mathbf{X}}\) as the “partialled-out” value of \(Y\) given \(X\)
  • By the Frisch-Waugh-Lovell theorem, we can write any regression coefficient in terms of the “partialled” bivariate regression

    \[\hat{\tau}_{\text{res}} = \frac{\text{cov}(D^{\perp \mathbf{X}}, Y^{\perp \mathbf{X}})}{\text{var}(D^{\perp \mathbf{X}})}\]

  • Using our definition of \(Y\) (by the linear model)

    \[\hat{\tau}_{\text{res}} = \frac{\text{cov}(D^{\perp \mathbf{X}}, \hat{\tau}D^{\perp \mathbf{X}} + \hat{\gamma}Z^{\perp \mathbf{X}})}{\text{var}(D^{\perp \mathbf{X}})}\]

  • Properties of covariance

    \[\hat{\tau}_{\text{res}} = \hat{\tau}\frac{\text{cov}(D^{\perp \mathbf{X}}, D^{\perp \mathbf{X}})}{{\text{var}(D^{\perp \mathbf{X}})}} + \hat{\gamma}\frac{\text{cov}(D^{\perp \mathbf{X}}, Z^{\perp \mathbf{X}})}{\text{var}(D^{\perp \mathbf{X}})}\]

OVB in linear models

  • Simplifying

    \[\hat{\tau}_{\text{res}} = \hat{\tau} + \hat{\gamma}\frac{\text{cov}(D^{\perp \mathbf{X}}, Z^{\perp \mathbf{X}})}{\text{var}(D^{\perp \mathbf{X}})}\]

  • We can recognize that the last term is the coefficient on \(D\) from a regression of \(Z\) on \(D\) and \(X\) (again using FWL) - let’s call this \(\hat{\delta}\)

    \[\hat{\tau}_{\text{res}} = \hat{\tau} + \hat{\gamma}\hat{\delta}\]

  • So the discrepancy between the “restricted” and “unrestricted” models can be written as the product of two coefficients - the relationship between \(Z\) and \(Y\) (given \(X\)) and the relationship between \(Z\) and \(D\) (given \(D\))

    \[\widehat{\text{bias}} = \hat{\gamma}\hat{\delta}\]

Signing the bias

  • Gentzkow, Shapiro, Sinkinson (2011: AER) examine the effect of newspaper entry on political competitiveness in the US counties from 1869 to 1928.
    • Outcome: Presidential/congressional turnout
    • Treatment: Number of new newspapers
    • Finding: More newspapers increase turnout.
  • Consider some hypothetical confounders, in what direction would they bias the estimate?
    • Population growth: How would population growth affect newspaper entry? How would it affect turnout?
    • Income growth How would income growth affect newspaper entry and turnout?
  • How would we expect either of these confounders to alter our estimate?

Sensitivity analyses

Sensitivity analysis

  • Sensitivity analyses ask the question “how bad of a violation of our identification assumptions would break our result?”
    • We use a parameter (or parameters) to represent the violation and re-do the analysis.
    • Vary the parameter over a (sensible) range - how often do our results appreciably change (e.g. become zero or flip sign)
  • Challenge: How do we define a suitable sensitivity parameter that has actual interpretability?

Confounding function

  • A general approach to thinking about sensitivity parameters in a binary treatment setting comes from Blackwell (2014)

  • Define the “confounding function”

    \[q(d, x) = E[Y_i(d) | D_i = d, X_i = x] - E[Y_i(d) | D_i = 1 - d, X_i = x]\]

  • The confounding function captures the extent to which the potential outcomes differ between a treated unit and a control unit with \(X_i = x\).

    • Under ignorability, \(q(d, x) = 0\)
  • We could set the confounding function to have a particular form:

    • For example, \(q(d, x) = \alpha\) implies the selection bias is constant at all levels of \(X_i\)

Sensitivity analyses

  • Given a value of \(q(d, x)\), we can straightforwardly implement a sensitivity analysis by de-biasing the outcome

    \[Y_i^q = Y_i - q(D_i, X_i) \times Pr(1 - D_i | X_i)\]

  • Then, run the analysis on \(Y_i^q\)

    • Vary the sensitivity parameters for \(\alpha\) and see what magnitude of confounding is enough to “break” the results.

Sensitivity analyses

  • The intuition for the debiasing comes from our selection-into-treatment bias decomposition. Without covariates:

\[\begin{align*} E[Y_i(0)] &= E[Y_i(0) |D_i = 0] Pr(D_i = 0) + E[Y_i(0) | D_i = 1] Pr(D_i = 1)\\ &= E[Y_i(0) |D_i = 0] - \bigg(E[Y_i(0) | D_i = 0] - E[Y_i(0) | D_i = 1]\bigg) \times Pr(D_i = 1)\\ &= E[Y_i |D_i = 0] - q(0) \times Pr(D_i = 1)\\ &= E[Y_i^q | D_i = 0] \end{align*}\]

Sensitivity analysis in linear models

  • An alternative approach is to think about confounding in terms of two quantities
    • The relationship between treatment and confounder
    • The relationship between outcome and confounder
  • In a linear model setting, we could construct a sensitivity analysis in terms of two parameters:
    • The partial regression coefficient between \(Z\) and \(Y\), \(\hat{\gamma}\)
    • The partial regression coefficient between \(Z\) and \(D\) \(\hat{\delta}\)
  • Slightly annoying since each of these depends on the scale of \(Z\) and \(Y\) - can we re-write in terms of parameters with the same range irrespective of the outcome?
    • Cinelli and Hazlett (2020) provide a reparameterization in terms of the partial \(R^2\) of two regressions involving \(Z\) (which are always between \(0\) and \(1\))

Rewriting the bias

  • Start by defining the \(R^2_{Z \sim D}\) as the \(R^2\) from a regression of \(Z\) on \(D\).

    • For OLS: \(R^2_{Z \sim D} = 1 - \frac{\text{Var}(Z^{\perp D})}{\text{Var}(Z)} = \text{cor}(Z, D)^2 = \left(\frac{\text{cov}(Z, D)}{\text{sd}(Z)\text{sd}(D)}\right)^2\)
    • Same thing for the partial \(R^2\): \(R^2_{Z \sim D | \mathbf{X}} = \text{cor}(Z^{\perp \mathbf{X}}, D^{\perp \mathbf{X}})^2\)
  • Now write our bias term

    \[\widehat{\text{bias}} = \left(\frac{\text{cov}(D^{\perp \mathbf{X}}, Z^{\perp \mathbf{X}})}{\text{var}(D^{\perp \mathbf{X}})}\right) \left(\frac{\text{cov}(Y^{\perp \mathbf{X}, D}, Z^{\perp \mathbf{X}, D})}{\text{var}(Z^{\perp \mathbf{X}, D})}\right)\]

  • Convert covariance to correlation

    \[\widehat{\text{bias}} = \left(\frac{\text{cor}(D^{\perp \mathbf{X}}, Z^{\perp \mathbf{X}}) \text{sd}(Z^{\perp \mathbf{X}})}{\text{sd}(D^{\perp \mathbf{X}})}\right) \left(\frac{\text{cor}(Y^{\perp \mathbf{X}, D}, Z^{\perp \mathbf{X}, D})\text{sd}(Y^{\perp \mathbf{X}, D})}{\text{sd}(Z^{\perp \mathbf{X}, D})}\right)\]

Rewriting the bias

  • Rearrange terms

    \[\widehat{\text{bias}} = \left(\frac{\text{cor}(D^{\perp \mathbf{X}}, Z^{\perp \mathbf{X}}) \text{cor}(Y^{\perp \mathbf{X}, D}, Z^{\perp \mathbf{X}, D})}{\frac{\text{sd}(Z^{\perp \mathbf{X},D})}{\text{sd}(Z^{\perp \mathbf{X}})}}\right) \left(\frac{\text{sd}(Y^{\perp \mathbf{X}, D})}{\text{sd}(D^{\perp \mathbf{X}})}\right)\]

  • Square everything

    \[\widehat{\text{bias}}^2 = \left(\frac{\text{cor}(D^{\perp \mathbf{X}}, Z^{\perp \mathbf{X}})^2 \text{cor}(Y^{\perp \mathbf{X}, D}, Z^{\perp \mathbf{X}, D})^2}{\frac{\text{var}(Z^{\perp \mathbf{X},D})}{\text{var}(Z^{\perp \mathbf{X}})}}\right) \left(\frac{\text{var}(Y^{\perp \mathbf{X}, D})}{\text{var}(D^{\perp \mathbf{X}})}\right)\]

  • Substitute in the (partial) \(R^2\) parameters

    \[\widehat{\text{bias}}^2 = \left(\frac{R^2_{D \sim Z | \mathbf{X}} R^2_{Y \sim Z | \mathbf{X}, D}}{1- R^2_{D \sim Z|\mathbf{X}}}\right) \left(\frac{\text{var}(Y^{\perp \mathbf{X}, D})}{\text{var}(D^{\perp \mathbf{X}})}\right)\]

Rewriting the bias

  • Take the square root to get the absolute bias

    \[|\widehat{\text{bias}}|= \sqrt{\frac{R^2_{D \sim Z | \mathbf{X}} R^2_{Y \sim Z | \mathbf{X}, D}}{1- R^2_{D \sim Z|\mathbf{X}}}}\left(\frac{\text{sd}(Y^{\perp \mathbf{X}, D})}{\text{sd}(D^{\perp \mathbf{X}})}\right)\]

  • Some important intuitions

    • The bias is a product of the magnitude of the two \(R^2\). An unobserved confounder that explains very little of the treatment needs to explain a lot of the outcome to induce a sizeable bias (and vice-versa)
    • The bias is smaller when the amount of variation in the outcome given \(X\) and \(D\) is low (not much \(Y\) left to explain)
    • The bias is amplified when the variability in \(D\) given \(X\) is low.

Illustration: Hazlett (2019)

  • Hazlett (2019; JCR) examines the impact of exposure to violence on attitudes towards peace in the context of the war in Darfur.
    • Key finding: Refugees with greater exposure to violence are more likely to express support for peace - support for a “war-weariness” theory of attitudes during conflict.
    • Identification strategy: Selection-on-observables conditional on village and gender (plus other covariates).
    • Argues that exposure to violence by pro-government militias across villages was non-random but within-village often indiscriminate.
  • How bad does the residual confounding need to be to break the result?

Illustration: Hazlett (2019)

library(sensemakr)
data('darfur')

darfur.reg <- lm(peacefactor ~ directlyharmed + village + female + age + farmer_dar +
                   herder_dar + pastvoted + hhsize_darfur, data=darfur)
tidy(darfur.reg) %>% filter(term == "directlyharmed")
# A tibble: 1 × 5
  term           estimate std.error statistic   p.value
  <chr>             <dbl>     <dbl>     <dbl>     <dbl>
1 directlyharmed   0.0973    0.0233      4.18 0.0000318
sd(darfur$peacefactor)
[1] 0.348

Illustration: Hazlett (2019)

  • We would like to generate a plot of how the results would change as we vary the two \(R^2\) parameters by calculating the bias across each of the possible parameter values.
    • Need 3 dimensions (each \(R^2\) plus the estimate)
    • Can do this manually…but extremely tedious - luckily Cinelli and Hazlett make a great R package sensemakr
sensitivity <- sensemakr(model = darfur.reg,
                         treatment = "directlyharmed",
                         benchmark_covariates = "female",
                         kd = 1:3,
                         ky = 1:3,
                         q = 1,
                         alpha = .05,
                         reduce = T)

Illustration: Hazlett (2019)

Illustration: Hazlett (2019)

  • “Robustness Value” - consider a confounder that has an equal partial \(R^2\) with the outcome and the treatment – what’s the smallest such \(R^2\) necessary to drive the result to \(0\) (or insignificance).
sensitivity$sensitivity_stats
       treatment estimate     se t_statistic r2yd.x  rv_q  rv_qa f2yd.x dof
1 directlyharmed   0.0973 0.0233        4.18 0.0219 0.139 0.0763 0.0224 783
  • While the \(R^2\) stats do have an intuitive interpretation, there is no absolute scale for what constitutes a “robust” vs. “non-robust” result.
    • It depends also on how much unexplained variability there is in the outcome.
    • Useful to benchmark the unobserved confounding against other known confounders

Illustration: Hazlett (2019)

  • Consider the previous contour plot - the red points indicate bias under hypothetical confounding that is \(1x\), \(2x\), and \(3x\) as strong as gender
  • Why pick gender (versus any other observed confounder)?
    • We have prior theoretical reasons to believe it’s strongly associated with both outcome and treatment.
  • Caution with informal benchmarking - Cinelli and Hazlett (2020) show that just calculating the observed partial \(R^2\)s for the benchmarks can be inaccurate.
    • Estimates of how \(X\) relates to \(Y\) may be biased due to omission of \(Z\).
    • Also \(D\) is a collider.
  • C+H (2020) derive formal bounds for the “benchmark” exercise with a set of observed covariates.

Summary

  • Sensitivity analyses are a tool for arguing
    • You can always find values of the sensitivity parameters for which the results fail to hold
    • You can always find values of the senstiviity parameters for which the results do hold.
  • What sensitivity analyses do is describe how severe a violation of selection-on-observables needs to be in order to threaten the main conclusions.
    • (e.g.) Smoking and cancer – even if there were some unobserved confounder it would need to explain an enormous amount of variance for us to conclude no effect.
  • Norms are developing about sensitivity analyses and reporting in observational designs
    • “Robustness Values” and contour plots
    • How best to “benchmark” – what to benchmark against
  • Don’t be surprised if your reviewers start requesting these!