# A tibble: 1 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 directlyharmed 0.0973 0.0233 4.18 0.0000318
[1] 0.348
PS 813 - Causal Inference
April 22, 2026
\[ \require{cancel} \]
Suppose we want to identify the effect of \(D\) on \(Y\) conditional on pre-treatment covariates \(\mathbf{X}\). Assume we’re willing to assume a linear model for the outcome and that there exists one omitted covariate \(Z\)
\[Y = \hat{\tau} D + \mathbf{X}\hat{\beta} + \hat{\gamma} Z + \hat{\epsilon}\]
What happens if we instead estimate the “restricted” model with \(Z\) omitted?
\[Y = \hat{\tau}_{\text{res}} D + \mathbf{X}\hat{\beta}_{\text{res}} + \hat{\epsilon}_{\text{res}}\]
What’s the relationship between \(\hat{\tau}_{\text{res}}\) and \(\hat{\tau}\)?
Let’s define \(D^{\perp \mathbf{X}}\) as the “partialled-out” value of \(D\) (the residuals from a regression of \(D\) on \(X\)).
By the Frisch-Waugh-Lovell theorem, we can write any regression coefficient in terms of the “partialled” bivariate regression
\[\hat{\tau}_{\text{res}} = \frac{\text{cov}(D^{\perp \mathbf{X}}, Y^{\perp \mathbf{X}})}{\text{var}(D^{\perp \mathbf{X}})}\]
Using our definition of \(Y\) (by the linear model)
\[\hat{\tau}_{\text{res}} = \frac{\text{cov}(D^{\perp \mathbf{X}}, \hat{\tau}D^{\perp \mathbf{X}} + \hat{\gamma}Z^{\perp \mathbf{X}})}{\text{var}(D^{\perp \mathbf{X}})}\]
Properties of covariance
\[\hat{\tau}_{\text{res}} = \hat{\tau}\frac{\text{cov}(D^{\perp \mathbf{X}}, D^{\perp \mathbf{X}})}{{\text{var}(D^{\perp \mathbf{X}})}} + \hat{\gamma}\frac{\text{cov}(D^{\perp \mathbf{X}}, Z^{\perp \mathbf{X}})}{\text{var}(D^{\perp \mathbf{X}})}\]
Simplifying
\[\hat{\tau}_{\text{res}} = \hat{\tau} + \hat{\gamma}\frac{\text{cov}(D^{\perp \mathbf{X}}, Z^{\perp \mathbf{X}})}{\text{var}(D^{\perp \mathbf{X}})}\]
We can recognize that the last term is the coefficient on \(D\) from a regression of \(Z\) on \(D\) and \(X\) (again using FWL) - let’s call this \(\hat{\delta}\)
\[\hat{\tau}_{\text{res}} = \hat{\tau} + \hat{\gamma}\hat{\delta}\]
So the discrepancy between the “restricted” and “unrestricted” models can be written as the product of two coefficients - the relationship between \(Z\) and \(Y\) (given \(X\)) and the relationship between \(Z\) and \(D\) (given \(D\))
\[\widehat{\text{bias}} = \hat{\gamma}\hat{\delta}\]
A general approach to thinking about sensitivity parameters in a binary treatment setting comes from Blackwell (2014)
Define the “confounding function”
\[q(d, x) = E[Y_i(d) | D_i = d, X_i = x] - E[Y_i(d) | D_i = 1 - d, X_i = x]\]
The confounding function captures the extent to which the potential outcomes differ between a treated unit and a control unit with \(X_i = x\).
We could set the confounding function to have a particular form:
Given a value of \(q(d, x)\), we can straightforwardly implement a sensitivity analysis by de-biasing the outcome
\[Y_i^q = Y_i - q(D_i, X_i) \times Pr(1 - D_i | X_i)\]
Then, run the analysis on \(Y_i^q\)
\[\begin{align*} E[Y_i(0)] &= E[Y_i(0) |D_i = 0] Pr(D_i = 0) + E[Y_i(0) | D_i = 1] Pr(D_i = 1)\\ &= E[Y_i(0) |D_i = 0] - \bigg(E[Y_i(0) | D_i = 0] - E[Y_i(0) | D_i = 1]\bigg) \times Pr(D_i = 1)\\ &= E[Y_i |D_i = 0] - q(0) \times Pr(D_i = 1)\\ &= E[Y_i^q | D_i = 0] \end{align*}\]
Start by defining the \(R^2_{Z \sim D}\) as the \(R^2\) from a regression of \(Z\) on \(D\).
Now write our bias term
\[\widehat{\text{bias}} = \left(\frac{\text{cov}(D^{\perp \mathbf{X}}, Z^{\perp \mathbf{X}})}{\text{var}(D^{\perp \mathbf{X}})}\right) \left(\frac{\text{cov}(Y^{\perp \mathbf{X}, D}, Z^{\perp \mathbf{X}, D})}{\text{var}(Z^{\perp \mathbf{X}, D})}\right)\]
Convert covariance to correlation
\[\widehat{\text{bias}} = \left(\frac{\text{cor}(D^{\perp \mathbf{X}}, Z^{\perp \mathbf{X}}) \text{sd}(Z^{\perp \mathbf{X}})}{\text{sd}(D^{\perp \mathbf{X}})}\right) \left(\frac{\text{cor}(Y^{\perp \mathbf{X}, D}, Z^{\perp \mathbf{X}, D})\text{sd}(Y^{\perp \mathbf{X}, D})}{\text{sd}(Z^{\perp \mathbf{X}, D})}\right)\]
Rearrange terms
\[\widehat{\text{bias}} = \left(\frac{\text{cor}(D^{\perp \mathbf{X}}, Z^{\perp \mathbf{X}}) \text{cor}(Y^{\perp \mathbf{X}, D}, Z^{\perp \mathbf{X}, D})}{\frac{\text{sd}(Z^{\perp \mathbf{X},D})}{\text{sd}(Z^{\perp \mathbf{X}})}}\right) \left(\frac{\text{sd}(Y^{\perp \mathbf{X}, D})}{\text{sd}(D^{\perp \mathbf{X}})}\right)\]
Square everything
\[\widehat{\text{bias}}^2 = \left(\frac{\text{cor}(D^{\perp \mathbf{X}}, Z^{\perp \mathbf{X}})^2 \text{cor}(Y^{\perp \mathbf{X}, D}, Z^{\perp \mathbf{X}, D})^2}{\frac{\text{var}(Z^{\perp \mathbf{X},D})}{\text{var}(Z^{\perp \mathbf{X}})}}\right) \left(\frac{\text{var}(Y^{\perp \mathbf{X}, D})}{\text{var}(D^{\perp \mathbf{X}})}\right)\]
Substitute in the (partial) \(R^2\) parameters
\[\widehat{\text{bias}}^2 = \left(\frac{R^2_{D \sim Z | \mathbf{X}} R^2_{Y \sim Z | \mathbf{X}, D}}{1- R^2_{D \sim Z|\mathbf{X}}}\right) \left(\frac{\text{var}(Y^{\perp \mathbf{X}, D})}{\text{var}(D^{\perp \mathbf{X}})}\right)\]
Take the square root to get the absolute bias
\[|\widehat{\text{bias}}|= \sqrt{\frac{R^2_{D \sim Z | \mathbf{X}} R^2_{Y \sim Z | \mathbf{X}, D}}{1- R^2_{D \sim Z|\mathbf{X}}}}\left(\frac{\text{sd}(Y^{\perp \mathbf{X}, D})}{\text{sd}(D^{\perp \mathbf{X}})}\right)\]
Some important intuitions
# A tibble: 1 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 directlyharmed 0.0973 0.0233 4.18 0.0000318
[1] 0.348
sensemakr treatment estimate se t_statistic r2yd.x rv_q rv_qa f2yd.x dof
1 directlyharmed 0.0973 0.0233 4.18 0.0219 0.139 0.0763 0.0224 783
PS 813 - University of Wisconsin - Madison