\(\hat{\tau}\) is a weighted average over 2x2 differences-in-differences.
Only some of those are valid under our identification assumptions
Others require an additional constant effects assumption in order to identify an ATT.
Intuition:
The TWFE estimator under staggered adoption incorporates 2x2 DiD terms where the “baseline” period is in the future (where both units are under treatment)
Two-way FE w/ staggered adoption
Good 2x2
Two-way FE w/ staggered adoption
Bad 2x2
Two-way FE w/ staggered adoption
Consider the “forbidden comparison” with
\(t^\prime > t\), (baseline period is in the “future”)
\(g \le t < t^\prime\) (timing group of interest is treated prior to both periods).
\(t < g^\prime \le t^\prime\), (comparison group is treated at the baseline period)
Both \(\tau_l\) incorporate ATTs from other relative-times \(\neq l\) (“contamination bias”)
The dynamic specification is only valid if we believe each relative-time treatment effect is constant across timing groups.
Li and Strezhnev (2024) develop some more of the intuition
Each \(\hat{\tau_l}\) incorporates 2x2 DiDs with units that are also already treated but at different times
The bias due to the “contaminating” treatment effects gets differenced out by estimates of those relative time effects with different units
Dynamic TWFE
Dynamic specification - good 2x2s
Dynamic TWFE
Dynamic specification - contaminated 2x2s
Dynamic TWFE
Dynamic specification - (some) de-contamination
Example: Paglayan (2019, AJPS)
Let’s fit the dynamic specification on the Paglyan (2019) dataset.
# Keep allunion_rep2 <- unionunion_rep2 <- union_rep2 %>%mutate(yearFromCB = year - YearCBrequired)# Make the never-treateds "infinity" (ensure that their dummy will be dropped as well)union_rep2$yearFromCB[is.na(union_rep2$YearCBrequired)] <-Inf# Make the dummy variables using the factor syntax - make -1 the reference periodunion_rep2$yearFromCBFactor <-relevel(as.factor(union_rep2$yearFromCB), ref="-1")
Example: Paglayan (2019, AJPS)
Sun and Abraham (2021) Estimator
One solution to the problem of contamination bias is to estimate the relative treatment time effects separately for each unique treatment timing group
Control group is the never treated
Easy to implement in a single TWFE regression with interactions between cohort-indicators and relative-treatment time indicators.
Aggregate into average effects for each relative-treatment time by averaging over the cohorts with that particular relative treatment time.
Implemented in the fixest R package using the sunab() syntax
Example: Paglayan (2019, AJPS)
Implementing the Sun and Abraham (2021) estimator
### Per-pupil expenditure# Code never-treated (NA) as Inf so sunab treats them as the control cohortunion_rep2_sa <- union_rep2union_rep2_sa$YearCBrequired[is.na(union_rep2_sa$YearCBrequired)] <-Infdyn_reg_rep2_sa <-feols(lnppexpend ~sunab(YearCBrequired, year, ref.p=-1) | year + State,data=union_rep2_sa, cluster="State")
Example: Paglayan (2019, AJPS)
Callaway and Sant’anna (2021) Estimator
Rather than correcting the TWFE regression, directly estimate each group-time ATT via a simple 2x2 DiD
Compare each treated cohort \(g\) at time \(t\) to a not-yet-treated (or never-treated) control group
Use the period just before treatment (\(g - 1\)) as the baseline
Step 4: Aggregate the \(\hat{\tau}_{it}\) into relevant quantities (e.g. relative time effects, overall average, etc…)
Implemented in fect (among other packages)
Regression Imputation Estimator
The regression imputation estimator for a given group-time ATT\[ATT_{g}(t)\] can be expressed as an average over 2x2 differences-in-differences
Using all the pre-treatment periods as baselines (\(t^\prime < g\))
And using all units that adopt treatment after\(g\) (\(g^\prime > g\))
This creates a kind of sequential structure to the imputations for each post-treatment period
\(ATT_{g}(g)\) is a simple average over DiDs…
…but \(ATT_{g}(g+1)\) incorporates the estimates from \(ATT_{g+1}(g+1)\)…
…and so on…
Units under treatment are used as “controls” with the model-imputed counterfactual acting as \(Y_{it}\) for that period.
Regression Imputation Estimator
Regression Imputation Group 3, Time 3
Regression Imputation Estimator
Regression Imputation Group 4, Time 4
Regression Imputation Estimator
Regression Imputation Group 3, Time 4
Example: Paglayan (2019, AJPS)
Implementing the regression imputation estimator using fect
library(fect)union_fect <- unionunion_fect$CBrequired_SY <-as.numeric(union_fect$CBrequired_SY)fect_fit <-fect(lnppexpend ~ CBrequired_SY, data = union_fect,index =c("State", "year"), method ="fe",force ="two-way", se =TRUE)
Example: Paglayan (2019, AJPS)
New DiD is just old DiD
All of the “heterogeneity-robust” difference-in-differences methods work in basically the same way
Estimate the group-time ATTs using only the 2x2 comparisons that are valid w/o additional homogeneity assumptions
There are basically two classes of estimators
Either “fix the TWFE regression” (Sun and Abraham, regression imputation, ‘extended’ TWFE)
Or construct the “first-differences” regression (Callaway/Sant’anna)
Implementations of these methods will differ on other options, but these are researcher choices
Which units are used as “controls”
Which periods act as the baseline
How to estimate standard errors (some form of asymptotic cluster SEs or bootstrap)
Interpreting event study plots w/ “new” DiD estimators
One downside of abandoning the traditional “event study plot” is that there are no clear convention for how to construct the plot of the treatment effects and the pre-treatment placebos usinig the new DiD estimators.
Common software implementations don’t generate figures that have the same interpretation as the dynamic TWFE (Roth, 2026)
Sun and Abraham (2021) is the most direct equivalent to the original event study plot
All treatment effects and placebos are estimated from the same held out common baseline.
But remember that the composition of each relative treatment time varies
Late adopters don’t contribute to many of the treatment effects, early adopters don’t contribute to many of the placebos.
Callaway/Sant’anna - Perils of the “varying” baseline
One option in the Callaway-Sant’anna package is to compute placebos using a varying baseline.
This generates event study plots that are asymmetric
Post-treatment estimates all use \(-1\) as the baseline time period
But the pre-treatment estimates are always relative to adjacent periods
“Short” DiDs only, not “long”
Visually there is a kink even in the absence of any treatment effect!
Regression imputation estimators exhibit an even more troubling problem
How do you construct a placebo test if all of the control observations are being used to estimate the treatment effects?
Trilemma - You have to give up one of…
Not using the same observations twice (imputing for units in the imputation regression)
Using the same baselines as the treatment effect estimates
Imputing for all of the pre-treatment periods
What you give up depends on how you plan to use the pre-treatment placebo estimates.
But…you should definitely avoid imputing from the same regression as you used for the post-treatment observations (the default in fect)
Example with the fect defaults
Regression Imputation
Li and Strezhnev (2026) show that the in-sample imputation approach suffers from two biases if you care about the magnitudes of the pre-treatment coefficients
Attenuation bias - Some of the component DiD comparisons are zero by construction since they re-use the same unit or period twice
Contamination bias - Under staggered adoption, placebo estimates for periods further away from treatment incorporate estimates for periods closer to treatment
Possible solution
“Double-leave-one-out” approach - Estimate the placebo effects using separate models for each treatment timing group and time period.
Leave out all units that adopt treatment prior to the cohort of interest and the time period being imputed-for.
In-sample imputation bias
We can write the regression imputation estimator for a pre-treatment period as
Gazmararian, Alexander F. “Sources of partisan change: Evidence from the shale gas shock in American coal country.” The Journal of Politics 87, no. 2 (2025): 601-615.
Examines whether the 2008 shale gas shock (fracking boom) shifted Republican presidential vote share in U.S. coal counties
Gas substitutes for coal, accelerating coal’s decline
Voters dependent on coal shift toward Republicans who promised looser environmental regulation
Treatment: coal county (\(\geq\) 1% coal employment pre-2008) \(\times\) post-2008
Researcher selects the \(\epsilon\) - what size of an effect is considered “negligible”
Existing recommendations mostly come from the literature on balance checking (Hartman and Hidalgo, 2018).
For assessing pre-trends, most intuitive to just benchmark against observed effects.
Easy implementation - Two One-Sided Test (TOST) approach
Reject (in favor of a negligible difference) if the 90% confidence interval is entirely within the “equivalence region”
Replication: In-sample vs. leave-one-out
Replication: Fixed baseline estimates
HonestDiD: Partial identification
An alternative approach to incorporating the actual pre-treatment placebos is to use them to bound the potential violation of parallel trends in the post-treatment period.
Rambachan and Roth (2023) propose a partial identification approach based on such a user specified bound
Develop an approach for construct confidence sets under this violation
In our conventional event-study regression, we get estimates of…
\(\beta_{l}^{\text{post}} = \tau_{l}^{\text{post}} + \delta_l^{\text{post}}\) - the treatment effects - a combination of the “true” effect and the parallel trends violation
\(\beta_{l}^{\text{pre}} = \delta_l^{\text{pre}}\) - the pre-treatment “placebos” - these are only capturing the parallel trends violation
We don’t want to assume \(\delta = 0\).
Instead, we’ll relax that assumption by assuming \(\delta \in \Delta\), some user-specified set of restrictions
HonestDiD: Partial identification
One popular approach recommended in Rambachan and Roth (2023) is to bound the relative magnitudes
The per-period change in the violation of parallel trends in the post-treatment period is no more than \(\bar{M}\) times the largest observed per-period violation in the pre-treatment period