Writing Extensions to the did Package

Brantly Callaway and Pedro H.C. Sant’Anna



This vignette provides an example of extending the ideas of the did package to some other cases.

DiD with Anticipation

The example we give next is one where individuals anticipate participating in the treatment and adjust their behavior (or at least their outcomes change) before they actually participate in the treatment. This is a feature of some applications in economics – for example, in labor applications such as job training or job displacement, individual’s earnings often “dip” just before they actually participate in the treatment. One solution to this is to impose that parallel trends holds, but adjust the periods over which it holds. One version of this is to make the assumption that

Parallel Trends with Anticipation For all groups and time periods such that \(t \ge g-1\) \[ E[Y_t(0) - Y_{g-2}(0) | G=g] = E[Y_t(0) - Y_{g-2}(0) | C=1] \]

In other words, the path of outcomes, in the absence of participating in the treatment, between periods \((g-2)\) and \(t\) is the same for individuals in group \(g\) (which is not observed) as for untreated individuals (which is observed).

In this case, it is straightforward to show that, for all \(t \ge g-1\), \[ ATT(g,t) = E[Y_t - Y_{g-2} | G=g] - E[Y_t - Y_{g-2} | C=1] \]

This is very similar to the main case covered in the did package (see discussion in our Introduction to DiD with Multiple Time Periods vignette), except that the “base period” here is \((g-2)\) (two periods before individuals in group \(g\) become treated) rather than \((g-1)\) (one period before individuals in group \(g\) become treated) which is the setup used in the did package.

If you are worried that that units anticipate treatment participation two periods before treatment actually starts, you could address this by setting the base period to be \((g-3)\) instead. It is worth pointing out, however, that there is a tradeoff. Allowing for more periods of anticipation will require that the parallel trends assumption hold over more periods in order to identify the group-time average treatment effects. At any rate, for this vignette, we stick with the case where a researcher wants to allow anticipation exactly one period before units participate in the treatment.

Computing Treatment Effects under DiD with Anticipation

To start with, we want to point out that the important part for this vignette is not so much this particular application, but rather just to demonstrate how to write the code for this particular case.

To start with, we’ve constructed a dataset (called dta) that has anticipation. All groups follow parallel trends (in the absence of treatment) except in the period immediately before treatment (due to treatment anticipation). In those periods, outcomes for individuals who participate in the treatment “dip” – here, they decrease by 1 in the pre-treatment period. In post-treatment periods, the effect of participating in the treatment is equal to 1. This is simulated data, but it has features that would be common in, for example, an application evaluating the effect of a job training program.

Here is what the data looks like

where G defines the period when an individual first becomes treated, X is a covariate but we’ll ignore it (parallel trends holds in this example without having to condition on X), and Y is the outcome.

As a first step, we’ll try estimating the effect of participating in the treatment (focus on dynamic effects as in an event study plot) while just ignoring the possibility of anticipation.

# estimate group-time average treatment effects using att_gt method
# (and ignoring pre-treatment "dip")
attgt.ignoredip <- att_gt(yname = "Y",
                          tname = "period",
                          idname = "id",
                          gname = "G",
                          xformla = ~1,
                          data = dta,

# summarize the results
#> Call:
#> att_gt(yname = "Y", tname = "period", idname = "id", gname = "G", 
#>     xformla = ~1, data = dta)
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> Group-Time Average Treatment Effects:
#>  Group Time ATT(g,t) Std. Error [95% Simult.  Conf. Band]  
#>      3    2  -0.9867     0.0944       -1.2515     -0.7219 *
#>      3    3   1.9663     0.0633        1.7889      2.1437 *
#>      3    4   1.9931     0.0876        1.7475      2.2387 *
#>      3    5   2.0575     0.1245        1.7083      2.4066 *
#>      4    2   0.0267     0.0533       -0.1228      0.1762  
#>      4    3  -0.8470     0.1224       -1.1903     -0.5037 *
#>      4    4   1.9373     0.0670        1.7494      2.1252 *
#>      4    5   2.0648     0.0944        1.8001      2.3296 *
#>      5    2   0.0019     0.0564       -0.1562      0.1601  
#>      5    3   0.0115     0.0552       -0.1434      0.1665  
#>      5    4  -0.7937     0.1469       -1.2057     -0.3817 *
#>      5    5   2.1602     0.0679        1.9697      2.3506 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> P-value for pre-test of parallel trends assumption:  0
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

# make dynamic effects plot
p <- ggdid(aggte(attgt.ignoredip, "dynamic"))

# add actual treatment effects to the plot
truth <- cbind.data.frame(e = seq(-3,2), att.e = c(0,0,-1,1,1,1))
p <- p + geom_line(data = truth, aes(x = e, y = att.e), inherit.aes = FALSE, color = "blue")
p <- p + geom_point(data = truth, aes(x = e, y = att.e), inherit.aes = FALSE, color = "blue")

In the figure, the blue line is the “truth”. You can see that anticipation leads to two things. First, it leads to rejecting the parallel trends assumptions – you can correctly visualize the anticipation effects here and this is quite helpful for understanding what is going on. Second, however, the anticipation effects feed into the treatment effect estimates. Here, we over-estimate the effect of participating in the treatment due to the anticipation. Also, note that this example is really simple, and it would be easy to come up with cases where the results were much more complicated by anticipation.

Next we want to write our own code for getting around anticipation. A few notes:

Now to the code. We’ll start by writing a function to compute group-time average treatment effects and dynamic effects given some dataset (we’ll rely on the dataset having variables called G, id, period, and Y, but this would be easy to modify). Again, this function is a simplified version of what’s going on behind the scenes in the did package.

compute.attgt <- function(dta) {
  # pick up all groups
  groups <- unique(dta$G)

  # pick up all time periods
  time.periods <- unique(dta$period)

  # sort the groups and drop the untreated group
  groups <- sort(groups)[-1]

  # sort the time periods and drop the first two
  # (can't compute treatment effects for these two
  # periods with one-period anticipation -- w/o anticipation
  # we would just drop one period here)
  time.periods <- sort(time.periods)[-c(1,2)]

  # drop last time period (because we don't know if
  # these units are affected by anticipation or not
  # and we are being very careful)
  # (if you were worried about more than one anticipation
  # period here, would need to drop more time periods
  # from the end)
  time.periods <- time.periods[-length(time.periods)]

  # list to store all group-time average treatment effects
  # that we calculate
  attgt.list <- list()
  counter <- 1

  # loop over all groups
  for (g in groups) {

    # get the correct "base" period for this group
    # (subtract 2 to avoid anticipation)
    main.base.period <- g - 2
    # loop over all time periods
    for (tp in time.periods) {

      # if it's a pre-treatment time period (used for the
      # pre-test, we need to adjust the base period)

      # group not treated yet
      if (tp < g) {
        # move two periods before
        base.period <- tp - 2
      } else {
        # this is a post-treatment period
        base.period <- main.base.period

      # now, all we need to do is collect the right subset
      # of the data and estimate a 2x2 DiD

      # get group g and untreated group
      this.data <- subset(dta, G==g | G==0)

      # get current period and base period data
      this.data <- subset(this.data, period==tp | period==base.period)

      # set up to compute 2x2 estimator
      Ypost <- subset(this.data, period==tp)$Y
      Ypre <- subset(this.data, period==base.period)$Y

      # dummy variable being in group g
      G <- 1*(subset(this.data, period==tp)$G == g)

      # compute 2x2 estimator using DRDID package
      # (in this unconditional case, it would be straightforward
      # to calculate the 2x2 att just using averages, but we
      # like the DRDID package as it will work for more complicated
      # cases as well)
      attgt <- DRDID::reg_did_panel(Ypost, Ypre, G, covariates=NULL)$ATT

      # save results
      attgt.list[[counter]] <- list(att=attgt, group=g, time.period=tp)
      counter <- counter+1


  # aggregate into dynamic effects
  # turn results into a data.frame
  attgt.results <- do.call("rbind.data.frame", attgt.list)

  # add event time to the results
  attgt.results$e <- attgt.results$time.period - attgt.results$group
  # calculate relative sizes of each group
  # (will be used as weights)
  n.group <- sapply(groups, function(gg) nrow(subset(dta, G==gg)))
  # merge in group sizes
  ngroup.mat <- cbind(groups, n.group)
  attgt.results <- merge(attgt.results, ngroup.mat, by.x = "group", by.y = "groups")

  # event times to calculate dynamic effects
  eseq <- unique(attgt.results$e) 
  eseq <- sort(eseq)

  # calculate average effects by event time
  att.e <- c()
  counter <- 1
  for (this.e in eseq) {
    # get subset of results at this event time
    res.e <- subset(attgt.results, e==this.e)

    # calculate weights by group size
    res.e$weight <- res.e$n.group / sum(res.e$n.group)

    # calculate dynamic effect as weighted average
    att.e[counter] <- sum(res.e$att * res.e$weight)

    # on to the next one
    counter <- counter+1

  # store dynamic effects results
  dyn.results <- cbind.data.frame(e = eseq, att.e = att.e)

  # return group-time average treatment effects and dynamic effects

Now, we have a function to compute group-time average treatment effects and dynamic effects. Let’s use it on the data that we have

Finally, let’s compute some standard errors using the nonparametric bootstrap (which is different and much slower than the multiplier bootstrap that we use in the did package); we’ll just focus on the dynamic effects estimator

Finally, we can translate these into a plot. [One last thing here, to keep things very simple, we are going to plot pointwise confidence intervals, but we strongly suggest computing simultaneous confidence bands in practice.]

Here, the black line is our estimate and the blue line is the “truth”. You can immediately see that we get things right here when we account for anticipation. The only other thing to notice is that we do not estimate dynamic effects at as many different lengths of exposure, but that is the price to pay for accounting for anticipation.


In this vignette, we illustrated how one could apply the principles in the did package to the case where units anticipate participating in the treatment. This is a relatively straightforward exercise, and we think that showing this sort of extension is potentially useful for other situations that do not fit exactly into the cases covered by the did package.