Getting Started with the did Package

Brantly Callaway and Pedro H.C. Sant’Anna

2022-07-19

Introduction

This vignette discusses the basics of using Difference-in-Differences (DiD) designs to identify and estimate the average effect of participating in a treatment with a particular focus on tools from the did package. The background article for it is Callaway and Sant’Anna (2021), “Difference-in-Differences with Multiple Time Periods”.

We use some notation in this vignette that is fully explained in our Introduction to DiD with Multiple Time Periods vignette.

Examples with simulated data

Let’s start with a really simple example with simulated data. Here, there are going to be 4 time periods. There are 4000 units in the treated group that are randomly (with equal probability) assigned to first participate in the treatment (a group) in each time period. And there are 4000 ``never treated’’ units. The data generating process for untreated potential outcomes

\[ Y_{it}(0) = \theta_t + \eta_i + X_i'\beta_t + v_{it} \]

This is an example of a very simple model for untreated potential outcomes that is compatible with a conditional parallel trends assumption. In particular,

Estimating Group-Time Average Treatment Effects

Building the dataset

Estimating Group-Time Average Treatment Effects

The main function to estimate group-time average treatment effects is the att_gt function. See documentation here. The most basic call to att_gt is given in the following

The summary of example_attgt provides estimates of the group-time average treatment effects in the column labeled att, and the corresponding bootstrapped-based standard errors are given in the column se. The corresponding groups and times are given in the columns group and time. Under the no-anticipation and parallel trends assumptions, group-time average treatment effects are identified in periods when \(t \geq g\) (i.e., post-treatment periods for each group). The table also reports pseudo group-time average treatment effects when \(t < g\) (i.e., pre-treatment periods for group \(g\)). These can be used as a pre-test for the parallel trends assumption (as long as we assume that the no-anticipation assumption indeed holds). In addition, the results of a Wald pre-test of the parallel trends assumption is reported in the summary of the results. A much more detailed discussion of using the did package for pre-testing is available here.

Next, we’ll demonstrate how to plot group-time average treatment effects. To plot these, use the ggdid function which builds off the ggplot2 package.

The resulting figure is one that contains separate plots for each group. Notice in the figure above, the first plot is labeled “Group 2”, the second “Group 3”, etc. Then, the figure contains estimates of group-time average treatment effects for each group in each time period along with a simultaneous confidence interval. The red dots in the plots are pre-treatment pseudo group-time average treatment effects and are most useful for pre-testing the parallel trends assumption. The blue dots are post-treatment group-time average treatment effects and should be interpreted as the average effect of participating in the treatment for units in a particular group at a particular point in time.

Other features of the did package

The above discussion covered only the most basic case for using the did package. There are a number of simple extensions that are useful in applications.

Adjustments for Multiple Hypothesis Testing

By default, the did package reports simultaneous confidence bands in plots of group-time average treatment effects with multiple time periods – these are confidence bands that are robust to multiple hypothesis testing [essentially, the idea here is to use the same standard errors but make an adjustment to the critical value to account for multiple testing – in the example in this section, the critical value for a 95% uniform confidence band is 2.73 instead of 1.96]. You can turn this off and compute analytical standard errors and corresponding figures with pointwise confidence intervals by setting bstrap=FALSE, cband=FALSE in the call to att_gtbut we don’t recommend it!

Aggregating group-time average treatment effects

In many applications, there can be a large number of groups and time periods. In this case, it may be infeasible to interpret plots of group-time average treatment effects. The did package provides a number of ways to aggregate group-time average treatment effects using the aggte function.

Simple Aggregation

One idea that is likely to immediately come to mind is to just return a weighted average of all group-time average treatment effects with weights proportional to the group size. This is available by calling the aggte function with type = simple.

This sort of aggregation immediately avoids the negative weights issue that two-way fixed effects regressions can suffer from, but we often think that there are better alternatives. In particular, this simple aggregation tends to overweight the effect of early-treated groups simply because we observe more of them during post-treatment periods. We think there are likely to be better alternatives in most applications.

Dynamic Effects and Event Studies

One of the most common alternative approaches is to aggregate group-time effects into an event study plot. Group-time average treatment effects can immediately be averaged into average treatment effects at different lengths of exposure to the treatment using the following code:

In this figure, the x-axis is the length of exposure to the treatment. Length of exposure equal to 0 provides the average effect of participating in the treatment across groups in the time period when they first participate in the treatment (instantaneous treatment effect). Length of exposure equal to -1 corresponds to the time period before groups first participate in the treatment, and length of exposure equal to 1 corresponds to the first time period after initial exposure to the treatment.

As we would expect based on the data that we generated, it looks like parallel trends holds in pre-treatment periods and the effect of participating in the treatment is increasing with length of exposure of the treatment.

The Overall ATT here averages the average treatment effects across all lengths of exposure to the treatment.

Group-Specific Effects

Another idea is to look at average effects specific to each group. It is also straightforward to aggregate group-time average treatment effects into group-specific average treatment effects using the following code:

In this figure, the y-axis is categorized by group. The x-axis provides estimates of the average effect of participating in the treatment for units in each group averaged across all time periods after that group becomes treated. In our example, the average effect of participating in the treatment is largest for group 2 – this is because the effect of treatment here increases with length of exposure to the treatment, and they are the group that is exposed to the treatment for the longest.

The Overall ATT averages the group-specific treatment effects across groups. In our view, this parameter is a leading choice as an overall summary effect of participating in the treatment. It is the average effect of participating in the treatment that was experienced across all units that participate in the treatment in any period. In this sense, it has a similar interpretation to the ATT in the textbook case where there are exactly two periods and two groups.

Small Group Sizes

Small group sizes can sometimes cause estimation problems in the did package. To give an example, if there are any groups that have fewer observations than the number of covariates in the model, the code will error. The did package reports a warning if there are any groups that have fewer than 5 observations.

In addition, statistical inference, particularly on group-time average treatment may become more tenuous with small groups. For example, the effective sample size for estimating the change in outcomes over time for individuals in a particular group is equal to the number of observations in that group and asymptotic results are unlikely to provide good approximations to the sampling distribution of group-time average treatment effects when the number of units in a group is small. In these cases, one should be very cautious about interpreting the \(ATT(g,t)\) results for these small groups.

A reasonable alternative approach in this case is to just focus on aggregated treatment effect parameters (i.e., to run aggte(..., type = "group") or aggte(...,type = "dynamic")). For each of these cases, the effective sample size is the total number of units that are ever treated. As long as the total number of ever treated units is “large” (which should be the case for many DiD application), then the statistical inference results provided by the did package should be more stable.

Selecting Alternative Control Groups

By default, the did package uses the group of units that never participate in the treatment as the control group. In this case, if there is no group that never participates, then the did package will drop the last period and set units that do not become treated until the last period as the control group (this will also throw a warning). The other option for the control group is to use the “not yet treated”. The “not yet treated” include the never treated as well as those units that, for a particular point in time, have not been treated yet (though they eventually become treated). This group is at least as large as the never treated group though it changes across time periods. To use the “not yet treated” as the control, set the option control_group="notyettreated".

Repeated cross sections

The did package can also work with repeated cross section rather than panel data. If the data is repeated cross sections, simply set the option panel = FALSE. In this case, idname is also ignored. From a usage standpoint, everything else is identical to the case with panel data.

Unbalanced Panel Data

By default, the did package takes in panel data and, if it is not balanced, coerces it into being a balanced panel by dropping units with observations that are missing in any time period. However, if the user specifies the option allow_unbalanced_panel = TRUE, then the did package will not coerce the data into being balanced. Practically, the main cost here is that the computation time will increase in this case. We also recommend that users think carefully about why they have an unbalanced panel before proceeding this direction.

Alternative Estimation Methods

The did package implements all the \(2 \times 2\) DiD estimators that are in the DRDID package. By default, the did package uses “doubly robust” estimators that are based on first step linear regressions for the outcome variable and logit for the generalized propensity score. The other options are “ipw” for inverse probability weighting and “reg” for regression.

The argument est_method is also available to pass in a custom function for estimating DiD with 2 periods and 2 groups. See its documentation for more details. Fair Warning: this is very advanced use of the did package and should be done with caution.

An example with real data

Next, we use a subset of data that comes from Callaway and Sant’Anna (2020). This is a dataset that contains county-level teen employment rates from 2003-2007. The data can be loaded by

data(mpdta)

mpdta is a balanced panel with 2500 observations. And the dataset looks like

head(mpdta)
#>     year countyreal     lpop     lemp first.treat treat
#> 866 2003       8001 5.896761 8.461469        2007     1
#> 841 2004       8001 5.896761 8.336870        2007     1
#> 842 2005       8001 5.896761 8.340217        2007     1
#> 819 2006       8001 5.896761 8.378161        2007     1
#> 827 2007       8001 5.896761 8.487352        2007     1
#> 937 2003       8019 2.232377 4.997212        2007     1

Data Requirements

In particular applications, the dataset should look like this with the key parts being:

The Effect of the Minimum Wage on Youth Employment

Next, we walk through a straightforward, but realistic way to use the did package to carry out an application.

Side Comment: This is just an example of how to use our method in a real-world setup. To really evaluate the effect of the minimum wage on teen employment, one would need to be more careful along a number of dimensions. Thus, results displayed here should be interpreted as illustrative only.

We’ll consider two cases. For the first case, we will not condition on any covariates. For the second, we will condition on the log of county population (in a “real” application, one might want to condition on more covariates).

# estimate group-time average treatment effects without covariates
mw.attgt <- att_gt(yname = "lemp",
                   gname = "first.treat",
                   idname = "countyreal",
                   tname = "year",
                   xformla = ~1,
                   data = mpdta,
                   )

# summarize the results
summary(mw.attgt)
#> 
#> Call:
#> att_gt(yname = "lemp", tname = "year", idname = "countyreal", 
#>     gname = "first.treat", xformla = ~1, data = mpdta)
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> Group-Time Average Treatment Effects:
#>  Group Time ATT(g,t) Std. Error [95% Simult.  Conf. Band]  
#>   2004 2004  -0.0105     0.0248       -0.0777      0.0567  
#>   2004 2005  -0.0704     0.0319       -0.1566      0.0158  
#>   2004 2006  -0.1373     0.0399       -0.2453     -0.0292 *
#>   2004 2007  -0.1008     0.0365       -0.1995     -0.0021 *
#>   2006 2004   0.0065     0.0225       -0.0543      0.0674  
#>   2006 2005  -0.0028     0.0203       -0.0577      0.0522  
#>   2006 2006  -0.0046     0.0179       -0.0529      0.0437  
#>   2006 2007  -0.0412     0.0202       -0.0959      0.0135  
#>   2007 2004   0.0305     0.0156       -0.0117      0.0727  
#>   2007 2005  -0.0027     0.0160       -0.0460      0.0405  
#>   2007 2006  -0.0311     0.0177       -0.0789      0.0167  
#>   2007 2007  -0.0261     0.0172       -0.0725      0.0204  
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> P-value for pre-test of parallel trends assumption:  0.16812
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

# plot the results
# set ylim so that all plots have the same scale along y-axis
ggdid(mw.attgt, ylim = c(-.3,.3))

There are a few things to notice in this case

These continue to be simultaneous confidence bands for dynamic effects. The results are broadly similar to the ones from the group-time average treatment effects: one fails to reject parallel trends in pre-treatment periods and it looks like somewhat negative effects of the minimum wage on youth employment.

One potential issue with these dynamic effect estimators is that the composition of the groups changes with different lengths of exposure in the event study plots. For example, for the group of states who increased their minimum wage in 2007, we can only identify the instantaneous average effect of the minimum wage (\(e=0\)), whereas for states that raised their minimum wage in 2004 (2006), we can identify the average effect of the minimum wage on event-times \(e=0,1,2,3\) (\(e=0,1\)). When computing the event-study plot for \(e=0\), we would aggregate the effects for all three groups, but this is not the case when \(e=1,2,3\). If the effects of the minimum wage are systematically different across groups (here, there is not much evidence of this as the effect for all groups seems to be close to 0 on impact and perhaps becoming more negative over time), then this can lead to confounding dynamics and selective treatment timing among different groups.

One way to combat this is to balance the sample by (i) only including groups that are exposed to the treatment for at least a certain number of time periods and (ii) only look at dynamic effects in those time periods. In the did package, one can do this by specifying the balance_e option. Here, we set balance_e = 1 – what this does is to only consider groups of states that are treated in 2004 and 2006 (so we can compute event-study-type parameters for them for \(e=0,1\)), drops the group treated in 2007 (as we can not compute the event-study-type parameter with \(e=1\) for this group)), and then only looks at instantaneous average effects and the average effect one period after states raised the minimum wage.

Finally, we can run all the same results including a covariate. In this application the results turn out to be nearly identical, and here we provide just the code for estimating the group-time average treatment effects while including covariates. The other steps are otherwise the same.

Common Issues with the did package

We update this section with common issues that people run into when using the did package. Please feel free to contact us with questions or comments.

Bugs

We also welcome (and encourage) users to report any bugs / difficulties they have using our code.