Package 'did' reference manual

Title:	Treatment Effects with Multiple Periods and Groups
Description:	The standard Difference-in-Differences (DID) setup involves two periods and two groups -- a treated group and untreated group. Many applications of DID methods involve more than two periods and have individuals that are treated at different points in time. This package contains tools for computing average treatment effect parameters in Difference in Differences setups with more than two periods and with variation in treatment timing using the methods developed in Callaway and Sant'Anna (2021) <doi:10.1016/j.jeconom.2020.12.001>. The main parameters are group-time average treatment effects which are the average treatment effect for a particular group at a a particular time. These can be aggregated into a fewer number of treatment effect parameters, and the package deals with the cases where there is selective treatment timing, dynamic treatment effects, calendar time effects, or combinations of these. There are also functions for testing the Difference in Differences assumption, and plotting group-time average treatment effects.
Authors:	Brantly Callaway [aut, cre], Pedro H. C. Sant'Anna [aut]
Maintainer:	Brantly Callaway <[email protected]>
License:	GPL-2
Version:	2.2.1.910
Built:	2025-03-26 21:33:27 UTC
Source:	https://github.com/bcallaway11/did

Aggregate Group-Time Average Treatment Effects

Description

A function to take group-time average treatment effects and aggregate them into a smaller number of parameters. There are several possible aggregations including "simple", "dynamic", "group", and "calendar."

Usage

aggte(
  MP,
  type = "group",
  balance_e = NULL,
  min_e = -Inf,
  max_e = Inf,
  na.rm = FALSE,
  bstrap = NULL,
  biters = NULL,
  cband = NULL,
  alp = NULL,
  clustervars = NULL
)
aggte(
  MP,
  type = "group",
  balance_e = NULL,
  min_e = -Inf,
  max_e = Inf,
  na.rm = FALSE,
  bstrap = NULL,
  biters = NULL,
  cband = NULL,
  alp = NULL,
  clustervars = NULL
)

Arguments

`MP`	an MP object (i.e., the results of the `att_gt()` method)
`type`	Which type of aggregated treatment effect parameter to compute. One option is "simple" (this just computes a weighted average of all group-time average treatment effects with weights proportional to group size). Other options are "dynamic" (this computes average effects across different lengths of exposure to the treatment and is similar to an "event study"; here the overall effect averages the effect of the treatment across all positive lengths of exposure); "group" (this is the default option and computes average treatment effects across different groups; here the overall effect averages the effect across different groups); and "calendar" (this computes average treatment effects across different time periods; here the overall effect averages the effect across each time period).
`balance_e`	If set (and if one computes dynamic effects), it balances the sample with respect to event time. For example, if `balance.e=2`, `aggte` will drop groups that are not exposed to treatment for at least three periods. (the initial period when `e=0` as well as the next two periods when `e=1` and the `e=2`). This ensures that the composition of groups does not change when event time changes.
`min_e`	For event studies, this is the smallest event time to compute dynamic effects for. By default, `min_e = -Inf` so that effects at all lengths of exposure are computed.
`max_e`	For event studies, this is the largest event time to compute dynamic effects for. By default, `max_e = Inf` so that effects at all lengths of exposure are computed.
`na.rm`	Logical value if we are to remove missing Values from analyses. Defaults is FALSE.
`bstrap`	Boolean for whether or not to compute standard errors using the multiplier bootstrap. If standard errors are clustered, then one must set `bstrap=TRUE`. Default is value set in the MP object. If bstrap is `FALSE`, then analytical standard errors are reported.
`biters`	The number of bootstrap iterations to use. The default is the value set in the MP object, and this is only applicable if `bstrap=TRUE`.
`cband`	Boolean for whether or not to compute a uniform confidence band that covers all of the group-time average treatment effects with fixed probability `1-alp`. In order to compute uniform confidence bands, `bstrap` must also be set to `TRUE`. The default is the value set in the MP object
`alp`	the significance level, default is value set in the MP object.
`clustervars`	A vector of variables to cluster on. At most, there can be two variables (otherwise will throw an error) and one of these must be the same as idname which allows for clustering at the individual level. Default is the variables set in the MP object

Value

An AGGTEobj object that holds the results from the aggregation

Examples

Initial ATT(g,t) estimates from att_gt()

data(mpdta)
set.seed(09152024)
out <- att_gt(yname="lemp",
               tname="year",
               idname="countyreal",
               gname="first.treat",
               xformla=NULL,
               data=mpdta)

You can aggregate the ATT(g,t) in many ways.

Overall ATT:

aggte(out, type = "simple")
#> 
#> Call:
#> aggte(MP = out, type = "simple")
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> 
#>    ATT    Std. Error     [ 95%  Conf. Int.]  
#>  -0.04        0.0123     -0.064     -0.0159 *
#> 
#> 
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

Dynamic ATT (Event-Study):

aggte(out, type = "dynamic")
#> 
#> Call:
#> aggte(MP = out, type = "dynamic")
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> 
#> Overall summary of ATT's based on event-study/dynamic aggregation:  
#>      ATT    Std. Error     [ 95%  Conf. Int.]  
#>  -0.0772        0.0209    -0.1181     -0.0363 *
#> 
#> 
#> Dynamic Effects:
#>  Event time Estimate Std. Error [95% Simult.  Conf. Band]  
#>          -3   0.0305     0.0158       -0.0103      0.0713  
#>          -2  -0.0006     0.0134       -0.0351      0.0340  
#>          -1  -0.0245     0.0145       -0.0617      0.0128  
#>           0  -0.0199     0.0125       -0.0521      0.0122  
#>           1  -0.0510     0.0161       -0.0926     -0.0093 *
#>           2  -0.1373     0.0386       -0.2368     -0.0377 *
#>           3  -0.1008     0.0345       -0.1899     -0.0117 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

ATT for each group:

aggte(out, type = "group")
#> 
#> Call:
#> aggte(MP = out, type = "group")
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> 
#> Overall summary of ATT's based on group/cohort aggregation:  
#>     ATT    Std. Error     [ 95%  Conf. Int.]  
#>  -0.031        0.0126    -0.0558     -0.0062 *
#> 
#> 
#> Group Effects:
#>  Group Estimate Std. Error [95% Simult.  Conf. Band]  
#>   2004  -0.0797     0.0281       -0.1407     -0.0188 *
#>   2006  -0.0229     0.0156       -0.0568      0.0110  
#>   2007  -0.0261     0.0172       -0.0634      0.0113  
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

ATT for each calendar year:

aggte(out, type = "calendar")
#> 
#> Call:
#> aggte(MP = out, type = "calendar")
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> 
#> Overall summary of ATT's based on calendar time aggregation:  
#>      ATT    Std. Error     [ 95%  Conf. Int.]  
#>  -0.0417        0.0172    -0.0755     -0.0079 *
#> 
#> 
#> Time Effects:
#>  Time Estimate Std. Error [95% Simult.  Conf. Band]  
#>  2004  -0.0105     0.0251       -0.0701      0.0490  
#>  2005  -0.0704     0.0320       -0.1464      0.0056  
#>  2006  -0.0488     0.0213       -0.0994      0.0018  
#>  2007  -0.0371     0.0139       -0.0700     -0.0041 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

AGGTEobj

Description

Objects of this class hold results on aggregated group-time average treatment effects

An object for holding aggregated treatment effect parameters.

Usage

AGGTEobj(
  overall.att = NULL,
  overall.se = NULL,
  type = "simple",
  egt = NULL,
  att.egt = NULL,
  se.egt = NULL,
  crit.val.egt = NULL,
  inf.function = NULL,
  min_e = NULL,
  max_e = NULL,
  balance_e = NULL,
  call = NULL,
  DIDparams = NULL
)
AGGTEobj(
  overall.att = NULL,
  overall.se = NULL,
  type = "simple",
  egt = NULL,
  att.egt = NULL,
  se.egt = NULL,
  crit.val.egt = NULL,
  inf.function = NULL,
  min_e = NULL,
  max_e = NULL,
  balance_e = NULL,
  call = NULL,
  DIDparams = NULL
)

Arguments

`overall.att`	The estimated overall ATT
`overall.se`	Standard error for overall ATT
`type`	Which type of aggregated treatment effect parameter to compute. One option is "simple" (this just computes a weighted average of all group-time average treatment effects with weights proportional to group size). Other options are "dynamic" (this computes average effects across different lengths of exposure to the treatment and is similar to an "event study"; here the overall effect averages the effect of the treatment across all positive lengths of exposure); "group" (this is the default option and computes average treatment effects across different groups; here the overall effect averages the effect across different groups); and "calendar" (this computes average treatment effects across different time periods; here the overall effect averages the effect across each time period).
`egt`	Holds the length of exposure (for dynamic effects), the group (for selective treatment timing), or the time period (for calendar time effects)
`att.egt`	The ATT specific to egt
`se.egt`	The standard error specific to egt
`crit.val.egt`	A critical value for computing uniform confidence bands for dynamic effects, selective treatment timing, or time period effects.
`inf.function`	The influence function of the chosen aggregated parameters
`min_e`	For event studies, this is the smallest event time to compute dynamic effects for. By default, `min_e = -Inf` so that effects at all lengths of exposure are computed.
`max_e`	For event studies, this is the largest event time to compute dynamic effects for. By default, `max_e = Inf` so that effects at all lengths of exposure are computed.
`balance_e`	If set (and if one computes dynamic effects), it balances the sample with respect to event time. For example, if `balance.e=2`, `aggte` will drop groups that are not exposed to treatment for at least three periods. (the initial period when `e=0` as well as the next two periods when `e=1` and the `e=2`). This ensures that the composition of groups does not change when event time changes.
`call`	The function call to aggte
`DIDparams`	A DIDparams object

Value

an AGGTEobj

Group-Time Average Treatment Effects

Description

att_gt computes average treatment effects in DID setups where there are more than two periods of data and allowing for treatment to occur at different points in time and allowing for treatment effect heterogeneity and dynamics. See Callaway and Sant'Anna (2021) for a detailed description.

Usage

att_gt(
  yname,
  tname,
  idname = NULL,
  gname,
  xformla = NULL,
  data,
  panel = TRUE,
  allow_unbalanced_panel = FALSE,
  control_group = c("nevertreated", "notyettreated"),
  anticipation = 0,
  weightsname = NULL,
  alp = 0.05,
  bstrap = TRUE,
  cband = TRUE,
  biters = 1000,
  clustervars = NULL,
  est_method = "dr",
  base_period = "varying",
  faster_mode = FALSE,
  print_details = FALSE,
  pl = FALSE,
  cores = 1
)
att_gt(
  yname,
  tname,
  idname = NULL,
  gname,
  xformla = NULL,
  data,
  panel = TRUE,
  allow_unbalanced_panel = FALSE,
  control_group = c("nevertreated", "notyettreated"),
  anticipation = 0,
  weightsname = NULL,
  alp = 0.05,
  bstrap = TRUE,
  cband = TRUE,
  biters = 1000,
  clustervars = NULL,
  est_method = "dr",
  base_period = "varying",
  faster_mode = FALSE,
  print_details = FALSE,
  pl = FALSE,
  cores = 1
)

Arguments

`yname`	The name of the outcome variable
`tname`	The name of the column containing the time periods
`idname`	The individual (cross-sectional unit) id name
`gname`	The name of the variable in `data` that contains the first period when a particular observation is treated. This should be a positive number for all observations in treated groups. It defines which "group" a unit belongs to. It should be 0 for units in the untreated group.
`xformla`	A formula for the covariates to include in the model. It should be of the form `~ X1 + X2`. Default is NULL which is equivalent to `xformla=~1`. This is used to create a matrix of covariates which is then passed to the 2x2 DID estimator chosen in `est_method`.
`data`	The name of the data.frame that contains the data
`panel`	Whether or not the data is a panel dataset. The panel dataset should be provided in long format – that is, where each row corresponds to a unit observed at a particular point in time. The default is TRUE. When is using a panel dataset, the variable `idname` must be set. When `panel=FALSE`, the data is treated as repeated cross sections.
`allow_unbalanced_panel`	Whether or not function should "balance" the panel with respect to time and id. The default values if `FALSE` which means that `att_gt()` will drop all units where data is not observed in all periods. The advantage of this is that the computations are faster (sometimes substantially).
`control_group`	Which units to use the control group. The default is "nevertreated" which sets the control group to be the group of units that never participate in the treatment. This group does not change across groups or time periods. The other option is to set `group="notyettreated"`. In this case, the control group is set to the group of units that have not yet participated in the treatment in that time period. This includes all never treated units, but it includes additional units that eventually participate in the treatment, but have not participated yet.
`anticipation`	The number of time periods before participating in the treatment where units can anticipate participating in the treatment and therefore it can affect their untreated potential outcomes
`weightsname`	The name of the column containing the sampling weights. If not set, all observations have same weight.
`alp`	the significance level, default is 0.05
`bstrap`	Boolean for whether or not to compute standard errors using the multiplier bootstrap. If standard errors are clustered, then one must set `bstrap=TRUE`. Default is `TRUE` (in addition, cband is also by default `TRUE` indicating that uniform confidence bands will be returned. If bstrap is `FALSE`, then analytical standard errors are reported.
`cband`	Boolean for whether or not to compute a uniform confidence band that covers all of the group-time average treatment effects with fixed probability `1-alp`. In order to compute uniform confidence bands, `bstrap` must also be set to `TRUE`. The default is `TRUE`.
`biters`	The number of bootstrap iterations to use. The default is 1000, and this is only applicable if `bstrap=TRUE`.
`clustervars`	A vector of variables names to cluster on. At most, there can be two variables (otherwise will throw an error) and one of these must be the same as idname which allows for clustering at the individual level. By default, we cluster at individual level (when `bstrap=TRUE`).
`est_method`	the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust approach in the `DRDID` package. Other built-in methods include "ipw" for inverse probability weighting and "reg" for first step regression estimators. The user can also pass their own function for estimating group time average treatment effects. This should be a function `f(Y1,Y0,treat,covariates)` where `Y1` is an `n` x `1` vector of outcomes in the post-treatment outcomes, `Y0` is an `n` x `1` vector of pre-treatment outcomes, `treat` is a vector indicating whether or not an individual participates in the treatment, and `covariates` is an `n` x `k` matrix of covariates. The function should return a list that includes `ATT` (an estimated average treatment effect), and `inf.func` (an `n` x `1` influence function). The function can return other things as well, but these are the only two that are required. `est_method` is only used if covariates are included.
`base_period`	Whether to use a "varying" base period or a "universal" base period. Either choice results in the same post-treatment estimates of ATT(g,t)'s. In pre-treatment periods, using a varying base period amounts to computing a pseudo-ATT in each treatment period by comparing the change in outcomes for a particular group relative to its comparison group in the pre-treatment periods (i.e., in pre-treatment periods this setting computes changes from period t-1 to period t, but repeatedly changes the value of t) A universal base period fixes the base period to always be (g-anticipation-1). This does not compute pseudo-ATT(g,t)'s in pre-treatment periods, but rather reports average changes in outcomes from period t to (g-anticipation-1) for a particular group relative to its comparison group. This is analogous to what is often reported in event study regressions. Using a varying base period results in an estimate of ATT(g,t) being reported in the period immediately before treatment. Using a universal base period normalizes the estimate in the period right before treatment (or earlier when the user allows for anticipation) to be equal to 0, but one extra estimate in an earlier period.
`faster_mode`	This option enables a faster version of `did`, optimizing computation time for large datasets by improving data management within the package. The default is set to `FALSE`. While the difference is minimal for small datasets, it is recommended for use with large datasets.
`print_details`	Whether or not to show details/progress of computations. Default is `FALSE`.
`pl`	Whether or not to use parallel processing
`cores`	The number of cores to use for parallel processing

Value

an MP object containing all the results for group-time average treatment effects

Examples:

Basic att_gt() call:

# Example data
data(mpdta)
set.seed(09152024)
out1 <- att_gt(yname="lemp",
               tname="year",
               idname="countyreal",
               gname="first.treat",
               xformla=NULL,
               data=mpdta)
summary(out1)
#> 
#> Call:
#> att_gt(yname = "lemp", tname = "year", idname = "countyreal", 
#>     gname = "first.treat", xformla = NULL, data = mpdta)
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> Group-Time Average Treatment Effects:
#>  Group Time ATT(g,t) Std. Error [95% Simult.  Conf. Band]  
#>   2004 2004  -0.0105     0.0246       -0.0755      0.0545  
#>   2004 2005  -0.0704     0.0346       -0.1621      0.0212  
#>   2004 2006  -0.1373     0.0397       -0.2422     -0.0323 *
#>   2004 2007  -0.1008     0.0366       -0.1976     -0.0040 *
#>   2006 2004   0.0065     0.0226       -0.0532      0.0663  
#>   2006 2005  -0.0028     0.0193       -0.0538      0.0483  
#>   2006 2006  -0.0046     0.0185       -0.0536      0.0444  
#>   2006 2007  -0.0412     0.0208       -0.0962      0.0137  
#>   2007 2004   0.0305     0.0146       -0.0081      0.0692  
#>   2007 2005  -0.0027     0.0162       -0.0457      0.0402  
#>   2007 2006  -0.0311     0.0182       -0.0793      0.0172  
#>   2007 2007  -0.0261     0.0174       -0.0722      0.0201  
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> P-value for pre-test of parallel trends assumption:  0.16812
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

Using covariates:

out2 <- att_gt(yname="lemp",
               tname="year",
               idname="countyreal",
               gname="first.treat",
               xformla=~lpop,
               data=mpdta)
summary(out2)
#> 
#> Call:
#> att_gt(yname = "lemp", tname = "year", idname = "countyreal", 
#>     gname = "first.treat", xformla = ~lpop, data = mpdta)
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> Group-Time Average Treatment Effects:
#>  Group Time ATT(g,t) Std. Error [95% Simult.  Conf. Band]  
#>   2004 2004  -0.0145     0.0249       -0.0817      0.0527  
#>   2004 2005  -0.0764     0.0307       -0.1592      0.0064  
#>   2004 2006  -0.1404     0.0370       -0.2403     -0.0406 *
#>   2004 2007  -0.1069     0.0331       -0.1962     -0.0176 *
#>   2006 2004  -0.0005     0.0215       -0.0583      0.0574  
#>   2006 2005  -0.0062     0.0181       -0.0549      0.0425  
#>   2006 2006   0.0010     0.0190       -0.0502      0.0521  
#>   2006 2007  -0.0413     0.0207       -0.0971      0.0145  
#>   2007 2004   0.0267     0.0143       -0.0117      0.0652  
#>   2007 2005  -0.0046     0.0153       -0.0459      0.0368  
#>   2007 2006  -0.0284     0.0197       -0.0816      0.0247  
#>   2007 2007  -0.0288     0.0157       -0.0712      0.0136  
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> P-value for pre-test of parallel trends assumption:  0.23267
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

Specify comparison units:

out3 <- att_gt(yname="lemp",
               tname="year",
               idname="countyreal",
               gname="first.treat",
               xformla=~lpop,
               control_group = "notyettreated",
               data=mpdta)
summary(out3)
#> 
#> Call:
#> att_gt(yname = "lemp", tname = "year", idname = "countyreal", 
#>     gname = "first.treat", xformla = ~lpop, data = mpdta, control_group = "notyettreated")
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> Group-Time Average Treatment Effects:
#>  Group Time ATT(g,t) Std. Error [95% Simult.  Conf. Band]  
#>   2004 2004  -0.0212     0.0219       -0.0797      0.0374  
#>   2004 2005  -0.0816     0.0299       -0.1617     -0.0015 *
#>   2004 2006  -0.1382     0.0375       -0.2387     -0.0376 *
#>   2004 2007  -0.1069     0.0354       -0.2016     -0.0122 *
#>   2006 2004  -0.0075     0.0216       -0.0653      0.0504  
#>   2006 2005  -0.0046     0.0184       -0.0539      0.0448  
#>   2006 2006   0.0087     0.0167       -0.0362      0.0535  
#>   2006 2007  -0.0413     0.0192       -0.0927      0.0101  
#>   2007 2004   0.0269     0.0146       -0.0122      0.0661  
#>   2007 2005  -0.0042     0.0160       -0.0470      0.0386  
#>   2007 2006  -0.0284     0.0182       -0.0773      0.0204  
#>   2007 2007  -0.0288     0.0176       -0.0759      0.0184  
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> P-value for pre-test of parallel trends assumption:  0.23326
#> Control Group:  Not Yet Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

References

Callaway, Brantly and Pedro H.C. Sant'Anna. \"Difference-in-Differences with Multiple Time Periods.\" Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. doi:10.1016/j.jeconom.2020.12.001, https://arxiv.org/abs/1803.09015

build_sim_dataset

Description

A function for building simulated data

Usage

build_sim_dataset(sp_list, panel = TRUE)
build_sim_dataset(sp_list, panel = TRUE)

Arguments

`sp_list`	A list of simulation parameters. See `reset.sim` to generate some default values for parameters
`panel`	whether to construct panel data (the default) or repeated cross sections data

Value

a data.frame with the following columns

G observations group
X value of covariate
id observation's id
cluster observation's cluster (by construction there is no within-cluster correlation)
period time period for current observation
Y outcome
treat whether or not this unit is ever treated

Pre-Test of Conditional Parallel Trends Assumption

Description

An integrated moments test for the conditional parallel trends assumption holding in all pre-treatment time periods for all groups

Usage

conditional_did_pretest(
  yname,
  tname,
  idname = NULL,
  gname,
  xformla = NULL,
  data,
  panel = TRUE,
  allow_unbalanced_panel = FALSE,
  control_group = c("nevertreated", "notyettreated"),
  weightsname = NULL,
  alp = 0.05,
  bstrap = TRUE,
  cband = TRUE,
  biters = 1000,
  clustervars = NULL,
  est_method = "ipw",
  print_details = FALSE,
  pl = FALSE,
  cores = 1
)
conditional_did_pretest(
  yname,
  tname,
  idname = NULL,
  gname,
  xformla = NULL,
  data,
  panel = TRUE,
  allow_unbalanced_panel = FALSE,
  control_group = c("nevertreated", "notyettreated"),
  weightsname = NULL,
  alp = 0.05,
  bstrap = TRUE,
  cband = TRUE,
  biters = 1000,
  clustervars = NULL,
  est_method = "ipw",
  print_details = FALSE,
  pl = FALSE,
  cores = 1
)

Arguments

`yname`	The name of the outcome variable
`tname`	The name of the column containing the time periods
`idname`	The individual (cross-sectional unit) id name
`gname`	The name of the variable in `data` that contains the first period when a particular observation is treated. This should be a positive number for all observations in treated groups. It defines which "group" a unit belongs to. It should be 0 for units in the untreated group.
`xformla`	A formula for the covariates to include in the model. It should be of the form `~ X1 + X2`. Default is NULL which is equivalent to `xformla=~1`. This is used to create a matrix of covariates which is then passed to the 2x2 DID estimator chosen in `est_method`.
`data`	The name of the data.frame that contains the data
`panel`	Whether or not the data is a panel dataset. The panel dataset should be provided in long format – that is, where each row corresponds to a unit observed at a particular point in time. The default is TRUE. When is using a panel dataset, the variable `idname` must be set. When `panel=FALSE`, the data is treated as repeated cross sections.
`allow_unbalanced_panel`	Whether or not function should "balance" the panel with respect to time and id. The default values if `FALSE` which means that `att_gt()` will drop all units where data is not observed in all periods. The advantage of this is that the computations are faster (sometimes substantially).
`control_group`	Which units to use the control group. The default is "nevertreated" which sets the control group to be the group of units that never participate in the treatment. This group does not change across groups or time periods. The other option is to set `group="notyettreated"`. In this case, the control group is set to the group of units that have not yet participated in the treatment in that time period. This includes all never treated units, but it includes additional units that eventually participate in the treatment, but have not participated yet.
`weightsname`	The name of the column containing the sampling weights. If not set, all observations have same weight.
`alp`	the significance level, default is 0.05
`bstrap`	Boolean for whether or not to compute standard errors using the multiplier bootstrap. If standard errors are clustered, then one must set `bstrap=TRUE`. Default is `TRUE` (in addition, cband is also by default `TRUE` indicating that uniform confidence bands will be returned. If bstrap is `FALSE`, then analytical standard errors are reported.
`cband`	Boolean for whether or not to compute a uniform confidence band that covers all of the group-time average treatment effects with fixed probability `1-alp`. In order to compute uniform confidence bands, `bstrap` must also be set to `TRUE`. The default is `TRUE`.
`biters`	The number of bootstrap iterations to use. The default is 1000, and this is only applicable if `bstrap=TRUE`.
`clustervars`	A vector of variables names to cluster on. At most, there can be two variables (otherwise will throw an error) and one of these must be the same as idname which allows for clustering at the individual level. By default, we cluster at individual level (when `bstrap=TRUE`).
`est_method`	the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust approach in the `DRDID` package. Other built-in methods include "ipw" for inverse probability weighting and "reg" for first step regression estimators. The user can also pass their own function for estimating group time average treatment effects. This should be a function `f(Y1,Y0,treat,covariates)` where `Y1` is an `n` x `1` vector of outcomes in the post-treatment outcomes, `Y0` is an `n` x `1` vector of pre-treatment outcomes, `treat` is a vector indicating whether or not an individual participates in the treatment, and `covariates` is an `n` x `k` matrix of covariates. The function should return a list that includes `ATT` (an estimated average treatment effect), and `inf.func` (an `n` x `1` influence function). The function can return other things as well, but these are the only two that are required. `est_method` is only used if covariates are included.
`print_details`	Whether or not to show details/progress of computations. Default is `FALSE`.
`pl`	Whether or not to use parallel processing
`cores`	The number of cores to use for parallel processing

Value

an MP.TEST object

References

Callaway, Brantly and Sant'Anna, Pedro H. C. "Difference-in-Differences with Multiple Time Periods and an Application on the Minimum Wage and Employment." Working Paper https://arxiv.org/abs/1803.09015v2 (2018).

Examples

## Not run: 
data(mpdta)
pre.test <- conditional_did_pretest(yname="lemp",
                                    tname="year",
                                    idname="countyreal",
                                    gname="first.treat",
                                    xformla=~lpop,
                                    data=mpdta)
summary(pre.test)

## End(Not run)

## Not run: 
data(mpdta)
pre.test <- conditional_did_pretest(yname="lemp",
                                    tname="year",
                                    idname="countyreal",
                                    gname="first.treat",
                                    xformla=~lpop,
                                    data=mpdta)
summary(pre.test)

## End(Not run)

DIDparams

Description

Object to hold did parameters that are passed across functions

Usage

DIDparams(
  yname,
  tname,
  idname = NULL,
  gname,
  xformla = NULL,
  data,
  control_group,
  anticipation = 0,
  weightsname = NULL,
  alp = 0.05,
  bstrap = TRUE,
  biters = 1000,
  clustervars = NULL,
  cband = TRUE,
  print_details = TRUE,
  faster_mode = FALSE,
  pl = FALSE,
  cores = 1,
  est_method = "dr",
  base_period = "varying",
  panel = TRUE,
  true_repeated_cross_sections,
  n = NULL,
  nG = NULL,
  nT = NULL,
  tlist = NULL,
  glist = NULL,
  call = NULL
)
DIDparams(
  yname,
  tname,
  idname = NULL,
  gname,
  xformla = NULL,
  data,
  control_group,
  anticipation = 0,
  weightsname = NULL,
  alp = 0.05,
  bstrap = TRUE,
  biters = 1000,
  clustervars = NULL,
  cband = TRUE,
  print_details = TRUE,
  faster_mode = FALSE,
  pl = FALSE,
  cores = 1,
  est_method = "dr",
  base_period = "varying",
  panel = TRUE,
  true_repeated_cross_sections,
  n = NULL,
  nG = NULL,
  nT = NULL,
  tlist = NULL,
  glist = NULL,
  call = NULL
)

Arguments

`yname`	The name of the outcome variable
`tname`	The name of the column containing the time periods
`idname`	The individual (cross-sectional unit) id name
`gname`	The name of the variable in `data` that contains the first period when a particular observation is treated. This should be a positive number for all observations in treated groups. It defines which "group" a unit belongs to. It should be 0 for units in the untreated group.
`xformla`	A formula for the covariates to include in the model. It should be of the form `~ X1 + X2`. Default is NULL which is equivalent to `xformla=~1`. This is used to create a matrix of covariates which is then passed to the 2x2 DID estimator chosen in `est_method`.
`data`	The name of the data.frame that contains the data
`control_group`	Which units to use the control group. The default is "nevertreated" which sets the control group to be the group of units that never participate in the treatment. This group does not change across groups or time periods. The other option is to set `group="notyettreated"`. In this case, the control group is set to the group of units that have not yet participated in the treatment in that time period. This includes all never treated units, but it includes additional units that eventually participate in the treatment, but have not participated yet.
`anticipation`	The number of time periods before participating in the treatment where units can anticipate participating in the treatment and therefore it can affect their untreated potential outcomes
`weightsname`	The name of the column containing the sampling weights. If not set, all observations have same weight.
`alp`	the significance level, default is 0.05
`bstrap`	Boolean for whether or not to compute standard errors using the multiplier bootstrap. If standard errors are clustered, then one must set `bstrap=TRUE`. Default is `TRUE` (in addition, cband is also by default `TRUE` indicating that uniform confidence bands will be returned. If bstrap is `FALSE`, then analytical standard errors are reported.
`biters`	The number of bootstrap iterations to use. The default is 1000, and this is only applicable if `bstrap=TRUE`.
`clustervars`	A vector of variables names to cluster on. At most, there can be two variables (otherwise will throw an error) and one of these must be the same as idname which allows for clustering at the individual level. By default, we cluster at individual level (when `bstrap=TRUE`).
`cband`	Boolean for whether or not to compute a uniform confidence band that covers all of the group-time average treatment effects with fixed probability `1-alp`. In order to compute uniform confidence bands, `bstrap` must also be set to `TRUE`. The default is `TRUE`.
`print_details`	Whether or not to show details/progress of computations. Default is `FALSE`.
`faster_mode`	This option enables a faster version of `did`, optimizing computation time for large datasets by improving data management within the package. The default is set to `FALSE`. While the difference is minimal for small datasets, it is recommended for use with large datasets.
`pl`	Whether or not to use parallel processing
`cores`	The number of cores to use for parallel processing
`est_method`	the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust approach in the `DRDID` package. Other built-in methods include "ipw" for inverse probability weighting and "reg" for first step regression estimators. The user can also pass their own function for estimating group time average treatment effects. This should be a function `f(Y1,Y0,treat,covariates)` where `Y1` is an `n` x `1` vector of outcomes in the post-treatment outcomes, `Y0` is an `n` x `1` vector of pre-treatment outcomes, `treat` is a vector indicating whether or not an individual participates in the treatment, and `covariates` is an `n` x `k` matrix of covariates. The function should return a list that includes `ATT` (an estimated average treatment effect), and `inf.func` (an `n` x `1` influence function). The function can return other things as well, but these are the only two that are required. `est_method` is only used if covariates are included.
`base_period`	Whether to use a "varying" base period or a "universal" base period. Either choice results in the same post-treatment estimates of ATT(g,t)'s. In pre-treatment periods, using a varying base period amounts to computing a pseudo-ATT in each treatment period by comparing the change in outcomes for a particular group relative to its comparison group in the pre-treatment periods (i.e., in pre-treatment periods this setting computes changes from period t-1 to period t, but repeatedly changes the value of t) A universal base period fixes the base period to always be (g-anticipation-1). This does not compute pseudo-ATT(g,t)'s in pre-treatment periods, but rather reports average changes in outcomes from period t to (g-anticipation-1) for a particular group relative to its comparison group. This is analogous to what is often reported in event study regressions. Using a varying base period results in an estimate of ATT(g,t) being reported in the period immediately before treatment. Using a universal base period normalizes the estimate in the period right before treatment (or earlier when the user allows for anticipation) to be equal to 0, but one extra estimate in an earlier period.
`panel`	Whether or not the data is a panel dataset. The panel dataset should be provided in long format – that is, where each row corresponds to a unit observed at a particular point in time. The default is TRUE. When is using a panel dataset, the variable `idname` must be set. When `panel=FALSE`, the data is treated as repeated cross sections.
`true_repeated_cross_sections`	Whether or not the data really is repeated cross sections. (We include this because unbalanced panel code runs through the repeated cross sections code)
`n`	The number of observations. This is equal to the number of units (which may be different from the number of rows in a panel dataset).
`nG`	The number of groups
`nT`	The number of time periods
`tlist`	a vector containing each time period
`glist`	a vector containing each group
`call`	Function call to att_gt

Plot `did` objects using `ggplot2`

Description

Function to plot objects from the did package

Usage

ggdid(object, ...)
ggdid(object, ...)

Arguments

`object`	either a `MP` object or `AGGTEobj` object. See `help(ggdid.MP)` and `help(ggdid.AGGTEobj)`.
`...`	other arguments

Plot `AGGTEobj` objects

Description

A function to plot AGGTEobj objects

Usage

## S3 method for class 'AGGTEobj'
ggdid(
  object,
  ylim = NULL,
  xlab = NULL,
  ylab = NULL,
  title = "",
  xgap = 1,
  legend = TRUE,
  ref_line = 0,
  theming = TRUE,
  ...
)
## S3 method for class 'AGGTEobj'
ggdid(
  object,
  ylim = NULL,
  xlab = NULL,
  ylab = NULL,
  title = "",
  xgap = 1,
  legend = TRUE,
  ref_line = 0,
  theming = TRUE,
  ...
)

Arguments

`object`	either a `MP` object or `AGGTEobj` object. See `help(ggdid.MP)` and `help(ggdid.AGGTEobj)`.
`ylim`	optional y limits for the plot; setting here makes the y limits the same across different plots
`xlab`	optional x-axis label
`ylab`	optional y-axis label
`title`	optional plot title
`xgap`	optional gap between the labels on the x-axis. For example, `xgap=3` indicates that the labels should show up for every third value on the x-axis. The default is 1.
`legend`	Whether or not to include a legend (which will indicate color of pre- and post-treatment estimates). Default is `TRUE`.
`ref_line`	A reference line at this value, usually to compare confidence intervals to 0. Set to NULL to omit.
`theming`	Set to FALSE to skip all theming so you can do it yourself.
`...`	other arguments

Plot `MP` objects using `ggplot2`

Description

A function to plot MP objects

Usage

## S3 method for class 'MP'
ggdid(
  object,
  ylim = NULL,
  xlab = NULL,
  ylab = NULL,
  title = "Group",
  xgap = 1,
  ncol = 1,
  legend = TRUE,
  group = NULL,
  ref_line = 0,
  theming = TRUE,
  grtitle = "Group",
  ...
)
## S3 method for class 'MP'
ggdid(
  object,
  ylim = NULL,
  xlab = NULL,
  ylab = NULL,
  title = "Group",
  xgap = 1,
  ncol = 1,
  legend = TRUE,
  group = NULL,
  ref_line = 0,
  theming = TRUE,
  grtitle = "Group",
  ...
)

Arguments

`object`	either a `MP` object or `AGGTEobj` object. See `help(ggdid.MP)` and `help(ggdid.AGGTEobj)`.
`ylim`	optional y limits for the plot; setting here makes the y limits the same across different plots
`xlab`	optional x-axis label
`ylab`	optional y-axis label
`title`	optional plot title
`xgap`	optional gap between the labels on the x-axis. For example, `xgap=3` indicates that the labels should show up for every third value on the x-axis. The default is 1.
`ncol`	The number of columns to include in the resulting plot. The default is 1.
`legend`	Whether or not to include a legend (which will indicate color of pre- and post-treatment estimates). Default is `TRUE`.
`group`	Vector for which groups to include in the plots of ATT(g,t). Default is NULL, and, in this case, plots for all groups will be included (`ggdid.MP` only).
`ref_line`	A reference line at this value, usually to compare confidence intervals to 0. Set to NULL to omit.
`theming`	Set to FALSE to skip all theming so you can do it yourself.
`grtitle`	Title to append before each group name (`ggdid.MP` only).
`...`	other arguments

glance model characteristics from AGGTEobj objects

Description

glance model characteristics from AGGTEobj objects

Usage

## S3 method for class 'AGGTEobj'
glance(x, ...)
## S3 method for class 'AGGTEobj'
glance(x, ...)

Arguments

`x`	a model of class AGGTEobj produced by the `aggte()` function
`...`	other arguments passed to methods

glance model characteristics from MP objects

Description

glance model characteristics from MP objects

Usage

## S3 method for class 'MP'
glance(x, ...)
## S3 method for class 'MP'
glance(x, ...)

Arguments

`x`	a model of class MP produced by the `att_gt()` function
`...`	other arguments passed to methods

indicator

Description

indicator weighting function

Usage

indicator(X, u)
indicator(X, u)

Arguments

`X`	matrix of X's from the data
`u`	a particular value to compare X's to

Value

numeric vector

Examples

data(mpdta)
dta <- subset(mpdta, year==2007)
X <- model.matrix(~lpop, data=dta)
X <- indicator(X, X[1,])

data(mpdta)
dta <- subset(mpdta, year==2007)
X <- model.matrix(~lpop, data=dta)
X <- indicator(X, X[1,])

Multiplier Bootstrap

Description

A function to take an influence function and use the multiplier bootstrap to compute standard errors and critical values for uniform confidence bands.

Usage

mboot(inf.func, DIDparams, pl = FALSE, cores = 1)
mboot(inf.func, DIDparams, pl = FALSE, cores = 1)

Arguments

`inf.func`	an influence function
`DIDparams`	DIDparams object
`pl`	whether or not to use parallel processing in the multiplier bootstrap, default=FALSE
`cores`	the number of cores to use with parallel processing, default=1

Value

list with elements

`bres`	results from each bootstrap iteration
`V`	variance matrix
`se`	standard errors
`crit.val`	a critical value for computing uniform confidence bands

MP

Description

Multi-period objects that hold results for group-time average treatment effects

Usage

MP(
  group,
  t,
  att,
  V_analytical,
  se,
  c,
  inffunc,
  n = NULL,
  W = NULL,
  Wpval = NULL,
  aggte = NULL,
  alp = 0.05,
  DIDparams = NULL
)
MP(
  group,
  t,
  att,
  V_analytical,
  se,
  c,
  inffunc,
  n = NULL,
  W = NULL,
  Wpval = NULL,
  aggte = NULL,
  alp = 0.05,
  DIDparams = NULL
)

Arguments

`group`	which group (defined by period first treated) an group-time average treatment effect is for
`t`	which time period a group-time average treatment effect is for
`att`	the group-average treatment effect for group `group` and time period `t`
`V_analytical`	Analytical estimator for the asymptotic variance-covariance matrix for group-time average treatment effects
`se`	standard errors for group-time average treatment effects. If bootstrap is set to TRUE, this provides bootstrap-based se.
`c`	simultaneous critical value if one is obtaining simultaneous confidence bands. Otherwise it reports the critical value based on pointwise normal approximation.
`inffunc`	the influence function for estimating group-time average treatment effects
`n`	the number of unique cross-sectional units (unique values of idname)
`W`	the Wald statistic for pre-testing the common trends assumption
`Wpval`	the p-value of the Wald statistic for pre-testing the common trends assumption
`aggte`	an aggregate treatment effects object
`alp`	the significance level, default is 0.05
`DIDparams`	a `DIDparams` object. A way to optionally return the parameters of the call to `att_gt()` or `conditional_did_pretest()`.

Value

MP object

MP.TEST

Description

An object that holds results from computing pre-test of the conditional parallel trends assumption

Usage

MP.TEST(
  CvM = NULL,
  CvMb = NULL,
  CvMcval = NULL,
  CvMpval = NULL,
  KS = NULL,
  KSb = NULL,
  KScval = NULL,
  KSpval = NULL,
  clustervars = NULL,
  xformla = NULL
)
MP.TEST(
  CvM = NULL,
  CvMb = NULL,
  CvMcval = NULL,
  CvMpval = NULL,
  KS = NULL,
  KSb = NULL,
  KScval = NULL,
  KSpval = NULL,
  clustervars = NULL,
  xformla = NULL
)

Arguments

`CvM`	Cramer von Mises test statistic
`CvMb`	a vector of bootstrapped Cramer von Mises test statistics
`CvMcval`	CvM critical value
`CvMpval`	p-value for CvM test
`KS`	Kolmogorov-Smirnov test statistic
`KSb`	a vector of bootstrapped KS test statistics
`KScval`	KS critical value
`KSpval`	p-value for KS test
`clustervars`	vector of which variables were clustered on for the test
`xformla`	formla for the X variables used in the test

County Teen Employment Dataset

Description

A dataset containing (the log of) teen employment in 500 counties in the U.S. from 2004 to 2007. This is a subset of the dataset used in Callaway and Sant'Anna (2021). See that paper for additional descriptions.

Usage

mpdta
mpdta

Format

A data frame with 2000 rows and 5 variables:

year: the year of the observation
countyreal: a unique identifier for a particular county
lpop: the log of 1000s of population for the county
lemp: the log of teen employment in the county
first.treat: the year that the state where the county is located raised its minimum wage, it is set equal to 0 for counties that have minimum wages equal to the federal minimum wage over the entire period.
treat: whether or not a particular county is treated in that year

Source

Callaway and Sant'Anna (2020)

Process `did` Function Arguments

Description

Function to process arguments passed to the main methods in the did package as well as conducting some tests to make sure data is in proper format / try to throw helpful error messages.

Usage

pre_process_did(
  yname,
  tname,
  idname,
  gname,
  xformla = NULL,
  data,
  panel = TRUE,
  allow_unbalanced_panel,
  control_group = c("nevertreated", "notyettreated"),
  anticipation = 0,
  weightsname = NULL,
  alp = 0.05,
  bstrap = FALSE,
  cband = FALSE,
  biters = 1000,
  clustervars = NULL,
  est_method = "dr",
  base_period = "varying",
  print_details = TRUE,
  faster_mode = FALSE,
  pl = FALSE,
  cores = 1,
  call = NULL
)
pre_process_did(
  yname,
  tname,
  idname,
  gname,
  xformla = NULL,
  data,
  panel = TRUE,
  allow_unbalanced_panel,
  control_group = c("nevertreated", "notyettreated"),
  anticipation = 0,
  weightsname = NULL,
  alp = 0.05,
  bstrap = FALSE,
  cband = FALSE,
  biters = 1000,
  clustervars = NULL,
  est_method = "dr",
  base_period = "varying",
  print_details = TRUE,
  faster_mode = FALSE,
  pl = FALSE,
  cores = 1,
  call = NULL
)

Arguments

`yname`	The name of the outcome variable
`tname`	The name of the column containing the time periods
`idname`	The individual (cross-sectional unit) id name
`gname`	The name of the variable in `data` that contains the first period when a particular observation is treated. This should be a positive number for all observations in treated groups. It defines which "group" a unit belongs to. It should be 0 for units in the untreated group.
`xformla`	A formula for the covariates to include in the model. It should be of the form `~ X1 + X2`. Default is NULL which is equivalent to `xformla=~1`. This is used to create a matrix of covariates which is then passed to the 2x2 DID estimator chosen in `est_method`.
`data`	The name of the data.frame that contains the data
`panel`	Whether or not the data is a panel dataset. The panel dataset should be provided in long format – that is, where each row corresponds to a unit observed at a particular point in time. The default is TRUE. When is using a panel dataset, the variable `idname` must be set. When `panel=FALSE`, the data is treated as repeated cross sections.
`allow_unbalanced_panel`	Whether or not function should "balance" the panel with respect to time and id. The default values if `FALSE` which means that `att_gt()` will drop all units where data is not observed in all periods. The advantage of this is that the computations are faster (sometimes substantially).
`control_group`	Which units to use the control group. The default is "nevertreated" which sets the control group to be the group of units that never participate in the treatment. This group does not change across groups or time periods. The other option is to set `group="notyettreated"`. In this case, the control group is set to the group of units that have not yet participated in the treatment in that time period. This includes all never treated units, but it includes additional units that eventually participate in the treatment, but have not participated yet.
`anticipation`	The number of time periods before participating in the treatment where units can anticipate participating in the treatment and therefore it can affect their untreated potential outcomes
`weightsname`	The name of the column containing the sampling weights. If not set, all observations have same weight.
`alp`	the significance level, default is 0.05
`bstrap`	Boolean for whether or not to compute standard errors using the multiplier bootstrap. If standard errors are clustered, then one must set `bstrap=TRUE`. Default is `TRUE` (in addition, cband is also by default `TRUE` indicating that uniform confidence bands will be returned. If bstrap is `FALSE`, then analytical standard errors are reported.
`cband`	Boolean for whether or not to compute a uniform confidence band that covers all of the group-time average treatment effects with fixed probability `1-alp`. In order to compute uniform confidence bands, `bstrap` must also be set to `TRUE`. The default is `TRUE`.
`biters`	The number of bootstrap iterations to use. The default is 1000, and this is only applicable if `bstrap=TRUE`.
`clustervars`	A vector of variables names to cluster on. At most, there can be two variables (otherwise will throw an error) and one of these must be the same as idname which allows for clustering at the individual level. By default, we cluster at individual level (when `bstrap=TRUE`).
`est_method`	the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust approach in the `DRDID` package. Other built-in methods include "ipw" for inverse probability weighting and "reg" for first step regression estimators. The user can also pass their own function for estimating group time average treatment effects. This should be a function `f(Y1,Y0,treat,covariates)` where `Y1` is an `n` x `1` vector of outcomes in the post-treatment outcomes, `Y0` is an `n` x `1` vector of pre-treatment outcomes, `treat` is a vector indicating whether or not an individual participates in the treatment, and `covariates` is an `n` x `k` matrix of covariates. The function should return a list that includes `ATT` (an estimated average treatment effect), and `inf.func` (an `n` x `1` influence function). The function can return other things as well, but these are the only two that are required. `est_method` is only used if covariates are included.
`base_period`	Whether to use a "varying" base period or a "universal" base period. Either choice results in the same post-treatment estimates of ATT(g,t)'s. In pre-treatment periods, using a varying base period amounts to computing a pseudo-ATT in each treatment period by comparing the change in outcomes for a particular group relative to its comparison group in the pre-treatment periods (i.e., in pre-treatment periods this setting computes changes from period t-1 to period t, but repeatedly changes the value of t) A universal base period fixes the base period to always be (g-anticipation-1). This does not compute pseudo-ATT(g,t)'s in pre-treatment periods, but rather reports average changes in outcomes from period t to (g-anticipation-1) for a particular group relative to its comparison group. This is analogous to what is often reported in event study regressions. Using a varying base period results in an estimate of ATT(g,t) being reported in the period immediately before treatment. Using a universal base period normalizes the estimate in the period right before treatment (or earlier when the user allows for anticipation) to be equal to 0, but one extra estimate in an earlier period.
`print_details`	Whether or not to show details/progress of computations. Default is `FALSE`.
`faster_mode`	This option enables a faster version of `did`, optimizing computation time for large datasets by improving data management within the package. The default is set to `FALSE`. While the difference is minimal for small datasets, it is recommended for use with large datasets.
`pl`	Whether or not to use parallel processing
`cores`	The number of cores to use for parallel processing
`call`	Function call to att_gt

Value

a DIDparams object

Process `did` Function Arguments

Description

Function to process arguments passed to the main methods in the did package as well as conducting some tests to make sure data is in proper format / try to throw helpful error messages.

Usage

pre_process_did2(
  yname,
  tname,
  idname,
  gname,
  xformla = NULL,
  data,
  panel = TRUE,
  allow_unbalanced_panel,
  control_group = c("nevertreated", "notyettreated"),
  anticipation = 0,
  weightsname = NULL,
  alp = 0.05,
  bstrap = FALSE,
  cband = FALSE,
  biters = 1000,
  clustervars = NULL,
  est_method = "dr",
  base_period = "varying",
  print_details = TRUE,
  faster_mode = FALSE,
  pl = FALSE,
  cores = 1,
  call = NULL
)
pre_process_did2(
  yname,
  tname,
  idname,
  gname,
  xformla = NULL,
  data,
  panel = TRUE,
  allow_unbalanced_panel,
  control_group = c("nevertreated", "notyettreated"),
  anticipation = 0,
  weightsname = NULL,
  alp = 0.05,
  bstrap = FALSE,
  cband = FALSE,
  biters = 1000,
  clustervars = NULL,
  est_method = "dr",
  base_period = "varying",
  print_details = TRUE,
  faster_mode = FALSE,
  pl = FALSE,
  cores = 1,
  call = NULL
)

Arguments

`yname`	The name of the outcome variable
`tname`	The name of the column containing the time periods
`idname`	The individual (cross-sectional unit) id name
`gname`	The name of the variable in `data` that contains the first period when a particular observation is treated. This should be a positive number for all observations in treated groups. It defines which "group" a unit belongs to. It should be 0 for units in the untreated group.
`xformla`	A formula for the covariates to include in the model. It should be of the form `~ X1 + X2`. Default is NULL which is equivalent to `xformla=~1`. This is used to create a matrix of covariates which is then passed to the 2x2 DID estimator chosen in `est_method`.
`data`	The name of the data.frame that contains the data
`panel`	Whether or not the data is a panel dataset. The panel dataset should be provided in long format – that is, where each row corresponds to a unit observed at a particular point in time. The default is TRUE. When is using a panel dataset, the variable `idname` must be set. When `panel=FALSE`, the data is treated as repeated cross sections.
`allow_unbalanced_panel`	Whether or not function should "balance" the panel with respect to time and id. The default values if `FALSE` which means that `att_gt()` will drop all units where data is not observed in all periods. The advantage of this is that the computations are faster (sometimes substantially).
`control_group`	Which units to use the control group. The default is "nevertreated" which sets the control group to be the group of units that never participate in the treatment. This group does not change across groups or time periods. The other option is to set `group="notyettreated"`. In this case, the control group is set to the group of units that have not yet participated in the treatment in that time period. This includes all never treated units, but it includes additional units that eventually participate in the treatment, but have not participated yet.
`anticipation`	The number of time periods before participating in the treatment where units can anticipate participating in the treatment and therefore it can affect their untreated potential outcomes
`weightsname`	The name of the column containing the sampling weights. If not set, all observations have same weight.
`alp`	the significance level, default is 0.05
`bstrap`	Boolean for whether or not to compute standard errors using the multiplier bootstrap. If standard errors are clustered, then one must set `bstrap=TRUE`. Default is `TRUE` (in addition, cband is also by default `TRUE` indicating that uniform confidence bands will be returned. If bstrap is `FALSE`, then analytical standard errors are reported.
`cband`	Boolean for whether or not to compute a uniform confidence band that covers all of the group-time average treatment effects with fixed probability `1-alp`. In order to compute uniform confidence bands, `bstrap` must also be set to `TRUE`. The default is `TRUE`.
`biters`	The number of bootstrap iterations to use. The default is 1000, and this is only applicable if `bstrap=TRUE`.
`clustervars`	A vector of variables names to cluster on. At most, there can be two variables (otherwise will throw an error) and one of these must be the same as idname which allows for clustering at the individual level. By default, we cluster at individual level (when `bstrap=TRUE`).
`est_method`	the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust approach in the `DRDID` package. Other built-in methods include "ipw" for inverse probability weighting and "reg" for first step regression estimators. The user can also pass their own function for estimating group time average treatment effects. This should be a function `f(Y1,Y0,treat,covariates)` where `Y1` is an `n` x `1` vector of outcomes in the post-treatment outcomes, `Y0` is an `n` x `1` vector of pre-treatment outcomes, `treat` is a vector indicating whether or not an individual participates in the treatment, and `covariates` is an `n` x `k` matrix of covariates. The function should return a list that includes `ATT` (an estimated average treatment effect), and `inf.func` (an `n` x `1` influence function). The function can return other things as well, but these are the only two that are required. `est_method` is only used if covariates are included.
`base_period`	Whether to use a "varying" base period or a "universal" base period. Either choice results in the same post-treatment estimates of ATT(g,t)'s. In pre-treatment periods, using a varying base period amounts to computing a pseudo-ATT in each treatment period by comparing the change in outcomes for a particular group relative to its comparison group in the pre-treatment periods (i.e., in pre-treatment periods this setting computes changes from period t-1 to period t, but repeatedly changes the value of t) A universal base period fixes the base period to always be (g-anticipation-1). This does not compute pseudo-ATT(g,t)'s in pre-treatment periods, but rather reports average changes in outcomes from period t to (g-anticipation-1) for a particular group relative to its comparison group. This is analogous to what is often reported in event study regressions. Using a varying base period results in an estimate of ATT(g,t) being reported in the period immediately before treatment. Using a universal base period normalizes the estimate in the period right before treatment (or earlier when the user allows for anticipation) to be equal to 0, but one extra estimate in an earlier period.
`print_details`	Whether or not to show details/progress of computations. Default is `FALSE`.
`faster_mode`	This option enables a faster version of `did`, optimizing computation time for large datasets by improving data management within the package. The default is set to `FALSE`. While the difference is minimal for small datasets, it is recommended for use with large datasets.
`pl`	Whether or not to use parallel processing
`cores`	The number of cores to use for parallel processing
`call`	Function call to att_gt

Value

a DIDparams object

print.AGGTEobj

Description

prints value of a AGGTEobj object

Usage

## S3 method for class 'AGGTEobj'
print(x, ...)
## S3 method for class 'AGGTEobj'
print(x, ...)

Arguments

`x`	a `AGGTEobj` object
`...`	extra arguments

print.MP

Description

prints value of a MP object

Usage

## S3 method for class 'MP'
print(x, ...)
## S3 method for class 'MP'
print(x, ...)

Arguments

`x`	a `MP` object
`...`	extra arguments

Process Results from `compute.att_gt()`

Description

Process Results from compute.att_gt()

Usage

process_attgt(attgt.list)
process_attgt(attgt.list)

Arguments

attgt.list

list of results from compute.att_gt()

Value

list with elements:

`group`	which group a set of results belongs to
`tt`	which time period a set of results belongs to
`att`	the group time average treatment effect

reset.sim

Description

a function to create a "reasonable" set of parameters to create simulated panel data that obeys a parallel trends assumption. In particular, it provides parameters where the the effect of participating in the treatment is equal to one in all post-treatment time periods.

After calling this function, the user can change particular values of the parameters in order to generate dynamics, heterogeneous effects across groups, etc.

Usage

reset.sim(time.periods = 4, n = 5000, ipw = TRUE, reg = TRUE)
reset.sim(time.periods = 4, n = 5000, ipw = TRUE, reg = TRUE)

Arguments

`time.periods`	The number of time periods to include
`n`	The total number of observations
`ipw`	If TRUE, sets parameters so that DGP is compatible with recovering ATT(g,t)'s using IPW (i.e., where logit that just includes a linear term in X works). If FALSE, sets parameters that will be incompatible with IPW. Either way, these parameters can be specified by the user if so desired.
`reg`	If TRUE, sets parameters so that DGP is compatible with recovering ATT(g,t)'s using regressions on untreated untreated potential outcomes. If FALSE, sets parameters that will be incompatible with using regressions (i.e., regressions that include only linear term in X). Either way, these parameters can be specified by the user if so desired.

Value

list of simulation parameters

sim

Description

An internal function that builds simulated data, computes ATT(g,t)'s and some aggregations. It is useful for testing the inference procedures in the did function.

Usage

sim(
  sp_list,
  ret = NULL,
  bstrap = TRUE,
  cband = TRUE,
  control_group = "nevertreated",
  xformla = ~X,
  est_method = "dr",
  clustervars = NULL,
  panel = TRUE
)
sim(
  sp_list,
  ret = NULL,
  bstrap = TRUE,
  cband = TRUE,
  control_group = "nevertreated",
  xformla = ~X,
  est_method = "dr",
  clustervars = NULL,
  panel = TRUE
)

Arguments

`sp_list`	A list of simulation parameters. See `reset.sim` to generate some default values for parameters
`ret`	which type of results to return. The options are `Wpval` (returns 1 if the p-value from a Wald test that all pre-treatment ATT(g,t)'s are equal is less than .05), `cband` (returns 1 if a uniform confidence band covers 0 for groups and times), `simple` (returns 1 if, using the simple treatment effect aggregation results in rejecting that this aggregated treatment effect parameter is equal to 0), `dynamic` (returns 1 if the uniform confidence band from the dynamic treatment effect aggregation covers 0 in all pre- and post-treatment periods). The default value is NULL, and in this case the function will just return the results from the call to `att_gt`.
`bstrap`	whether or not to use the bootstrap to conduct inference (default is TRUE)
`cband`	whether or not to compute uniform confidence bands in the call to `att_gt` (the default is TRUE)
`control_group`	Whether to use the "nevertreated" comparison group (the default) or the "notyettreated" as the comparison group
`xformla`	Formula for covariates in `att_gt` (default is `~X`)
`est_method`	Which estimation method to use in `att_gt` (default is "dr")
`clustervars`	Any additional variables which should be clustered on
`panel`	whether to simulate panel data (the default) or otherwise repeated cross sections data

Value

When ret=NULL, returns the results of the call to att_gt, otherwise returns 1 if the specified test rejects or 0 if not.

Summary Aggregate Treatment Effect Parameter Objects

Description

A function to summarize aggregated treatment effect parameters.

Usage

## S3 method for class 'AGGTEobj'
summary(object, ...)
## S3 method for class 'AGGTEobj'
summary(object, ...)

Arguments

`object`	an `AGGTEobj` object
`...`	other arguments

summary.MP

Description

prints a summary of a MP object

Usage

## S3 method for class 'MP'
summary(object, ...)
## S3 method for class 'MP'
summary(object, ...)

Arguments

`object`	an `MP` object
`...`	extra arguments

summary.MP.TEST

Description

print a summary of test results

Usage

## S3 method for class 'MP.TEST'
summary(object, ...)
## S3 method for class 'MP.TEST'
summary(object, ...)

Arguments

`object`	an MP.TEST object
`...`	other variables

Multiplier Bootstrap for Conditional Moment Test

Description

A slightly modified multiplier bootstrap procedure for the pre-test of the conditional parallel trends assumption

Usage

test.mboot(inf.func, DIDparams, cores = 1)
test.mboot(inf.func, DIDparams, cores = 1)

Arguments

`inf.func`	an influence function
`DIDparams`	DIDparams object
`cores`	The number of cores to use to bootstrap the test statistic in parallel. Default is `cores=1` which corresponds to not running parallel.

Value

list

`bres`	CvM test statistics for each bootstrap iteration
`crit.val`	critical value for CvM test statistic

tidy results from AGGTEobj objects

Description

tidy results from AGGTEobj objects

Usage

## S3 method for class 'AGGTEobj'
tidy(x, ...)
## S3 method for class 'AGGTEobj'
tidy(x, ...)

Arguments

`x`	a model of class AGGTEobj produced by the `aggte()` function
`...`	Additional arguments to tidying method.

tidy results from MP objects

Description

tidy results from MP objects

Usage

## S3 method for class 'MP'
tidy(x, ...)
## S3 method for class 'MP'
tidy(x, ...)

Arguments

`x`	a model of class MP produced by the `att_gt()` function
`...`	Additional arguments to tidying method.

trimmer

Description

A utility function to find observations that appear to violate support conditions. This function is not called anywhere in the code, but it is just useful for debugging some common issues that users run into.

Usage

trimmer(
  g,
  tname,
  idname,
  gname,
  xformla,
  data,
  control_group = "notyettreated",
  threshold = 0.999
)
trimmer(
  g,
  tname,
  idname,
  gname,
  xformla,
  data,
  control_group = "notyettreated",
  threshold = 0.999
)

Arguments

`g`	is a particular group (below I pass in 2009)
`tname`	The name of the column containing the time periods
`idname`	The individual (cross-sectional unit) id name
`gname`	The name of the variable in `data` that contains the first period when a particular observation is treated. This should be a positive number for all observations in treated groups. It defines which "group" a unit belongs to. It should be 0 for units in the untreated group.
`xformla`	A formula for the covariates to include in the model. It should be of the form `~ X1 + X2`. Default is NULL which is equivalent to `xformla=~1`. This is used to create a matrix of covariates which is then passed to the 2x2 DID estimator chosen in `est_method`.
`data`	The name of the data.frame that contains the data
`control_group`	Which units to use the control group. The default is "nevertreated" which sets the control group to be the group of units that never participate in the treatment. This group does not change across groups or time periods. The other option is to set `group="notyettreated"`. In this case, the control group is set to the group of units that have not yet participated in the treatment in that time period. This includes all never treated units, but it includes additional units that eventually participate in the treatment, but have not participated yet.
`threshold`	the cutoff for which observations are flagged as likely violators of the support condition.

Value

list of ids of observations that likely violate support conditions

Package 'did'

Help Index

Aggregate Group-Time Average Treatment Effects

Description

Usage

Arguments

Value

Examples

AGGTEobj

Description

Usage

Arguments

Value

Group-Time Average Treatment Effects

Description

Usage

Arguments

Value

Examples:

References

build_sim_dataset

Description

Usage

Arguments

Value

Pre-Test of Conditional Parallel Trends Assumption

Description

Usage

Arguments

Value

References

Examples

DIDparams

Description

Usage

Arguments

Plot did objects using ggplot2

Description

Usage

Arguments

Plot AGGTEobj objects

Description

Usage

Arguments

Plot MP objects using ggplot2

Description

Usage

Arguments

glance model characteristics from AGGTEobj objects

Description

Usage

Arguments

glance model characteristics from MP objects

Description

Usage

Arguments

indicator

Description

Usage

Arguments

Value

Examples

Multiplier Bootstrap

Description

Usage

Arguments

Value

MP

Description

Usage

Arguments

Value

MP.TEST

Description

Usage

Arguments

County Teen Employment Dataset

Description

Usage

Format

Plot `did` objects using `ggplot2`

Plot `AGGTEobj` objects

Plot `MP` objects using `ggplot2`

Process `did` Function Arguments

Process `did` Function Arguments

Process Results from `compute.att_gt()`