Title: | Treatment Effects with Multiple Periods and Groups |
---|---|
Description: | The standard Difference-in-Differences (DID) setup involves two periods and two groups -- a treated group and untreated group. Many applications of DID methods involve more than two periods and have individuals that are treated at different points in time. This package contains tools for computing average treatment effect parameters in Difference in Differences setups with more than two periods and with variation in treatment timing using the methods developed in Callaway and Sant'Anna (2021) <doi:10.1016/j.jeconom.2020.12.001>. The main parameters are group-time average treatment effects which are the average treatment effect for a particular group at a a particular time. These can be aggregated into a fewer number of treatment effect parameters, and the package deals with the cases where there is selective treatment timing, dynamic treatment effects, calendar time effects, or combinations of these. There are also functions for testing the Difference in Differences assumption, and plotting group-time average treatment effects. |
Authors: | Brantly Callaway [aut, cre], Pedro H. C. Sant'Anna [aut] |
Maintainer: | Brantly Callaway <[email protected]> |
License: | GPL-2 |
Version: | 2.2.1.910 |
Built: | 2025-02-11 05:45:43 UTC |
Source: | https://github.com/bcallaway11/did |
A function to take group-time average treatment effects and aggregate them into a smaller number of parameters. There are several possible aggregations including "simple", "dynamic", "group", and "calendar."
aggte( MP, type = "group", balance_e = NULL, min_e = -Inf, max_e = Inf, na.rm = FALSE, bstrap = NULL, biters = NULL, cband = NULL, alp = NULL, clustervars = NULL )
aggte( MP, type = "group", balance_e = NULL, min_e = -Inf, max_e = Inf, na.rm = FALSE, bstrap = NULL, biters = NULL, cband = NULL, alp = NULL, clustervars = NULL )
MP |
an MP object (i.e., the results of the |
type |
Which type of aggregated treatment effect parameter to compute. One option is "simple" (this just computes a weighted average of all group-time average treatment effects with weights proportional to group size). Other options are "dynamic" (this computes average effects across different lengths of exposure to the treatment and is similar to an "event study"; here the overall effect averages the effect of the treatment across all positive lengths of exposure); "group" (this is the default option and computes average treatment effects across different groups; here the overall effect averages the effect across different groups); and "calendar" (this computes average treatment effects across different time periods; here the overall effect averages the effect across each time period). |
balance_e |
If set (and if one computes dynamic effects), it balances
the sample with respect to event time. For example, if |
min_e |
For event studies, this is the smallest event time to compute
dynamic effects for. By default, |
max_e |
For event studies, this is the largest event time to compute
dynamic effects for. By default, |
na.rm |
Logical value if we are to remove missing Values from analyses. Defaults is FALSE. |
bstrap |
Boolean for whether or not to compute standard errors using
the multiplier bootstrap. If standard errors are clustered, then one
must set |
biters |
The number of bootstrap iterations to use. The default is the value set in the MP object,
and this is only applicable if |
cband |
Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability |
alp |
the significance level, default is value set in the MP object. |
clustervars |
A vector of variables to cluster on. At most, there can be two variables (otherwise will throw an error) and one of these must be the same as idname which allows for clustering at the individual level. Default is the variables set in the MP object |
An AGGTEobj
object that holds the results from the
aggregation
Initial ATT(g,t) estimates from att_gt()
data(mpdta) set.seed(09152024) out <- att_gt(yname="lemp", tname="year", idname="countyreal", gname="first.treat", xformla=NULL, data=mpdta)
You can aggregate the ATT(g,t) in many ways.
Overall ATT:
aggte(out, type = "simple") #> #> Call: #> aggte(MP = out, type = "simple") #> #> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> #> #> #> ATT Std. Error [ 95% Conf. Int.] #> -0.04 0.0123 -0.064 -0.0159 * #> #> #> --- #> Signif. codes: `*' confidence band does not cover 0 #> #> Control Group: Never Treated, Anticipation Periods: 0 #> Estimation Method: Doubly Robust
Dynamic ATT (Event-Study):
aggte(out, type = "dynamic") #> #> Call: #> aggte(MP = out, type = "dynamic") #> #> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> #> #> #> Overall summary of ATT's based on event-study/dynamic aggregation: #> ATT Std. Error [ 95% Conf. Int.] #> -0.0772 0.0209 -0.1181 -0.0363 * #> #> #> Dynamic Effects: #> Event time Estimate Std. Error [95% Simult. Conf. Band] #> -3 0.0305 0.0158 -0.0103 0.0713 #> -2 -0.0006 0.0134 -0.0351 0.0340 #> -1 -0.0245 0.0145 -0.0617 0.0128 #> 0 -0.0199 0.0125 -0.0521 0.0122 #> 1 -0.0510 0.0161 -0.0926 -0.0093 * #> 2 -0.1373 0.0386 -0.2368 -0.0377 * #> 3 -0.1008 0.0345 -0.1899 -0.0117 * #> --- #> Signif. codes: `*' confidence band does not cover 0 #> #> Control Group: Never Treated, Anticipation Periods: 0 #> Estimation Method: Doubly Robust
ATT for each group:
aggte(out, type = "group") #> #> Call: #> aggte(MP = out, type = "group") #> #> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> #> #> #> Overall summary of ATT's based on group/cohort aggregation: #> ATT Std. Error [ 95% Conf. Int.] #> -0.031 0.0126 -0.0558 -0.0062 * #> #> #> Group Effects: #> Group Estimate Std. Error [95% Simult. Conf. Band] #> 2004 -0.0797 0.0281 -0.1407 -0.0188 * #> 2006 -0.0229 0.0156 -0.0568 0.0110 #> 2007 -0.0261 0.0172 -0.0634 0.0113 #> --- #> Signif. codes: `*' confidence band does not cover 0 #> #> Control Group: Never Treated, Anticipation Periods: 0 #> Estimation Method: Doubly Robust
ATT for each calendar year:
aggte(out, type = "calendar") #> #> Call: #> aggte(MP = out, type = "calendar") #> #> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> #> #> #> Overall summary of ATT's based on calendar time aggregation: #> ATT Std. Error [ 95% Conf. Int.] #> -0.0417 0.0172 -0.0755 -0.0079 * #> #> #> Time Effects: #> Time Estimate Std. Error [95% Simult. Conf. Band] #> 2004 -0.0105 0.0251 -0.0701 0.0490 #> 2005 -0.0704 0.0320 -0.1464 0.0056 #> 2006 -0.0488 0.0213 -0.0994 0.0018 #> 2007 -0.0371 0.0139 -0.0700 -0.0041 * #> --- #> Signif. codes: `*' confidence band does not cover 0 #> #> Control Group: Never Treated, Anticipation Periods: 0 #> Estimation Method: Doubly Robust
Objects of this class hold results on aggregated group-time average treatment effects
An object for holding aggregated treatment effect parameters.
AGGTEobj( overall.att = NULL, overall.se = NULL, type = "simple", egt = NULL, att.egt = NULL, se.egt = NULL, crit.val.egt = NULL, inf.function = NULL, min_e = NULL, max_e = NULL, balance_e = NULL, call = NULL, DIDparams = NULL )
AGGTEobj( overall.att = NULL, overall.se = NULL, type = "simple", egt = NULL, att.egt = NULL, se.egt = NULL, crit.val.egt = NULL, inf.function = NULL, min_e = NULL, max_e = NULL, balance_e = NULL, call = NULL, DIDparams = NULL )
overall.att |
The estimated overall ATT |
overall.se |
Standard error for overall ATT |
type |
Which type of aggregated treatment effect parameter to compute. One option is "simple" (this just computes a weighted average of all group-time average treatment effects with weights proportional to group size). Other options are "dynamic" (this computes average effects across different lengths of exposure to the treatment and is similar to an "event study"; here the overall effect averages the effect of the treatment across all positive lengths of exposure); "group" (this is the default option and computes average treatment effects across different groups; here the overall effect averages the effect across different groups); and "calendar" (this computes average treatment effects across different time periods; here the overall effect averages the effect across each time period). |
egt |
Holds the length of exposure (for dynamic effects), the group (for selective treatment timing), or the time period (for calendar time effects) |
att.egt |
The ATT specific to egt |
se.egt |
The standard error specific to egt |
crit.val.egt |
A critical value for computing uniform confidence bands for dynamic effects, selective treatment timing, or time period effects. |
inf.function |
The influence function of the chosen aggregated parameters |
min_e |
For event studies, this is the smallest event time to compute
dynamic effects for. By default, |
max_e |
For event studies, this is the largest event time to compute
dynamic effects for. By default, |
balance_e |
If set (and if one computes dynamic effects), it balances
the sample with respect to event time. For example, if |
call |
The function call to aggte |
DIDparams |
A DIDparams object |
an AGGTEobj
att_gt
computes average treatment effects in DID
setups where there are more than two periods of data and allowing for
treatment to occur at different points in time and allowing for
treatment effect heterogeneity and dynamics.
See Callaway and Sant'Anna (2021) for a detailed description.
att_gt( yname, tname, idname = NULL, gname, xformla = NULL, data, panel = TRUE, allow_unbalanced_panel = FALSE, control_group = c("nevertreated", "notyettreated"), anticipation = 0, weightsname = NULL, alp = 0.05, bstrap = TRUE, cband = TRUE, biters = 1000, clustervars = NULL, est_method = "dr", base_period = "varying", faster_mode = FALSE, print_details = FALSE, pl = FALSE, cores = 1 )
att_gt( yname, tname, idname = NULL, gname, xformla = NULL, data, panel = TRUE, allow_unbalanced_panel = FALSE, control_group = c("nevertreated", "notyettreated"), anticipation = 0, weightsname = NULL, alp = 0.05, bstrap = TRUE, cband = TRUE, biters = 1000, clustervars = NULL, est_method = "dr", base_period = "varying", faster_mode = FALSE, print_details = FALSE, pl = FALSE, cores = 1 )
yname |
The name of the outcome variable |
tname |
The name of the column containing the time periods |
idname |
The individual (cross-sectional unit) id name |
gname |
The name of the variable in |
xformla |
A formula for the covariates to include in the
model. It should be of the form |
data |
The name of the data.frame that contains the data |
panel |
Whether or not the data is a panel dataset.
The panel dataset should be provided in long format – that
is, where each row corresponds to a unit observed at a
particular point in time. The default is TRUE. When
is using a panel dataset, the variable |
allow_unbalanced_panel |
Whether or not function should
"balance" the panel with respect to time and id. The default
values if |
control_group |
Which units to use the control group.
The default is "nevertreated" which sets the control group
to be the group of units that never participate in the
treatment. This group does not change across groups or
time periods. The other option is to set
|
anticipation |
The number of time periods before participating in the treatment where units can anticipate participating in the treatment and therefore it can affect their untreated potential outcomes |
weightsname |
The name of the column containing the sampling weights. If not set, all observations have same weight. |
alp |
the significance level, default is 0.05 |
bstrap |
Boolean for whether or not to compute standard errors using
the multiplier bootstrap. If standard errors are clustered, then one
must set |
cband |
Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability |
biters |
The number of bootstrap iterations to use. The default is 1000,
and this is only applicable if |
clustervars |
A vector of variables names to cluster on. At most, there
can be two variables (otherwise will throw an error) and one of these
must be the same as idname which allows for clustering at the individual
level. By default, we cluster at individual level (when |
est_method |
the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust
approach in the |
base_period |
Whether to use a "varying" base period or a "universal" base period. Either choice results in the same post-treatment estimates of ATT(g,t)'s. In pre-treatment periods, using a varying base period amounts to computing a pseudo-ATT in each treatment period by comparing the change in outcomes for a particular group relative to its comparison group in the pre-treatment periods (i.e., in pre-treatment periods this setting computes changes from period t-1 to period t, but repeatedly changes the value of t) A universal base period fixes the base period to always be (g-anticipation-1). This does not compute pseudo-ATT(g,t)'s in pre-treatment periods, but rather reports average changes in outcomes from period t to (g-anticipation-1) for a particular group relative to its comparison group. This is analogous to what is often reported in event study regressions. Using a varying base period results in an estimate of ATT(g,t) being reported in the period immediately before treatment. Using a universal base period normalizes the estimate in the period right before treatment (or earlier when the user allows for anticipation) to be equal to 0, but one extra estimate in an earlier period. |
faster_mode |
This option enables a faster version of |
print_details |
Whether or not to show details/progress of computations.
Default is |
pl |
Whether or not to use parallel processing |
cores |
The number of cores to use for parallel processing |
an MP
object containing all the results for group-time average
treatment effects
Basic att_gt()
call:
# Example data data(mpdta) set.seed(09152024) out1 <- att_gt(yname="lemp", tname="year", idname="countyreal", gname="first.treat", xformla=NULL, data=mpdta) summary(out1) #> #> Call: #> att_gt(yname = "lemp", tname = "year", idname = "countyreal", #> gname = "first.treat", xformla = NULL, data = mpdta) #> #> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> #> #> Group-Time Average Treatment Effects: #> Group Time ATT(g,t) Std. Error [95% Simult. Conf. Band] #> 2004 2004 -0.0105 0.0246 -0.0755 0.0545 #> 2004 2005 -0.0704 0.0346 -0.1621 0.0212 #> 2004 2006 -0.1373 0.0397 -0.2422 -0.0323 * #> 2004 2007 -0.1008 0.0366 -0.1976 -0.0040 * #> 2006 2004 0.0065 0.0226 -0.0532 0.0663 #> 2006 2005 -0.0028 0.0193 -0.0538 0.0483 #> 2006 2006 -0.0046 0.0185 -0.0536 0.0444 #> 2006 2007 -0.0412 0.0208 -0.0962 0.0137 #> 2007 2004 0.0305 0.0146 -0.0081 0.0692 #> 2007 2005 -0.0027 0.0162 -0.0457 0.0402 #> 2007 2006 -0.0311 0.0182 -0.0793 0.0172 #> 2007 2007 -0.0261 0.0174 -0.0722 0.0201 #> --- #> Signif. codes: `*' confidence band does not cover 0 #> #> P-value for pre-test of parallel trends assumption: 0.16812 #> Control Group: Never Treated, Anticipation Periods: 0 #> Estimation Method: Doubly Robust
Using covariates:
out2 <- att_gt(yname="lemp", tname="year", idname="countyreal", gname="first.treat", xformla=~lpop, data=mpdta) summary(out2) #> #> Call: #> att_gt(yname = "lemp", tname = "year", idname = "countyreal", #> gname = "first.treat", xformla = ~lpop, data = mpdta) #> #> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> #> #> Group-Time Average Treatment Effects: #> Group Time ATT(g,t) Std. Error [95% Simult. Conf. Band] #> 2004 2004 -0.0145 0.0249 -0.0817 0.0527 #> 2004 2005 -0.0764 0.0307 -0.1592 0.0064 #> 2004 2006 -0.1404 0.0370 -0.2403 -0.0406 * #> 2004 2007 -0.1069 0.0331 -0.1962 -0.0176 * #> 2006 2004 -0.0005 0.0215 -0.0583 0.0574 #> 2006 2005 -0.0062 0.0181 -0.0549 0.0425 #> 2006 2006 0.0010 0.0190 -0.0502 0.0521 #> 2006 2007 -0.0413 0.0207 -0.0971 0.0145 #> 2007 2004 0.0267 0.0143 -0.0117 0.0652 #> 2007 2005 -0.0046 0.0153 -0.0459 0.0368 #> 2007 2006 -0.0284 0.0197 -0.0816 0.0247 #> 2007 2007 -0.0288 0.0157 -0.0712 0.0136 #> --- #> Signif. codes: `*' confidence band does not cover 0 #> #> P-value for pre-test of parallel trends assumption: 0.23267 #> Control Group: Never Treated, Anticipation Periods: 0 #> Estimation Method: Doubly Robust
Specify comparison units:
out3 <- att_gt(yname="lemp", tname="year", idname="countyreal", gname="first.treat", xformla=~lpop, control_group = "notyettreated", data=mpdta) summary(out3) #> #> Call: #> att_gt(yname = "lemp", tname = "year", idname = "countyreal", #> gname = "first.treat", xformla = ~lpop, data = mpdta, control_group = "notyettreated") #> #> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> #> #> Group-Time Average Treatment Effects: #> Group Time ATT(g,t) Std. Error [95% Simult. Conf. Band] #> 2004 2004 -0.0212 0.0219 -0.0797 0.0374 #> 2004 2005 -0.0816 0.0299 -0.1617 -0.0015 * #> 2004 2006 -0.1382 0.0375 -0.2387 -0.0376 * #> 2004 2007 -0.1069 0.0354 -0.2016 -0.0122 * #> 2006 2004 -0.0075 0.0216 -0.0653 0.0504 #> 2006 2005 -0.0046 0.0184 -0.0539 0.0448 #> 2006 2006 0.0087 0.0167 -0.0362 0.0535 #> 2006 2007 -0.0413 0.0192 -0.0927 0.0101 #> 2007 2004 0.0269 0.0146 -0.0122 0.0661 #> 2007 2005 -0.0042 0.0160 -0.0470 0.0386 #> 2007 2006 -0.0284 0.0182 -0.0773 0.0204 #> 2007 2007 -0.0288 0.0176 -0.0759 0.0184 #> --- #> Signif. codes: `*' confidence band does not cover 0 #> #> P-value for pre-test of parallel trends assumption: 0.23326 #> Control Group: Not Yet Treated, Anticipation Periods: 0 #> Estimation Method: Doubly Robust
Callaway, Brantly and Pedro H.C. Sant'Anna. \"Difference-in-Differences with Multiple Time Periods.\" Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. doi:10.1016/j.jeconom.2020.12.001, https://arxiv.org/abs/1803.09015
A function for building simulated data
build_sim_dataset(sp_list, panel = TRUE)
build_sim_dataset(sp_list, panel = TRUE)
sp_list |
A list of simulation parameters. See |
panel |
whether to construct panel data (the default) or repeated cross sections data |
a data.frame with the following columns
G observations group
X value of covariate
id observation's id
cluster observation's cluster (by construction there is no within-cluster correlation)
period time period for current observation
Y outcome
treat whether or not this unit is ever treated
An integrated moments test for the conditional parallel trends assumption holding in all pre-treatment time periods for all groups
conditional_did_pretest( yname, tname, idname = NULL, gname, xformla = NULL, data, panel = TRUE, allow_unbalanced_panel = FALSE, control_group = c("nevertreated", "notyettreated"), weightsname = NULL, alp = 0.05, bstrap = TRUE, cband = TRUE, biters = 1000, clustervars = NULL, est_method = "ipw", print_details = FALSE, pl = FALSE, cores = 1 )
conditional_did_pretest( yname, tname, idname = NULL, gname, xformla = NULL, data, panel = TRUE, allow_unbalanced_panel = FALSE, control_group = c("nevertreated", "notyettreated"), weightsname = NULL, alp = 0.05, bstrap = TRUE, cband = TRUE, biters = 1000, clustervars = NULL, est_method = "ipw", print_details = FALSE, pl = FALSE, cores = 1 )
yname |
The name of the outcome variable |
tname |
The name of the column containing the time periods |
idname |
The individual (cross-sectional unit) id name |
gname |
The name of the variable in |
xformla |
A formula for the covariates to include in the
model. It should be of the form |
data |
The name of the data.frame that contains the data |
panel |
Whether or not the data is a panel dataset.
The panel dataset should be provided in long format – that
is, where each row corresponds to a unit observed at a
particular point in time. The default is TRUE. When
is using a panel dataset, the variable |
allow_unbalanced_panel |
Whether or not function should
"balance" the panel with respect to time and id. The default
values if |
control_group |
Which units to use the control group.
The default is "nevertreated" which sets the control group
to be the group of units that never participate in the
treatment. This group does not change across groups or
time periods. The other option is to set
|
weightsname |
The name of the column containing the sampling weights. If not set, all observations have same weight. |
alp |
the significance level, default is 0.05 |
bstrap |
Boolean for whether or not to compute standard errors using
the multiplier bootstrap. If standard errors are clustered, then one
must set |
cband |
Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability |
biters |
The number of bootstrap iterations to use. The default is 1000,
and this is only applicable if |
clustervars |
A vector of variables names to cluster on. At most, there
can be two variables (otherwise will throw an error) and one of these
must be the same as idname which allows for clustering at the individual
level. By default, we cluster at individual level (when |
est_method |
the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust
approach in the |
print_details |
Whether or not to show details/progress of computations.
Default is |
pl |
Whether or not to use parallel processing |
cores |
The number of cores to use for parallel processing |
an MP.TEST
object
Callaway, Brantly and Sant'Anna, Pedro H. C. "Difference-in-Differences with Multiple Time Periods and an Application on the Minimum Wage and Employment." Working Paper https://arxiv.org/abs/1803.09015v2 (2018).
## Not run: data(mpdta) pre.test <- conditional_did_pretest(yname="lemp", tname="year", idname="countyreal", gname="first.treat", xformla=~lpop, data=mpdta) summary(pre.test) ## End(Not run)
## Not run: data(mpdta) pre.test <- conditional_did_pretest(yname="lemp", tname="year", idname="countyreal", gname="first.treat", xformla=~lpop, data=mpdta) summary(pre.test) ## End(Not run)
Object to hold did parameters that are passed across functions
DIDparams( yname, tname, idname = NULL, gname, xformla = NULL, data, control_group, anticipation = 0, weightsname = NULL, alp = 0.05, bstrap = TRUE, biters = 1000, clustervars = NULL, cband = TRUE, print_details = TRUE, faster_mode = FALSE, pl = FALSE, cores = 1, est_method = "dr", base_period = "varying", panel = TRUE, true_repeated_cross_sections, n = NULL, nG = NULL, nT = NULL, tlist = NULL, glist = NULL, call = NULL )
DIDparams( yname, tname, idname = NULL, gname, xformla = NULL, data, control_group, anticipation = 0, weightsname = NULL, alp = 0.05, bstrap = TRUE, biters = 1000, clustervars = NULL, cband = TRUE, print_details = TRUE, faster_mode = FALSE, pl = FALSE, cores = 1, est_method = "dr", base_period = "varying", panel = TRUE, true_repeated_cross_sections, n = NULL, nG = NULL, nT = NULL, tlist = NULL, glist = NULL, call = NULL )
yname |
The name of the outcome variable |
tname |
The name of the column containing the time periods |
idname |
The individual (cross-sectional unit) id name |
gname |
The name of the variable in |
xformla |
A formula for the covariates to include in the
model. It should be of the form |
data |
The name of the data.frame that contains the data |
control_group |
Which units to use the control group.
The default is "nevertreated" which sets the control group
to be the group of units that never participate in the
treatment. This group does not change across groups or
time periods. The other option is to set
|
anticipation |
The number of time periods before participating in the treatment where units can anticipate participating in the treatment and therefore it can affect their untreated potential outcomes |
weightsname |
The name of the column containing the sampling weights. If not set, all observations have same weight. |
alp |
the significance level, default is 0.05 |
bstrap |
Boolean for whether or not to compute standard errors using
the multiplier bootstrap. If standard errors are clustered, then one
must set |
biters |
The number of bootstrap iterations to use. The default is 1000,
and this is only applicable if |
clustervars |
A vector of variables names to cluster on. At most, there
can be two variables (otherwise will throw an error) and one of these
must be the same as idname which allows for clustering at the individual
level. By default, we cluster at individual level (when |
cband |
Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability |
print_details |
Whether or not to show details/progress of computations.
Default is |
faster_mode |
This option enables a faster version of |
pl |
Whether or not to use parallel processing |
cores |
The number of cores to use for parallel processing |
est_method |
the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust
approach in the |
base_period |
Whether to use a "varying" base period or a "universal" base period. Either choice results in the same post-treatment estimates of ATT(g,t)'s. In pre-treatment periods, using a varying base period amounts to computing a pseudo-ATT in each treatment period by comparing the change in outcomes for a particular group relative to its comparison group in the pre-treatment periods (i.e., in pre-treatment periods this setting computes changes from period t-1 to period t, but repeatedly changes the value of t) A universal base period fixes the base period to always be (g-anticipation-1). This does not compute pseudo-ATT(g,t)'s in pre-treatment periods, but rather reports average changes in outcomes from period t to (g-anticipation-1) for a particular group relative to its comparison group. This is analogous to what is often reported in event study regressions. Using a varying base period results in an estimate of ATT(g,t) being reported in the period immediately before treatment. Using a universal base period normalizes the estimate in the period right before treatment (or earlier when the user allows for anticipation) to be equal to 0, but one extra estimate in an earlier period. |
panel |
Whether or not the data is a panel dataset.
The panel dataset should be provided in long format – that
is, where each row corresponds to a unit observed at a
particular point in time. The default is TRUE. When
is using a panel dataset, the variable |
true_repeated_cross_sections |
Whether or not the data really is repeated cross sections. (We include this because unbalanced panel code runs through the repeated cross sections code) |
n |
The number of observations. This is equal to the number of units (which may be different from the number of rows in a panel dataset). |
nG |
The number of groups |
nT |
The number of time periods |
tlist |
a vector containing each time period |
glist |
a vector containing each group |
call |
Function call to att_gt |
did
objects using ggplot2
Function to plot objects from the did
package
ggdid(object, ...)
ggdid(object, ...)
object |
either a |
... |
other arguments |
AGGTEobj
objectsA function to plot AGGTEobj
objects
## S3 method for class 'AGGTEobj' ggdid( object, ylim = NULL, xlab = NULL, ylab = NULL, title = "", xgap = 1, legend = TRUE, ref_line = 0, theming = TRUE, ... )
## S3 method for class 'AGGTEobj' ggdid( object, ylim = NULL, xlab = NULL, ylab = NULL, title = "", xgap = 1, legend = TRUE, ref_line = 0, theming = TRUE, ... )
object |
either a |
ylim |
optional y limits for the plot; setting here makes the y limits the same across different plots |
xlab |
optional x-axis label |
ylab |
optional y-axis label |
title |
optional plot title |
xgap |
optional gap between the labels on the x-axis. For example,
|
legend |
Whether or not to include a legend (which will indicate color
of pre- and post-treatment estimates). Default is |
ref_line |
A reference line at this value, usually to compare confidence intervals to 0. Set to NULL to omit. |
theming |
Set to FALSE to skip all theming so you can do it yourself. |
... |
other arguments |
MP
objects using ggplot2
A function to plot MP
objects
## S3 method for class 'MP' ggdid( object, ylim = NULL, xlab = NULL, ylab = NULL, title = "Group", xgap = 1, ncol = 1, legend = TRUE, group = NULL, ref_line = 0, theming = TRUE, grtitle = "Group", ... )
## S3 method for class 'MP' ggdid( object, ylim = NULL, xlab = NULL, ylab = NULL, title = "Group", xgap = 1, ncol = 1, legend = TRUE, group = NULL, ref_line = 0, theming = TRUE, grtitle = "Group", ... )
object |
either a |
ylim |
optional y limits for the plot; setting here makes the y limits the same across different plots |
xlab |
optional x-axis label |
ylab |
optional y-axis label |
title |
optional plot title |
xgap |
optional gap between the labels on the x-axis. For example,
|
ncol |
The number of columns to include in the resulting plot. The default is 1. |
legend |
Whether or not to include a legend (which will indicate color
of pre- and post-treatment estimates). Default is |
group |
Vector for which groups to include in the plots of ATT(g,t).
Default is NULL, and, in this case, plots for all groups will be included ( |
ref_line |
A reference line at this value, usually to compare confidence intervals to 0. Set to NULL to omit. |
theming |
Set to FALSE to skip all theming so you can do it yourself. |
grtitle |
Title to append before each group name ( |
... |
other arguments |
glance model characteristics from AGGTEobj objects
## S3 method for class 'AGGTEobj' glance(x, ...)
## S3 method for class 'AGGTEobj' glance(x, ...)
x |
a model of class AGGTEobj produced by the |
... |
other arguments passed to methods |
glance model characteristics from MP objects
## S3 method for class 'MP' glance(x, ...)
## S3 method for class 'MP' glance(x, ...)
x |
a model of class MP produced by the |
... |
other arguments passed to methods |
indicator weighting function
indicator(X, u)
indicator(X, u)
X |
matrix of X's from the data |
u |
a particular value to compare X's to |
numeric vector
data(mpdta) dta <- subset(mpdta, year==2007) X <- model.matrix(~lpop, data=dta) X <- indicator(X, X[1,])
data(mpdta) dta <- subset(mpdta, year==2007) X <- model.matrix(~lpop, data=dta) X <- indicator(X, X[1,])
A function to take an influence function and use the multiplier bootstrap to compute standard errors and critical values for uniform confidence bands.
mboot(inf.func, DIDparams, pl = FALSE, cores = 1)
mboot(inf.func, DIDparams, pl = FALSE, cores = 1)
inf.func |
an influence function |
DIDparams |
DIDparams object |
pl |
whether or not to use parallel processing in the multiplier bootstrap, default=FALSE |
cores |
the number of cores to use with parallel processing, default=1 |
list with elements
bres |
results from each bootstrap iteration |
V |
variance matrix |
se |
standard errors |
crit.val |
a critical value for computing uniform confidence bands |
Multi-period objects that hold results for group-time average treatment effects
MP( group, t, att, V_analytical, se, c, inffunc, n = NULL, W = NULL, Wpval = NULL, aggte = NULL, alp = 0.05, DIDparams = NULL )
MP( group, t, att, V_analytical, se, c, inffunc, n = NULL, W = NULL, Wpval = NULL, aggte = NULL, alp = 0.05, DIDparams = NULL )
group |
which group (defined by period first treated) an group-time average treatment effect is for |
t |
which time period a group-time average treatment effect is for |
att |
the group-average treatment effect for group |
V_analytical |
Analytical estimator for the asymptotic variance-covariance matrix for group-time average treatment effects |
se |
standard errors for group-time average treatment effects. If bootstrap is set to TRUE, this provides bootstrap-based se. |
c |
simultaneous critical value if one is obtaining simultaneous confidence bands. Otherwise it reports the critical value based on pointwise normal approximation. |
inffunc |
the influence function for estimating group-time average treatment effects |
n |
the number of unique cross-sectional units (unique values of idname) |
W |
the Wald statistic for pre-testing the common trends assumption |
Wpval |
the p-value of the Wald statistic for pre-testing the common trends assumption |
aggte |
an aggregate treatment effects object |
alp |
the significance level, default is 0.05 |
DIDparams |
a |
MP object
An object that holds results from computing pre-test of the conditional parallel trends assumption
MP.TEST( CvM = NULL, CvMb = NULL, CvMcval = NULL, CvMpval = NULL, KS = NULL, KSb = NULL, KScval = NULL, KSpval = NULL, clustervars = NULL, xformla = NULL )
MP.TEST( CvM = NULL, CvMb = NULL, CvMcval = NULL, CvMpval = NULL, KS = NULL, KSb = NULL, KScval = NULL, KSpval = NULL, clustervars = NULL, xformla = NULL )
CvM |
Cramer von Mises test statistic |
CvMb |
a vector of bootstrapped Cramer von Mises test statistics |
CvMcval |
CvM critical value |
CvMpval |
p-value for CvM test |
KS |
Kolmogorov-Smirnov test statistic |
KSb |
a vector of bootstrapped KS test statistics |
KScval |
KS critical value |
KSpval |
p-value for KS test |
clustervars |
vector of which variables were clustered on for the test |
xformla |
formla for the X variables used in the test |
A dataset containing (the log of) teen employment in 500 counties in the U.S. from 2004 to 2007. This is a subset of the dataset used in Callaway and Sant'Anna (2021). See that paper for additional descriptions.
mpdta
mpdta
A data frame with 2000 rows and 5 variables:
the year of the observation
a unique identifier for a particular county
the log of 1000s of population for the county
the log of teen employment in the county
the year that the state where the county is located raised its minimum wage, it is set equal to 0 for counties that have minimum wages equal to the federal minimum wage over the entire period.
whether or not a particular county is treated in that year
Callaway and Sant'Anna (2020)
did
Function ArgumentsFunction to process arguments passed to the main methods in the
did
package as well as conducting some tests to make sure
data is in proper format / try to throw helpful error messages.
pre_process_did( yname, tname, idname, gname, xformla = NULL, data, panel = TRUE, allow_unbalanced_panel, control_group = c("nevertreated", "notyettreated"), anticipation = 0, weightsname = NULL, alp = 0.05, bstrap = FALSE, cband = FALSE, biters = 1000, clustervars = NULL, est_method = "dr", base_period = "varying", print_details = TRUE, faster_mode = FALSE, pl = FALSE, cores = 1, call = NULL )
pre_process_did( yname, tname, idname, gname, xformla = NULL, data, panel = TRUE, allow_unbalanced_panel, control_group = c("nevertreated", "notyettreated"), anticipation = 0, weightsname = NULL, alp = 0.05, bstrap = FALSE, cband = FALSE, biters = 1000, clustervars = NULL, est_method = "dr", base_period = "varying", print_details = TRUE, faster_mode = FALSE, pl = FALSE, cores = 1, call = NULL )
yname |
The name of the outcome variable |
tname |
The name of the column containing the time periods |
idname |
The individual (cross-sectional unit) id name |
gname |
The name of the variable in |
xformla |
A formula for the covariates to include in the
model. It should be of the form |
data |
The name of the data.frame that contains the data |
panel |
Whether or not the data is a panel dataset.
The panel dataset should be provided in long format – that
is, where each row corresponds to a unit observed at a
particular point in time. The default is TRUE. When
is using a panel dataset, the variable |
allow_unbalanced_panel |
Whether or not function should
"balance" the panel with respect to time and id. The default
values if |
control_group |
Which units to use the control group.
The default is "nevertreated" which sets the control group
to be the group of units that never participate in the
treatment. This group does not change across groups or
time periods. The other option is to set
|
anticipation |
The number of time periods before participating in the treatment where units can anticipate participating in the treatment and therefore it can affect their untreated potential outcomes |
weightsname |
The name of the column containing the sampling weights. If not set, all observations have same weight. |
alp |
the significance level, default is 0.05 |
bstrap |
Boolean for whether or not to compute standard errors using
the multiplier bootstrap. If standard errors are clustered, then one
must set |
cband |
Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability |
biters |
The number of bootstrap iterations to use. The default is 1000,
and this is only applicable if |
clustervars |
A vector of variables names to cluster on. At most, there
can be two variables (otherwise will throw an error) and one of these
must be the same as idname which allows for clustering at the individual
level. By default, we cluster at individual level (when |
est_method |
the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust
approach in the |
base_period |
Whether to use a "varying" base period or a "universal" base period. Either choice results in the same post-treatment estimates of ATT(g,t)'s. In pre-treatment periods, using a varying base period amounts to computing a pseudo-ATT in each treatment period by comparing the change in outcomes for a particular group relative to its comparison group in the pre-treatment periods (i.e., in pre-treatment periods this setting computes changes from period t-1 to period t, but repeatedly changes the value of t) A universal base period fixes the base period to always be (g-anticipation-1). This does not compute pseudo-ATT(g,t)'s in pre-treatment periods, but rather reports average changes in outcomes from period t to (g-anticipation-1) for a particular group relative to its comparison group. This is analogous to what is often reported in event study regressions. Using a varying base period results in an estimate of ATT(g,t) being reported in the period immediately before treatment. Using a universal base period normalizes the estimate in the period right before treatment (or earlier when the user allows for anticipation) to be equal to 0, but one extra estimate in an earlier period. |
print_details |
Whether or not to show details/progress of computations.
Default is |
faster_mode |
This option enables a faster version of |
pl |
Whether or not to use parallel processing |
cores |
The number of cores to use for parallel processing |
call |
Function call to att_gt |
a DIDparams
object
did
Function ArgumentsFunction to process arguments passed to the main methods in the
did
package as well as conducting some tests to make sure
data is in proper format / try to throw helpful error messages.
pre_process_did2( yname, tname, idname, gname, xformla = NULL, data, panel = TRUE, allow_unbalanced_panel, control_group = c("nevertreated", "notyettreated"), anticipation = 0, weightsname = NULL, alp = 0.05, bstrap = FALSE, cband = FALSE, biters = 1000, clustervars = NULL, est_method = "dr", base_period = "varying", print_details = TRUE, faster_mode = FALSE, pl = FALSE, cores = 1, call = NULL )
pre_process_did2( yname, tname, idname, gname, xformla = NULL, data, panel = TRUE, allow_unbalanced_panel, control_group = c("nevertreated", "notyettreated"), anticipation = 0, weightsname = NULL, alp = 0.05, bstrap = FALSE, cband = FALSE, biters = 1000, clustervars = NULL, est_method = "dr", base_period = "varying", print_details = TRUE, faster_mode = FALSE, pl = FALSE, cores = 1, call = NULL )
yname |
The name of the outcome variable |
tname |
The name of the column containing the time periods |
idname |
The individual (cross-sectional unit) id name |
gname |
The name of the variable in |
xformla |
A formula for the covariates to include in the
model. It should be of the form |
data |
The name of the data.frame that contains the data |
panel |
Whether or not the data is a panel dataset.
The panel dataset should be provided in long format – that
is, where each row corresponds to a unit observed at a
particular point in time. The default is TRUE. When
is using a panel dataset, the variable |
allow_unbalanced_panel |
Whether or not function should
"balance" the panel with respect to time and id. The default
values if |
control_group |
Which units to use the control group.
The default is "nevertreated" which sets the control group
to be the group of units that never participate in the
treatment. This group does not change across groups or
time periods. The other option is to set
|
anticipation |
The number of time periods before participating in the treatment where units can anticipate participating in the treatment and therefore it can affect their untreated potential outcomes |
weightsname |
The name of the column containing the sampling weights. If not set, all observations have same weight. |
alp |
the significance level, default is 0.05 |
bstrap |
Boolean for whether or not to compute standard errors using
the multiplier bootstrap. If standard errors are clustered, then one
must set |
cband |
Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability |
biters |
The number of bootstrap iterations to use. The default is 1000,
and this is only applicable if |
clustervars |
A vector of variables names to cluster on. At most, there
can be two variables (otherwise will throw an error) and one of these
must be the same as idname which allows for clustering at the individual
level. By default, we cluster at individual level (when |
est_method |
the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust
approach in the |
base_period |
Whether to use a "varying" base period or a "universal" base period. Either choice results in the same post-treatment estimates of ATT(g,t)'s. In pre-treatment periods, using a varying base period amounts to computing a pseudo-ATT in each treatment period by comparing the change in outcomes for a particular group relative to its comparison group in the pre-treatment periods (i.e., in pre-treatment periods this setting computes changes from period t-1 to period t, but repeatedly changes the value of t) A universal base period fixes the base period to always be (g-anticipation-1). This does not compute pseudo-ATT(g,t)'s in pre-treatment periods, but rather reports average changes in outcomes from period t to (g-anticipation-1) for a particular group relative to its comparison group. This is analogous to what is often reported in event study regressions. Using a varying base period results in an estimate of ATT(g,t) being reported in the period immediately before treatment. Using a universal base period normalizes the estimate in the period right before treatment (or earlier when the user allows for anticipation) to be equal to 0, but one extra estimate in an earlier period. |
print_details |
Whether or not to show details/progress of computations.
Default is |
faster_mode |
This option enables a faster version of |
pl |
Whether or not to use parallel processing |
cores |
The number of cores to use for parallel processing |
call |
Function call to att_gt |
a DIDparams
object
prints value of a AGGTEobj
object
## S3 method for class 'AGGTEobj' print(x, ...)
## S3 method for class 'AGGTEobj' print(x, ...)
x |
a |
... |
extra arguments |
prints value of a MP
object
## S3 method for class 'MP' print(x, ...)
## S3 method for class 'MP' print(x, ...)
x |
a |
... |
extra arguments |
compute.att_gt()
Process Results from compute.att_gt()
process_attgt(attgt.list)
process_attgt(attgt.list)
attgt.list |
list of results from |
list with elements:
group |
which group a set of results belongs to |
tt |
which time period a set of results belongs to |
att |
the group time average treatment effect |
a function to create a "reasonable" set of parameters to create simulated panel data that obeys a parallel trends assumption. In particular, it provides parameters where the the effect of participating in the treatment is equal to one in all post-treatment time periods.
After calling this function, the user can change particular values of the parameters in order to generate dynamics, heterogeneous effects across groups, etc.
reset.sim(time.periods = 4, n = 5000, ipw = TRUE, reg = TRUE)
reset.sim(time.periods = 4, n = 5000, ipw = TRUE, reg = TRUE)
time.periods |
The number of time periods to include |
n |
The total number of observations |
ipw |
If TRUE, sets parameters so that DGP is compatible with recovering ATT(g,t)'s using IPW (i.e., where logit that just includes a linear term in X works). If FALSE, sets parameters that will be incompatible with IPW. Either way, these parameters can be specified by the user if so desired. |
reg |
If TRUE, sets parameters so that DGP is compatible with recovering ATT(g,t)'s using regressions on untreated untreated potential outcomes. If FALSE, sets parameters that will be incompatible with using regressions (i.e., regressions that include only linear term in X). Either way, these parameters can be specified by the user if so desired. |
list of simulation parameters
An internal function that builds simulated data, computes
ATT(g,t)'s and some aggregations. It is useful for testing the inference
procedures in the did
function.
sim( sp_list, ret = NULL, bstrap = TRUE, cband = TRUE, control_group = "nevertreated", xformla = ~X, est_method = "dr", clustervars = NULL, panel = TRUE )
sim( sp_list, ret = NULL, bstrap = TRUE, cband = TRUE, control_group = "nevertreated", xformla = ~X, est_method = "dr", clustervars = NULL, panel = TRUE )
sp_list |
A list of simulation parameters. See |
ret |
which type of results to return. The options are |
bstrap |
whether or not to use the bootstrap to conduct inference (default is TRUE) |
cband |
whether or not to compute uniform confidence bands in the call to |
control_group |
Whether to use the "nevertreated" comparison group (the default) or the "notyettreated" as the comparison group |
xformla |
Formula for covariates in |
est_method |
Which estimation method to use in |
clustervars |
Any additional variables which should be clustered on |
panel |
whether to simulate panel data (the default) or otherwise repeated cross sections data |
When ret=NULL
, returns the results of the call to att_gt
, otherwise returns
1 if the specified test rejects or 0 if not.
A function to summarize aggregated treatment effect parameters.
## S3 method for class 'AGGTEobj' summary(object, ...)
## S3 method for class 'AGGTEobj' summary(object, ...)
object |
an |
... |
other arguments |
prints a summary of a MP
object
## S3 method for class 'MP' summary(object, ...)
## S3 method for class 'MP' summary(object, ...)
object |
an |
... |
extra arguments |
print a summary of test results
## S3 method for class 'MP.TEST' summary(object, ...)
## S3 method for class 'MP.TEST' summary(object, ...)
object |
an MP.TEST object |
... |
other variables |
A slightly modified multiplier bootstrap procedure for the pre-test of the conditional parallel trends assumption
test.mboot(inf.func, DIDparams, cores = 1)
test.mboot(inf.func, DIDparams, cores = 1)
inf.func |
an influence function |
DIDparams |
DIDparams object |
cores |
The number of cores to use to bootstrap the test
statistic in parallel. Default is |
list
bres |
CvM test statistics for each bootstrap iteration |
crit.val |
critical value for CvM test statistic |
tidy results from AGGTEobj objects
## S3 method for class 'AGGTEobj' tidy(x, ...)
## S3 method for class 'AGGTEobj' tidy(x, ...)
x |
a model of class AGGTEobj produced by the |
... |
Additional arguments to tidying method. |
tidy results from MP objects
## S3 method for class 'MP' tidy(x, ...)
## S3 method for class 'MP' tidy(x, ...)
x |
a model of class MP produced by the |
... |
Additional arguments to tidying method. |
A utility function to find observations that appear to violate support conditions. This function is not called anywhere in the code, but it is just useful for debugging some common issues that users run into.
trimmer( g, tname, idname, gname, xformla, data, control_group = "notyettreated", threshold = 0.999 )
trimmer( g, tname, idname, gname, xformla, data, control_group = "notyettreated", threshold = 0.999 )
g |
is a particular group (below I pass in 2009) |
tname |
The name of the column containing the time periods |
idname |
The individual (cross-sectional unit) id name |
gname |
The name of the variable in |
xformla |
A formula for the covariates to include in the
model. It should be of the form |
data |
The name of the data.frame that contains the data |
control_group |
Which units to use the control group.
The default is "nevertreated" which sets the control group
to be the group of units that never participate in the
treatment. This group does not change across groups or
time periods. The other option is to set
|
threshold |
the cutoff for which observations are flagged as likely violators of the support condition. |
list of ids of observations that likely violate support conditions