| Title: | Panel Treatment Effects Tools |
|---|---|
| Description: | Generic code for estimating treatment effects with panel data. The idea is to break into separate steps organizing the data, looping over groups and time periods, computing group-time average treatment effects, and aggregating group-time average treatment effects. Often, one is able to implement a new identification/estimation procedure by simply replacing the step on estimating group-time average treatment effects. See several different examples of this approach in the package documentation. |
| Authors: | Brantly Callaway [aut, cre] |
| Maintainer: | Brantly Callaway <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.1 |
| Built: | 2026-05-25 22:15:05 UTC |
| Source: | https://github.com/bcallaway11/ptetools |
Objects of this class hold results on aggregated
group-time average treatment effects. This is derived from the AGGTEobj
class in the did package.
An object for holding aggregated treatment effect parameters.
aggte_obj( overall.att = NULL, overall.se = NULL, type = "simple", egt = NULL, att.egt = NULL, se.egt = NULL, crit.val.egt = NULL, inf.function = NULL, min_e = NULL, max_e = NULL, balance_e = NULL, DIDparams = NULL )aggte_obj( overall.att = NULL, overall.se = NULL, type = "simple", egt = NULL, att.egt = NULL, se.egt = NULL, crit.val.egt = NULL, inf.function = NULL, min_e = NULL, max_e = NULL, balance_e = NULL, DIDparams = NULL )
overall.att |
The estimated overall ATT |
overall.se |
Standard error for overall ATT |
type |
The type of aggregation to be done. Default is "overall". |
egt |
Holds the length of exposure (for dynamic effects), the group (for selective treatment timing), or the time period (for calendar time effects) |
att.egt |
The ATT specific to egt |
se.egt |
The standard error specific to egt |
crit.val.egt |
A critical value for computing uniform confidence bands for dynamic effects, selective treatment timing, or time period effects. |
inf.function |
The influence function of the chosen aggregated parameters |
min_e |
The minimum event time computed in the event study results. This is useful when there are a huge number of pre-treatment periods. |
max_e |
The maximum event time computed in the event study results. This is useful when there are a huge number of post-treatment periods. |
balance_e |
Drops groups that do not have at least |
DIDparams |
A DIDparams object |
an aggte_obj
Class for holding group-time average treatment effects along with their influence function
attgt_if(attgt, inf_func, extra_gt_returns = NULL)attgt_if(attgt, inf_func, extra_gt_returns = NULL)
attgt |
group-time average treatment effect |
inf_func |
influence function |
extra_gt_returns |
A place to return anything extra from particular group-time average treatment effect calculations. For DID, this might be something like propensity score estimates, regressions of untreated potential outcomes on covariates. For ife, this could be something like the first step regression 2sls estimates. This argument is also potentially useful for debugging. |
attgt_if object
Class for holding returns from group-time specific estimates in settings when an influence function is not returned
attgt_noif(attgt, extra_gt_returns = NULL)attgt_noif(attgt, extra_gt_returns = NULL)
attgt |
group-time average treatment effect |
extra_gt_returns |
A place to return anything extra from particular group-time average treatment effect calculations. For DID, this might be something like propensity score estimates, regressions of untreated potential outcomes on covariates. For ife, this could be something like the first step regression 2sls estimates. This argument is also potentially useful for debugging. |
an attgt_noif object
Aggregate group-time average treatment effects into overall, group, and dynamic effects. This function is only used for (i) computing standard errors using the empirical bootstrap, and (ii) combining distributions at the (g,t) level
attgt_pte_aggregations(attgt.list, ptep)attgt_pte_aggregations(attgt.list, ptep)
attgt.list |
list of attgt results from |
ptep |
|
pte_emp_boot object
Plot dose-specific results for a continuous treatment.
## S3 method for class 'dose_obj' autoplot(object, type = "att", ...)## S3 method for class 'dose_obj' autoplot(object, type = "att", ...)
object |
a |
type |
whether to plot |
... |
unused |
a ggplot object
Event-study plot for a pte_emp_boot object returned by
empirical-bootstrap estimators (e.g., cic(), qdid(),
mdid()). Pre- and post-treatment periods are distinguished by color.
## S3 method for class 'pte_emp_boot' autoplot(object, ...)## S3 method for class 'pte_emp_boot' autoplot(object, ...)
object |
a |
... |
unused |
a ggplot object
Plot a pte_qtt object.
For type = "overall": QTT curve with quantile on the x-axis.
For type = "dynamic": event-study plot with event time on the x-axis.
Each selected quantile is a separate colored line. CIs are shown by default
when a single quantile is plotted, and suppressed by default when multiple
quantiles are plotted.
## S3 method for class 'pte_qtt' autoplot( object, type = "overall", cband = TRUE, plot_probs = 0.5, plot_ci = NULL, ... )## S3 method for class 'pte_qtt' autoplot( object, type = "overall", cband = TRUE, plot_probs = 0.5, plot_ci = NULL, ... )
object |
a |
type |
which aggregation to plot: |
cband |
logical; if |
plot_probs |
numeric vector of quantile levels to show in the dynamic
plot. Defaults to |
plot_ci |
logical or |
... |
unused |
a ggplot object
Event-study plot for a pte_results object. Pre- and
post-treatment periods are distinguished by color.
## S3 method for class 'pte_results' autoplot(object, ...)## S3 method for class 'pte_results' autoplot(object, ...)
object |
a |
... |
unused |
a ggplot object
Function that actually computes panel treatment effects.
The difference relative to compute.pte is that this function
loops over time periods first (instead of groups) and tries to
estimate model for untreated potential outcomes jointly for all groups.
compute.pte(ptep, subset_fun, attgt_fun, ...)compute.pte(ptep, subset_fun, attgt_fun, ...)
ptep |
|
subset_fun |
This is a function that should take in |
attgt_fun |
This is a function that should work in the case where there is a single group and the "right" number of time periods to recover an estimate of the ATT. For example, in the contest of difference in differences, it would need to work for a single group, find the appropriate comparison group (untreated units), find the right time periods (pre- and post-treatment), and then recover an estimate of ATT for that group. It will be called over and over separately by groups and by time periods to compute ATT(g,t)'s. The function needs to work in a very specific way. It should take in the
arguments: If |
... |
extra arguments that can be passed to create the correct subsets
of the data (depending on |
a list containing the following elements:
attgt.list: list of ATT(g,t) estimates
inffunc: influence function matrix
extra_gt_returns: list of extra returns from gt-specific calculationsons
Computes a group-time average treatment effect and influence function using an unconfoundedness-type identification strategy. This estimator is appropriate when parallel trends is implausible but a selection-on-observables assumption holds in levels (rather than differences) — e.g., during the early COVID-19 pandemic.
Originally from Callaway and Li (2021). Moved into ptetools from
the ppe package.
covid_attgt(gt_data, xformla, d_outcome = FALSE, d_covs_formula = ~-1, ...)covid_attgt(gt_data, xformla, d_outcome = FALSE, d_covs_formula = ~-1, ...)
gt_data |
data that is "local" to a particular group-time average
treatment effect, structured as a |
xformla |
one-sided formula for covariates used in the propensity score and outcome regression models |
d_outcome |
logical; if |
d_covs_formula |
one-sided formula for covariates to include as
changes (differences). Default is |
... |
extra arguments; not used |
attgt_if object
Callaway, B. and Li, T. (2021). Policy Evaluation during a Pandemic. https://arxiv.org/abs/2105.06927
A panel dataset containing Covid-19 related data for 46 states. This data comes from Callaway and Li (2021). See the paper for additional descriptions.
covid_datacovid_data
A data frame with 1656 rows and 9 variables:
The cumulative number of cases per million individuals in a particular state by a particular time period.
Time period
The group that a state belongs to. It is based on the time period when they enacted the shelter-in-place order.
State abbreviation
The total Covid-19 number of tests run per million individuals in a particular state by a particular time period.
Numeric state identifier
Census region for particular state
The percentage change in retail and recreational travel from pre-Covid baseline. This is from Google's Mobility report (see paper for details).
The current number of cases per million individuals in a
particular state by a particular time period. This variable is
constructed from positive (see paper for details).
Callaway and Li (2021)
A function to perform sanity checks and possibly adjust a a critical value to form a uniform confidence band
crit_val_checks(crit_val, alp = 0.05)crit_val_checks(crit_val, alp = 0.05)
crit_val |
the critical value |
alp |
the significance level |
a (possibly adjusted) critical value
Takes a data.frame and computes for a particular group g and time period t and computes an estimate of a group time average treatment effect and a corresponding influence function using a difference in differences approach.
The code relies on gt_data having certain variables defined.
In particular, there should be an id column (individual identifier),
D (treated group identifier), period (time period), name
(equal to "pre" for pre-treatment periods and equal to "post" for post
treatment periods), Y (outcome).
In our case, we call two_by_two_subset which sets up the
data to have this format before the call to did_attgt.
did_attgt(gt_data, xformula = ~1, ...)did_attgt(gt_data, xformula = ~1, ...)
gt_data |
data that is "local" to a particular group-time average treatment effect |
xformula |
one-sided formula for covariates used in the propensity score and outcome regression models |
... |
extra function arguments; not used here |
attgt_if
Takes a local repeated cross sections data set and computes an estimate of a group-time average treatment effect and corresponding influence function using a repeated cross sections DID approach.
did_rcs_attgt(gt_data, xformula = ~1, est_method = "dr", ...)did_rcs_attgt(gt_data, xformula = ~1, est_method = "dr", ...)
gt_data |
data that is "local" to a particular group-time average treatment effect |
xformula |
one-sided formula for covariates used in the propensity score and outcome regression models |
est_method |
Which type of estimation method to use. Default is "dr" for doubly robust. The other option is "reg" for regression adjustment. |
... |
extra function arguments; not used here |
attgt_if
Holds results from computing dose-specific treatment effects with a continuous treatment
dose_obj( dose, overall_att = NULL, overall_att_se = NULL, overall_att_inffunc = NULL, overall_acrt = NULL, overall_acrt_se = NULL, overall_acrt_inffunc = NULL, att.d = NULL, att.d_se = NULL, att.d_crit.val = NULL, att.d_inffunc = NULL, acrt.d = NULL, acrt.d_se = NULL, acrt.d_crit.val = NULL, acrt.d_inffunc = NULL, pte_params = NULL )dose_obj( dose, overall_att = NULL, overall_att_se = NULL, overall_att_inffunc = NULL, overall_acrt = NULL, overall_acrt_se = NULL, overall_acrt_inffunc = NULL, att.d = NULL, att.d_se = NULL, att.d_crit.val = NULL, att.d_inffunc = NULL, acrt.d = NULL, acrt.d_se = NULL, acrt.d_crit.val = NULL, acrt.d_inffunc = NULL, pte_params = NULL )
dose |
vector containing the values of the dose used in estimation |
overall_att |
estimate of the overall ATT, the mean of ATT(D) given D > 0 |
overall_att_se |
the standard error of the estimate of overall_att |
overall_att_inffunc |
the influence function for estimating overall_att |
overall_acrt |
estimate of the overall ACRT, the mean of ACRT(D|D) given D > 0 |
overall_acrt_se |
the standard error for the estimate of overall_acrt |
overall_acrt_inffunc |
the influence function for estimating overall_acrt |
att.d |
estimates of ATT(d) for each value of |
att.d_se |
standard error of ATT(d) for each value of |
att.d_crit.val |
critical value to produce pointwise or uniform confidence interval for ATT(d) |
att.d_inffunc |
matrix containing the influence function from estimating ATT(d) |
acrt.d |
estimates of ACRT(d) for each value of |
acrt.d_se |
standard error of ACRT(d) for each value of |
acrt.d_crit.val |
critical value to produce pointwise or uniform confidence interval for ACRT(d) |
acrt.d_inffunc |
matrix containing the influence function from estimating ACRT(d) |
pte_params |
a pte_params object containing other parameters passed to the function |
a dose_obj object
Deprecated. Use autoplot() on the pte_results
object instead.
ggpte(pte_results)ggpte(pte_results)
pte_results |
a |
a ggplot object
Deprecated. Use autoplot() on the dose_obj
instead.
ggpte_cont(dose_obj, type = "att")ggpte_cont(dose_obj, type = "att")
dose_obj |
a |
type |
whether to plot |
a ggplot object
Class that holds causal effect parameter estimates across timing groups and time periods
group_time_att( group, time.period, att, V_analytical, se, crit_val, inf_func, n, W, Wpval, cband, alp, ptep, extra_gt_returns )group_time_att( group, time.period, att, V_analytical, se, crit_val, inf_func, n, W, Wpval, cband, alp, ptep, extra_gt_returns )
group |
numeric vector of groups for ATT(g,t) |
time.period |
numeric vector of time periods for ATT(g,t) |
att |
numeric vector containing the value of ATT(g,t) for corresponding group and time period |
V_analytical |
analytical asymptotic variance matrix for ATT(g,t)'s |
se |
numeric vector of standard errors |
crit_val |
critical value (usually a critical value for conducting uniform inference) |
inf_func |
matrix of influence function |
n |
number of unique individuals |
W |
Wald statistic for ATT(g,t) version of pre-test of parallel trends assumption |
Wpval |
p-value for Wald pre-test of ATT(g,t) version of parallel trends assumption |
cband |
logical indicating whether or not to report a confidence band |
alp |
significance level |
ptep |
|
extra_gt_returns |
list containing extra returns at the group-time level |
object of class group_time_att
Checks and converts data to satisfy criteria to be used in internal
ptetools functions. In particular,
the function takes in a data.frame, checks if it has the right
columns to be used to calculate a group-time average treatment effect,
and sets the class of the data.frame to include gt_data_frame
gt_data_frame(data)gt_data_frame(data)
data |
data that will be checked to see if has right format for computing group-time average treatment effects |
gt_data_frame object
A function that takes an original data set and keeps all
data for all groups that are not-yet-treated by period tp as well
as for group g.
In particular, this keeps more data than functions like two_by_two
subset that use a fixed base period.
A main use case for this function is the interactive fixed effects approach proposed in Callaway and Tsyawo (2023).
keep_all_pretreatment_subset(data, g, tp, ...)keep_all_pretreatment_subset(data, g, tp, ...)
data |
the full dataset |
g |
the current group |
tp |
the current time period |
... |
additional arguments |
list that contains the following elements:
gt_data: a gt_data_frame object that contains the
correct subset of data
n1: the number of observations in this subset
disidx: a vector of the correct ids for this subset
A function that takes an original data set and keeps all pre-treatment data for all groups. For group g, it also includes data for the current period.
Also, note that if tp is still a pre-treatment period for group g,
then periods after tp will also be dropped for group g. This is a
design choice and is useful especially for estimating placebo
group-time average treatment effects in pre-treatment periods.
A main use case for this function is to compute ATT(g,t)'s using a global estimation strategy such as imputation in Gardner (2022).
keep_all_untreated_subset(data, g, tp, ...)keep_all_untreated_subset(data, g, tp, ...)
data |
the full dataset |
g |
the current group |
tp |
the current time period |
... |
extra arguments to get the subset correct |
list that contains the following elements:
gt_data: a gt_data_frame object that contains the
correct subset of data
n1: the number of observations in this subset
disidx: a vector of the correct ids for this subset
Function for using multiplier bootstrap to conduct inference
mboot2(inffunc, biters = 1000, alp = 0.05)mboot2(inffunc, biters = 1000, alp = 0.05)
inffunc |
influence function matrix |
biters |
number of bootstrap iterations; default is 100 |
alp |
significance level; default is 0.05 |
list with the following elements:
boot_se: bootstrap standard errors
crit_val: critical value for uniform confidence bands
A function that returns weights on (g,t)'s to deliver overall (averaged across groups and time periods) treatment effect parameters
overall_weights(attgt, balance_e = NULL, min_e = -Inf, max_e = Inf, ...)overall_weights(attgt, balance_e = NULL, min_e = -Inf, max_e = Inf, ...)
attgt |
A group_time_att object to be aggregated |
balance_e |
Drops groups that do not have at least |
min_e |
The minimum event time computed in the event study results. This is useful when there are a huge number of pre-treatment periods. |
max_e |
The maximum event time computed in the event study results. This is useful when there are a huge number of post-treatment periods. |
... |
extra arguments |
a data.frame containing columns:
group: the group
time.period: the time period
overall_weight: the weight
Computes empirical bootstrap pointwise standard errors
panel_empirical_bootstrap( attgt.list, ptep, setup_pte_fun, subset_fun, attgt_fun, extra_gt_returns, aggregation_fun = NULL, ... )panel_empirical_bootstrap( attgt.list, ptep, setup_pte_fun, subset_fun, attgt_fun, extra_gt_returns, aggregation_fun = NULL, ... )
attgt.list |
list of attgt results from |
ptep |
|
setup_pte_fun |
This is a function that should take in This function provides also provides a good place for error handling related to the types of data that can be handled. The |
subset_fun |
This is a function that should take in |
attgt_fun |
This is a function that should work in the case where there is a single group and the "right" number of time periods to recover an estimate of the ATT. For example, in the contest of difference in differences, it would need to work for a single group, find the appropriate comparison group (untreated units), find the right time periods (pre- and post-treatment), and then recover an estimate of ATT for that group. It will be called over and over separately by groups and by time periods to compute ATT(g,t)'s. The function needs to work in a very specific way. It should take in the
arguments: If |
extra_gt_returns |
A place to return anything extra from particular group-time average treatment effect calculations. For DID, this might be something like propensity score estimates, regressions of untreated potential outcomes on covariates. For ife, this could be something like the first step regression 2sls estimates. This argument is also potentially useful for debugging. |
aggregation_fun |
An optional function for aggregating group-time
treatment effects. When |
... |
extra arguments that can be passed to create the correct subsets
of the data (depending on |
pte_emp_boot object
Convenience wrapper around autoplot.dose_obj.
## S3 method for class 'dose_obj' plot(x, ...)## S3 method for class 'dose_obj' plot(x, ...)
x |
a |
... |
passed to |
invisibly returns the ggplot object
Convenience wrapper around autoplot.pte_emp_boot.
## S3 method for class 'pte_emp_boot' plot(x, ...)## S3 method for class 'pte_emp_boot' plot(x, ...)
x |
a |
... |
passed to |
invisibly returns the ggplot object
Convenience wrapper around autoplot.pte_qtt.
## S3 method for class 'pte_qtt' plot(x, type = "overall", cband = TRUE, plot_probs = 0.5, plot_ci = NULL, ...)## S3 method for class 'pte_qtt' plot(x, type = "overall", cband = TRUE, plot_probs = 0.5, plot_ci = NULL, ...)
x |
a |
type |
which aggregation to plot. See |
cband |
logical; if |
plot_probs |
numeric vector of quantile levels to show. See |
plot_ci |
logical or |
... |
passed to |
invisibly returns the ggplot object
Convenience wrapper around autoplot.pte_results.
## S3 method for class 'pte_results' plot(x, ...)## S3 method for class 'pte_results' plot(x, ...)
x |
a |
... |
passed to |
invisibly returns the ggplot object
Process ATT(g,t) results when influence function is available
process_att_gt(att_gt_results, ptep)process_att_gt(att_gt_results, ptep)
att_gt_results |
ATT(g,t)'s |
ptep |
|
group_time_att object
After computing results for each group and time period,
process_dose_gt combines/averages them into overall effects and/or
dose specific effects. This is generic code that can be used
from different ways of estimating causal effects across different
timing groups and periods in a previous step.
process_dose_gt(gt_results, ptep, ...)process_dose_gt(gt_results, ptep, ...)
gt_results |
list of group-time specific results |
ptep |
|
... |
extra arguments |
a dose_obj object
Tools for estimating treatment effects with panel data.
Main function for computing panel treatment effects
pte( yname, gname, tname, idname = NULL, data, setup_pte_fun, subset_fun, attgt_fun, aggregation_fun = NULL, panel = TRUE, cband = TRUE, alp = 0.05, boot_type = "multiplier", weightsname = NULL, gt_type = "att", ret_quantile = NULL, global_fun = FALSE, time_period_fun = FALSE, group_fun = FALSE, process_dtt_gt_fun = process_dtt_gt, process_dose_gt_fun = process_dose_gt, probs = NULL, biters = 100, cl = 1, call = NULL, ... )pte( yname, gname, tname, idname = NULL, data, setup_pte_fun, subset_fun, attgt_fun, aggregation_fun = NULL, panel = TRUE, cband = TRUE, alp = 0.05, boot_type = "multiplier", weightsname = NULL, gt_type = "att", ret_quantile = NULL, global_fun = FALSE, time_period_fun = FALSE, group_fun = FALSE, process_dtt_gt_fun = process_dtt_gt, process_dose_gt_fun = process_dose_gt, probs = NULL, biters = 100, cl = 1, call = NULL, ... )
yname |
Name of outcome in |
gname |
Name of group in |
tname |
Name of time period in |
idname |
Name of id in |
data |
balanced panel or repeated cross sections data |
setup_pte_fun |
This is a function that should take in This function provides also provides a good place for error handling related to the types of data that can be handled. The |
subset_fun |
This is a function that should take in |
attgt_fun |
This is a function that should work in the case where there is a single group and the "right" number of time periods to recover an estimate of the ATT. For example, in the contest of difference in differences, it would need to work for a single group, find the appropriate comparison group (untreated units), find the right time periods (pre- and post-treatment), and then recover an estimate of ATT for that group. It will be called over and over separately by groups and by time periods to compute ATT(g,t)'s. The function needs to work in a very specific way. It should take in the
arguments: If |
aggregation_fun |
An optional function for aggregating group-time
treatment effects in the empirical bootstrap path. When |
panel |
Whether the data are panel data. The default is TRUE. Set to FALSE for repeated cross sections. |
cband |
whether or not to report a uniform (instead of pointwise) confidence band (default is TRUE) |
alp |
significance level; default is 0.05 |
boot_type |
should be one of "multiplier" (the default) or "empirical".
The multiplier bootstrap is generally much faster, but |
weightsname |
The name of the column that contains sampling weights. The default is NULL, in which case no sampling weights are used. |
gt_type |
which type of group-time effects are computed.
The default is "att". Different estimation strategies can implement
their own choices for |
ret_quantile |
For functions that compute quantile treatment effects,
this is a specific quantile at which to report results, e.g.,
|
global_fun |
Logical indicating whether or not untreated potential outcomes can be estimated in one shot, i.e., for all groups and time periods. Main use case would be for one-shot imputation estimators. Not supported yet. |
time_period_fun |
Logical indicating whether or not untreated potential outcomes can be estimated for all groups in the same time period. Not supported yet. |
group_fun |
Logical indicating whether or not untreated potential outcomes can be estimated for all time periods for a single group. Not supported yet. These functions aim at reducing or eliminating running the same code multiple times. |
process_dtt_gt_fun |
An optional function to customize results when
the gt-specific function returns the distribution of treated and untreated
potential outcomes. The default is |
process_dose_gt_fun |
An optional function to customize results when the gt-specific
function returns treatment effects that depend on dose (i.e., amount of the
treatment). The default is |
probs |
For |
biters |
number of bootstrap iterations; default is 100 |
cl |
number of clusters to be used when bootstrapping; default is 1 |
call |
keeps track of through the |
... |
extra arguments that can be passed to create the correct subsets
of the data (depending on |
pte_results object
Maintainer: Brantly Callaway [email protected]
Authors:
Brantly Callaway [email protected]
Useful links:
Report bugs at https://github.com/bcallaway11/ptetools/issues
# example using minimum wage data # and difference-in-differences identification strategy library(did) data(mpdta) did_res <- pte( yname = "lemp", gname = "first.treat", tname = "year", idname = "countyreal", data = mpdta, setup_pte_fun = setup_pte, subset_fun = two_by_two_subset, attgt_fun = did_attgt, xformla = ~lpop ) summary(did_res) ggplot2::autoplot(did_res)# example using minimum wage data # and difference-in-differences identification strategy library(did) data(mpdta) did_res <- pte( yname = "lemp", gname = "first.treat", tname = "year", idname = "countyreal", data = mpdta, setup_pte_fun = setup_pte, subset_fun = two_by_two_subset, attgt_fun = did_attgt, xformla = ~lpop ) summary(did_res) ggplot2::autoplot(did_res)
This is a slight edit of the aggte function from the did package.
Currently, it only provides aggregations for "overall" treatment effects
and event studies. It also will provide the weights directly which is
currently used for constructing aggregations based on distributions.
The other difference is that, pte_aggte provides inference results
where the only randomness is coming from the outcomes (not from the group
assignment nor from the covariates).
pte_aggte( attgt, type = "overall", balance_e = NULL, min_e = -Inf, max_e = Inf, ... )pte_aggte( attgt, type = "overall", balance_e = NULL, min_e = -Inf, max_e = Inf, ... )
attgt |
A group_time_att object to be aggregated |
type |
The type of aggregation to be done. Default is "overall". |
balance_e |
Drops groups that do not have at least |
min_e |
The minimum event time computed in the event study results. This is useful when there are a huge number of pre-treatment periods. |
max_e |
The maximum event time computed in the event study results. This is useful when there are a huge number of post-treatment periods. |
... |
extra arguments |
an aggte_obj
pte_attgt takes a "local" data.frame and computes
an estimate of a group time average treatment effect
and a corresponding influence function. This function generalizes
a number of existing methods and underlies the pte_default function.
The code relies on gt_data having certain variables defined.
In particular, there should be an id column (individual identifier),
G (group identifier), period (time period), name
(equal to "pre" for pre-treatment periods and equal to "post" for post
treatment periods), Y (outcome).
In our case, we call two_by_two_subset which sets up the
data to have this format before the call to pte_attgt
pte_attgt( gt_data, xformula, d_outcome = FALSE, d_covs_formula = ~-1, lagged_outcome_cov = FALSE, est_method = "dr", ... )pte_attgt( gt_data, xformula, d_outcome = FALSE, d_covs_formula = ~-1, lagged_outcome_cov = FALSE, est_method = "dr", ... )
gt_data |
data that is "local" to a particular group-time average treatment effect |
xformula |
one-sided formula for covariates used in the propensity score and outcome regression models |
d_outcome |
Whether or not to take the first difference of the outcome. The default is FALSE. To use difference-in-differences, set this to be TRUE. |
d_covs_formula |
A formula for time varying covariates to enter the first estimation step models. The default is not to include any, and, hence, to only include pre-treatment covariates. |
lagged_outcome_cov |
Whether to include the lagged outcome as a covariate. Default is FALSE. |
est_method |
Which type of estimation method to use. Default is "dr" for doubly robust. The other option is "reg" for regression adjustment. |
... |
extra function arguments; not used here |
attgt_if
This is a generic/example wrapper for a call to the pte function.
This function provides access to difference-in-differences and unconfoundedness based identification/estimation strategies given (i) panel data and (ii) staggered treatment adoption
pte_default( yname, gname, tname, idname = NULL, data, panel = TRUE, xformula = ~1, d_outcome = FALSE, d_covs_formula = ~-1, lagged_outcome_cov = FALSE, est_method = "dr", anticipation = 0, base_period = "varying", control_group = "notyettreated", weightsname = NULL, cband = TRUE, alp = 0.05, boot_type = "multiplier", biters = 100, cl = 1, ... )pte_default( yname, gname, tname, idname = NULL, data, panel = TRUE, xformula = ~1, d_outcome = FALSE, d_covs_formula = ~-1, lagged_outcome_cov = FALSE, est_method = "dr", anticipation = 0, base_period = "varying", control_group = "notyettreated", weightsname = NULL, cband = TRUE, alp = 0.05, boot_type = "multiplier", biters = 100, cl = 1, ... )
yname |
Name of outcome in |
gname |
Name of group in |
tname |
Name of time period in |
idname |
Name of id in |
data |
balanced panel or repeated cross sections data |
panel |
Whether the data are panel data. The default is TRUE. Set to FALSE for repeated cross sections. |
xformula |
one-sided formula for covariates used in the propensity score and outcome regression models |
d_outcome |
Whether or not to take the first difference of the outcome. The default is FALSE. To use difference-in-differences, set this to be TRUE. |
d_covs_formula |
A formula for time varying covariates to enter the first estimation step models. The default is not to include any, and, hence, to only include pre-treatment covariates. |
lagged_outcome_cov |
Whether to include the lagged outcome as a covariate. Default is FALSE. |
est_method |
Which type of estimation method to use. Default is "dr" for doubly robust. The other option is "reg" for regression adjustment. |
anticipation |
how many periods before the treatment actually takes place that it can have an effect on outcomes |
base_period |
The type of base period to use. This only affects the numeric value of results in pre-treatment periods. Results in post-treatment periods are not affected by this choice. The default is "varying", where the base period will "back up" to the immediately preceding period in pre-treatment periods. The other option is "universal" where the base period is fixed in pre-treatment periods to be the period right before the treatment starts. "Universal" is commonly used in difference-in-differences applications, but can be unnatural for other identification strategies. |
control_group |
Which group is used as the comparison group. The default choice is "notyettreated", but different estimation strategies can implement their own choices for the control group |
weightsname |
The name of the column that contains sampling weights. The default is NULL, in which case no sampling weights are used. |
cband |
whether or not to report a uniform (instead of pointwise) confidence band (default is TRUE) |
alp |
significance level; default is 0.05 |
boot_type |
should be one of "multiplier" (the default) or "empirical".
The multiplier bootstrap is generally much faster, but |
biters |
number of bootstrap iterations; default is 100 |
cl |
number of clusters to be used when bootstrapping; default is 1 |
... |
additional arguments passed to |
pte_results object
# example using minimum wage data # and a lagged outcome unconfoundedness strategy library(did) data(mpdta) lou_res <- pte_default( yname = "lemp", gname = "first.treat", tname = "year", idname = "countyreal", data = mpdta, xformula = ~lpop, d_outcome = FALSE, d_covs_formula = ~lpop, lagged_outcome_cov = TRUE ) summary(lou_res) ggplot2::autoplot(lou_res)# example using minimum wage data # and a lagged outcome unconfoundedness strategy library(did) data(mpdta) lou_res <- pte_default( yname = "lemp", gname = "first.treat", tname = "year", idname = "countyreal", data = mpdta, xformula = ~lpop, d_outcome = FALSE, d_covs_formula = ~lpop, lagged_outcome_cov = TRUE ) summary(lou_res) ggplot2::autoplot(lou_res)
Class for holding results with a continuous treatment
pte_dose_results(att_gt, dose, att_d = NULL, acrt_d = NULL, ptep)pte_dose_results(att_gt, dose, att_d = NULL, acrt_d = NULL, ptep)
att_gt |
attgt results |
dose |
vector of doses |
att_d |
ATT(d) for each value of |
acrt_d |
ACRT(d) for each value of |
ptep |
a |
a pte_dose_results object
Class for holding ptetools empirical bootstrap results
pte_emp_boot( attgt_results, overall_results, group_results, dyn_results, overall_weights = NULL, dyn_weights = NULL, group_weights = NULL, extra_gt_returns = NULL, ptep = NULL )pte_emp_boot( attgt_results, overall_results, group_results, dyn_results, overall_weights = NULL, dyn_weights = NULL, group_weights = NULL, extra_gt_returns = NULL, ptep = NULL )
attgt_results |
|
overall_results |
|
group_results |
|
dyn_results |
|
overall_weights |
vector containing weights on underlying ATT(g,t) for overall treatment effect parameter |
dyn_weights |
list containing weights on underlying ATT(g,t)
for each value of |
group_weights |
list containing weights on underlying ATT(g,t) corresponding to deliver averaged group-specific treatment effects |
extra_gt_returns |
A place to return anything extra from particular group-time average treatment effect calculations. For DID, this might be something like propensity score estimates, regressions of untreated potential outcomes on covariates. For ife, this could be something like the first step regression 2sls estimates. This argument is also potentially useful for debugging. |
ptep |
|
a pte_emp_boot object
Class that contains pte parameters
pte_params( yname, gname, tname, idname = NULL, data, panel = TRUE, glist, tlist, cband, alp, boot_type, anticipation = NULL, base_period = NULL, weightsname = NULL, control_group = "notyettreated", gt_type = "att", ret_quantile = 0.5, probs = NULL, global_fun = FALSE, time_period_fun = FALSE, group_fun = FALSE, biters, cl, call = NULL )pte_params( yname, gname, tname, idname = NULL, data, panel = TRUE, glist, tlist, cband, alp, boot_type, anticipation = NULL, base_period = NULL, weightsname = NULL, control_group = "notyettreated", gt_type = "att", ret_quantile = 0.5, probs = NULL, global_fun = FALSE, time_period_fun = FALSE, group_fun = FALSE, biters, cl, call = NULL )
yname |
Name of outcome in |
gname |
Name of group in |
tname |
Name of time period in |
idname |
Name of id in |
data |
balanced panel or repeated cross sections data |
panel |
Whether the data are panel data. The default is TRUE. Set to FALSE for repeated cross sections. |
glist |
list of groups to create group-time average treatment effects for |
tlist |
list of time periods to create group-time average treatment effects for |
cband |
whether or not to report a uniform (instead of pointwise) confidence band (default is TRUE) |
alp |
significance level; default is 0.05 |
boot_type |
which type of bootstrap to use |
anticipation |
how many periods before the treatment actually takes place that it can have an effect on outcomes |
base_period |
The type of base period to use. This only affects the numeric value of results in pre-treatment periods. Results in post-treatment periods are not affected by this choice. The default is "varying", where the base period will "back up" to the immediately preceding period in pre-treatment periods. The other option is "universal" where the base period is fixed in pre-treatment periods to be the period right before the treatment starts. "Universal" is commonly used in difference-in-differences applications, but can be unnatural for other identification strategies. |
weightsname |
The name of the column that contains sampling weights. The default is NULL, in which case no sampling weights are used. |
control_group |
Which group is used as the comparison group. The default choice is "notyettreated", but different estimation strategies can implement their own choices for the control group |
gt_type |
which type of group-time effects are computed.
The default is "att". Different estimation strategies can implement
their own choices for |
ret_quantile |
For functions that compute quantile treatment effects,
this is a specific quantile at which to report results, e.g.,
|
probs |
For |
global_fun |
Logical indicating whether or not untreated potential outcomes can be estimated in one shot, i.e., for all groups and time periods. Main use case would be for one-shot imputation estimators. Not supported yet. |
time_period_fun |
Logical indicating whether or not untreated potential outcomes can be estimated for all groups in the same time period. Not supported yet. |
group_fun |
Logical indicating whether or not untreated potential outcomes can be estimated for all time periods for a single group. Not supported yet. These functions aim at reducing or eliminating running the same code multiple times. |
biters |
number of bootstrap iterations; default is 100 |
cl |
number of clusters to be used when bootstrapping; default is 1 |
call |
keeps track of through the |
pte_params object
Holds the full quantile treatment effect (QTT) curve at the
overall, group-specific, and dynamic (event-study) aggregation levels.
Each aggregation contains estimates at all quantile levels in probs
together with bootstrap standard errors and pointwise confidence intervals.
pte_qtt(overall, dynamic, group, F0_overall = NULL, F1_overall = NULL, ptep)pte_qtt(overall, dynamic, group, F0_overall = NULL, F1_overall = NULL, ptep)
overall |
data.frame with columns |
dynamic |
data.frame with columns |
group |
data.frame with columns |
F0_overall |
mixture CDF of untreated potential outcomes |
F1_overall |
mixture CDF of treated potential outcomes |
ptep |
|
a pte_qtt object
Class for holding overall results with a staggered treatment, including an overall ATT and an event study
pte_results(att_gt, overall_att, event_study, ptep)pte_results(att_gt, overall_att, event_study, ptep)
att_gt |
attgt results |
overall_att |
overall_att results |
event_study |
event_study results |
ptep |
|
a pte_results object
Aggregate group-time distribution of the treatment effect into overall, group, and dynamic effects.
qott_pte_aggregations(attgt.list, ptep, extra_gt_returns)qott_pte_aggregations(attgt.list, ptep, extra_gt_returns)
attgt.list |
list of attgt results from |
ptep |
|
extra_gt_returns |
A place to return anything extra from particular group-time average treatment effect calculations. For DID, this might be something like propensity score estimates, regressions of untreated potential outcomes on covariates. For ife, this could be something like the first step regression 2sls estimates. This argument is also potentially useful for debugging. |
pte_emp_boot object
Runs the empirical bootstrap for the full QTT curve case
(gt_type = "qtt"). Called automatically by
panel_empirical_bootstrap when gt_type == "qtt".
qtt_empirical_bootstrap( attgt.list, ptep, setup_pte_fun, subset_fun, attgt_fun, extra_gt_returns, aggregation_fun = NULL, ... )qtt_empirical_bootstrap( attgt.list, ptep, setup_pte_fun, subset_fun, attgt_fun, extra_gt_returns, aggregation_fun = NULL, ... )
attgt.list |
list of attgt results from |
ptep |
|
setup_pte_fun |
This is a function that should take in This function provides also provides a good place for error handling related to the types of data that can be handled. The |
subset_fun |
This is a function that should take in |
attgt_fun |
This is a function that should work in the case where there is a single group and the "right" number of time periods to recover an estimate of the ATT. For example, in the contest of difference in differences, it would need to work for a single group, find the appropriate comparison group (untreated units), find the right time periods (pre- and post-treatment), and then recover an estimate of ATT for that group. It will be called over and over separately by groups and by time periods to compute ATT(g,t)'s. The function needs to work in a very specific way. It should take in the
arguments: If |
extra_gt_returns |
A place to return anything extra from particular group-time average treatment effect calculations. For DID, this might be something like propensity score estimates, regressions of untreated potential outcomes on covariates. For ife, this could be something like the first step regression 2sls estimates. This argument is also potentially useful for debugging. |
aggregation_fun |
An optional function for aggregating group-time
treatment effects. When |
... |
extra arguments that can be passed to create the correct subsets
of the data (depending on |
pte_qtt object
Aggregate group-time F0/F1 distributions into QTT curves at
the overall, group, and dynamic level. CDFs are mixed first using
BMisc::combine_ecdfs and then inverted at all quantile levels in
probs, avoiding the bias from averaging scalar QTTs.
qtt_pte_aggregations(attgt.list, ptep, extra_gt_returns)qtt_pte_aggregations(attgt.list, ptep, extra_gt_returns)
attgt.list |
list of attgt results from |
ptep |
|
extra_gt_returns |
A place to return anything extra from particular group-time average treatment effect calculations. For DID, this might be something like propensity score estimates, regressions of untreated potential outcomes on covariates. For ife, this could be something like the first step regression 2sls estimates. This argument is also potentially useful for debugging. |
named list with elements overall_results,
dyn_results, group_results, F0_overall,
F1_overall
This is a function for how to setup
the data to be used in the ptetools package.
The setup_pte function builds on setup_pte_basic and
attempts to provide a general purpose function (with error handling)
to arrange the data in a way that can be processed by subset_fun
and attgt_fun in the next steps.
setup_pte( yname, gname, tname, idname = NULL, data, panel = TRUE, required_pre_periods = 1, anticipation = 0, base_period = "varying", cband = TRUE, alp = 0.05, boot_type = "multiplier", weightsname = NULL, gt_type = "att", ret_quantile = 0.5, probs = NULL, biters = 100, cl = 1, call = NULL, ... )setup_pte( yname, gname, tname, idname = NULL, data, panel = TRUE, required_pre_periods = 1, anticipation = 0, base_period = "varying", cband = TRUE, alp = 0.05, boot_type = "multiplier", weightsname = NULL, gt_type = "att", ret_quantile = 0.5, probs = NULL, biters = 100, cl = 1, call = NULL, ... )
yname |
Name of outcome in |
gname |
Name of group in |
tname |
Name of time period in |
idname |
Name of id in |
data |
balanced panel or repeated cross sections data |
panel |
Whether the data are panel data. The default is TRUE. Set to FALSE for repeated cross sections. |
required_pre_periods |
The number of required pre-treatment periods to implement the estimation strategy. Default is 1. |
anticipation |
how many periods before the treatment actually takes place that it can have an effect on outcomes |
base_period |
The type of base period to use. This only affects the numeric value of results in pre-treatment periods. Results in post-treatment periods are not affected by this choice. The default is "varying", where the base period will "back up" to the immediately preceding period in pre-treatment periods. The other option is "universal" where the base period is fixed in pre-treatment periods to be the period right before the treatment starts. "Universal" is commonly used in difference-in-differences applications, but can be unnatural for other identification strategies. |
cband |
whether or not to report a uniform (instead of pointwise) confidence band (default is TRUE) |
alp |
significance level; default is 0.05 |
boot_type |
which type of bootstrap to use |
weightsname |
The name of the column that contains sampling weights. The default is NULL, in which case no sampling weights are used. |
gt_type |
which type of group-time effects are computed.
The default is "att". Different estimation strategies can implement
their own choices for |
ret_quantile |
For functions that compute quantile treatment effects,
this is a specific quantile at which to report results, e.g.,
|
probs |
For |
biters |
number of bootstrap iterations; default is 100 |
cl |
number of clusters to be used when bootstrapping; default is 1 |
call |
keeps track of through the |
... |
additional arguments |
pte_params object
This is a lightweight (example) function for how to setup
the data to be used in the ptetools package.
setup_pte_basic takes in information about the structure of data
and returns a pte_params object. The key piece of information
that is computed by this function is the list of groups and list of
time periods where ATT(g,t) should be computed. In particular, this function
omits the never-treated group but includes all other groups and drops the first
time period. This setup is basically geared towards the 2x2 case —
i.e., where ATT could be identified with two periods, a treated and
untreated group, and the first period being pre-treatment for both groups.
This is the relevant case for DID, but is also relevant for other cases as well.
However, for example, if more pre-treatment periods were needed, then this
function should be replaced by something else.
For code that is written with the idea of being easy-to-use by other researchers, this is a good place to do some error handling / checking that the data is in the correct format, etc.
setup_pte_basic( yname, gname, tname, idname = NULL, data, panel = TRUE, cband = TRUE, alp = 0.05, boot_type = "multiplier", gt_type = "att", ret_quantile = 0.5, probs = NULL, biters = 100, cl = 1, call = NULL, ... )setup_pte_basic( yname, gname, tname, idname = NULL, data, panel = TRUE, cband = TRUE, alp = 0.05, boot_type = "multiplier", gt_type = "att", ret_quantile = 0.5, probs = NULL, biters = 100, cl = 1, call = NULL, ... )
yname |
Name of outcome in |
gname |
Name of group in |
tname |
Name of time period in |
idname |
Name of id in |
data |
balanced panel or repeated cross sections data |
panel |
Whether the data are panel data. The default is TRUE. Set to FALSE for repeated cross sections. |
cband |
whether or not to report a uniform (instead of pointwise) confidence band (default is TRUE) |
alp |
significance level; default is 0.05 |
boot_type |
which type of bootstrap to use |
gt_type |
which type of group-time effects are computed.
The default is "att". Different estimation strategies can implement
their own choices for |
ret_quantile |
For functions that compute quantile treatment effects,
this is a specific quantile at which to report results, e.g.,
|
probs |
For |
biters |
number of bootstrap iterations; default is 100 |
cl |
number of clusters to be used when bootstrapping; default is 1 |
call |
keeps track of through the |
... |
additional arguments |
pte_params object
A function for computing a 2x2 subset for repeated cross
sections data. This is analogous to two_by_two_subset, but indexes
observations by rows rather than by panel ids.
two_by_two_rcs_subset( data, g, tp, control_group = "notyettreated", anticipation = 0, base_period = "varying", ... )two_by_two_rcs_subset( data, g, tp, control_group = "notyettreated", anticipation = 0, base_period = "varying", ... )
data |
the full dataset |
g |
the current group |
tp |
the current time period |
control_group |
whether to use "notyettreated" (default) or "nevertreated" |
anticipation |
the number of periods of anticipation (i.e., number of periods before the treatment happens where the treatment can "already" affect the outcome) |
base_period |
The type of base period to use. This only affects the numeric value of results in pre-treatment periods. Results in post-treatment periods are not affected by this choice. The default is "varying", where the base period will "back up" to the immediately preceding period in pre-treatment periods. The other option is "universal" where the base period is fixed in pre-treatment periods to be the period right before the treatment starts. "Universal" is commonly used in difference-in-differences applications, but can be unnatural for other identification strategies. |
... |
extra arguments to get the subset correct |
list that contains the following elements:
gt_data: a gt_data_frame object that contains the
correct subset of data
n1: the number of observations in this subset
disidx: a vector of the correct rows for this subset
A function for computing a 2x2 subset of original data. This is the subset with post treatment periods separately for the treated group and comparison group and pre-treatment periods in the period immediately before the treated group became treated.
two_by_two_subset( data, g, tp, control_group = "notyettreated", anticipation = 0, base_period = "varying", ... )two_by_two_subset( data, g, tp, control_group = "notyettreated", anticipation = 0, base_period = "varying", ... )
data |
the full dataset |
g |
the current group |
tp |
the current time period |
control_group |
whether to use "notyettreated" (default) or "nevertreated" |
anticipation |
the number of periods of anticipation (i.e., number of periods before the treatment happens where the treatment can "already" affect the outcome) |
base_period |
The type of base period to use. This only affects the numeric value of results in pre-treatment periods. Results in post-treatment periods are not affected by this choice. The default is "varying", where the base period will "back up" to the immediately preceding period in pre-treatment periods. The other option is "universal" where the base period is fixed in pre-treatment periods to be the period right before the treatment starts. "Universal" is commonly used in difference-in-differences applications, but can be unnatural for other identification strategies. |
... |
extra arguments to get the subset correct |
list that contains the following elements:
gt_data: a gt_data_frame object that contains the
correct subset of data
n1: the number of observations in this subset
disidx: a vector of the correct ids for this subset