---
title: "Staggered Treatment Adoption with the qte Package"
author: "Brantly Callaway"
date: today
format: html
bibliography: refs.bib
knitr:
  opts_chunk:
    collapse: true
    comment: "#>"
    fig.width: 6
    fig.height: 4
vignette: >
  %\VignetteIndexEntry{Staggered Treatment Adoption with the qte Package}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
---

## Introduction

In many empirical settings, units adopt treatment at different points in time
— a structure called *staggered treatment adoption*. For example, states may
adopt a policy in different years, or firms may receive an intervention at
different times during a study period.

All estimators in `qte` support staggered adoption via the same interface:
specify the outcome (`yname`), the first-treatment-period cohort variable
(`gname`, with 0 for never-treated units), calendar time (`tname`), and a unit
identifier (`idname`). The package follows the approach of
@callaway-santanna-2021, computing group-time effects $QTT(g, t)$ for every
cohort $g$ and period $t$, then aggregating these into overall, group-specific,
and event-study (dynamic) summaries.

The primary output of interest is the **QTT curve** — the full distribution of
treatment effects across quantile levels $\tau$. Each estimator can also be run
with `gt_type = "att"` to recover the scalar ATT.

This vignette demonstrates the full applied workflow using the `mpdta` dataset.
For conceptual background on the identification assumptions behind each
estimator, see `vignette("panel-estimators")`.

```{r}
#| label: setup
#| message: false
library(qte)
library(ggplot2)
set.seed(42)
data(mpdta, package = "did")
```

`mpdta` is a balanced panel of 500 US counties observed annually from 2003 to
2007. The outcome is `lemp` (log county-level employment). Counties are grouped
by `first.treat`, the year they first adopted a minimum wage increase (2004,
2006, or 2007); counties with `first.treat == 0` never adopt.

```{r}
#| label: data-summary
table(mpdta$first.treat)
```

---

## QTT curve estimation

We begin with `cic()` using `gt_type = "qtt"` to estimate the full QTT curve.
CDFs are mixed across group-time cells first (using overall-ATT weights), then
inverted at the requested `probs` grid — this avoids the bias that would arise
from averaging scalar quantiles across cells.

```{r}
#| label: qtt-estimate
res_qtt <- cic(
  yname   = "lemp",
  gname   = "first.treat",
  tname   = "year",
  idname  = "countyreal",
  data    = mpdta,
  biters  = 200,
  gt_type = "qtt",
  probs   = seq(0.1, 0.9, by = 0.1)
)
```

### Overall QTT curve

`summary()` prints the overall QTT curve and confidence bands.

```{r}
#| label: qtt-summary
summary(res_qtt)
```

`autoplot()` plots the overall curve with uniform confidence bands:

```{r}
#| label: qtt-plot
#| fig-alt: "Overall QTT curve from cic()"
autoplot(res_qtt)
```

### Dynamic QTT — event-study by quantile

`autoplot()` with `type = "dynamic"` plots the QTT at selected quantile(s)
across event times. The `plot_probs` argument selects which quantiles to show.

```{r}
#| label: qtt-dynamic-single
#| fig-alt: "Event-study plot for the median QTT"
autoplot(res_qtt, type = "dynamic", plot_probs = 0.5)
```

Multiple quantiles can be overlaid by passing a vector to `plot_probs` (values
must be in the estimated `probs` grid):

```{r}
#| label: qtt-dynamic-multi
#| fig-alt: "Event-study plot for multiple quantiles of the QTT"
autoplot(res_qtt, type = "dynamic", plot_probs = c(0.1, 0.5, 0.9))
```

Pre-treatment estimates near zero support the identifying assumption. The
post-treatment pattern shows the estimated QTT in each period after adoption.

---

## ATT and event-study summary

Setting `gt_type = "att"` recovers the scalar ATT at each group-time cell,
which aggregates into an overall ATT and an event-study (dynamic) table.
Event times $e < 0$ are pre-treatment periods; $e \geq 0$ are post-treatment.

```{r}
#| label: att-estimate
res_att <- cic(
  yname   = "lemp",
  gname   = "first.treat",
  tname   = "year",
  idname  = "countyreal",
  data    = mpdta,
  biters  = 50,
  gt_type = "att"
)
summary(res_att)
```

```{r}
#| label: att-event-study
#| fig-alt: "Event-study plot of ATT estimates by event time"
autoplot(res_att, type = "dynamic")
```

If the QTT curve from `res_qtt` is roughly flat, treatment shifts the
distribution uniformly and the ATT captures the full story. Deviations across
quantiles indicate distributional heterogeneity that the ATT misses.

---

## Comparing estimators

All six estimators share the same interface. Here we run each with
`gt_type = "qtt"` and `probs = 0.5` to compare median QTT estimates. The
estimators are listed in the same order as in `vignette("panel-estimators")`.

```{r}
#| label: compare-estimators

res_cic <- cic(
  yname = "lemp", gname = "first.treat", tname = "year",
  idname = "countyreal", data = mpdta,
  gt_type = "qtt", probs = 0.5, biters = 50
)

res_qdid <- qdid(
  yname = "lemp", gname = "first.treat", tname = "year",
  idname = "countyreal", data = mpdta,
  gt_type = "qtt", probs = 0.5, biters = 50
)

res_pqtt <- panel_qtt(
  yname = "lemp", gname = "first.treat", tname = "year",
  idname = "countyreal", data = mpdta,
  gt_type = "qtt", probs = 0.5, biters = 50
)

res_ddid <- ddid(
  yname = "lemp", gname = "first.treat", tname = "year",
  idname = "countyreal", data = mpdta,
  gt_type = "qtt", probs = 0.5, biters = 50
)

res_mdid <- mdid(
  yname = "lemp", gname = "first.treat", tname = "year",
  idname = "countyreal", data = mpdta,
  gt_type = "qtt", probs = 0.5, biters = 50
)

res_lou <- lou_qtt(
  yname = "lemp", gname = "first.treat", tname = "year",
  idname = "countyreal", data = mpdta,
  gt_type = "qtt", probs = 0.5, biters = 50
)
```

```{r}
#| label: compare-table
median_qtt <- function(res, label) {
  row <- res$overall[res$overall$probs == 0.5, ]
  data.frame(estimator = label, qtt = round(row$qtt, 4), se = round(row$se, 4))
}

do.call(rbind, list(
  median_qtt(res_cic,  "cic()"),
  median_qtt(res_qdid, "qdid()"),
  median_qtt(res_pqtt, "panel_qtt()"),
  median_qtt(res_ddid, "ddid()"),
  median_qtt(res_mdid, "mdid()"),
  median_qtt(res_lou,  "lou_qtt()")
))
```

Estimates will generally differ because each estimator relies on a different
identifying assumption. Similarity across methods strengthens confidence in
the findings; divergence warrants investigation of which assumption is more
plausible for the application.

---

## Repeated cross sections

If the data are repeated cross sections rather than a panel, set `panel = FALSE`
and omit `idname`. `cic()`, `qdid()`, and `mdid()` support repeated cross
sections; `ddid()`, `panel_qtt()`, and `lou_qtt()` require panel data.

```{r}
#| label: rcs-example
#| eval: false
res_rcs <- cic(
  yname  = "lemp",
  gname  = "first.treat",
  tname  = "year",
  data   = mpdta,   # no idname
  panel  = FALSE,
  biters = 50
)
```

---

## Inference notes

Standard errors are computed via the empirical bootstrap, resampling units
(panel) or observations (repeated cross sections) with replacement. The
`biters` argument controls the number of iterations (default 100). Parallel
computation is available via the `cl` argument (`cl = 4` uses 4 cores).

The `cband = TRUE` default in `summary()` and `autoplot()` produces uniform
confidence bands (simultaneous coverage over all quantiles or event times).
Set `cband = FALSE` for pointwise intervals.

## References