Package 'BMisc'

Title: Miscellaneous Functions for Panel Data, Quantiles, and Printing Results
Description: These are miscellaneous functions for working with panel data, quantiles, and printing results. For panel data, the package includes functions for making a panel data balanced (that is, dropping missing individuals that have missing observations in any time period), converting id numbers to row numbers, and to treat repeated cross sections as panel data under the assumption of rank invariance. For quantiles, there are functions to make distribution functions from a set of data points (this is particularly useful when a distribution function is created in several steps), to combine distribution functions based on some external weights, and to invert distribution functions. Finally, there are several other miscellaneous functions for obtaining weighted means, weighted distribution functions, and weighted quantiles; to generate summary statistics and their differences for two groups; and to add or drop covariates from formulas.
Authors: Brantly Callaway [aut, cre]
Maintainer: Brantly Callaway <[email protected]>
License: GPL-2
Version: 1.4.7
Built: 2024-11-11 06:31:43 UTC
Source: https://github.com/bcallaway11/bmisc

Help Index


Add a Covariate to a Formula

Description

addCovFromFormla adds some covariates to a formula; covs should be a list of variable names

Usage

addCovToFormla(covs, formla)

Arguments

covs

should be a list of variable names

formla

which formula to add covariates to

Value

formula

Examples

formla <- y ~ x
addCovToFormla(list("w", "z"), formla)

formla <- ~x
addCovToFormla("z", formla)

Block Bootstrap

Description

make draws of all observations with the same id in a panel data context. This is useful for bootstrapping with panel data.

Usage

blockBootSample(data, idname)

Arguments

data

data.frame from which you want to bootstrap

idname

column in data which contains an individual identifier

Value

data.frame bootstrapped from the original dataset; this data.frame will contain new ids

Examples

data("LaborSupply", package = "plm")
bbs <- blockBootSample(LaborSupply, "id")
nrow(bbs)
head(bbs$id)

check_staggered

Description

A function to check if treatment is staggered in a panel data set.

Usage

check_staggered(df, idname, treatname)

Arguments

df

the data.frame used in the function

idname

name of column that holds the unit id

treatname

name of column with the treatment indicator

Value

a logical indicating whether treatment is staggered


Check Function

Description

The check function used for optimizing to get quantiles

Usage

checkfun(a, tau)

Arguments

a

vector to compute quantiles for

tau

between 0 and 1, ex. .5 implies get the median

Value

numeric value

Examples

x <- rnorm(100)
x[which.min(checkfun(x, 0.5))] ## should be around 0

Combine Two Distribution Functions

Description

Combines two distribution functions with given weights by pstrat

Usage

combineDfs(y.seq, dflist, pstrat = NULL, ...)

Arguments

y.seq

sequence of possible y values

dflist

list of distribution functions to combine

pstrat

a vector of weights to put on each distribution function; if weights are not provided then equal weight is given to each distribution function

...

additional arguments that can be past to BMisc::makeDist

Value

ecdf

Examples

x <- rnorm(100)
y <- rnorm(100, 1, 1)
Fx <- ecdf(x)
Fy <- ecdf(y)
both <- combineDfs(seq(-2, 3, 0.1), list(Fx, Fy))
plot(Fx, col = "green")
plot(Fy, col = "blue", add = TRUE)
plot(both, add = TRUE)

Compare Variables across Groups

Description

compareBinary takes in a variable e.g. union and runs bivariate regression of x on treatment (for summary statistics)

Usage

compareBinary(
  x,
  on,
  dta,
  w = rep(1, nrow(dta)),
  report = c("diff", "levels", "both")
)

Arguments

x

variables to run regression on

on

binary variable

dta

the data to use

w

weights

report

which type of report to make; diff is the difference between the two variables by group

Value

matrix of results


Cross Section to Panel

Description

Turn repeated cross sections data into panel data by imposing rank invariance; does not require that the inputs have the same length

Usage

cs2panel(cs1, cs2, yname)

Arguments

cs1

data frame, the first cross section

cs2

data frame, the second cross section

yname

the name of the variable to calculate difference for (should be the same in each dataset)

Value

the change in outcomes over time


drop_collinear

Description

A function to check for multicollinearity and drop collinear terms from a matrix

Usage

drop_collinear(matrix)

Arguments

matrix

a matrix for which the function will remove collinear columns

Value

a matrix with collinear columns removed


Drop a Covariate from a Formula

Description

dropCovFromFormla adds drops some covariates from a formula; covs should be a list of variable names

Usage

dropCovFromFormla(covs, formla)

Arguments

covs

should be a list of variable names

formla

which formula to drop covariates from

Value

formula

Examples

formla <- y ~ x + w + z
dropCovFromFormla(list("w", "z"), formla)

dropCovFromFormla("z", formla)

element_wise_mult

Description

This is a function that takes in two matrices of dimension nxB and nxk and returns a Bxk matrix that comes from element-wise multiplication of every column in the first matrix times the entire second matrix and the averaging over the n-dimension. It is equivalent (but faster than) the following R code: 'sapply(1:biters, function(b) sqrt(n)*colMeans(Umat[,b]*inf.func))' . This function is particularly useful for fast computations using the multiplier bootstrap.

Usage

element_wise_mult(U, inf_func)

Arguments

U

nxB matrix (e.g., these could be a matrix of Rademachar weights for B bootstrap iterations using the multiplier bootstrap

inf_func

nxk matrix of (e.g., these could be a matrix containing the influence function for different parameter estimates)

Value

a Bxk matrix


get_first_difference

Description

A function that calculates the first difference in a panel data setting. If the data.frame that is passed in has nxT rows, the resulting vector will also have nxT elements with one element for each unit set to be NA.

Usage

get_first_difference(df, idname, yname, tname)

Arguments

df

the data.frame used in the function

idname

name of column that holds the unit id

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

tname

name of column that holds the time period


get_group

Description

A function to calculate a unit's group in a panel data setting with a binary treatment and staggered treatment adoption and where there is a column in the data indicating whether or not a unit is treated

Usage

get_group(df, idname, tname, treatname)

Arguments

df

the data.frame used in the function

idname

name of column that holds the unit id

tname

name of column that holds the time period

treatname

name of column with the treatment indicator


get_lagYi

Description

A function that calculates lagged outcomes in a panel data setting. If the data.frame that is passed in has nxT rows, the resulting vector will also have nxT elements with one element for each unit set to be NA

Usage

get_lagYi(df, idname, yname, tname, nlags = 1)

Arguments

df

the data.frame used in the function

idname

name of column that holds the unit id

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

tname

name of column that holds the time period

nlags

The number of periods to lag. The default is 1, which computes the lag from the previous period.


get_principal_components

Description

A function to calculate unit-specific principal components, given panel data

Usage

get_principal_components(
  xformula,
  data,
  idname,
  tname,
  n_components = NULL,
  ret_wide = FALSE,
  ret_id = FALSE
)

Arguments

xformula

a formula specifying the variables to use in the principal component analysis

data

a data.frame containing the panel data

idname

the name of the column containing the unit id

tname

the name of the column containing the time period

n_components

the number of principal components to retain, the default is NULL which will result in all principal components being retained

ret_wide

whether to return the data in wide format (where the number of rows is equal to n = length(unique(data[[idname]])) or long format (where the number of rows is equal to nT = nrow(data)). The default is FALSE, so that long data is returned by default.

ret_id

whether to return the id column in the output data.frame. The default is FALSE.

Value

a data.frame containing the original data with the principal components appended


get_Yi1

Description

A function to calculate outcomes for units in the first time period that is available in a panel data setting (this function can also be used to recover covariates, etc. in the first period).

Usage

get_Yi1(df, idname, yname, tname, gname)

Arguments

df

the data.frame used in the function

idname

name of column that holds the unit id

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

tname

name of column that holds the time period

gname

name of column containing the unit's group


get_Yibar

Description

A function to calculate the average outcome across all time periods separately for each unit in a panel data setting (this function can also be used to recover covariates, etc.).

Usage

get_Yibar(df, idname, yname)

Arguments

df

the data.frame used in the function

idname

name of column that holds the unit id

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period


get_Yibar_pre

Description

A function to calculate average outcomes for units in their pre-treatment periods (this function can also be used to recover pre-treatment averages of covariates, etc.). For units that do not participate in the treatment (and therefore have group==0), the function calculates their overall average outcome.

Usage

get_Yibar_pre(df, idname, yname, tname, gname)

Arguments

df

the data.frame used in the function

idname

name of column that holds the unit id

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

tname

name of column that holds the time period

gname

name of column containing the unit's group


get_YiGmin1

Description

A function to calculate outcomes for units in the period right before they become treated (this function can also be used to recover covariates, etc. in the period right before a unit becomes treated). For units that do not participate in the treatment (and therefore have group==0), they are assigned their outcome in the last period.

Usage

get_YiGmin1(df, idname, yname, tname, gname)

Arguments

df

the data.frame used in the function

idname

name of column that holds the unit id

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

tname

name of column that holds the time period

gname

name of column containing the unit's group


get_Yit

Description

A function to calculate outcomes for units in a particular time period 'tp' in a panel data setting (this function can also be used to recover covariates, etc. in the first period).

Usage

get_Yit(df, tp, idname, yname, tname)

Arguments

df

the data.frame used in the function

tp

The time period for which to get the outcome

idname

name of column that holds the unit id

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

tname

name of column that holds the time period

Value

a vector of outcomes in period t, the vector will have the length nT (i.e., this is returned for each element in the panel, not for a particular period)


Return Particular Element from Each Element in a List

Description

a function to take a list and get a particular part out of each element in the list

Usage

getListElement(listolists, whichone = 1)

Arguments

listolists

a list

whichone

which item to get out of each list (can be numeric or name)

Value

list of all the elements 'whichone' from each list

Examples

len <- 100 # number elements in list
lis <- lapply(1:len, function(l) list(x = (-l), y = l^2)) # create list
getListElement(lis, "x")[1] # should be equal to -1
getListElement(lis, 1)[1] # should be equal to -1

Weighted Distribution Function

Description

Get a distribution function from a vector of values after applying some weights

Usage

getWeightedDf(y, y.seq = NULL, weights = NULL, norm = TRUE)

Arguments

y

a vector to compute the mean for

y.seq

an optional vector of values to compute the distribution function for; the default is to use all unique values of y

weights

the vector of weights, can be NULL, then will just return mean

norm

normalize the weights so that they have mean of 1, default is to normalize

Value

ecdf


Weighted Mean

Description

Get the mean applying some weights

Usage

getWeightedMean(y, weights = NULL, norm = TRUE)

Arguments

y

a vector to compute the mean for

weights

the vector of weights, can be NULL, then will just return mean

norm

normalize the weights so that they have mean of 1, default is to normalize

Value

the weighted mean


Get Weighted Quantiles

Description

Finds multiple quantiles by repeatedly calling getWeightedQuantile

Usage

getWeightedQuantiles(tau, cvec, weights = NULL, norm = TRUE)

Arguments

tau

a vector of values between 0 and 1

cvec

a vector to compute quantiles for

weights

the weights, weighted.checkfun normalizes the weights to sum to 1.

norm

normalize the weights so that they have mean of 1, default is to normalize

Value

vector of quantiles


Convert Vector of ids into Vector of Row Numbers

Description

ids2rownum takes a vector of ids and converts it to the right row number in the dataset; ids should be unique in the dataset that is, don't pass the function panel data with multiple same ids

Usage

ids2rownum(ids, data, idname)

Arguments

ids

vector of ids

data

data frame

idname

unique id

Value

vector of row numbers

Examples

ids <- seq(1, 1000, length.out = 100)
ids <- ids[order(runif(100))]
df <- data.frame(id = ids)
ids2rownum(df$id, df, "id")

Invert Ecdf

Description

take an ecdf object and invert it to get a step-quantile function

Usage

invertEcdf(df)

Arguments

df

an ecdf object

Value

stepfun object that contains the quantiles of the df


Left-hand Side Variables

Description

Take a formula and return a vector of the variables on the left hand side, it will return NULL for a one sided formula

Usage

lhs.vars(formla)

Arguments

formla

a formula

Value

vector of variable names

Examples

ff <- yvar ~ x1 + x2
lhs.vars(ff)

Balance a Panel Data Set

Description

This function drops observations from data.frame that are not part of balanced panel data set.

Usage

makeBalancedPanel(data, idname, tname, return_data.table = FALSE)

Arguments

data

data.frame used in function

idname

unique id

tname

time period name

return_data.table

if TRUE, makeBalancedPanel will return a data.table rather than a data.frame. Default is FALSE.

Value

data.frame that is a balanced panel

Examples

id <- rep(seq(1, 100), each = 2) # individual ids for setting up a two period panel
t <- rep(seq(1, 2), 100) # time periods
y <- rnorm(200) # outcomes
dta <- data.frame(id = id, t = t, y = y) # make into data frame
dta <- dta[-7, ] # drop the 7th row from the dataset (which creates an unbalanced panel)
dta <- makeBalancedPanel(dta, idname = "id", tname = "t")

Make a Distribution Function

Description

turn vectors of a values and their distribution function values into an ecdf. Vectors should be the same length and both increasing.

Usage

makeDist(
  x,
  Fx,
  sorted = FALSE,
  rearrange = FALSE,
  force01 = FALSE,
  method = "constant"
)

Arguments

x

vector of values

Fx

vector of the distribution function values

sorted

boolean indicating whether or not x is already sorted; computation is somewhat faster if already sorted

rearrange

boolean indicating whether or not should monotize distribution function

force01

boolean indicating whether or not to force the values of the distribution function (i.e. Fx) to be between 0 and 1

method

which method to pass to approxfun to approximate the distribution function. Default is "constant"; other possible choice is "linear". "constant" returns a step function, just like an empirical cdf; "linear" linearly interpolates between neighboring points.

Value

ecdf

Examples

y <- rnorm(100)
y <- y[order(y)]
u <- runif(100)
u <- u[order(u)]
F <- makeDist(y, u)

multiplier_bootstrap

Description

A function that takes in an influence function (an nxk matrix) and the number of bootstrap iterations and returns a Bxk matrix of bootstrap results. This function uses Rademechar weights.

Usage

multiplier_bootstrap(inf_func, biters)

Arguments

inf_func

nxk matrix of (e.g., these could be a matrix containing the influence function for different parameter estimates)

biters

the number of bootstrap iterations

Value

a Bxk matrix


Matrix-Vector Multiplication

Description

This function multiplies a matrix by a vector and returns a numeric vector.

Usage

mv_mult(A, v)

Arguments

A

an nxk matrix.

v

a vector (can be stored as numeric or as a kx1 matrix)

Value

A numeric vector resulting from the multiplication of the matrix by the vector.

Examples

A <- matrix(1:9, nrow = 3, ncol = 3)
v <- c(2, 4, 6)
mv_mult(A, v)

orig2t

Description

A helper function to switch from original time periods to "new" time periods (which are just time periods going from 1 to total number of available periods). This allows for periods not being exactly spaced apart by 1.

Usage

orig2t(orig, original_time.periods)

Arguments

orig

a vector of original time periods to convert to new time periods.

original_time.periods

vector containing all original time periods.

Value

new time period converted from original time period


Panel Data to Repeated Cross Sections

Description

panel2cs takes a 2 period dataset and turns it into a cross sectional dataset. The data includes the change in time varying variables between the time periods. The default functionality is to keep all the variables from period 1 and add all the variables listed by name in timevars from period 2 to those.

Usage

panel2cs(data, timevars, idname, tname)

Arguments

data

data.frame used in function

timevars

vector of names of variables to keep

idname

unique id

tname

time period name

Value

data.frame


Panel Data to Repeated Cross Sections

Description

panel2cs2 takes a 2 period dataset and turns it into a cross sectional dataset; i.e., long to wide. This function considers a particular case where there is some outcome whose value can change over time. It returns the dataset from the first period with the outcome in the second period and the change in outcomes over time appended to it

Usage

panel2cs2(data, yname, idname, tname, balance_panel = TRUE)

Arguments

data

data.frame used in function

yname

name of outcome variable that can change over time

idname

unique id

tname

time period name

balance_panel

whether to ensure that panel is balanced. Default is TRUE, but code runs somewhat faster if this is set to be FALSE.

Value

data from first period with .y0 (outcome in first period), .y1 (outcome in second period), and .dy (change in outcomes over time) appended to it


Right-hand Side of Formula

Description

Take a formula and return the right hand side of the formula

Usage

rhs(formla)

Arguments

formla

a formula

Value

a one sided formula

Examples

ff <- yvar ~ x1 + x2
rhs(ff)

Right-hand Side Variables

Description

Take a formula and return a vector of the variables on the right hand side

Usage

rhs.vars(formla)

Arguments

formla

a formula

Value

vector of variable names

Examples

ff <- yvar ~ x1 + x2
rhs.vars(ff)

ff <- y ~ x1 + I(x1^2)
rhs.vars(ff)

source_all

Description

Source all the files in a folder

Usage

source_all(fldr)

Arguments

fldr

path to a folder


Subsample of Observations from Panel Data

Description

returns a subsample of a panel data set; in particular drops all observations that are not in keepids. If it is not set, randomly keeps nkeep observations.

Usage

subsample(dta, idname, tname, keepids = NULL, nkeep = NULL)

Arguments

dta

a data.frame which is a balanced panel

idname

the name of the id variable

tname

the name of the time variable

keepids

which ids to keep

nkeep

how many ids to keep (only used if keepids is not set); the default is the number of unique ids

Value

a data.frame that contains a subsample of dta

Examples

data("LaborSupply", package = "plm")
nrow(LaborSupply)
unique(LaborSupply$year)
ss <- subsample(LaborSupply, "id", "year", nkeep = 100)
nrow(ss)

t2orig

Description

A helper function to switch from "new" t values to original t values. This allows for periods not being exactly spaced apart by 1.

Usage

t2orig(t, original_time.periods)

Arguments

t

a vector of time periods to convert back to original time periods.

original_time.periods

vector containing all original time periods.

Value

original time period converted from new time period


time_invariant_to_panel

Description

This function takes a time-invariant variable and repeats it for each period in a panel data set.

Usage

time_invariant_to_panel(x, df, idname, balanced_panel = TRUE)

Arguments

x

a vector of length equal to the number of unique ids in df.

df

the data.frame used in the function

idname

name of column that holds the unit id

balanced_panel

a logical indicating whether the panel is balanced. If TRUE, the function will optimize the repetition process. Default is TRUE.

Value

a vector of length equal to the number of rows in df.


Variable Names to Formula

Description

take a name for a y variable and a vector of names for x variables and turn them into a formula

Usage

toformula(yname, xnames)

Arguments

yname

the name of the y variable

xnames

vector of names for x variables

Value

a formula

Examples

toformula("yvar", c("x1", "x2"))

## should return yvar ~ 1
toformula("yvar", rhs.vars(~1))

TorF

Description

A function to replace NA's with FALSE in vector of logicals

Usage

TorF(cond, use_isTRUE = FALSE)

Arguments

cond

a vector of conditions to check

use_isTRUE

whether or not to use a vectorized version of isTRUE. This is generally slower but covers more cases.

Value

logical vector


Weighted Check Function

Description

Weights the check function

Usage

weighted.checkfun(q, cvec, tau, weights)

Arguments

q

the value to check

cvec

vector of data to compute quantiles for

tau

between 0 and 1, ex. .5 implies get the median

weights

the weights, weighted.checkfun normalizes the weights to sum to 1.

Value

numeric