README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```


# wildrwolf `r emo::ji("wolf")`

<!-- badges: start -->
[![R-CMD-check](https://github.com/s3alfisc/wildrwolf/workflows/R-CMD-check/badge.svg)](https://github.com/s3alfisc/wildrwolf/actions)
[![](http://cranlogs.r-pkg.org/badges/last-month/wildrwolf)](https://cran.r-project.org/package=wildrwolf)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html)
[![](https://www.r-pkg.org/badges/version/wildrwolf)](https://cran.r-project.org/package=wildrwolf)
![runiverse-package](https://s3alfisc.r-universe.dev/badges/wildrwolf)
[![Codecov test coverage](https://codecov.io/gh/s3alfisc/wildrwolf/branch/main/graph/badge.svg)](https://app.codecov.io/gh/s3alfisc/wildrwolf?branch=main)
<!-- badges: end -->

The `wildrwolf` package implements Romano-Wolf multiple-hypothesis-adjusted p-values for objects of type `fixest` and `fixest_multi` from the `fixest` package via a wild (cluster) bootstrap. 

Because the bootstrap-resampling is based on the [fwildclusterboot](https://github.com/s3alfisc/fwildclusterboot) package, `wildrwolf` is usually really fast. 

The package is complementary to [wildwyoung](https://github.com/s3alfisc/wildwyoung) (still work in progress), which implements the multiple hypothesis adjustment method following Westfall and Young (1993).

Adding support for multi-way clustering is work in progress.

## Installation

You can install the package from CRAN and the development version from [GitHub](https://github.com/) with:

``` r
install.packages("wildrwolf")

# install.packages("devtools")
devtools::install_github("s3alfisc/wildrwolf")

# from r-universe (windows & mac, compiled R > 4.0 required)
install.packages('wildrwolf', repos ='https://s3alfisc.r-universe.dev')
```

## Example I

<!-- As you can see in the example, there seems to be a bug in `rwolf()` for the pairs bootstrap. -->

```{r, warning=FALSE, message = FALSE}
library(wildrwolf)
library(fixest)

set.seed(1412)

N <- 1000
X1 <- rnorm(N)
X2 <- rnorm(N)
rho <- 0.5
sigma <- matrix(rho, 4, 4); diag(sigma) <- 1
u <- MASS::mvrnorm(n = N, mu = rep(0, 4), Sigma = sigma)
Y1 <- 1 + 1 * X1 + X2 
Y2 <- 1 + 0.01 * X1 + X2 
Y3 <- 1 + 0.4 * X1 + X2
Y4 <- 1 + -0.02 * X1 + X2 
for(x in 1:4){
  var_char <- paste0("Y", x)
  assign(var_char, get(var_char) + u[,x])
}

data <- data.frame(Y1 = Y1,
                   Y2 = Y2,
                   Y3 = Y3,
                   Y4 = Y4,
                   X1 = X1,
                   X2 = X2,
                   #group_id = group_id,
                   splitvar = sample(1:2, N, TRUE))

fit <- feols(c(Y1, Y2, Y3, Y4) ~ csw(X1,X2),
             data = data,
             se = "hetero",
             ssc = ssc(cluster.adj = TRUE))

# clean workspace except for res & data
rm(list= ls()[!(ls() %in% c('fit','data'))])

res_rwolf1 <- wildrwolf::rwolf(
  models = fit,
  param = "X1", 
  B = 9999
)

pvals <- lapply(fit, function(x) pvalue(x)["X1"]) |> unlist()

# Romano-Wolf Corrected P-values
res_rwolf1

```

## Example II

```{r, warning = FALSE, message = FALSE}
fit1 <- feols(Y1 ~ X1 , data = data)
fit2 <- feols(Y1 ~ X1 + X2, data = data)
fit3 <- feols(Y2 ~ X1, data = data)
fit4 <- feols(Y2 ~ X1 + X2, data = data)

res_rwolf2 <- rwolf(
  models = list(fit1, fit2, fit3, fit4), 
  param = "X1",  
  B = 9999
)
res_rwolf2
```

## Performance

The above procedure with `S=8` hypotheses, `N=1000` observations and `k %in% (1,2)` parameters finishes in around 5 seconds.

```{r, warning = FALSE, message = FALSE}
if(requireNamespace("microbenchmark")){
  
  microbenchmark::microbenchmark(
    "Romano-Wolf" = wildrwolf::rwolf(
      models = fit,
      param = "X1", 
      B = 9999 
    ), 
    times = 1
  )
 
}
```


## But does it work? Monte Carlo Experiments

We test $S=6$ hypotheses and generate data as 

$$Y_{i,s,g} = \beta_{0} + \beta_{1,s} D_{i} + u_{i,g} + \epsilon_{i,s} $$
where $D_i = 1(U_i > 0.5)$ and $U_i$ is drawn from a uniform distribution, $u_{i,g}$ is a cluster level shock with intra-cluster correlation $0.5$, and the idiosyncratic error term is drawn from a multivariate random normal distribution with mean $0_S$ and covariance matrix 

```{r}
S <- 6
rho <- 0.5
Sigma <- matrix(rho, 6, 6)
diag(Sigma) <- 1
Sigma
```


with $\rho \geq 0$. We assume that $\beta_{1,s}= 0$ for all $s$. 

This experiment imposes a data generating process as in equation (9) in [Clarke, Romano and Wolf](https://docs.iza.org/dp12845.pdf), with an additional error term $u_g$ for $G=20$ clusters and intra-cluster correlation 0.5 and $N=1000$ observations. 

You can run the simulations via the `run_fwer_sim()` function attached in the package. 


```{r, message = FALSE, results = "hide"}
# note that this will take some time
res <- run_fwer_sim(
  seed = 76,
  n_sims = 1000,
  B = 499,
  N = 1000,
  s = 6, 
  rho = 0.5 #correlation between hypotheses, not intra-cluster!
)
```

Both Holm's method and `wildrwolf` control the family wise error rates, at both the 5 and 10% significance level. 

```{r}
res
```


## Comparison with Stata's rwolf package

```{r, eval = FALSE}
library(RStata)
# initiate RStata
    options("RStata.StataPath" = "\"C:\\Program Files\\Stata17\\StataBE-64\"")
    options("RStata.StataVersion" = 17)
# save the data set so it can be loaded into STATA
write.csv(data, "c:/Users/alexa/Dropbox/rwolf/inst/extdata/readme.csv")

# estimate with stata via Rstata
stata_program <- "
clear
set more off
import delimited c:/Users/alexa/Dropbox/rwolf/inst/data/readme.csv
set seed 1
rwolf y1 y2 y3 y4, indepvar(x1) controls(x2) reps(9999)
"
RStata::stata(stata_program, data.out = TRUE)


# Romano-Wolf step-down adjusted p-values
# 
# 
# Independent variable:  x1
# Outcome variables:   y1 y2 y3 y4
# Number of resamples: 9999
# 
# 
# ------------------------------------------------------------------------------
#    Outcome Variable | Model p-value    Resample p-value    Romano-Wolf p-value
# --------------------+---------------------------------------------------------
#                  y1 |    0.0000             0.0001              0.0001
#                  y2 |    0.3904             0.3755              0.6070
#                  y3 |    0.0000             0.0001              0.0001
#                  y4 |    0.9586             0.9596              0.9596
# ------------------------------------------------------------------------------


```

For comparison,  `wildrwolf` produces the following output:

```{r, warning = FALSE, message = FALSE, eval = FALSE}
models <- feols(c(Y1, Y2, Y3, Y4) ~ X1 + X2 
                 , data = data, se = "hetero")
```

```{r, include = FALSE}
models <- feols(c(Y1, Y2, Y3, Y4) ~ X1 + X2 
                 , data = data, se = "hetero")
```

```{r}
rwolf(models, param = "X1", B = 9999)

```