-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
241 lines (179 loc) · 6.88 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# wildrwolf `r emo::ji("wolf")`
<!-- badges: start -->
[![R-CMD-check](https://github.com/s3alfisc/wildrwolf/workflows/R-CMD-check/badge.svg)](https://github.com/s3alfisc/wildrwolf/actions)
[![](http://cranlogs.r-pkg.org/badges/last-month/wildrwolf)](https://cran.r-project.org/package=wildrwolf)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html)
[![](https://www.r-pkg.org/badges/version/wildrwolf)](https://cran.r-project.org/package=wildrwolf)
![runiverse-package](https://s3alfisc.r-universe.dev/badges/wildrwolf)
[![Codecov test coverage](https://codecov.io/gh/s3alfisc/wildrwolf/branch/main/graph/badge.svg)](https://app.codecov.io/gh/s3alfisc/wildrwolf?branch=main)
<!-- badges: end -->
The `wildrwolf` package implements Romano-Wolf multiple-hypothesis-adjusted p-values for objects of type `fixest` and `fixest_multi` from the `fixest` package via a wild (cluster) bootstrap.
Because the bootstrap-resampling is based on the [fwildclusterboot](https://github.com/s3alfisc/fwildclusterboot) package, `wildrwolf` is usually really fast.
The package is complementary to [wildwyoung](https://github.com/s3alfisc/wildwyoung) (still work in progress), which implements the multiple hypothesis adjustment method following Westfall and Young (1993).
Adding support for multi-way clustering is work in progress.
## Installation
You can install the package from CRAN and the development version from [GitHub](https://github.com/) with:
``` r
install.packages("wildrwolf")
# install.packages("devtools")
devtools::install_github("s3alfisc/wildrwolf")
# from r-universe (windows & mac, compiled R > 4.0 required)
install.packages('wildrwolf', repos ='https://s3alfisc.r-universe.dev')
```
## Example I
<!-- As you can see in the example, there seems to be a bug in `rwolf()` for the pairs bootstrap. -->
```{r, warning=FALSE, message = FALSE}
library(wildrwolf)
library(fixest)
set.seed(1412)
N <- 1000
X1 <- rnorm(N)
X2 <- rnorm(N)
rho <- 0.5
sigma <- matrix(rho, 4, 4); diag(sigma) <- 1
u <- MASS::mvrnorm(n = N, mu = rep(0, 4), Sigma = sigma)
Y1 <- 1 + 1 * X1 + X2
Y2 <- 1 + 0.01 * X1 + X2
Y3 <- 1 + 0.4 * X1 + X2
Y4 <- 1 + -0.02 * X1 + X2
for(x in 1:4){
var_char <- paste0("Y", x)
assign(var_char, get(var_char) + u[,x])
}
data <- data.frame(Y1 = Y1,
Y2 = Y2,
Y3 = Y3,
Y4 = Y4,
X1 = X1,
X2 = X2,
#group_id = group_id,
splitvar = sample(1:2, N, TRUE))
fit <- feols(c(Y1, Y2, Y3, Y4) ~ csw(X1,X2),
data = data,
se = "hetero",
ssc = ssc(cluster.adj = TRUE))
# clean workspace except for res & data
rm(list= ls()[!(ls() %in% c('fit','data'))])
res_rwolf1 <- wildrwolf::rwolf(
models = fit,
param = "X1",
B = 9999
)
pvals <- lapply(fit, function(x) pvalue(x)["X1"]) |> unlist()
# Romano-Wolf Corrected P-values
res_rwolf1
```
## Example II
```{r, warning = FALSE, message = FALSE}
fit1 <- feols(Y1 ~ X1 , data = data)
fit2 <- feols(Y1 ~ X1 + X2, data = data)
fit3 <- feols(Y2 ~ X1, data = data)
fit4 <- feols(Y2 ~ X1 + X2, data = data)
res_rwolf2 <- rwolf(
models = list(fit1, fit2, fit3, fit4),
param = "X1",
B = 9999
)
res_rwolf2
```
## Performance
The above procedure with `S=8` hypotheses, `N=1000` observations and `k %in% (1,2)` parameters finishes in around 5 seconds.
```{r, warning = FALSE, message = FALSE}
if(requireNamespace("microbenchmark")){
microbenchmark::microbenchmark(
"Romano-Wolf" = wildrwolf::rwolf(
models = fit,
param = "X1",
B = 9999
),
times = 1
)
}
```
## But does it work? Monte Carlo Experiments
We test $S=6$ hypotheses and generate data as
$$Y_{i,s,g} = \beta_{0} + \beta_{1,s} D_{i} + u_{i,g} + \epsilon_{i,s} $$
where $D_i = 1(U_i > 0.5)$ and $U_i$ is drawn from a uniform distribution, $u_{i,g}$ is a cluster level shock with intra-cluster correlation $0.5$, and the idiosyncratic error term is drawn from a multivariate random normal distribution with mean $0_S$ and covariance matrix
```{r}
S <- 6
rho <- 0.5
Sigma <- matrix(rho, 6, 6)
diag(Sigma) <- 1
Sigma
```
with $\rho \geq 0$. We assume that $\beta_{1,s}= 0$ for all $s$.
This experiment imposes a data generating process as in equation (9) in [Clarke, Romano and Wolf](https://docs.iza.org/dp12845.pdf), with an additional error term $u_g$ for $G=20$ clusters and intra-cluster correlation 0.5 and $N=1000$ observations.
You can run the simulations via the `run_fwer_sim()` function attached in the package.
```{r, message = FALSE, results = "hide"}
# note that this will take some time
res <- run_fwer_sim(
seed = 76,
n_sims = 1000,
B = 499,
N = 1000,
s = 6,
rho = 0.5 #correlation between hypotheses, not intra-cluster!
)
```
Both Holm's method and `wildrwolf` control the family wise error rates, at both the 5 and 10% significance level.
```{r}
res
```
## Comparison with Stata's rwolf package
```{r, eval = FALSE}
library(RStata)
# initiate RStata
options("RStata.StataPath" = "\"C:\\Program Files\\Stata17\\StataBE-64\"")
options("RStata.StataVersion" = 17)
# save the data set so it can be loaded into STATA
write.csv(data, "c:/Users/alexa/Dropbox/rwolf/inst/extdata/readme.csv")
# estimate with stata via Rstata
stata_program <- "
clear
set more off
import delimited c:/Users/alexa/Dropbox/rwolf/inst/data/readme.csv
set seed 1
rwolf y1 y2 y3 y4, indepvar(x1) controls(x2) reps(9999)
"
RStata::stata(stata_program, data.out = TRUE)
# Romano-Wolf step-down adjusted p-values
#
#
# Independent variable: x1
# Outcome variables: y1 y2 y3 y4
# Number of resamples: 9999
#
#
# ------------------------------------------------------------------------------
# Outcome Variable | Model p-value Resample p-value Romano-Wolf p-value
# --------------------+---------------------------------------------------------
# y1 | 0.0000 0.0001 0.0001
# y2 | 0.3904 0.3755 0.6070
# y3 | 0.0000 0.0001 0.0001
# y4 | 0.9586 0.9596 0.9596
# ------------------------------------------------------------------------------
```
For comparison, `wildrwolf` produces the following output:
```{r, warning = FALSE, message = FALSE, eval = FALSE}
models <- feols(c(Y1, Y2, Y3, Y4) ~ X1 + X2
, data = data, se = "hetero")
```
```{r, include = FALSE}
models <- feols(c(Y1, Y2, Y3, Y4) ~ X1 + X2
, data = data, se = "hetero")
```
```{r}
rwolf(models, param = "X1", B = 9999)
```