-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path02-rfs-for-app.Rmd
126 lines (104 loc) · 2.43 KB
/
02-rfs-for-app.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
title: "Fitting Random Forests for WhoseEgg Shiny App"
author: "Katherine Goode <br>"
date: 'Last Updated: `r format(Sys.time(), "%B %d, %Y")`'
output: rmarkdown::github_document
#output: pdf_document
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE)
```
This document contains code that fits the three random forest models that will be used in the app: models with invasive carp species grouped into one class and all other species classified into species, genus, and family. The data used to train the model is that used in Goode et al. (2021) to train the augmented models, and the same seed (808) is used, so the models should agree.
Load packages:
```{r}
library(dplyr)
library(randomForest)
library(purrr)
```
Make a list of the response variables:
```{r}
vars_resp = c(
"Family_IC",
"Genus_IC",
"Common_Name_IC"
)
```
Make a vector of the predictor variables:
```{r}
vars_pred = c(
"Month",
"Julian_Day",
"Temperature",
"Conductivity",
"Larval_Length",
"Membrane_Ave",
"Membrane_SD",
"Membrane_CV",
"Embryo_to_Membrane_Ratio",
"Embryo_Ave",
"Embryo_SD",
"Embryo_CV",
"Egg_Stage",
"Compact_Diffuse",
"Pigment",
"Sticky_Debris",
"Deflated"
)
```
Load the prepared egg data and convert necessary variables to factors:
```{r}
eggdata_for_app <-
read.csv("../data/eggdata_for_app.csv") %>%
mutate_at(
.vars = c(
"Egg_Stage",
"Compact_Diffuse",
"Pigment",
"Sticky_Debris",
"Deflated",
all_of(vars_resp)
),
.funs = factor
)
str(eggdata_for_app)
```
Function for fitting a random forest model given a response variable, predictor variables, and a dataset (uses the same seed to fit the random forests as Camacho et al. (2019) and Goode et al. (2021)):
```{r}
fit_rf <- function(resp, preds, data) {
# Fit the random forest
set.seed(808)
rf <- randomForest(
data %>% pull(resp) ~ .,
data = data %>% select(all_of(preds)),
importance = T,
ntree = 1000
)
# Put model in a named list
rf_list = list(rf)
names(rf_list) = resp
# Return the named list
return(rf_list)
}
```
Fit the random forest models:
```{r}
rfs_for_app <-
map(
.x = vars_resp,
.f = fit_rf,
preds = vars_pred,
data = eggdata_for_app
) %>%
flatten()
```
Save the random forests:
```{r}
saveRDS(
object = rfs_for_app,
file = "../data/rfs_for_app.rds"
)
```
# Session Info
```{r}
sessionInfo()
```