Repository to store the code associated with the WhoseEgg R Shiny app for classifying invasive carp eggs using a random forest model.
- App Location: https://whoseegg.stat.iastate.edu/.
- Creators: Katherine Goode, Michael Weber, and Philip Dixon
Folders
-
data
: Folder that contains the datasets used by WhoseEgg (or used to prepare the data used by WhoseEgg)2016 eggs with incorrect data_MW.csv
: Contains the problematic eggs with corrections by Mikeeggdata_for_app.csv
: The data used to train the random forest models used by WhoseEggproblematic-eggs.csv
: The dataset with identified problematic eggsrfs_for_app.rds
: The random forests used by WhoseEgg to make predictionstemplate.xlsx
: The template for users of WhoseEgg that is downloable via the app
-
prep
: Folder that contains the code used to prepare the data and models for WhoseEgg sorted by number based on in order in which the files should be run. Both the .md and .pdf files are generated by the .Rmd file. Look at the .md files for easy viewing on GitHub. The folders corresonding to the code files contain the figures generated by the .Rmd files and used by the .md files on GitHub.01-data-for-app
.md/.Rmd/.pdf: prepares the training data02-rfs-for-app
.md/.Rmd/.pdf: code for training the random forests used by WhoseEgg03-animation-for-app
.md/.Rmd/.pdf: Code for creating an animation of a swimming fish04-mds-for-app
.md/.Rmd/.pdf: Code for investigating the use of MDS to identify new observations outside of the training data (for future work)99-testing-app-functions
.md/.Rmd/.pdf: Code for testing the functions used by WhoseEgg
-
text
: Folder with R markdown files accessed by WhoseEgg to incorporate text in the app. The files are numberbed to correspond with the page in the app they appear on:01
: home page02
: input page03
: predictions page04
: downloads page05
: help page06
: references page
-
www
: Contains the figures the appear in the WhoseEgg app
Files
app.R
: Main R script that contains the server and UI for the apphelper-functions.R
: R script that contains the helper functions used by WhoseEggmatomo.txt
: Text file that is necessary for Matomo to collect user information from the WhoseEgg server (DO NOT REMOVE)r-requirements.txt
: Text file with a list of R packages used by WhoseEgg that is necessary for the server to work correctly (DO NOT REMOVE)README.md
andREADME.Rmd
: README file for WhoseEgg GitHub repository with lots of helpful informationreferences.bib
: File with BibTex citations used in WhoseEgg
-
Clone WhoseEgg Repository into R Studio: See this book chapter for help if this is new to you.
-
Edit code and/or files as needed: If the only changes that need to be made are updating the training data and random forests, you only need to adjust the code in the files
01-data-for-app.Rmd
and02-rfs-for-app.Rmd
and save the new versions of the data and random forests. The code in app.R will automatically use whatever versions ofeggdata_for_app.csv
andrfs_for_app.rds
are in the repository data folder. -
Commit and push the updates to the main branch of the repository: You could also create a new branch while working on updating the app and then merge the new branch with the main branch. The WhoseEgg server uses the files in the main branch.
-
Wait for the server to update: This may take a while.
If any new R packages are added to the app, their name must be added to
the r-requirements.txt
file.
In order to access the Matomo user data associated with WhoseEgg, log in at https://trends.ent.iastate.edu/. This is useful for tracking the number of visits the WhoseEgg URL receives, and this information may be used if additional funding is applied for. Currently, Katherine Goode, Mike Weber, and Philip Dixon have access. Contact [email protected] to be added to the users who can access the Matomo data. (Must also have approval from Mike Weber and Philip Dixon.)
If there is an error with the WhoseEgg server, you can check the log by going through these steps (must have the appropriate access):
-
Go here and login with okta (must be connected to the ISU VPN).
-
Also make sure ‘Developer’ is selected from the dropdown in the top left rather than ‘Administrator’.
-
If you see a dropdown near the top, choose ‘rit-pdixon-lab-carp’.
-
Click ‘Topology’ on the left.
-
Then click the big circle.
-
Then under the ‘Pods’ heading, click ‘View’ Logs’.
If you need to get access, contact [email protected]. Currently, only Katherine and Philip have access. (Must also have approval from Philip Dixon.)
Data Input
- Add option for manual input of one observation (could try using a Google form embedding in the app)
- Look into research on data input in apps
Methods
- Add more visualizations:
- Comparing input data to training data
- Create an MDS plot with training and new data
- Make plots interactive
- Add plots comparing predictions with egg characteristics
- Interpretations of random forest probabilities
- Add a nice interpretation
- Add comments on the interpretations of probabilities near 0.5-0.6
- Look into how to compute empirical probabilities and try out the model calibration technique
- Read papers on model calibration:
- Add random forest prediction intervals
- Switch to using random forests with reduced features
- Try using weighting in random forests to account for imbalance in classes
Format
- Figure out a better way to fix the header so it doesn’t cover material when the screen size changes
- Fix the WhoseEgg color text when hovered over
- Add a download image button
Other
- Add a check for when a new dataset is uploaded
- Add formal tests
- Connect with GitHub actions
- Look into the process the fish bioenergetics Shiny app uses to update frequently with more data and fish species
- Update functions and code to make more efficient
The code below loads the R packages used by WhoseEgg (via the app.R file), and the version numbers of the packages are printed below.
# Packages used by the WhoseEgg code (in the app.R file)
library(dplyr)
library(DT)
library(forcats)
library(ggplot2)
library(markdown)
library(plotly)
library(purrr)
library(randomForest)
library(shiny)
library(shinythemes)
library(stringr)
library(tidyr)
# Print the session information
sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] tidyr_1.1.3 stringr_1.4.0 shinythemes_1.2.0
## [4] shiny_1.6.0 randomForest_4.6-14 purrr_0.3.4
## [7] plotly_4.9.3 markdown_1.1 ggplot2_3.3.3
## [10] forcats_0.5.1 DT_0.17 dplyr_1.0.6
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 later_1.2.0 pillar_1.6.1 compiler_4.0.4
## [5] tools_4.0.4 digest_0.6.27 viridisLite_0.4.0 jsonlite_1.7.2
## [9] evaluate_0.14 lifecycle_1.0.0 tibble_3.1.2 gtable_0.3.0
## [13] pkgconfig_2.0.3 rlang_0.4.11 DBI_1.1.1 yaml_2.2.1
## [17] xfun_0.23 fastmap_1.1.0 httr_1.4.2 withr_2.4.2
## [21] knitr_1.33 generics_0.1.0 vctrs_0.3.8 htmlwidgets_1.5.3
## [25] grid_4.0.4 tidyselect_1.1.1 data.table_1.14.0 glue_1.4.2
## [29] R6_2.5.0 fansi_0.5.0 rmarkdown_2.9 magrittr_2.0.1
## [33] promises_1.2.0.1 scales_1.1.1 ellipsis_0.3.2 htmltools_0.5.1.1
## [37] assertthat_0.2.1 xtable_1.8-4 mime_0.10 colorspace_2.0-1
## [41] httpuv_1.6.1 utf8_1.2.1 stringi_1.6.2 lazyeval_0.2.2
## [45] munsell_0.5.0 crayon_1.4.1