revise blog name and refresh docs

vbaliga · Jul 30, 2024 · 798c6e4 · 798c6e4
1 parent 0301b7e
commit 798c6e4
Show file tree

Hide file tree

Showing 21 changed files with 1,296 additions and 279 deletions.
diff --git a/.Rproj.user/shared/notebooks/paths b/.Rproj.user/shared/notebooks/paths
@@ -1,5 +1,12 @@
 /Users/mimir/Library/CloudStorage/OneDrive-UBC/github_repos/vbaliga.github.io/about.qmd="E572E06C"
 /Users/mimir/Library/CloudStorage/OneDrive-UBC/github_repos/vbaliga.github.io/blog.qmd="FF4B1115"
 /Users/mimir/Library/CloudStorage/OneDrive-UBC/github_repos/vbaliga.github.io/index.qmd="BB5822EE"
+/Users/mimir/Library/CloudStorage/OneDrive-UBC/github_repos/vbaliga.github.io/posts/2019-04-22-set-max-DLLs-in-r/index.qmd="D80F5546"
+/Users/mimir/Library/CloudStorage/OneDrive-UBC/github_repos/vbaliga.github.io/posts/2019-04-28-verify-that-r-packages-are-installed-and-loaded/index.qmd="9238F6CC"
+/Users/mimir/Library/CloudStorage/OneDrive-UBC/github_repos/vbaliga.github.io/posts/2019-05-03-parallel-processing-for-mcmcglmm-in-r-windows-friendly/index.qmd="C7171F00"
+/Users/mimir/Library/CloudStorage/OneDrive-UBC/github_repos/vbaliga.github.io/posts/2019-05-05-variance-vs-sample-size/index.qmd="A1FDB294"
+/Users/mimir/Library/CloudStorage/OneDrive-UBC/github_repos/vbaliga.github.io/posts/2019-05-12_variance-vs-sample-size-2/index.qmd="6AD102B5"
+/Users/mimir/Library/CloudStorage/OneDrive-UBC/github_repos/vbaliga.github.io/posts/2020-06-13-genbank-searches-tips-and-tricks/index.qmd="979764BF"
+/Users/mimir/Library/CloudStorage/OneDrive-UBC/github_repos/vbaliga.github.io/posts/2020-06-27-replace-text-in-specific-column/index.qmd="C827C05D"
 /Users/mimir/Library/CloudStorage/OneDrive-UBC/github_repos/vbaliga.github.io/press.qmd="BC1FDBAC"
 /Users/mimir/Library/CloudStorage/OneDrive-UBC/github_repos/vbaliga.github.io/publications.qmd="87BB8EC7"
diff --git a/_freeze/posts/2019-04-22-set-max-DLLs-in-r/index/execute-results/html.json b/_freeze/posts/2019-04-22-set-max-DLLs-in-r/index/execute-results/html.json
@@ -1,6 +1,7 @@
 {
-  "hash": "4e65c1ac6a8e7efece8d4ca0d3b48517",
+  "hash": "445cd25e79b72a26064936d2cd32da2e",
   "result": {
+    "engine": "knitr",
     "markdown": "---\ntitle: \"Set max DLLs in R (Windows)\"\nsubtitle: \"If you need to adjust the max number of .dll files that R can handle, here is code that works if you are using Windows.\"\nauthor: \"Vikram B. Baliga\"\ncategories:\n  - R\n  - DLLs \n  - Windows\ndate: 2019-04-22\ntoc: true\nimage: \"set-max-DLLs-r.png\"\n---\n\n\nOn occasion, you may need adjust the max number of .dll files that R can handle.\nI first encountered this need when using a high number of packages together.\n\nI've had trouble finding this info in the past, so I decided to create this post\nfor others. This works if you are using Windows.\n\nThe following is machine-specific, so you will need to do this on each computer\nyou run R.\n\n## Find the `.Renviron` file\n\n\n::: {.cell}\n\n```{.r .cell-code}\nuser_renviron <- \n  path.expand(file.path(\"~\", \".Renviron\"))\n# check to see if the file already exists\n# typically under: \"C:/Users/YOURUSERNAME/Documents/.Renviron\"\nif(!file.exists(user_renviron)) \n  file.create(user_renviron)\nfile.edit(user_renviron) \n```\n:::\n\n\nIf `file.edit(user_renviron)` fails to work, just open the file itself (located\nwherever `user_renviron` is pointing) with a text editor.\n\n## Edit max DLLs\n\nOnce you have the file open, edit or add the following line, save, and restart\nR:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nR_MAX_NUM_DLLS=500\n```\n:::\n\n\n🐢\n",
     "supporting": [
       "index_files"

diff --git a/...019-04-28-verify-that-r-packages-are-installed-and-loaded/index/execute-results/html.json b/...019-04-28-verify-that-r-packages-are-installed-and-loaded/index/execute-results/html.json
@@ -1,8 +1,11 @@
 {
-  "hash": "4311eec8d851fbd2c13a42efd6bd1dde",
+  "hash": "25bcf59b949d90ac9de2b458ceefeffb",
   "result": {
-    "markdown": "---\ntitle: \"Check if packages are installed (and install if not) in R\"\nsubtitle: \"Here’s some code that provides an easy way to check whether specific packages are in the default Library. If they are, they’re simply loaded via library(). If any packages are missing, they’re installed (with dependencies) into the default Library and are then loaded.\"\nauthor: \"Vikram B. Baliga\"\ncategories:\n  - R\n  - packages \n  - package-installation\n  - package-loading\ndate: 2019-04-28\ntoc: true\nimage: \"verify-that-r-package.png\"\n---\n\n\nSay you have an R script that you share with others. You may not be sure that each user has installed all the packages the script will require. Using `install.packages()` would be unnecessary for users who already have the packages and simply need to load them.\n\nHere's some code that provides an easy way to check whether specific packages are in the default Library. If they are, they're simply loaded via `library()`. If any packages are missing, they're installed (with dependencies) into the default Library and are then loaded.\n\n## Load \\| install & load packages\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## If a package is installed, it will be loaded. If any \n## are not, the missing package(s) will be installed \n## from CRAN and then loaded.\n\n## First specify the packages of interest\npackages = c(\"MASS\", \"nlme\")\n\n## Now load or install&load all\npackage.check <- lapply(\n  packages,\n  FUN = function(x) {\n    if (!require(x, character.only = TRUE)) {\n      install.packages(x, dependencies = TRUE)\n      library(x, character.only = TRUE)\n    }\n  }\n)\n```\n:::\n\n\nThe logic of the `package.check()` function basically goes:\n\n-   Using `lapply()` to the list of `packages`,\n\n-   If a package is not installed, install it.\n\n-   Otherwise, load it.\n\nYou can then use `search()` to determine whether all the packages have loaded.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsearch()\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] \".GlobalEnv\"        \"package:nlme\"      \"package:MASS\"     \n [4] \"tools:quarto\"      \"package:stats\"     \"package:graphics\" \n [7] \"package:grDevices\" \"package:datasets\"  \"renv:shims\"       \n[10] \"package:utils\"     \"package:methods\"   \"Autoloads\"        \n[13] \"package:base\"     \n```\n:::\n:::\n\n\nThat's all!\\\n🐢\n",
-    "supporting": [],
+    "engine": "knitr",
+    "markdown": "---\ntitle: \"Check if packages are installed (and install if not) in R\"\nsubtitle: \"Here’s some code that provides an easy way to check whether specific packages are in the default Library. If they are, they’re simply loaded via library(). If any packages are missing, they’re installed (with dependencies) into the default Library and are then loaded.\"\nauthor: \"Vikram B. Baliga\"\ncategories:\n  - R\n  - packages \n  - package-installation\n  - package-loading\ndate: 2019-04-28\ntoc: true\nimage: \"verify-that-r-package.png\"\n---\n\n\nSay you have an R script that you share with others. You may not be sure that each user has installed all the packages the script will require. Using `install.packages()` would be unnecessary for users who already have the packages and simply need to load them.\n\nHere's some code that provides an easy way to check whether specific packages are in the default Library. If they are, they're simply loaded via `library()`. If any packages are missing, they're installed (with dependencies) into the default Library and are then loaded.\n\n## Load \\| install & load packages\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## If a package is installed, it will be loaded. If any \n## are not, the missing package(s) will be installed \n## from CRAN and then loaded.\n\n## First specify the packages of interest\npackages = c(\"MASS\", \"nlme\")\n\n## Now load or install&load all\npackage.check <- lapply(\n  packages,\n  FUN = function(x) {\n    if (!require(x, character.only = TRUE)) {\n      install.packages(x, dependencies = TRUE)\n      library(x, character.only = TRUE)\n    }\n  }\n)\n```\n:::\n\n\nThe logic of the `package.check()` function basically goes:\n\n-   Using `lapply()` to the list of `packages`,\n\n-   If a package is not installed, install it.\n\n-   Otherwise, load it.\n\nYou can then use `search()` to determine whether all the packages have loaded.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsearch()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n [1] \".GlobalEnv\"        \"package:nlme\"      \"package:MASS\"     \n [4] \"package:stats\"     \"package:graphics\"  \"package:grDevices\"\n [7] \"package:datasets\"  \"renv:shims\"        \"package:utils\"    \n[10] \"package:methods\"   \"Autoloads\"         \"package:base\"     \n```\n\n\n:::\n:::\n\n\nThat's all!\\\n🐢\n",
+    "supporting": [
+      "index_files"
+    ],
     "filters": [
       "rmarkdown/pagebreak.lua"
     ],

diff --git a/...03-parallel-processing-for-mcmcglmm-in-r-windows-friendly/index/execute-results/html.json b/...03-parallel-processing-for-mcmcglmm-in-r-windows-friendly/index/execute-results/html.json
@@ -1,8 +1,11 @@
 {
-  "hash": "80b76a5fd7cdfa1c87bec668b20c07ca",
+  "hash": "2cac23f3c60edb68ce625d5b50291dd1",
   "result": {
+    "engine": "knitr",
     "markdown": "---\ntitle: \"Parallel processing for MCMCglmm in R (Windows-friendly)\"\nsubtitle: \"I set up a virtual cluster and then use the parallel::parLapply() function to run iterations of MCMCglmm() in parallel for computers running Windows.\"\nauthor: \"Vikram B. Baliga\"\ncategories:\n  - R\n  - MCMCglmm \n  - parallel\n  - parallel-processing\n  - Windows\ndate: 2019-05-03\ntoc: true\nimage: \"mcmcglmm-parallel.png\"\n---\n\n\nLately, I have been using the [MCMCglmm](https://cran.r-project.org/web/packages/MCMCglmm/index.html) package to run linear mixed-models in a Bayesian framework. The documentation is generally very good but there seems to be relatively little support for using parallel processing (here: using multiple cores on your machine) to efficiently run large volumes of mcmc runs. This is especially true for Windows users, who cannot use functions like `parallel::mclapply()`.\n\nI'm happy to share that I have worked out a solution using the [parallel](https://www.rdocumentation.org/packages/parallel/versions/3.5.1) package. Basically, I set up a virtual cluster and then use the `parallel::parLapply()` function to run iterations of `MCMCglmm()` in parallel.\n\n## Data\n\nI'll use \"Example 2\" from the [MCMCglmm() function help](https://www.rdocumentation.org/packages/MCMCglmm/versions/2.26/topics/MCMCglmm). You can skip ahead to the next section if instead you'd like to tailor this to your own data & analysis.\n\nFirst load (or install&load) the `MCMCglmm` and `parallel` packages:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## If a package is installed, it will be loaded. If any \n## are not, the missing package(s) will be installed \n## from CRAN and then loaded.\n\n## First specify the packages of interest\npackages = c(\"MCMCglmm\", \"parallel\")\n\n## Now load or install&load all\npackage.check <- lapply(\n  packages,\n  FUN = function(x) {\n    if (!require(x, character.only = TRUE)) {\n      install.packages(x, dependencies = TRUE)\n      library(x, character.only = TRUE)\n    }\n  }\n)\n```\n:::\n\n\nWith the packages loaded, we'll prep our data set. Lifting this directly from the `MCMCglmm()` help page:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(bird.families)\nphylo.effect <- rbv(bird.families, 1, nodes = \"TIPS\")\nphenotype <- phylo.effect + rnorm(dim(phylo.effect)[1], 0, 1)\n\n# simulate phylogenetic and residual effects\n# with unit variance\ntest.data <- data.frame(phenotype = phenotype,\n                        taxon = row.names(phenotype))\nAinv <- inverseA(bird.families)$Ainv\n\n# inverse matrix of shared phyloegnetic history\nprior <- list(R = list(V = 1, nu = 0.002), \n              G = list(G1 = list(V = 1, nu = 0.002)))\n\nmodel2 <- MCMCglmm(\n  phenotype ~ 1,\n  random =  ~ taxon,\n  ginverse = list(taxon = Ainv),\n  data = test.data,\n  prior = prior,\n  verbose = FALSE,\n  nitt = 1300,\n  burnin = 300,\n  thin = 1\n)\nsummary(model2)\n```\n:::\n\n\n     Iterations = 301:1300\n     Thinning interval  = 1\n     Sample size  = 1000 \n\n     DIC: 375.0159 \n\n     G-structure:  ~taxon\n\n\n     R-structure:  ~units\n\n\n     Location effects: phenotype ~ 1 \n\n                post.mean l-95% CI u-95% CI eff.samp pMCMC\n    (Intercept)    0.1630  -0.5899   0.8938     1000 0.654\n\nOf course, the example provided sets `nitt` to only 1300, yielding an ESS of only \\~800 for the fixed effect. I am guessing this is intended to make sure the example is quick to run.\n\nBoosting this to `nitt = 100000`, `burnin = 10000`, and `thin = 10` gives a more healthy ESS \\> 8000. But please note that this will take a lot longer to finish (I'll leave it up to you to use the `Sys.time()` function to time it yourself).\n\n## Run MCMC chains in parallel\n\nWhenever conducting MCMC-based analyses, it's advisable to conduct multiple runs (different chains) and then assess convergence. I'll leave the convergence assessments for another day (but here's [a good StackExchange post](https://stats.stackexchange.com/questions/507/what-is-the-best-method-for-checking-convergence-in-mcmc)). For now we'll just conduct 10 runs of this model, each using `nitt = 100000`, using parallel processing.\n\n***PLEASE NOTE**: I am setting this up to use only 80% of your machine's total logical processors. You can certainly harness all of your CPUs if you'd like, although I advise against doing so if any of your MCMC runs take more than a few minutes. It also doesn't make sense to set the number of logical processors to be greater than the number of runs (chains), but more on that later. Anyway, treat your silicon well!*\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# use detectCores() by itself if you want all CPUs\nsetCores <- round(detectCores() * 0.8)\n\n# make the cluster\ncl <- makeCluster(getOption(\"cl.cores\", setCores))\n  # EDIT ON 2020-07-27: I have been informed that Mac users \n  # may have better luck using:\n  # cl <- parallel::makeCluster(getOption(\"cl.cores\", setCores), \n  #                             setup_strategy = \"sequential\")\n  # This is due to an apparent issue in RStudio. \n  # See this stackoverflow page for details:\n  # https://stackoverflow.com/questions/61700586/r-makecluster-command-used-to-work-but-now-fails-in-rstudio\n\n# load the MCMCglmm package within the cluster\ncl.pkg <- clusterEvalQ(cl, library(MCMCglmm))\n\n# import each object that's necessary to run the function\nclusterExport(cl, \"prior\")\nclusterExport(cl, \"test.data\")\nclusterExport(cl, \"Ainv\")\n\n# use parLapply() to execute 10 runs of MCMCglmm()\n# each with nitt=100000\nmodel2_10runs <- parLapply(cl = cl, 1:10, function(i) {\n  MCMCglmm(\n    phenotype ~ 1,\n    random =  ~ taxon,\n    ginverse = list(taxon = Ainv),\n    data = test.data,\n    prior = prior,\n    verbose = FALSE,\n    nitt = 100000,\n    burnin = 10000,\n    thin = 10\n  )\n})\n\n# once it's finished, use stopCluster() to stop running\n# the parallel cluster\nstopCluster(cl)\n```\n:::\n\n\nThe `model2_10runs` object is a list that contains each of the 10 mcmc models. You can perform all the usual summarization, plotting...etc, but just be sure to specify models within the list, e.g.: `summary(model2_10runs[[3]])` to summarize the third model out of the 10\n\n     Iterations = 10001:99991\n     Thinning interval  = 10\n     Sample size  = 9000 \n\n     DIC: 109.7491 \n\n     G-structure:  ~taxon\n\n          post.mean l-95% CI u-95% CI eff.samp\n    taxon     1.782   0.3085    2.989    178.6\n\n     R-structure:  ~units\n\n          post.mean  l-95% CI u-95% CI eff.samp\n    units    0.4437 0.0001843    1.224    181.1\n\n     Location effects: phenotype ~ 1 \n\n                post.mean l-95% CI u-95% CI eff.samp pMCMC\n    (Intercept)    0.1697  -0.5989   0.9841     9000 0.666\n\nAs I mentioned above, we'll leave convergence and other fun topics like autocorrelation for another day.\n\nThat's all!\\\n🐢\n",
-    "supporting": [],
+    "supporting": [
+      "index_files"
+    ],
     "filters": [
       "rmarkdown/pagebreak.lua"
     ],