diff --git a/help.html b/help.html index 6beb95800..b73edd105 100644 --- a/help.html +++ b/help.html @@ -347,14 +347,14 @@
Here we are creating a new object from an existing one:
new_rivers <- sample(rivers, 5)
new_rivers
-## [1] 310 1100 2348 380 500
+## [1] 360 255 600 377 720
Using just this will only print the result and not actually change new_rivers
:
new_rivers + 1
-## [1] 311 1101 2349 381 501
+## [1] 361 256 601 378 721
If we want to modify new_rivers
and save that modified version, then we need to reassign new_rivers
like so:
new_rivers <- new_rivers + 1
new_rivers
-## [1] 311 1101 2349 381 501
+## [1] 361 256 601 378 721
If we forget to reassign this can cause subsequent steps to not work as expected because we will not be working with the data that has been modified.
Make sure you run something like this, with the <-
operator:
rivers2 <- new_rivers + 1
rivers2
-## [1] 312 1102 2350 382 502
+## [1] 362 257 602 379 722
It’s super nifty!
We can use the CO heat-related ER visits dataset. This dataset contains information about the number and rate of visits for heat-related illness to ERs in Colorado from 2011-2022, adjusted for age.
+We can use the CO heat-related ER visits dataset. This dataset contains information about the number and rate of visits for heat-related illness to Emergency rooms in Colorado from 2011-2022, adjusted for age.
er <- read_csv("https://jhudatascience.org/intro_to_r/data/CO_ER_heat_visits.csv") @@ -299,7 +299,7 @@
As a comparison, let’s also load a wide version of this dataset.
+As a comparison, let’s also load a wide version of this dataset. {.codesmall}
wide_er <- read_csv(file = "https://jhudatascience.org/intro_to_r/data/CO_heat_er_visits_DenverBoulder_wide.csv")@@ -313,7 +313,7 @@ ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. -
head(long_er)@@ -362,7 +362,12 @@
esquisser()
function on a datasetlibrary(esquisse)
esquisser()
function on a datasetviewer = "browser"
argument to launch in your browser.Image by Gerd Altmann from Pixabay
install.packages("esquisse")
install.packages("ggplot2")
library(esquisse)
-library(ggplot2)
+library(tidyverse)
+FALSE ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
+FALSE ✔ dplyr 1.1.4 ✔ readr 2.1.5
+FALSE ✔ forcats 1.0.0 ✔ stringr 1.5.1
+FALSE ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
+FALSE ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
+FALSE ✔ purrr 1.0.2
+FALSE ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
+FALSE ✖ dplyr::filter() masks stats::filter()
+FALSE ✖ dplyr::lag() masks stats::lag()
+FALSE ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Try creating a plot using the Orange
data that automatically comes with R using the esquisse
package.
ggplot(Orange) +
aes(x = age, y = circumference, colour = Tree) +
geom_point(shape = "circle", size = 1.5) +
scale_color_hue(direction = 1) +
theme_minimal()
-tidyr
packagetidyr
package (part of tidyverse
)tidyr
allows you to “tidy” your data. We will be talking about:
C. Reshape data
-pivot_longer()
- puts column data into rows (tidyr
package)
Newly created column names (“Month” and “Rate”) are enclosed in quotation marks. It helps us be more specific than “name” and “value”.
@@ -373,9 +373,10 @@ 5 Alaska May_vacc_rate 0.626 6 Alaska April_vacc_rate 0.623 -circ <- read_csv("http://jhudatascience.org/intro_to_r/data/Charm_City_Circulator_Ridership.csv") +circ <- + read_csv("http://jhudatascience.org/intro_to_r/data/Charm_City_Circulator_Ridership.csv") head(circ, 5)# A tibble: 5 × 15 @@ -428,7 +429,23 @@Filter by Boardings only..
-long <- long %>% filter(str_detect(name, "Boardings"))+long <- long %>% filter(str_detect(name, "Boardings")) +long+ +# A tibble: 4,584 × 5 + day date daily name value + <chr> <chr> <dbl> <chr> <dbl> + 1 Monday 01/11/2010 952 orangeBoardings 877 + 2 Monday 01/11/2010 952 purpleBoardings NA + 3 Monday 01/11/2010 952 greenBoardings NA + 4 Monday 01/11/2010 952 bannerBoardings NA + 5 Tuesday 01/12/2010 796 orangeBoardings 777 + 6 Tuesday 01/12/2010 796 purpleBoardings NA + 7 Tuesday 01/12/2010 796 greenBoardings NA + 8 Tuesday 01/12/2010 796 bannerBoardings NA + 9 Wednesday 01/13/2010 1212. orangeBoardings 1203 +10 Wednesday 01/13/2010 1212. purpleBoardings NA +# ℹ 4,574 more rows
tidyr
package helps us convert between wide and long datatidyr
package (part of tidyverse
) helps us convert between wide and long datapivot_longer()
goes from wide -> long
💻 Lab
+~
+ -Image by Gerd Altmann from Pixabay
diff --git a/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab.Rmd b/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab.Rmd index 77b28a003..6c349a110 100644 --- a/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab.Rmd +++ b/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab.Rmd @@ -12,9 +12,7 @@ knitr::opts_chunk$set(echo = TRUE) Data in this lab comes from the CDC (https://covid.cdc.gov/covid-data-tracker/#vaccinations_vacc-total-admin-rate-total - snapshot from January 12, 2022) and the Bureau of Economic Analysis (https://www.bea.gov/data/income-saving/personal-income-by-state). ```{r message=FALSE} -library(readr) -library(dplyr) -library(tidyr) +library(tidyverse) ``` # Part 1 diff --git a/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab_Key.html b/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab_Key.html index 6d58a081c..01513f1cc 100644 --- a/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab_Key.html +++ b/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab_Key.html @@ -166,9 +166,7 @@Data in this lab comes from the CDC (https://covid.cdc.gov/covid-data-tracker/#vaccinations_vacc-total-admin-rate-total - snapshot from January 12, 2022) and the Bureau of Economic Analysis (https://www.bea.gov/data/income-saving/personal-income-by-state).
-library(readr)
-library(dplyr)
-library(tidyr)
+library(tidyverse)