diff --git a/help.html b/help.html index eb9e4940b..cd7fe92b3 100644 --- a/help.html +++ b/help.html @@ -347,14 +347,14 @@
Here we are creating a new object from an existing one:
new_rivers <- sample(rivers, 5)
new_rivers
-## [1] 1243 1100 268 1270 332
+## [1] 1054 135 265 246 890
Using just this will only print the result and not actually change new_rivers
:
new_rivers + 1
-## [1] 1244 1101 269 1271 333
+## [1] 1055 136 266 247 891
If we want to modify new_rivers
and save that modified version, then we need to reassign new_rivers
like so:
new_rivers <- new_rivers + 1
new_rivers
-## [1] 1244 1101 269 1271 333
+## [1] 1055 136 266 247 891
If we forget to reassign this can cause subsequent steps to not work as expected because we will not be working with the data that has been modified.
Make sure you run something like this, with the <-
operator:
rivers2 <- new_rivers + 1
rivers2
-## [1] 1245 1102 270 1272 334
+## [1] 1056 137 267 248 892
Feel free to submit typos/errors/etc via the GitHub repository associated with the class: https://github.com/jhudsl/intro_to_r
-This page was last updated on 2025-01-13.
+This page was last updated on 2025-01-14.
diff --git a/modules/Factors/Factors.html b/modules/Factors/Factors.html index 740fa3fa9..982d644d0 100644 --- a/modules/Factors/Factors.html +++ b/modules/Factors/Factors.html @@ -202,7 +202,7 @@Note that levels are, by default, in alphanumerical order.
-Extract the levels of a factor
vector using levels()
:
## [1] "blue" "red" "yellow"-
forcats
packageA package called forcats
is really helpful for working with factors.
factor()
vs as_factor()
factor()
is from base R and as_factor()
is from forcats
Both can change a variable to be of class factor.
- -factor()
will order alphabetically unless told otherwise.as_factor()
will order by first appearance unless told otherwise.If you are assigning your levels manually either function is fine!
- -as_factor()
functionx <- c("yellow", "red", "red", "blue", "yellow", "blue") -x_fact_2 <- as_factor(x) -x_fact_2- -
## [1] yellow red red blue yellow blue -## Levels: yellow red blue- -
# Compare to factor() method: -x_fact- -
## [1] yellow red red blue yellow blue -## Levels: blue red yellow- -
We will use data on student dropouts from the State of California during the 2016-2017 school year. More on this data can be found here: https://www.cde.ca.gov/ds/ad/filesdropouts.asp
@@ -284,7 +250,7 @@Aggregate (sum) across ethnicity and gender:
+Group by each school and aggregate (sum):
dropouts <- dropouts %>% @@ -362,23 +328,35 @@
Use set.seed()
to take the same random sample each time.
set.seed(123) # same random sample each time +dropouts_subset <- slice_sample(dropouts, n = 32) +dropouts_subset-
set.seed(123) -dropouts_subset <- slice_sample(dropouts, n = 32)+
## # A tibble: 32 × 3 +## CDS_CODE grade n_dropouts +## <chr> <chr> <dbl> +## 1 45699716050231 Junior 0 +## 2 45700036050330 Junior 0 +## 3 12630401230069 Sophomore 0 +## 4 09100900930131 Sophomore 1 +## 5 15633216009179 Junior 0 +## 6 33670330113647 Sophomore 0 +## 7 19643941931823 Freshman 1 +## 8 19647331933381 Sophomore 7 +## 9 38684786062053 Senior 0 +## 10 11626616007611 Freshman 0 +## # ℹ 22 more rows
Let’s make a plot first.
+Let’s make a plot first. We might not like the ordering on the x-axis, though.
dropouts_subset %>% - ggplot(mapping = aes(x = grade, y = n_dropouts)) + + ggplot(aes(x = grade, y = n_dropouts)) + geom_boxplot() + theme_bw(base_size = 16) # make all labels size 16- - -
OK this is very useful, but it is a bit difficult to read. We expect the values to be plotted by the order that we know, not by alphabetical order.
+Now let’s make our plot again:
dropouts_fct %>% - ggplot(mapping = aes(x = grade, y = n_dropouts)) + + ggplot(aes(x = grade, y = n_dropouts)) + geom_boxplot() + theme_bw(base_size = 16)- + -
Now that’s more like it! Notice how the data is automatically plotted in the order we would like.
+The factor data is automatically plotted in the order we would like.
-arrange()
the data by grade ?arrange()
the data by grade ?Character data is arranged alphabetically.
@@ -447,7 +425,7 @@Notice that the order is not what we would hope for!
-Factor data is arranged by level.
@@ -503,67 +481,67 @@ ## 3 Junior 2 ## 4 Senior 13 -forcats
for orderingA. Meaningful ordering of text data
+ +B. Automatic ordering or numeric data
+ +C. More precise values
+ +forcats
packageA package called forcats
is really helpful for working with factors.
forcats
for orderingWhat if we wanted to order grade
by increasing n_dropouts
?
library(forcats) dropouts_fct %>% - ggplot(mapping = aes(x = grade, y = n_dropouts)) + + ggplot(aes(x = grade, y = n_dropouts)) + geom_boxplot() + theme_bw(base_size = 16)- +
This would be useful for identifying easily which grade to focus on.
-We can order a factor by another variable by using the fct_reorder()
function of the forcats
package.
fct_reorder({column getting changed}, {guiding column}, {summarizing function})
We can order a factor by another variable by using the fct_reorder()
function of the forcats
package.
library(forcats) dropouts_fct %>% - ggplot(mapping = aes(x = fct_reorder(grade, n_dropouts, mean), y = n_dropouts)) + + ggplot(aes(x = fct_reorder(grade, n_dropouts, mean), y = n_dropouts)) + geom_boxplot() + labs(x = "Student Grade") + theme_bw(base_size = 16)- + -
.desc =
argument.desc =
argumentlibrary(forcats) dropouts_fct %>% - ggplot(mapping = aes(x = fct_reorder(grade, n_dropouts, mean, .desc = TRUE), y = n_dropouts)) + + ggplot(aes(x = fct_reorder(grade, n_dropouts, mean, .desc = TRUE), y = n_dropouts)) + geom_boxplot() + labs(x = "Student Grade") + theme_bw(base_size = 16)- - -
dropouts_fct %>% pull(grade) %>% levels() # By year order- -
## [1] "Freshman" "Sophomore" "Junior" "Senior"- -
dropouts_fct <- dropouts_fct %>% - mutate( - grade = fct_reorder(grade, n_dropouts, mean) - ) -dropouts_fct %>% pull(grade) %>% levels() # by increasing mean dropouts- -
## [1] "Junior" "Freshman" "Sophomore" "Senior"+
fct_count()
## # A tibble: 4 × 3 ## f n p ## <fct> <int> <dbl> -## 1 Junior 12 0.375 -## 2 Freshman 7 0.219 -## 3 Sophomore 7 0.219 +## 1 Freshman 7 0.219 +## 2 Sophomore 7 0.219 +## 3 Junior 12 0.375 ## 4 Senior 6 0.188
mutate
and a factor creating function like factor()
or as_factor
as_factor()
is from the forcats
package (first appearance order by default)factor()
base R function (alphabetical order by default)mutate
and a factor creating function like factor()
(alphabetical order by default)factor()
we can specify the levels with the levels
argument if we want a specific orderfct_reorder({variable_to_reorder}, {variable_to_order_by}, {summary function})
helps us reorder a variable by the values of another variableLoad all the libraries we will use in this lab.
-library(dplyr)
-library(ggplot2)
+library(tidyverse)
Load the Youth Tobacco Survey data from http://jhudatascience.org/intro_to_r/data/Youth_Tobacco_Survey_YTS_Data.csv. select
“Sample_Size”, “Education”, and “LocationAbbr”. Name this data “yts”.
Create a boxplot showing the difference in “Sample_Size” between Middle School and High School “Education”. Hint: Use aes(x = Education, y = Sample_Size)
and geom_boxplot()
.
yts %>%
- ggplot(mapping = aes(x = Education, y = Sample_Size)) +
+ ggplot(aes(x = Education, y = Sample_Size)) +
geom_boxplot()
## Warning: Removed 425 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
-
+
Repeat question 1.1 and 1.2 using the “yts_fct” data. You should see different ordering in the plot and count
table.
yts_fct %>%
- ggplot(mapping = aes(x = Education, y = Sample_Size)) +
+ ggplot(aes(x = Education, y = Sample_Size)) +
geom_boxplot()
## Warning: Removed 425 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
-
+
yts_fct %>%
count(Education)
## # A tibble: 2 × 2
@@ -263,7 +262,7 @@ P.3
yts_fct_plot <- yts_fct %>%
drop_na() %>%
- ggplot(mapping = aes(
+ ggplot(aes(
x = fct_reorder(
LocationAbbr, med_sample_size
),
diff --git a/modules/Factors/lab/yts_fct.png b/modules/Factors/lab/yts_fct.png
index 56c20c7aa..068447de7 100644
Binary files a/modules/Factors/lab/yts_fct.png and b/modules/Factors/lab/yts_fct.png differ