From d0cbebf55774a6644422e01a7eedad0924f98883 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Tue, 14 Jan 2025 18:41:04 +0000 Subject: [PATCH] Render site --- help.html | 8 +- index.html | 2 +- modules/Factors/Factors.html | 136 ++++++++++------------- modules/Factors/lab/Factors_Lab.Rmd | 3 +- modules/Factors/lab/Factors_Lab_Key.html | 13 +-- modules/Factors/lab/yts_fct.png | Bin 63579 -> 82316 bytes 6 files changed, 68 insertions(+), 94 deletions(-) diff --git a/help.html b/help.html index eb9e4940b..cd7fe92b3 100644 --- a/help.html +++ b/help.html @@ -347,14 +347,14 @@

Why are my changes not taking effect? It’s making my results look

Here we are creating a new object from an existing one:

new_rivers <- sample(rivers, 5)
 new_rivers
-
## [1] 1243 1100  268 1270  332
+
## [1] 1054  135  265  246  890

Using just this will only print the result and not actually change new_rivers:

new_rivers + 1
-
## [1] 1244 1101  269 1271  333
+
## [1] 1055  136  266  247  891

If we want to modify new_rivers and save that modified version, then we need to reassign new_rivers like so:

new_rivers <- new_rivers + 1
 new_rivers
-
## [1] 1244 1101  269 1271  333
+
## [1] 1055  136  266  247  891

If we forget to reassign this can cause subsequent steps to not work as expected because we will not be working with the data that has been modified.


@@ -403,7 +403,7 @@

Error: object ‘X’ not found

Make sure you run something like this, with the <- operator:

rivers2 <- new_rivers + 1
 rivers2
-
## [1] 1245 1102  270 1272  334
+
## [1] 1056  137  267  248  892

diff --git a/index.html b/index.html index 678806c88..3bd6f27f0 100644 --- a/index.html +++ b/index.html @@ -333,7 +333,7 @@

Testimonials

Find an Error!?


Feel free to submit typos/errors/etc via the GitHub repository associated with the class: https://github.com/jhudsl/intro_to_r

-

This page was last updated on 2025-01-13.

+

This page was last updated on 2025-01-14.

Creative Commons License

diff --git a/modules/Factors/Factors.html b/modules/Factors/Factors.html index 740fa3fa9..982d644d0 100644 --- a/modules/Factors/Factors.html +++ b/modules/Factors/Factors.html @@ -202,7 +202,7 @@

Note that levels are, by default, in alphanumerical order.

-

Factors

+

Factors - get the levels

Extract the levels of a factor vector using levels():

@@ -210,41 +210,7 @@

## [1] "blue"   "red"    "yellow"
-

forcats package

- -

A package called forcats is really helpful for working with factors.

- -

Forcats hex sticker

- -

factor() vs as_factor()

- -

factor() is from base R and as_factor() is from forcats

- -

Both can change a variable to be of class factor.

- -
    -
  • factor() will order alphabetically unless told otherwise.
  • -
  • as_factor() will order by first appearance unless told otherwise.
  • -
- -

If you are assigning your levels manually either function is fine!

- -

as_factor() function

- -
x <- c("yellow", "red", "red", "blue", "yellow", "blue")
-x_fact_2 <- as_factor(x)
-x_fact_2
- -
## [1] yellow red    red    blue   yellow blue  
-## Levels: yellow red blue
- -
# Compare to factor() method:
-x_fact
- -
## [1] yellow red    red    blue   yellow blue  
-## Levels: blue red yellow
- -

A Factor Example

+

A Factor Example

Preparing the data

-

Aggregate (sum) across ethnicity and gender:

+

Group by each school and aggregate (sum):

dropouts <-
   dropouts %>%
@@ -362,23 +328,35 @@ 

For the next steps, let’s take a subset of data.

-

Use set.seed() to take the same random sample each time.

+
set.seed(123) # same random sample each time
+dropouts_subset <- slice_sample(dropouts, n = 32)
+dropouts_subset
-
set.seed(123)
-dropouts_subset <- slice_sample(dropouts, n = 32)
+
## # A tibble: 32 × 3
+##    CDS_CODE       grade     n_dropouts
+##    <chr>          <chr>          <dbl>
+##  1 45699716050231 Junior             0
+##  2 45700036050330 Junior             0
+##  3 12630401230069 Sophomore          0
+##  4 09100900930131 Sophomore          1
+##  5 15633216009179 Junior             0
+##  6 33670330113647 Sophomore          0
+##  7 19643941931823 Freshman           1
+##  8 19647331933381 Sophomore          7
+##  9 38684786062053 Senior             0
+## 10 11626616007611 Freshman           0
+## # ℹ 22 more rows

Plot the data

-

Let’s make a plot first.

+

Let’s make a plot first. We might not like the ordering on the x-axis, though.

dropouts_subset %>%
-  ggplot(mapping = aes(x = grade, y = n_dropouts)) +
+  ggplot(aes(x = grade, y = n_dropouts)) +
   geom_boxplot() +
   theme_bw(base_size = 16) # make all labels size 16
-

- -

OK this is very useful, but it is a bit difficult to read. We expect the values to be plotted by the order that we know, not by alphabetical order.

+

Change to factor

@@ -415,15 +393,15 @@

Now let’s make our plot again:

dropouts_fct %>%
-  ggplot(mapping = aes(x = grade, y = n_dropouts)) +
+  ggplot(aes(x = grade, y = n_dropouts)) +
   geom_boxplot() +
   theme_bw(base_size = 16)
-

+

-

Now that’s more like it! Notice how the data is automatically plotted in the order we would like.

+

The factor data is automatically plotted in the order we would like.

-

What about if we arrange() the data by grade ?

+

What about if we arrange() the data by grade ?

Character data is arranged alphabetically.

@@ -447,7 +425,7 @@

Notice that the order is not what we would hope for!

-

Arranging Factors

+

Arranging Factors

Factor data is arranged by level.

@@ -503,67 +481,67 @@

## 3 Junior 2 ## 4 Senior 13 -

forcats for ordering

+

GUT CHECK: Why use factors?

+ +

A. Meaningful ordering of text data

+ +

B. Automatic ordering or numeric data

+ +

C. More precise values

+ +

forcats package

+ +

A package called forcats is really helpful for working with factors.

+ +

Forcats hex sticker

+ +

forcats for ordering

What if we wanted to order grade by increasing n_dropouts?

library(forcats)
 
 dropouts_fct %>%
-  ggplot(mapping = aes(x = grade, y = n_dropouts)) +
+  ggplot(aes(x = grade, y = n_dropouts)) +
   geom_boxplot() +
   theme_bw(base_size = 16)
-

+

This would be useful for identifying easily which grade to focus on.

-

forcats for ordering

+

forcats for ordering

We can order a factor by another variable by using the fct_reorder() function of the forcats package.

fct_reorder({column getting changed}, {guiding column}, {summarizing function})
-

forcats for ordering

+

forcats for ordering

We can order a factor by another variable by using the fct_reorder() function of the forcats package.

library(forcats)
 
 dropouts_fct %>%
-  ggplot(mapping = aes(x = fct_reorder(grade, n_dropouts, mean), y = n_dropouts)) +
+  ggplot(aes(x = fct_reorder(grade, n_dropouts, mean), y = n_dropouts)) +
   geom_boxplot() +
   labs(x = "Student Grade") +
   theme_bw(base_size = 16)
-

+

-

forcats for ordering.. with .desc = argument

+

forcats for ordering.. with .desc = argument

library(forcats)
 
 dropouts_fct %>%
-  ggplot(mapping = aes(x = fct_reorder(grade, n_dropouts, mean, .desc = TRUE), y = n_dropouts)) +
+  ggplot(aes(x = fct_reorder(grade, n_dropouts, mean, .desc = TRUE), y = n_dropouts)) +
   geom_boxplot() +
   labs(x = "Student Grade") +
   theme_bw(base_size = 16)
-

- -

forcats for ordering.. can be used to sort datasets

- -
dropouts_fct %>% pull(grade) %>% levels() # By year order
- -
## [1] "Freshman"  "Sophomore" "Junior"    "Senior"
- -
dropouts_fct <- dropouts_fct %>%
-  mutate(
-    grade = fct_reorder(grade, n_dropouts, mean)
-  )
-dropouts_fct %>% pull(grade) %>% levels() # by increasing mean dropouts
- -
## [1] "Junior"    "Freshman"  "Sophomore" "Senior"
+

Checking Proportions with fct_count()

@@ -576,18 +554,16 @@

## # A tibble: 4 × 3
 ##   f             n     p
 ##   <fct>     <int> <dbl>
-## 1 Junior       12 0.375
-## 2 Freshman      7 0.219
-## 3 Sophomore     7 0.219
+## 1 Freshman      7 0.219
+## 2 Sophomore     7 0.219
+## 3 Junior       12 0.375
 ## 4 Senior        6 0.188

Summary