Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/fhdsl/Intermediate_R into main
Browse files Browse the repository at this point in the history
  • Loading branch information
jhudsl-robot committed May 22, 2024
2 parents acf08d4 + d046763 commit 0fe80ee
Show file tree
Hide file tree
Showing 41 changed files with 2,326 additions and 2,742 deletions.
10 changes: 7 additions & 3 deletions docs/01-Fundamentals.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@

## Goals of this course

- Continue building *programming fundamentals*: how to make use of complex data structures, use custom functions built by other R users, and creating your own functions. How to iterate repeated tasks that scales naturally.
- Continue building *programming fundamentals*: How to use complex data structures, use and create custom functions, and how to iterate repeated tasks

- Continue exploration of *data science fundamentals*: how to clean messy data to a Tidy form for analysis.

- Outcome: Conduct a full analysis in the data science workflow (minus model).
- At the end of the course, you will be able to: conduct a full analysis in the data science workflow (minus model).

![](https://r4ds.hadley.nz/diagrams/data-science/base.png){width="450"}

Expand Down Expand Up @@ -428,7 +428,7 @@ l1$score

Therefore, `l1$score` is the same as `l1[[4]]` and is the same as `l1[["score"]]`.

A dataframe is just a named list of vectors of same length with **attributes** of (column) `names` and `row.names`.
A dataframe is just a named list of vectors of same length with additional **attributes** of (column) `names` and `row.names`.

## Matrix

Expand Down Expand Up @@ -475,3 +475,7 @@ my_matrix[2, 3]
```
## [1] 6
```

## Exercises

You can find [exercises and solutions on Posit Cloud](https://posit.cloud/content/8236252), or on [GitHub](https://github.com/fhdsl/Intermediate_R_Exercises).
16 changes: 11 additions & 5 deletions docs/02-Data_cleaning_1.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ grade2 = if_else(grade > 60, TRUE, FALSE)

3. If-else_if-else

```
```
grade3 = case_when(grade >= 90 ~ "A",
grade >= 80 ~ "B",
grade >= 70 ~ "C",
Expand Down Expand Up @@ -199,7 +199,7 @@ simple_df2 = mutate(simple_df, grade = ifelse(grade > 60, TRUE, FALSE))

3. If-else_if-else

```
```
simple_df3 = simple_df
simple_df3$grade = case_when(simple_df3$grade >= 90 ~ "A",
Expand All @@ -211,8 +211,10 @@ simple_df3$grade = case_when(simple_df3$grade >= 90 ~ "A",

or

```
simple_df3 = mutate(simple_df, grade = case_when(grade >= 90 ~ "A",
```
simple_df3 = simple_df
simple_df3 = mutate(simple_df3, grade = case_when(grade >= 90 ~ "A",
grade >= 80 ~ "B",
grade >= 70 ~ "C",
grade >= 60 ~ "D",
Expand Down Expand Up @@ -244,7 +246,7 @@ if(expression_is_TRUE) {
3. If-else_if-else:

```
if(expression_A_is_TRUE)
if(expression_A_is_TRUE) {
#code goes here
}else if(expression_B_is_TRUE) {
#other code goes here
Expand Down Expand Up @@ -299,3 +301,7 @@ result
```
## [1] 5
```

## Exercises

You can find [exercises and solutions on Posit Cloud](https://posit.cloud/content/8236252), or on [GitHub](https://github.com/fhdsl/Intermediate_R_Exercises).
11 changes: 4 additions & 7 deletions docs/03-Data_cleaning_2.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
# Data Cleaning, Part 2



```r
library(tidyverse)
```

## Tidy Data

It is important to have standard of organizing data, as it facilitates a consistent way of thinking about data organization and building tools (functions) that make use of that standard. The principles of **Tidy data**, developed by Hadley Wickham:
It is important to have standard of organizing data, as it facilitates a consistent way of thinking about data organization and building tools (functions) that make use of that standard. The [principles of **Tidy data**](https://vita.had.co.nz/papers/tidy-data.html), developed by Hadley Wickham:

1. Each variable must have its own column.

Expand Down Expand Up @@ -221,7 +220,7 @@ ggplot(df) + aes(x = Q1_Sales, y = Q2_Sales, color = Store) + geom_point()

## Subjectivity in Tidy Data

We have looked at clear cases of when a dataset is Tidy. In reality, the Tidy state depends on what we call variables and observations.
We have looked at clear cases of when a dataset is Tidy. In reality, the Tidy state depends on what we call variables and observations. Consider this example, inspired by the following [blog post](https://kiwidamien.github.io/what-is-tidy-data.html) by Damien Martin.


```r
Expand Down Expand Up @@ -316,8 +315,6 @@ ggplot(kidney_long_still) + aes(x = treatment, y = recovery_rate, fill = stone_s

![](03-Data_cleaning_2_files/figure-docx/unnamed-chunk-16-1.png)<!-- -->

## References

https://vita.had.co.nz/papers/tidy-data.html
## Exercises

https://kiwidamien.github.io/what-is-tidy-data.html
You can find [exercises and solutions on Posit Cloud](https://posit.cloud/content/8236252), or on [GitHub](https://github.com/fhdsl/Intermediate_R_Exercises).
25 changes: 19 additions & 6 deletions docs/04-Functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Some advice on writing functions:

- A function should do only one, well-defined task.

### Anatomy of a function definition
## Anatomy of a function definition

*Function definition consists of assigning a **function name** with a "function" statement that has a comma-separated list of named **function arguments**, and a **return expression**. The function name is stored as a variable in the global environment.*

Expand All @@ -34,13 +34,13 @@ With function definitions, not all code runs from top to bottom. The first four

When the function is called in line 5, the variables for the arguments are reassigned to function arguments to be used within the function and helps with the modular form. We need to introduce the concept of local and global environments to distinguish variables used only for a function from variables used for the entire program.

### Local and global environments
## Local and global environments

*{ } represents variable scoping: within each { }, if variables are defined, they are stored in a **local environment**, and is only accessible within { }. All function arguments are stored in the local environment. The overall environment of the program is called the **global environment** and can be also accessed within { }.*

The reason of having some of this "privacy" in the local environment is to make functions modular - they are independent little tools that should not interact with the rest of the global environment. Imagine someone writing a tool that they want to give someone else to use, but the tool depends on your environment, vice versa.

### A step-by-step example
## A step-by-step example

Using the `addFunction` function, let's see step-by-step how the R interpreter understands our code:

Expand All @@ -52,7 +52,7 @@ Using the `addFunction` function, let's see step-by-step how the R interpreter u

![We run the second line of code in the function body to return a value. The return value from the function is assigned to the variable z in the global environment. All local variables for the function are erased now that the function call is over.](images/func4.png)

### Function arguments create modularity
## Function arguments create modularity

First time writers of functions might ask: why are variables we use for the arguments of a function *reassigned* for function arguments in the local environment? Here is an example when that process is skipped - what are the consequences?

Expand Down Expand Up @@ -81,7 +81,7 @@ Here is the execution for `w`:

The function did not work as expected because we used hard-coded variables from the global environment and not function argument variables unique to the function use!

### Exercises
## Examples

- Create a function, called `add_and_raise_power` in which the function takes in 3 numeric arguments. The function computes the following: the first two arguments are added together and raised to a power determined by the 3rd argument. The function returns the resulting value. Here is a use case: `add_and_raise_power(1, 2, 3) = 27` because the function will return this expression: `(1 + 2) ^ 3`. Another use case: `add_and_raise_power(3, 1, 2) = 16` because of the expression `(3 + 1) ^ 2`. Confirm with that these use cases work. Can this function used for numeric vectors?

Expand Down Expand Up @@ -114,7 +114,16 @@ The function did not work as expected because we used hard-coded variables from
## [1] 344 8
```

- Create a function, called `medicaid_eligible` in which the function takes in one argument: a numeric vector called `age`. The function returns a numeric vector with the same length as `age`, in which elements are `0` for indicies that are less than 65 in `age`, and `1` for indicies 65 or higher in `age`. Use cases: `medicaid_eligible(c(30, 70)) = c(0, 1)`
- Create a function, called `num_na` in which the function takes in any vector, and then return a single numeric value. This numeric value is the number of `NA`s in the vector. Use cases: `num_na(c(NA, 2, 3, 4, NA, 5)) = 2` and `num_na(c(2, 3, 4, 5)) = 0`. Hint 1: Use `is.na()` function. Hint 2: Given a logical vector, you can count the number of `TRUE` values by using `sum()`, such as `sum(c(TRUE, TRUE, FALSE)) = 2`.


```r
num_na = function(x) {
return(sum(is.na(num_na)))
}
```

- Create a function, called `medicaid_eligible` in which the function takes in one argument: a numeric vector called `age`. The function returns a numeric vector with the same length as `age`, in which elements are `0` for indicies that are less than 65 in `age`, and `1` for indicies 65 or higher in `age`. (Hint: This is a data recoding problem!) Use cases: `medicaid_eligible(c(30, 70)) = c(0, 1)`


```r
Expand All @@ -130,3 +139,7 @@ The function did not work as expected because we used hard-coded variables from
```
## [1] 0 1
```

## Exercises

You can find [exercises and solutions on Posit Cloud](https://posit.cloud/content/8236252), or on [GitHub](https://github.com/fhdsl/Intermediate_R_Exercises).
Loading

0 comments on commit 0fe80ee

Please sign in to comment.