forked from UofTCoders/rcourse
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathassignment-02.Rmd
122 lines (92 loc) · 5.01 KB
/
assignment-02.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
title: 'Assignment 2: Base R (8 marks)'
output:
html_document:
toc: false
---
*To submit this assignment, upload the full document on blackboard,
including the original questions, your code, and the output. Submit
you assignment as a knitted `.pdf` (preferred) or `.html` file.*
1. Variable assignment (1 mark)
a. Assign the value `5` to the variable/object `a`. Display `a`.
(0.25 marks)
b. Assign the result of `10/3` to the variable `b`. Display `b`.
(0.25 marks)
c. Write a function that adds two numbers and returns their sum.
Use it to assign the sum of `a` and `b` to `result`. Display `result`.
(In practice, there is already a more sophisticated built-in
function for this: `result <- sum(a, b)`) (0.25 marks)
d. Write a function that multiplies two numbers and returns their product.
Use it to assign the product of `a` and `b` to `product`. Display `product`.
(In practice, there is already a more sophisticated built-in
function for this: `product <- prod(a, b)`) (0.25 marks)
2. Vectors (1 mark)
a. Create a vector `v` with all integers 0-30, and a vector `w`
with every third integer in the same range. (0.25 marks)
b. What is the difference in lengths of the vectors `v` and `w`?
(0.25 marks)
c. Create a new vector, `v_square`, with the square of elements at indices
3, 6, 7, 10, 15, 22, 23, 24, and 30 from the variable `v`. *Hint:
Use indexing rather than a for loop.* (0.25 marks)
d. Calculate the mean and median of the first five values from
`v_square`. (0.25 marks)
3. Boolean indexing (1 mark)
a. Create a boolean vector `v_bool`, indicating which vector `v`
elements are bigger than 20. How many values are over 20? *Hint:
In R, TRUE = 1, and FALSE = 0, so you can use simple arithmetic
to find this out.* (0.5 marks)
b. Display the output of `v[TRUE]`. Explain why you think R outputs this.
(0.25 marks) _(Note: this is not really something you would ever need
to do in practice!)_
b. Use the variable `v_bool` as an index to extract the elements
from `v` that are bigger than 20. What are the min and max
values of this new vector? (0.25 marks)
4. Data frames (2 marks)
a. There are many built-in data frames in R, which you can find
[more details about
online](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html).
What are the column names of the built-in dataframe `beaver1`?
How many observations (rows) and variables (columns) are there?
(0.5 marks)
b. Display both the first 6 and last 6 rows of this data frame.
Show how to do so with both indexing as well as specialized functions. (0.5 marks)
c. What is the min, mean, and max body temperature in this data set?
*Hint: Remember that each column in a data frame is a vector, so
you can use the same functions as in the previous question on
vectors.* (0.5 marks)
d. Use the `summary` function to display an overview of the `temp`
column. (0.25 marks)
e. Use a single instance of the `summary` function to display an overview
of the `time` and `temp` columns. (0.25 marks)
5. Data frames with dplyr (3 marks)
a. Say we're attempting to calculate mean temperature in the `beaver1` dataset.
What is wrong with the following chain of dplyr commands? (0.5 marks)
```
beaver1 %>%
filter(is.na(temp)) %>%
summarise(mean_temp = mean(temp))
```
b. Use dplyr to randomly sample 20 rows from `beaver1`. Calculate
mean temperature from this subsetted dataset. (0.5 marks)
_Hint: you may want to refer to the dplyr cheatsheet for this_
b. Using the full `beaver1` dataset, calculate the mean temperature
for day 346. (0.25 marks)
_Note: use the full dataset for parts c-f below as well._
c. Rather than using `filter()` to calculate the mean for each day
separately, the more convenient `group_by()` can be used to
aggregate measurements by a categorical value (such as the `day`
column in `beaver`). Use this approach to calculate the mean
temperature and activity level for each of the days in the
dataset. (0.5 marks)
d. Express in writing what the average activity level from the
above calculation means. *Hint: Remember that you can [read a
description of the columns
online](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html).*
(0.25 marks)
e. How many observations are there per day in this dataset? (0.25
marks)
f. How many observations are there per day when the beaver is
active outside the retreat? (0.25 marks)
g. Grouping by activity level *and* the day of the observation.
Which variable seems to be more related to high body
temperature: activity level or day of measurement? (0.5 marks)