-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathULLA_Intro_to_R_lecture.Rmd
350 lines (253 loc) · 8.61 KB
/
ULLA_Intro_to_R_lecture.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
---
title: "ULLA: Introduction to R, RStudio and making nice figures with ggplot2"
author: "Karin Steffen"
date: "July 7 2022"
output:
revealjs::revealjs_presentation:
theme: simple
highlight: default
center: false
transition: slide
css: new_reveal_0.css
editor_options:
chunk_output_type: inline
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, tidy=TRUE, message = FALSE, warning = FALSE)
library(ggplot2)
library(tidyverse)
```
## Content
> * General intro to R and RStudio
> * Data handling
> + importing and accessing data
> + manipulation and analysis
> + some "stats"
> * Visualisation
> + basics of generating plots
> + making plots look pretty
> * Exercises
> * Open lab
> * Challenge
# About R and RStudio
<img src="figures/R_RS_logo.png" class="plain">
- __R__: free software environment for statistical computing and graphics (and more)
- __RStudio__: GUI/ integrated development environment (IDE) for R
## Advantages of using R
> * free
> * powerful, flexible/customisable, easy to develop complex analyses
> * reproducible science: scripts and markdowns
## Data handling: collecting and organisation
__Conventions about data format__
- Observations are entered in __rows__
- Variables are entered in __columns__
- A column of data should contain only one data type
## Data handling: collecting and organisation
__Best practice__
> * Store a copy of data in nonproprietary formats
> * Leave an uncorrected file when doing analyses
> * Maintain effective metadata about the data
> * Create folders for relevant task to keep the overview over your analyses
## Error messages
- Warnings can be ignored.
- Specific hint? e.g.
- wrong data format, characters/factors instead of numbers
- library not loaded, package not installed
- Try running example code, and check the example input
- Google is your friend
## R help & the comunity
- Stackoverflow
- `man` pages and examples
- cheat sheets
<img src="figures/R_help.png" class="plain">
## Scripts and markdowns
Markup languages (HTML, LaTeX, markdown) are used to format text. Both exercises and presentations during this lab have been produced using the rmarkdown package. The code for these are available in the files Intro_to_R.Rmd and Intro_to_R_EXERCISE.Rmd. For longer-form writing, including reports and papers, check out the bookdown package.
<img src="figures/script_markdown.png" class="plain">
# Getting started
> * open RStudio
> * open a script or markdown document
> * start typing/ working
> ...
## How R works: Operators & Functions
- Arithmetic (basic math): `+`, `-`, `*`, `/`, `^`, ...
- Relational: `<`, `>`, `==`, `!=`
- Logical: `!`, `&`,`&&`, `|`, `||`, `TRUE`, `FALSE`
- Assignment: `<-`, `=`
- Pipe: `%>%`
- Specifying ranges (colon): `:`
- Access library functions: `::`
- Comment: `#`
- Functions: e.g. `c()`, `sum()`, `ggplot()`, ...
- Tilde operator (specify models, functions): `~`
## How R works: Operators
Examples: Arithmetic (basic math): +, -, *, /, ^, ...
```{r operators1, echo=TRUE}
12+13+14
(5+2)*2
12^2
```
## How R works: Operators & Functions
__Assignment (<-, =)__
```{r operators2, echo=TRUE}
x <- 12
y <- 13
z <- 14
x
x+y+z
```
## How R works: Operators & Functions
function(data, additional arguments)
```{r function1, echo=TRUE}
sum(12,13,14)
sum(x, y, z)
k <- c(12, 13, 14)
k
sum(k)
```
# How to access and modify parts of data structures
- How to access parts in **vectors** and tables
- How to work with tables (**data frame**/**data table**/**tibble**/**matrix**)
## How to access and modify parts of data structures: vectors
vector[element_number], length(vector)
```{r access_vectors3, echo=TRUE}
k
k[2]
length(k)
```
## How to access and modify parts of data structures: vectors
vector[element_number], length(vector)
```{r access_vectors, echo=TRUE}
(k[4] <- 15)
length(k)
```
## How to access and modify parts of data structures: vectors
vector[element], length(vector)
```{r access_vectors2, echo=TRUE}
animals <- c("platypus", "nudibranch", "potoo", "peacock mantis shrimp")
animals[2:4]
```
## How to access and modify parts of data structures: tables
table[rows, columns], table$column_name, dim(table)
```{r access_tables, echo=TRUE, eval=T}
df <- tibble(k, animals, colour=c("brown", "rainbow", "brown", "rainbow"))
df
df$colour
```
## How to access and modify parts of data structures: tables
table$column, table[rows, columns], dim(table)
```{r access_tables2, echo=TRUE, eval=T}
dim(df)
df[1:2,]
```
## How to access and modify parts of data structures: tables
table$column, table[rows, columns], dim(table)
```{r access_tables3, echo=TRUE, eval=T}
df[df$colour=="brown",] #select rows that match a condition
```
## Functions to start with: Importing data and inspecting it
```{r functions3, echo=TRUE, eval=FALSE}
# Working directory: setwd(), getwd()
setwd("~/Path/to/working_directory")
getwd()
# tidyverse, this requires loading the library
library(tidyverse)
data <- read_delim("file.csv")
data <- read_tsv("file.tsv")
data <- read_csv("file.csv")
# base R: read.csv()
data <- read.csv("~/Path/to/file.csv", header=TRUE, sep = ",")
# inspect the data structure
head(data)
tail(data)
dim(data)
str(data)
rownames(data)
colnames(data)
view(data)
```
# Working with tables (data frame, tibble, data table)
```{r tables, echo=TRUE}
#library(tidyverse)
mpg # data set included in ggplot2 package
```
## Working with tables
```{r filter_select, echo=TRUE, eval=T}
mpg %>%
filter(manufacturer=="lincoln")
mpg[mpg$manufacturer=="lincoln",]
```
## Working with tables
```{r filter_select2, echo=TRUE, eval=T}
mpg %>%
select(-c(displ,year,cyl,trans,drv,fl,class))%>%
head()
(mpg[, -c(3:8, 9,10)])
```
## Working with tables
```{r mutate, echo=TRUE, eval=TRUE}
mpg %>%
mutate(city_kmL=cty/2.352) %>%
select(c(manufacturer, model, displ, year, city_kmL)) %>%
head()
(mpg["city_kmL"] <- mpg$cty/2.352)
```
## Working with tables
```{r summarise, echo=T, eval=T}
# median(), sd(), IQR(), min(), max(), n()
summary <- mpg %>%
summarise(mean_fuel = mean(cty))
summary
mean(mpg$cty)
```
## Working with tables
```{r pivot, echo=T, eval=T}
mpg %>%
pivot_longer(c(cty, hwy), names_to="city_highway", values_to="mpg") %>%
select(manufacturer, model, year, trans, city_highway, mpg)
```
## Working with tables
```{r piv_sum, echo=T, eval=T}
mpg %>%
pivot_longer(c(cty, hwy), names_to="city_highway", values_to="mpg") %>%
summarise(mean_fuel = mean(mpg))
```
## Working with tables
```{r join, eval=TRUE, echo=TRUE}
car_origins <- tibble(
manufacturer=c("audi", "chevrolet", "dodge", "ford", "honda", "hyundai", "jeep", "land rover", "lincoln", "mercury", "nissan", "pontiac", "subaru", "toyota", "volkswagen"),
country_origin=c("Germany", "USA", "USA", "USA", "Japan", "South Korea", "USA", "USA", "USA", "USA", "Japan", "USA","Japan", "Japan", "Germany"))
head(car_origins, n=3)
mpg %>%
left_join(car_origins) %>%
select(c(manufacturer, model, year, trans, country_origin))
```
# Data visalisation with ggplot2
ggplot(table, aes(x=column1, y=column2))+geom_...()
```{r plot1, echo=TRUE, eval=TRUE}
ggplot(mpg, aes(x=manufacturer, y=cty))+
geom_point()
```
## Data visalisation with ggplot2
ggplot(table, aes(x=column1, y=column2))+geom_...()
```{r plot2, echo=TRUE, eval=TRUE}
ggplot(mpg, aes(x=manufacturer, y=cty))+
geom_bar(stat = "summary", fun = "mean")+
ggtitle("Mah plotttt")
```
## Data visualisation with ggplot2
*What kind of "geom" to use?*
Check out the [R graph gallery](https://r-graph-gallery.com/)!
*How about colours?*
Try out different colour palettes inspired from [Wes Anderson movies](https://github.com/karthik/wesanderson) or [Studio Ghibli](https://github.com/ewenme/ghibli).
Or find good colour combinations [here](https://medialab.github.io/iwanthue/) and specify them manually.
*Easy professional styles?*
Explore different styles from the ggthemes package last_plot()+theme_
*Feel silly?*
https://cran.r-project.org/web/packages/xkcd/vignettes/xkcd-intro.pdf
# Exercise material
Open the html file in your browser read and code along. Every chapter is a small story and you'll see how to use R and different visualization to guide your data analyses.
There are 'Hints', 'Answers' and 'Solutions'. But the content is hidden so that you first think and try yourself. Then you can click on the 'Code' button to reveal the content.
- **Bioassay**: data manipulation, plotting, using specific packages: dose-response curve calculation
- **Targeted metabolomics**: Multiple plots, reordering factors, correlation analysis, t-test, while loop
- **LCMS**: arranging and annotating plots
- **Multivariate data analysis**: PCA, interactive 3D plots