To view the material below as a presentation, open lecture.html.
Arvind R. Subramaniam
Assistant Member
Basic Sciences Division and Computational Biology Program
Fred Hutchinson Cancer Research Center
- Lecture 12 – Visualize data using
R
/ =ggplot2= - What you will learn over the next 3 lectures
- Example Datasets
- Raw Flow Cytometry Data
- Flow Cytometry Analysis Using =Tidyverse=
- =Tidyverse= Functions for Working with Tabular Data
- Use
TSV
andCSV
file formats for tabular data - Reading tabular data into R
- Read tabular data into a
DataFrame
(tibble
) - Plotting a point graph
- How do we show multiple experimental parameters?
- Plotting a point graph with color
- Plotting a line graph
- Plotting point and line graphs
- ‘Faceting’ – Plotting in multiple panels
- Play time!
Loading, Transforming, Visualizing Tabular Data using Tidyverse packages
Principles of Data Visualization (see book)
Plate Reader Assay
Flow Cytometry
FSC.A | SSC.A | FITC.A | PE.Texas.Red.A | Time |
---|---|---|---|---|
79033 | 69338 | 9173 | 18690 | 3.02 |
101336 | 87574 | 13184 | 29886 | 3.04 |
51737 | 56161 | 3083 | 18324 | 3.06 |
79904 | 45085 | 9957 | 18099 | 3.08 |
124491 | 97305 | 15739 | 28730 | 3.09 |
54359 | 45015 | 6175 | 11918 | 3.11 |
64615 | 88989 | 11907 | 32413 | 3.13 |
109592 | 64132 | 12561 | 18824 | 3.15 |
58503 | 116384 | 11591 | 27629 | 3.19 |
38634 | 51511 | 7200 | 21930 | 3.21 |
5 cols × 2,720,000 rows
Import/Export | Visualize | Transform |
---|---|---|
read_tsv | geom_point | select |
write_tsv | geom_line | filter |
facet_grid | arrange | |
mutate | ||
join | ||
group_by | ||
summarize |
Tab-Separated Values:
strain mean_yfp mean_rfp mean_ratio se_ratio insert_sequence kozak_region schp674 1270 20316 0.561 0.004 10×AAG CAAA schp675 3687 20438 1.621 0.036 10×AAG CCGC schp676 2657 20223 1.177 0.048 10×AAG CCAA schp677 3967 20604 1.728 0.03 10×AAG CCAC
Comma-Separated Values:
strain,mean_yfp,mean_rfp,mean_ratio,se_ratio,insert_sequence,kozak_region schp674,1270,20316,0.561,0.004,10×AAG,CAAA schp675,3687,20438,1.621,0.036,10×AAG,CCGC schp676,2657,20223,1.177,0.048,10×AAG,CCAA schp677,3967,20604,1.728,0.03,10×AAG,CCAC
library(tidyverse)
data <- read_tsv("data/example_dataset_1.tsv")
library(tidyverse)
data <- read_tsv("data/example_dataset_1.tsv")
print(data, n = 5)
# library to work with tabular data
library(tidyverse)
# read the tsv file into a tibble and
# assign it to the 'data' variable
data <- read_tsv("data/example_dataset_1.tsv")
# display the contents of 'data'
print(data, n = 5)
ggplot(data, aes(x = kozak_region,
y = mean_ratio)) +
geom_point()
strain | mean_ratio | insert_sequence | kozak_region |
---|---|---|---|
schp688 | 0.755 | 10×AGA | A |
schp684 | 1.437 | 10×AGA | B |
schp690 | 1.541 | 10×AGA | C |
schp687 | 2.004 | 10×AGA | D |
schp686 | 2.121 | 10×AGA | E |
schp685 | 2.893 | 10×AGA | F |
schp683 | 3.522 | 10×AGA | G |
schp689 | 3.424 | 10×AGA | H |
schp679 | 1.149 | 10×AAG | A |
schp675 | 1.621 | 10×AAG | B |
schp681 | 1.645 | 10×AAG | C |
schp678 | 1.906 | 10×AAG | D |
schp677 | 1.728 | 10×AAG | E |
schp676 | 1.177 | 10×AAG | F |
schp674 | 0.561 | 10×AAG | G |
schp680 | 0.519 | 10×AAG | H |
ggplot(data, aes(x = kozak_region,
y = mean_ratio,
color = insert_sequence)) +
geom_point()
ggplot(data, aes(x = kozak_region,
y = mean_ratio,
color = insert_sequence,
group = insert_sequence)) +
geom_line()
ggplot(data, aes(x = kozak_region,
y = mean_ratio,
color = insert_sequence,
group = insert_sequence)) +
geom_line() +
geom_point()
ggplot(data, aes(x = kozak_region,
y = mean_ratio,
group = insert_sequence)) +
geom_line() +
geom_point() +
facet_grid(~ insert_sequence)
- Get used to the RStudio interface.
- Plot data and customize appearance.
- Learn how to “Knit” RMarkdown files.
- Learn more at https://ggplot2.tidyverse.org.