-
Notifications
You must be signed in to change notification settings - Fork 50
/
Copy path05_ggtree_annotation.Rmd
519 lines (368 loc) · 28.8 KB
/
05_ggtree_annotation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
\newpage
# Phylogenetic Tree Annotation {#chapter5}
```{r include=FALSE}
library("ape")
library("ggplot2")
library("cowplot")
library("treeio")
library("ggtree")
```
## Visualizing and Annotating Tree Using Grammar of Graphics
The `r Biocpkg("ggtree")` [@yu_ggtree:_2017] is designed for a more general-purpose or a specific type of tree visualization and annotation. It supports the grammar of graphics\index{grammar of graphics} implemented in `r CRANpkg("ggplot2")` and users can freely visualize/annotate a tree by combining several annotation layers.
(ref:ggtreeNHXscap) Annotating tree using grammar of graphics.
(ref:ggtreeNHXcap) **Annotating tree using the grammar of graphics.** The NHX tree was annotated using the grammar of graphic syntax by combining different layers using the `+` operator. Species information was labeled in the middle of the branches. Duplication events were shown on the most recent common ancestor and clade bootstrap values were displayed near to it.
```{r echo=F, message=F, warning=F, out.width='100%'}
library(ggtree)
treetext <- "(((ADH2:0.1[&&NHX:S=human], ADH1:0.11[&&NHX:S=human]):
0.05 [&&NHX:S=primates:D=Y:B=100],ADHY:
0.1[&&NHX:S=nematode],ADHX:0.12 [&&NHX:S=insect]):
0.1[&&NHX:S=metazoa:D=N],(ADH4:0.09[&&NHX:S=yeast],
ADH3:0.13[&&NHX:S=yeast], ADH2:0.12[&&NHX:S=yeast],
ADH1:0.11[&&NHX:S=yeast]):0.1[&&NHX:S=Fungi])[&&NHX:D=N];"
tree <- read.nhx(textConnection(treetext))
p = ggtree(tree) + geom_tiplab() +
geom_label(aes(x=branch, label=S), fill='lightgreen') +
geom_label(aes(label=D), fill='steelblue') +
geom_text(aes(label=B), hjust=-.5)
p <- p + xlim(NA, 0.28)
```
```{r echo=T, eval=F}
library(ggtree)
treetext = "(((ADH2:0.1[&&NHX:S=human], ADH1:0.11[&&NHX:S=human]):
0.05 [&&NHX:S=primates:D=Y:B=100],ADHY:
0.1[&&NHX:S=nematode],ADHX:0.12 [&&NHX:S=insect]):
0.1[&&NHX:S=metazoa:D=N],(ADH4:0.09[&&NHX:S=yeast],
ADH3:0.13[&&NHX:S=yeast], ADH2:0.12[&&NHX:S=yeast],
ADH1:0.11[&&NHX:S=yeast]):0.1[&&NHX:S=Fungi])[&&NHX:D=N];"
tree <- read.nhx(textConnection(treetext))
ggtree(tree) + geom_tiplab() +
geom_label(aes(x=branch, label=S), fill='lightgreen') +
geom_label(aes(label=D), fill='steelblue') +
geom_text(aes(label=B), hjust=-.5)
```
```{r ggtreeNHX, warning=FALSE, fig.cap="(ref:ggtreeNHXcap)", fig.scap="(ref:ggtreeNHXscap)", out.extra='', echo=F}
print(p)
```
Here, as an example, we visualized the tree with several layers to display annotation stored in NHX tags, including a layer of `geom_tiplab()` to display tip labels (gene name in this case), a layer using `geom_label()` to show species information (`S` tag) colored by light green, a layer of duplication event information (`D` tag) colored by steelblue and another layer using `geom_text()` to show bootstrap value (`B` tag).
Layers defined in `r CRANpkg("ggplot2")` can be applied to `r Biocpkg("ggtree")` directly as demonstrated in Figure \@ref(fig:ggtreeNHX) of using `geom_label()` and `geom_text()`. But `r CRANpkg("ggplot2")` does not provide graphic layers that are specifically designed for phylogenetic tree annotation. For instance, layers for tip labels, tree branch scale legend, highlight, or labeling clade are all unavailable. To make tree annotation more flexible, several layers have been implemented in `r Biocpkg("ggtree")` (Table \@ref(tab:geoms)), enabling different ways of annotation on various parts/components of a phylogenetic tree.
```{r geoms, echo=FALSE, message=FALSE}
geoms <- matrix(c(
"geom_balance", "Highlights the two direct descendant clades of an internal node",
"geom_cladelab", "Annotates a clade with bar and text label (or image)",
"geom_facet", "Plots associated data in a specific panel (facet) and aligns the plot with the tree",
"geom_hilight", "Highlights selected clade with rectangular or round shape",
"geom_inset", "Adds insets (subplots) to tree nodes",
"geom_label2", "The modified version of geom_label, with subset aesthetic supported",
"geom_nodepoint", "Annotates internal nodes with symbolic points",
"geom_point2", "The modified version of geom_point, with subset aesthetic supported",
"geom_range", "Bar layer to present uncertainty of evolutionary inference",
"geom_rootpoint", "Annotates root node with symbolic point",
"geom_rootedge", "Adds root-edge to a tree",
"geom_segment2", "The modified version of geom_segment, with subset aesthetic supported",
"geom_strip", "Annotates associated taxa with bar and (optional) text label",
"geom_taxalink", "Links related taxa",
"geom_text2", "The modified version of geom_text, with subset aesthetic supported",
"geom_tiplab", "The layer of tip labels",
"geom_tippoint", "Annotates external nodes with symbolic points",
"geom_tree", "Tree structure layer, with multiple layouts supported",
"geom_treescale", "Tree branch scale legend"
), ncol=2, byrow=TRUE)
geoms <- as.data.frame(geoms)
colnames(geoms) <- c("Layer", "Description")
knitr::kable(geoms, caption = "Geom layers defined in ggtree.", booktabs = T) %>%
kable_styling(latex_options = c("striped", "hold_position"), full_width = T)
```
## Layers for Tree Annotation
### Colored strips
The `r Biocpkg("ggtree")` [@yu_ggtree:_2017] implements `geom_cladelab()` layer to annotate a selected clade with a bar indicating the clade with a corresponding label\index{geom\textunderscore cladelab}.
The `geom_cladelab()` layer accepts a selected internal node number and labels the corresponding clade automatically (Figure \@ref(fig:cladelabel)A). To get the internal node number, please refer to [Chapter 2](#accesor-tidytree).
```{r eval=F}
set.seed(2015-12-21)
tree <- rtree(30)
p <- ggtree(tree) + xlim(NA, 8)
p + geom_cladelab(node=45, label="test label") +
geom_cladelab(node=34, label="another clade")
```
Users can set the parameter, `align = TRUE`, to align the clade label, `offset`, to adjust the position and color to set the color of the bar and label text, *etc.* (Figure \@ref(fig:cladelabel)B).
```{r eval=F}
p + geom_cladelab(node=45, label="test label", align=TRUE,
offset = .2, textcolor='red', barcolor='red') +
geom_cladelab(node=34, label="another clade", align=TRUE,
offset = .2, textcolor='blue', barcolor='blue')
```
Users can change the `angle` of the clade label text and relative position from text to bar via the parameter `offset.text`. The size of the bar and text can be changed via the parameters `barsize` and `fontsize`, respectively (Figure \@ref(fig:cladelabel)C).
```{r eval=F}
p + geom_cladelab(node=45, label="test label", align=TRUE, angle=270,
hjust='center', offset.text=.5, barsize=1.5, fontsize=8) +
geom_cladelab(node=34, label="another clade", align=TRUE, angle=45)
```
Users can also use `geom_label()` to label the text and can set the background color by `fill` parameter (Figure \@ref(fig:cladelabel)D).
```{r eval=F}
p + geom_cladelab(node=34, label="another clade", align=TRUE,
geom='label', fill='lightblue')
```
(ref:cladelabelscap) Labelling clades.
(ref:cladelabelcap) **Labeling clades.** Default (A); aligning and coloring clade bar and text (B); changing size and angle (C) and using `geom_label()` with background color in the text (D).
```{r cladelabel, echo=FALSE, fig.width=12, fig.height=7.6, fig.cap="(ref:cladelabelcap)", fig.scap="(ref:cladelabelscap)", out.width='100%'}
set.seed(2015-12-21)
tree <- rtree(30)
p <- ggtree(tree) + xlim(NA, 8)
p1 = p + geom_cladelab(node=45, label="test label") +
geom_cladelab(node=34, label="another clade")
p2 = p + geom_cladelab(node=45, label="test label", align=T, textcolor='red', barcolor='red') +
geom_cladelab(node=34, label="another clade", align=T, textcolor='blue', barcolor='blue')
p3 = p + geom_cladelab(node=45, label="test label", align=T, angle=270,
hjust='center', offset.text=.5, barsize=1.5, fontsize=8) +
geom_cladelab(node=34, label="another clade", align=T, angle=45)
p4 = p + geom_cladelab(node=34, label="another clade", align=T, geom='label', fill='lightblue')
plot_grid(p1, p2, p3, p4, ncol=2, labels = LETTERS[1:4])
```
In addition, `geom_cladelab()` allows users to use the image or phylopic to annotate the clades, and supports using aesthetic mapping to automatically annotate the clade with bar and text label or image (*e.g.*, mapping variable to color the clade labels) (Figure \@ref(fig:cladelabaes)).
```{r eval=F}
dat <- data.frame(node = c(45, 34),
name = c("test label", "another clade"))
# The node and label is required when geom="text"
## or geom="label" or geom="shadowtext".
p1 <- p + geom_cladelab(data = dat,
mapping = aes(node = node, label = name, color = name),
fontsize = 3)
dt <- data.frame(node = c(45, 34),
image = c("7fb9bea8-e758-4986-afb2-95a2c3bf983d",
"0174801d-15a6-4668-bfe0-4c421fbe51e8"),
name = c("specie A", "specie B"))
# when geom="phylopic" or geom="image", the image of aes is required.
p2 <- p + geom_cladelab(data = dt,
mapping = aes(node = node, label = name, image = image),
geom = "phylopic", imagecolor = "black",
offset=1, offset.text=0.5)
# The color or size of image also can be mapped.
p3 <- p + geom_cladelab(data = dt,
mapping = aes(node = node, label = name,
image = image, color = name),
geom = "phylopic", offset = 1, offset.text=0.5)
```
(ref:cladelabaesscap) Labeling clades using aesthetic mapping.
(ref:cladelabaescap) **Labeling clades using aesthetic mapping.** The geom_cladelab() layer allows users to use aesthetic mapping to annotate the clades (A); it supports using images or phylopic to annotate clades (B); mapping variable to change color or size of the text or image is also supported (C).
```{r cladelabaes, echo=FALSE, fig.width=12, fig.height=5, fig.cap="(ref:cladelabaescap)", fig.scap="(ref:cladelabaesscap)", out.width='100%'}
dat <- data.frame(node=c(45, 34), name=c("test label", "another clade"))
f5 <- p + geom_cladelab(data=dat, mapping=aes(node=node, label=name, color=name), fontsize = 3)
dt <- data.frame(node = c(45, 34),
image = c("7fb9bea8-e758-4986-afb2-95a2c3bf983d", "0174801d-15a6-4668-bfe0-4c421fbe51e8"),
name = c("specie A", "specie B"))
f6 <- p + geom_cladelab(data=dt, mapping = aes(node = node, label = name, image = image), geom = "phylopic", imagecolor = "black", offset=1, offset.text=0.5)
f7 <- p + geom_cladelab(data=dt, mapping = aes(node = node, label = name, image = image, color = name), geom = "phylopic", offset = 1, offset.text=0.5)
plot_list(f5, f6, f7, ncol=3, tag_levels='A')
```
The `geom_cladelab()` layer also supports unrooted tree layouts (Figure \@ref(fig:striplabel)A).
```{r fig.wdith=7, fig.height=7, fig.align='center', warning=FALSE, message=FALSE, eval=F}
ggtree(tree, layout="daylight") +
geom_cladelab(node=35, label="test label", angle=0,
fontsize=8, offset=.5, vjust=.5) +
geom_cladelab(node=55, label='another clade',
angle=-95, hjust=.5, fontsize=8)
```
The `geom_cladelab()` is designed for labeling Monophyletic (Clade) while there are related taxa that do not form a clade. In `r Biocpkg("ggtree")`, we provide another layer, `geom_strip()`, to add a strip/bar to indicate the association with an optional label for Polyphyletic or Paraphyletic (Figure \@ref(fig:striplabel)B).
```{r eval=F}
p + geom_tiplab() +
geom_strip('t10', 't30', barsize=2, color='red',
label="associated taxa", offset.text=.1) +
geom_strip('t1', 't18', barsize=2, color='blue',
label = "another label", offset.text=.1)
```
(ref:striplabelscap) Labeling associated taxa.
(ref:striplabelcap) **Labeling associated taxa.** The `geom_cladelab()` is designed for labeling Monophyletic and supports unrooted layouts (A). The `geom_strip()` is designed for labeling all types of associated taxa, including Monophyletic, Polyphyletic, and Paraphyletic (B).
```{r striplabel, fig.width=13.5, fig.height=6.5, echo=FALSE, warning=FALSE, fig.cap="(ref:striplabelcap)", fig.scap="(ref:striplabelscap)", out.width='100%'}
pg <- ggtree(tree, layout="daylight")
p5 <- pg + geom_cladelab(node=35, label="test label", angle=0, fontsize=8, offset=.5, vjust=.5) +
geom_cladelab(node=55, label='another clade', angle=-95, hjust=.5, fontsize=8)
p6 <- p + geom_tiplab() +
geom_strip('t10', 't30', barsize=2, color='red',
label="associated taxa", offset.text=.1) +
geom_strip('t1', 't18', barsize=2, color='blue',
label = "another label", offset.text=.1)
plot_grid(p5, p6, ncol=2, labels=LETTERS[1:2])
```
### Highlight clades
The `r Biocpkg("ggtree")` implements the `geom_hilight()` layer, which accepts an internal node number and adds a layer of a rectangle to highlight the selected clade (Figure \@ref(fig:hilight))^[If you want to plot the tree above the highlighting area, visit [FAQ](#faq-under-the-tree) for details.]\index{geom\textunderscore hilight}.
```{r eval=F, fig.width=5, fig.height=5, fig.align="center", warning=FALSE}
nwk <- system.file("extdata", "sample.nwk", package="treeio")
tree <- read.tree(nwk)
ggtree(tree) +
geom_hilight(node=21, fill="steelblue", alpha=.6) +
geom_hilight(node=17, fill="darkgreen", alpha=.6)
ggtree(tree, layout="circular") +
geom_hilight(node=21, fill="steelblue", alpha=.6) +
geom_hilight(node=23, fill="darkgreen", alpha=.6)
```
The `geom_hilight` layer also supports highlighting clades for unrooted layout trees with round ('encircle') or rectangular ('rect') shape (Figure \@ref(fig:hilight)C).
```{r eval=FALSE, fig.width=5, fig.height=5, fig.align='center', warning=FALSE, message=FALSE}
## type can be 'encircle' or 'rect'
pg + geom_hilight(node=55, linetype = 3) +
geom_hilight(node=35, fill='darkgreen', type="rect")
```
Another way to highlight selected clades is by setting the clades with different colors and/or line types as demonstrated in Figure \@ref(fig:scaleClade).
In addition to `geom_hilight()`, `r Biocpkg("ggtree")` also implements `geom_balance()`
which is designed to highlight neighboring subclades of a given internal node (Figure \@ref(fig:hilight)D).
```{r fig.width=4, fig.height=5, fig.align='center', warning=FALSE, eval=F}
ggtree(tree) +
geom_balance(node=16, fill='steelblue', color='white', alpha=0.6, extend=1) +
geom_balance(node=19, fill='darkgreen', color='white', alpha=0.6, extend=1)
```
The `geom_hilight()` layer supports using aesthetic mapping to automatically highlight clades as demonstrated in Figures \@ref(fig:hilight)E-F. For plot in Cartesian coordinates (*e.g.*, rectangular layout), the rectangle can be rounded (Figure \@ref(fig:hilight)E) or filled with gradient colors (Figure \@ref(fig:hilight)F).
```{r eval=FALSE, fig.width=5, fig.height=5, fig.align='center', warning=FALSE, message=FALSE}
## using external data
d <- data.frame(node=c(17, 21), type=c("A", "B"))
ggtree(tree) + geom_hilight(data=d, aes(node=node, fill=type),
type = "roundrect")
## using data stored in the tree object
x <- read.nhx(system.file("extdata/NHX/ADH.nhx", package="treeio"))
ggtree(x) + geom_hilight(mapping=aes(subset = node %in% c(10, 12),
fill = S),
type = "gradient", gradient.direction = 'rt',
alpha = .8) +
scale_fill_manual(values=c("steelblue", "darkgreen"))
```
(ref:hilightscap) Highlight selected clades.
(ref:hilightcap) **Highlight selected clades.** Rectangular layout (A), circular/fan (B), and unrooted layouts. Highlight neighboring subclades simultaneously (D). Highlight selected clades using associated data (E and F).
```{r hilight, echo = FALSE, fig.width=12, fig.height=8, warning=FALSE, fig.cap="(ref:hilightcap)", fig.scap="(ref:hilightscap)", out.width='100%'}
nwk <- system.file("extdata", "sample.nwk", package="treeio")
tree <- read.tree(nwk)
p1= ggtree(tree) + geom_hilight(node=21, fill="steelblue", alpha=.6) +
geom_hilight(node=17, fill="darkgreen", alpha=.6)
p2= ggtree(tree, layout="circular") + geom_hilight(node=21, fill="steelblue", alpha=.6) +
geom_hilight(node=23, fill="darkgreen", alpha=.6)
p3 = pg + geom_hilight(node=55, linetype = 3) +
geom_hilight(node=35, fill='darkgreen', type='rect', linetype = 3)
p4 = ggtree(tree) +
geom_balance(node=16, fill='steelblue', color='white', alpha=0.6, extend=1) +
geom_balance(node=19, fill='darkgreen', color='white', alpha=0.6, extend=1)
d <- data.frame(node=c(17, 21), type=c("A", "B"))
p5 <- ggtree(tree) + geom_hilight(data=d, aes(node=node, fill=type), type = "roundrect")
## using data stored in tree object
x <- read.nhx(system.file("extdata/NHX/ADH.nhx", package="treeio"))
p6 <- ggtree(x) + geom_hilight(mapping=aes(subset = node %in% c(10, 12),
fill = S),
type = "gradient", graident.direction = 'rt',
alpha = .8) +
scale_fill_manual(values=c("steelblue", "darkgreen"))
plot_list(p1, p2, p3, p4, p5, p6, ncol=3, tag_levels='A')
```
### Taxa connection
Some evolutionary events (*e.g.*, reassortment, horizontal gene transfer) cannot be modeled by a simple tree. The `r Biocpkg("ggtree")` provides the `geom_taxalink()` layer that allows drawing straight or curved lines between any of two nodes in the tree, allowing it to represent evolutionary events by connecting taxa. It works with rectangular (Figure \@ref(fig:taxalink)A), circular (Figure \@ref(fig:taxalink)B), and inward circular (Figure \@ref(fig:taxalink)C) layouts. The `geom_taxalink()` is not only useful for presenting evolutionary events, but it can also be used to combine evolutionary trees to present relationships or interactions between species [@ggtreeExtra_2021]\index{geom\textunderscore taxalink}.
The `geom_taxalink()` layout supports aesthetic mapping, which requires a `data.frame` that stores association information with/without meta-data as input (Figure \@ref(fig:taxalink)D).
(ref:taxalinkscap) Linking related taxa.
(ref:taxalinkcap) **Linking related taxa.** This can be used to indicate evolutionary events or relationships between species. Rectangular layout (A), circular layout (B), and inward circular layout (C and D). It supports aesthetic mapping to map variables to set line sizes and colors (D).
```{r taxalink, fig.cap="(ref:taxalinkcap)", fig.scap="(ref:taxalinkscap)", fig.width=12, fig.height=10, warning=FALSE, out.width='100%'}
p1 <- ggtree(tree) + geom_tiplab() + geom_taxalink(taxa1='A', taxa2='E') +
geom_taxalink(taxa1='F', taxa2='K', color='red', linetype = 'dashed',
arrow=arrow(length=unit(0.02, "npc")))
p2 <- ggtree(tree, layout="circular") +
geom_taxalink(taxa1='A', taxa2='E', color="grey", alpha=0.5,
offset=0.05, arrow=arrow(length=unit(0.01, "npc"))) +
geom_taxalink(taxa1='F', taxa2='K', color='red',
linetype = 'dashed', alpha=0.5, offset=0.05,
arrow=arrow(length=unit(0.01, "npc"))) +
geom_taxalink(taxa1="L", taxa2="M", color="blue", alpha=0.5,
offset=0.05, hratio=0.8,
arrow=arrow(length=unit(0.01, "npc"))) +
geom_tiplab()
# when the tree was created using reverse x,
# we can set outward to FALSE, which will generate the inward curve lines.
p3 <- ggtree(tree, layout="inward_circular", xlim=150) +
geom_taxalink(taxa1='A', taxa2='E', color="grey", alpha=0.5,
offset=-0.2, outward=FALSE,
arrow=arrow(length=unit(0.01, "npc"))) +
geom_taxalink(taxa1='F', taxa2='K', color='red', linetype = 'dashed',
alpha=0.5, offset=-0.2, outward=FALSE,
arrow=arrow(length=unit(0.01, "npc"))) +
geom_taxalink(taxa1="L", taxa2="M", color="blue", alpha=0.5,
offset=-0.2, outward=FALSE,
arrow=arrow(length=unit(0.01, "npc"))) +
geom_tiplab(hjust=1)
dat <- data.frame(from=c("A", "F", "L"),
to=c("E", "K", "M"),
h=c(1, 1, 0.1),
type=c("t1", "t2", "t3"),
s=c(2, 1, 2))
p4 <- ggtree(tree, layout="inward_circular", xlim=c(150, 0)) +
geom_taxalink(data=dat,
mapping=aes(taxa1=from,
taxa2=to,
color=type,
size=s),
ncp=10,
offset=0.15) +
geom_tiplab(hjust=1) +
scale_size_continuous(range=c(1,3))
plot_list(p1, p2, p3, p4, ncol=2, tag_levels='A')
```
### Uncertainty of evolutionary inference
The `geom_range()` layer supports displaying interval (highest posterior density, confidence interval, range) as horizontal bars on tree nodes. The center of the interval will anchor to the corresponding node. The center by default is the mean value of the interval (Figure \@ref(fig:geomRange)A). We can set the `center` to the estimated mean or median value (Figure \@ref(fig:geomRange)B), or the observed value. As the tree branch and the interval may not be on the same scale, `r Biocpkg("ggtree")` provides `scale_x_range` to add a second *x*-axis for the range (Figure \@ref(fig:geomRange)C). Note that *x*-axis is disabled by the default theme, and we need to enable it if we want to display it (*e.g.*, using `theme_tree2()`)\index{geom\textunderscore range}.
(ref:geomRangescap) Displaying uncertainty of evolutionary inference.
(ref:geomRangecap) **Displaying uncertainty of evolutionary inference.** The center (mean value of the range (A) or estimated value (B)) is anchored to the tree nodes. A second *x*-axis was used for range scaling (C).
```{r eval=FALSE}
file <- system.file("extdata/MEGA7", "mtCDNA_timetree.nex", package = "treeio")
x <- read.mega(file)
p1 <- ggtree(x) + geom_range('reltime_0.95_CI', color='red', size=3, alpha=.3)
p2 <- ggtree(x) + geom_range('reltime_0.95_CI', color='red', size=3,
alpha=.3, center='reltime')
p3 <- p2 + scale_x_range() + theme_tree2()
```
```{r geomRange, fig.cap="(ref:geomRangecap)", fig.scap="(ref:geomRangescap)", fig.width=12, fig.height=4, echo=F, out.width='100%'}
file <- system.file("extdata/MEGA7", "mtCDNA_timetree.nex", package = "treeio")
x <- read.mega(file)
p1 <- ggtree(x) + geom_range('reltime_0.95_CI', color='red', size=3, alpha=.3)
p2 <- ggtree(x) + geom_range('reltime_0.95_CI', color='red', size=3, alpha=.3, center='reltime') + coord_cartesian(ylim = c(1, 7))
p3 <- p2 + scale_x_range() + theme_tree2()
plot_grid(p1, p2, p3, ncol=3, labels = LETTERS[1:3])
```
## Tree Annotation with Output from Evolution Software
### Tree annotation using data from evolutionary analysis software
[Chapter 1](#chapter) introduced using `r Biocpkg("treeio")` package [@wang_treeio_2020] to parse different tree formats and commonly used software outputs to obtain phylogeny-associated data. These imported data, as `S4` objects, can be visualized directly using `r Biocpkg("ggtree")`. Figure \@ref(fig:ggtreeNHX) demonstrates a tree annotated using the information (species classification, duplication event, and bootstrap value) stored in the NHX\index{NHX} file. `r pkg_phyldog` and `pkg_revbayes` output NHX files that can be parsed by `r Biocpkg("treeio")` and visualized by `r Biocpkg("ggtree")` with annotation using their inference data.
Furthermore, the evolutionary data from the inference of `r pkg_beast`, `r pkg_mrbayes`, and `r pkg_revbayes`, *d~N~/d~S~* values inferred by `r pkg_codeml`, ancestral sequences\index{ancestral sequences} inferred by `r pkg_hyphy`, `r pkg_codeml`, or `r pkg_baseml` and short read placement by `r pkg_epa` and `r pkg_pplacer` can be used to annotate the tree directly.
(ref:beastscap) Annotating `r pkg_beast` tree with _length\_95%\_HPD_ and posterior.
(ref:beastcap) **Annotating `r pkg_beast` tree with _length\_95%\_HPD_ and posterior.** Branch length credible intervals (95% HPD) were displayed as red horizontal bars and clade posterior values were shown on the middle of branches\index{BEAST}.
```{r beast, fig.cap="(ref:beastcap)", fig.scap="(ref:beastscap)", fig.width=7, out.extra='', out.width='100%'}
file <- system.file("extdata/BEAST", "beast_mcc.tree", package="treeio")
beast <- read.beast(file)
ggtree(beast, aes(color=rate)) +
geom_range(range='length_0.95_HPD', color='red', alpha=.6, size=2) +
geom_nodelab(aes(x=branch, label=round(posterior, 2)), vjust=-.5, size=3) +
scale_color_continuous(low="darkgreen", high="red") +
theme(legend.position=c(.1, .8))
```
In Figure \@ref(fig:beast), the tree was visualized and annotated with posterior >0.9 and demonstrated length uncertainty (95% Highest Posterior Density (HPD) interval).
Ancestral sequences inferred by `r pkg_hyphy` can be parsed using `r Biocpkg("treeio")`, whereas the substitutions along each tree branch were automatically computed and stored inside the phylogenetic tree object (*i.e.*, `S4` object). The `r Biocpkg("ggtree")` package can utilize this information stored in the object to annotate the tree, as demonstrated in Figure \@ref(fig:hyphy)\index{HyPhy}.
(ref:hyphyscap) Annotating tree with amino acid substitution determined by ancestral sequences inferred by `r pkg_hyphy`.
(ref:hyphycap) **Annotating tree with amino acid substitution determined by ancestral sequences inferred by `r pkg_hyphy`.** Amino acid substitutions were displayed in the middle of branches.
```{r hyphy, fig.width=7.8, fig.height=3.5, warning=FALSE, fig.cap="(ref:hyphycap)", fig.scap="(ref:hyphyscap)", out.extra='', out.width='100%'}
nwk <- system.file("extdata/HYPHY", "labelledtree.tree",
package="treeio")
ancseq <- system.file("extdata/HYPHY", "ancseq.nex",
package="treeio")
tipfas <- system.file("extdata", "pa.fas", package="treeio")
hy <- read.hyphy(nwk, ancseq, tipfas)
ggtree(hy) +
geom_text(aes(x=branch, label=AA_subs), size=2,
vjust=-.3, color="firebrick")
```
`r pkg_paml`'s `r pkg_baseml` and `r pkg_codeml` can also be used to infer ancestral sequences, whereas `r pkg_codeml`\index{CodeML} can infer selection pressure. After parsing this information using `r Biocpkg("treeio")`, `r Biocpkg("ggtree")` can integrate this information into the same tree structure and be used for annotation as illustrated in Figure \@ref(fig:codeml)\index{PAML}.
(ref:codemlscap) Annotating tree with amino acid substitution and *d~N~/d~S~* inferred by `r pkg_codeml`.
(ref:codemlcap) **Annotating tree with amino acid substitution and *d~N~/d~S~* inferred by `r pkg_codeml`.** Branches were rescaled and colored by *d~N~/d~S~* values, and amino acid substitutions were displayed on the middle of branches.
```{r codeml, fig.cap="(ref:codemlcap)", fig.scap="(ref:codemlscap)", warning=FALSE, out.extra='', fig.height=4, out.width='100%'}
rstfile <- system.file("extdata/PAML_Codeml", "rst",
package="treeio")
mlcfile <- system.file("extdata/PAML_Codeml", "mlc",
package="treeio")
ml <- read.codeml(rstfile, mlcfile)
ggtree(ml, aes(color=dN_vs_dS), branch.length='dN_vs_dS') +
scale_color_continuous(name='dN/dS', limits=c(0, 1.5),
oob=scales::squish,
low='darkgreen', high='red') +
geom_text(aes(x=branch, label=AA_subs),
vjust=-.5, color='steelblue', size=2) +
theme_tree2(legend.position=c(.9, .3))
```
Not only all the tree data parsed by `r Biocpkg("treeio")` can be used to visualize and annotate the phylogenetic tree using `r Biocpkg("ggtree")`, but also other trees and tree-like objects defined in the R community are supported. The `r Biocpkg("ggtree")` plays a unique role in the R ecosystem to facilitate phylogenetic analysis, and it can be easily integrated into other packages and pipelines. For more details of working with other tree-like structures, please refer to [Chapter 9](#chapter9). In addition to direct support of tree objects, `r Biocpkg("ggtree")` also allows users to plot a tree with different types of external data (see also [Chapter 7](#chapter7) and [@yu_two_2018]).
## Summary {#summary5}
The `r Biocpkg("ggtree")` package implements the grammar of graphics for annotating phylogenetic trees. Users can use the `r CRANpkg("ggplot2")` syntax to combine different annotation layers to produce complex tree annotation. If you are familiar with `r CRANpkg("ggplot2")`, tree annotation with a high level of customization can be intuitive and flexible using `r Biocpkg("ggtree")`. The `r Biocpkg("ggtree")` can collect information in the `treedata` object or link external data to the structure of the tree. This will enable us to use the phylogenetic tree for data integration analysis and comparative studies, and will greatly expand the application of the phylogenetic tree in different fields.