This package contains a set of functions that expand the ggtree
package. The
main function of this set, ggcollapse
allows the user to collapse nodes with
the same height for the collapsing triangles. The package also contains some
functions that help in the task of finding monophyletic groups of a certain
taxonomic rank and the full set of sister clades to a given node.
To install ggcollapse
, you just needs to execute the following
command:
if (!require('devtools', quietly = TRUE)) {
install.packages('devtools')
}
devtools::install_github('moibernabeu/ggcollapse')
Although the functions are versatile, the main idea of them is to use
three data types, a phylo
object for the tree, a data.frame
for the
data, and named vectors for the nodes (numeric
) to collapse and the
colours of the collapsed clades (character
).
In this tutorial we will simulate our data, first, we simulate a tree:
library(treeio)
set.seed(2024-11-04)
tree <- rtree(20)
To see the whole potential of the package, we will assign species to
each tip, the tip label format chosen for this tutorial will be
tip_SPECIES
, therefore,
tree$tip.label[3:20] <- paste(tree$tip.label[3:20], LETTERS[2:19], sep = '_')
# Adding a species duplication in the tree
tree$tip.label[1:2] <- paste(tree$tip.label[1:2], 'A', sep = '_')
The final tree, with the number of the internal nodes, would be:
library(ggtree)
ggtree(tree) +
geom_tiplab() +
geom_nodelab(aes(label = node), geom = 'label') +
xlim(0, 5.5)
To extract the species from each tip, we will design a function, which
will be added to the ggcollapse
functions later on:
library(stringr)
get_sp <- function(tip.label) {
return(str_split(tip.label, '_', n = 2, simplify = TRUE)[, 2])
}
tree$tip.label[1]
## [1] "t14_A"
get_sp(tree$tip.label[1])
## [1] "A"
Other options, and advantages, of using a species-getting function are the search of the species in a table relating the tip value with the species, a vector, etc.
Imagine that the species of your tree are distributed into two groups. To incorporate this information to the tree we can take two strategies:
-
generate a
data.frame
where each tip has the information, in this case, the species-getting function must return the header, or -
generate a
data.frame
relating the species with its information, in this case, the species-generating function must return the species.
In this tutorial, we will use the second strategy to generate de data.frame
,
we also will generate a palette for the groups:
groups <- c(rep('G1', 7), rep('G2', 3), rep('G3', 9))
tree_data <- data.frame(species = LETTERS[1:19],
group = groups)
head(tree_data, n = 4)
## species group
## 1 A G1
## 2 B G1
## 3 C G1
## 4 D G1
group_colours <- c('G1' = 'coral3',
'G2' = 'steelblue3',
'G3' = 'darkolivegreen4')
To incorporate the tree data (tree_data
) into the tree, we can use the
annotate_tree
function. This function associates the tip labels with the
information of the tree data table by using a linking function, in this case,
the species-getting function. We added the argument data_sp_column
to force
the user to incorporate the value (as character or numeric) of the column where
the species code is. The resulting tree is a treedata S4
object.
library(ggcollapse)
annot_tree <- annotate_tree(tree = tree,
tree_data = tree_data,
get_sp = get_sp,
data_sp_column = 'species')
Now, we can plot data across the tree, we can plot the group they belong using points on the tips:
ggtree(annot_tree) +
geom_tiplab(offset = 0.035) +
geom_tippoint(aes(colour = group)) +
scale_colour_manual(values = group_colours) +
xlim(0, 5.5)
Using the function get_monophyletics
we can get the value of the nodes whose
all their descendants belong to the group specified in the group_column
argument. Here we show the example of a tree that has been already annotated,
however, the function can get the tree_data
, get_sp
and data_sp_column
in
the case the tree is not annotated.
gr_mphy <- get_monophyletics(tree = annot_tree,
group_column = 'group')
gr_mphy
## G1 G1 G2 G3
## 22 26 30 32
An interesting function of this package, is the function get_sisters
. This
function allows the user to obtain the vector of sister clades to a specific
tip or internal node. The function will return a vector with the number of the
sister clade nodes sorted by increasing topological distance to the node. To
get the sisters of a tip, we can specify the tree and the tip label, as well
as the tip number:
tip_sisters <- get_sisters(tree = tree,
node = 't8_O')
tip_sisters
## [1] 38 34 39 30 26 22
tip_num <- which(tree$tip.label == 't8_O')
tip_sisters_num <- get_sisters(tree = tree,
node = tip_num)
tip_sisters_num == tip_sisters
## [1] TRUE TRUE TRUE TRUE TRUE TRUE
In case we want to name the vector of sisters with the most abundant value of
a column in the descendants a specific clade, we can use the annotated tree,
the node, and the naming column, in this case, the group. Using these arguments,
the function will return a named vector with the number of the sister clade
nodes named with the most abundant value for the column and its percentage
(e.g., if in a clade there are 3 tips from the group
G1
and 1 tip from the
group
G2
the clade would be annotated as G1 75%
):
tip_sisters_nmd <- get_sisters(tree = annot_tree,
node = 't8_O',
naming_column = 'group')
tip_sisters_nmd
## G3 100% G3 100% G3 100% G2 100% G1 100% G1 100%
## 38 34 39 30 26 22
We can follow the same procedure for an internal node:
node_sisters <- get_sisters(tree = tree,
node = 26)
node_sisters
## [1] 29 22
node_sisters_nmd <- get_sisters(tree = annot_tree,
node = 26,
naming_column = 'group')
node_sisters_nmd
## G3 75% G1 100%
## 29 22
The main point of this package is to allow the user to collapse some clades
easily. The ggcollapse
function just requires a phylo
tree object and
a named vector with the nodes to be collapsed. Imagine now that we want to
collapse the sister groups to t8_O
that we computed previously:
# Naming the sisters
names(tip_sisters) <- paste('S', 1:length(tip_sisters), sep = '')
# Collapsing the sister nodes
ggcollapse(tree = tree,
nodes = tip_sisters) +
geom_tiplab() +
xlim(0, 1.1)
We can also annotate the tree with the ggcollapse
function, and it would
return a ggtree
object with the data. In the following example, we will plot
the collapsed sisters with the uncollapsed nodes showing the species, rather
than the tip, label.
ggcollapse(tree = tree,
nodes = tip_sisters,
get_sp = get_sp,
tree_data = tree_data,
data_sp_column = 'species') +
geom_tiplab(aes(label = species)) +
xlim(0, 1.1)
Using the same logic, we can apply this to the monophyletic groups, for which we have the node colours.
# Collapsing the monophyletic groups
ggcollapse(tree = tree,
nodes = gr_mphy,
node_colours = group_colours)
We can also change the visualisation of the collapsed triangle using the same
argument values as in ggtree
(min
, max
, mixed
). Our default is mixed
.
ggcollapse(tree = tree,
nodes = gr_mphy,
collapse_mode = 'mixed',
node_colours = group_colours) +
xlim(0, 1.15) +
ggcollapse(tree = tree,
nodes = gr_mphy,
collapse_mode = 'max',
node_colours = group_colours) +
xlim(0, 1.15) +
ggcollapse(tree = tree,
nodes = gr_mphy,
collapse_mode = 'min',
node_colours = group_colours) +
xlim(0, 0.8)
As the ggcollapse
function returns a ggtree
object, the user can use all the
functionalities of this package, such as adding collapsing functions.
set.seed(09-08-1997)
supports <- round(runif(length(tree$tip.label) - 2, 25, 100))
tree$node.label <- c(NA, supports)
ggcollapse(tree = tree,
nodes = gr_mphy,
node_colours = group_colours) +
geom_nodelab(aes(x = branch), nudge_y = 0.15)