Skip to content

Commit

Permalink
Revise News and build pkgdown site
Browse files Browse the repository at this point in the history
  • Loading branch information
mikemc committed Jun 30, 2020
1 parent 1859b10 commit 66c527e
Show file tree
Hide file tree
Showing 55 changed files with 6,157 additions and 73 deletions.
3 changes: 3 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
^README\.Rmd$
^dev$
^\.travis\.yml$
^_pkgdown\.yml$
^docs$
^pkgdown$
6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
Package: speedyseq
Title: Faster implementations of phyloseq functions
Version: 0.1.2.9008
Version: 0.2.0
Authors@R:
person(given = "Michael",
family = "McLaren",
role = c("aut", "cre"),
email = "[email protected]")
Description: Faster implementations of phyloseq functions.
URL: https://github.com/mikemc/speedyseq
URL: https://mikemc.github.io/speedyseq, https://github.com/mikemc/speedyseq
BugReports: https://github.com/mikemc/speedyseq/issues
License: AGPL-3 + file LICENSE
Encoding: UTF-8
Expand All @@ -27,7 +27,7 @@ Imports:
tidyr,
scales,
vegan
RoxygenNote: 7.1.0
RoxygenNote: 7.1.1
Suggests:
cowplot,
DECIPHER,
Expand Down
111 changes: 55 additions & 56 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,28 @@
# speedyseq (development version)
# speedyseq 0.2.0

* Fixed bug that applied to taxonomic merge functions when an object named
`new_tax_mat` exists outside the function environment; described in [Issue
#31](https://github.com/mikemc/speedyseq/issues/31)

* Changed default ordering of new taxa output by taxonomic merging functions
and added `reorder` parameter to control this behavior. (Only applies to
phylogenetic objects without trees.)

* New `merge_taxa_vec()` function provides a vectorized version of
`phyloseq::merge_taxa()`

* New `tip_glom()` function provides a speedy version of
`phyloseq::tip_glom()` for indirect phylogenetic merging of taxa.
## Breaking changes

* New `tree_glom()` function performs direct phylogenetic merging of taxa. This
function is much faster and arguably more intuitive than `tip_glom()`.

* Merging / glom functions now work on relevant phyloseq components as well as
phyloseq objects

* Adds dependencies
[castor](https://cran.r-project.org/web/packages/castor/index.html) and
[purrr](https://purrr.tidyverse.org/)
* The default ordering of new taxa output by `tax_glom()` is different from
previous versions and from `phyloseq::tax_glom()` in phyloseq objects that do
not have phylogenetic trees. See "Minor improvements and fixes" for more
information.

## New features

### New general-purpose vectorized merging function

Phyloseq's `merge_taxa()` takes a phyloseq object or component object `x` and a
set of taxa `eqtaxa` and merges them into a single taxon. In place of the
`eqtaxa` argument, speedyseq's `merge_taxa_vec()` takes a vector `group` of
length `ntaxa(physeq)` that defines how all the taxa in `x` should be merged
into multiple new groups. Its syntax and behavior is patterned after that of
`base::rowsum()`, which it uses to do the merging in the OTU table. When aiming
to merge a large number of taxa into a smaller but still large number of
groups, it is much faster to do all the merging with one call to
`merge_taxa_vec()` than to loop through many calls to `merge_taxa()`.
### New general-purpose vectorized taxa-merging function

The new `merge_taxa_vec()` function provides a vectorized version of
`phyloseq::merge_taxa()` that can quickly merge arbitrary groups of taxa and
now forms the basis of all other merging functions. `phyloseq::merge_taxa()`
takes a phyloseq object or component object `x` and a set of taxa `eqtaxa` and
merges them into a single taxon. In place of the `eqtaxa` argument, speedyseq's
`merge_taxa_vec()` takes a vector `group` of length `ntaxa(physeq)` that
defines how all the taxa in `x` should be merged into multiple new groups. Its
syntax and behavior is patterned after that of `base::rowsum()`, which it uses
to do the merging in the OTU table. When aiming to merge a large number of taxa
into a smaller but still large number of groups, it is much faster to do all
the merging with one call to `merge_taxa_vec()` than to loop through many calls
to `merge_taxa()`.

A practical example is clustering amplicon sequence variants (ASVs) into OTUs
defined by a given similarity threshold. Suppose we have a phyloseq object `ps`
Expand Down Expand Up @@ -102,38 +88,33 @@ tax_table(ps2)[c(108, 136, 45),]
#> 185581 "OM60" NA NA
```

### Speedy `tip_glom()` for indirect phylogenetic merging

Phyloseq provides `tip_glom()` to perform a form of indirect phylogenetic
merging using the phylogenetic tree in `phy_tree(physeq)`. This function uses
the tree to create a distance matrix, performs hierarchical clustering on the
distance matrix, and then defines new taxonomic groups by cutting the
dendrogram produced by the clustering at a user defined height. Phyloseq's
version can be slow and memory intensive when the number of taxa is large.

Speedyseq's new `tip_glom()` function provides a faster and less
memory-intensive alternative to `phyloseq::tip_glom()` through the use of
vectorized merging (via `merge_taxa_vec()`) and faster and lower-memory
phylogenetic-distance computation (via `get_all_pairwise_distances()` from the
### Faster and lower-memory implementation of `phyloseq::tip_glom()`

The new `tip_glom()` function provides a speedy version of
`phyloseq::tip_glom()`. This function performs a form of indirect phylogenetic
merging of taxa using the phylogenetic tree in `phy_tree(physeq)` by 1) using
the tree to create a distance matrix, 2) performing hierarchical clustering on
the distance matrix, and 3) defining new taxonomic groups by cutting the
dendrogram at the height specified by the `h` parameter. Speedyseq's
`tip_glom()` provides a faster and less memory-intensive alternative to
`phyloseq::tip_glom()` through the use of vectorized merging (via
`merge_taxa_vec()`) and faster and lower-memory phylogenetic-distance
computation (via `get_all_pairwise_distances()` from the
[castor](https://cran.r-project.org/web/packages/castor/index.html) package).

Speedyseq's `tip_glom()` also has the new `tax_adjust` argument, which is
passed on to `merge_taxa_vec()`. It is set to `1` by default for phyloseq
compatibility and should give identical results to phyloseq in this case.

For phyloseq compatibility, the default clustering function is left as
`cluster::agnes`. However, equivalent but faster results can be obtained by
`cluster::agnes()`. However, equivalent but faster results can be obtained by
using the `hclust` function from base R with the `method == "average"` option.

Speedyseq's `tip_glom()` currently only works on phyloseq objects and will give
an error if used on a phylo (tree) object.

### Direct phylogenetic merging with `tree_glom()`

It might be desirable in many cases to perform phylogenetic merging based
directly on the phylogenetic tree rather than (as in `tip_glom()`) a dendrogram
derived from it. Speedyseq's new `tree_glom()` function performs such direct
phylogenetic merging, which has several advantages.
The new `tree_glom()` function performs direct phylogenetic merging of taxa.
This function is much faster and arguably more intuitive than `tip_glom()`.
Advantages of direct merging over the indirect merging of `tip_glom()` are

1. A merged group of taxa correspond to a clade in the original tree being
collapsed to a single taxon.
Expand Down Expand Up @@ -172,6 +153,24 @@ plot2 <- phy_tree(ps2) %>%
plot_grid(plot1, plot2)
```

## Minor improvements and fixes

* Fixed bug that applied to taxonomic merge functions when an object named
`new_tax_mat` exists outside the function environment; described in
[Issue #31](https://github.com/mikemc/speedyseq/issues/31)

* Merging functions now maintain the original order of new taxa by default,
except in phyloseq objects with phylogenetic trees (for which order is and
has always been determined by how archetypes are ordered in
`phy_tree(ps)$tip.label`). This behavior can lead to different taxa orders
from past speedyseq versions and from `phyloseq::tax_glom()` function.
However, it makes the resulting taxa order more predictable. New taxa can be
be reordered according to `group` or taxonomy in `merge_taxa_vec()` and
`tax_glom()` by setting `reorder = TRUE`.

* Merging/glom functions now work on relevant phyloseq components as well as
phyloseq objects

# speedyseq 0.1.2

* `tax_glom()` has a new implementation using base R functions instead of
Expand Down Expand Up @@ -224,6 +223,6 @@ is `TRUE`.

### `tax_glom()`

Phyloseq's `tax_glom()` can be applied to `taxonomyTable` objects as well as
`phyloseq::tax_glom()` can be applied to `taxonomyTable` objects as well as
`phyloseq` objects, but speedyseq's `tax_glom()` currently only works on
`phyloseq` objects and gives an error on `taxonomyTable` objects.
8 changes: 4 additions & 4 deletions R/agglomeration.R
Original file line number Diff line number Diff line change
Expand Up @@ -256,10 +256,10 @@ tip_glom <- function(physeq,
#' ntaxa(ps1)
#' ps2 <- tree_glom(ps1, 0.05)
#' ntaxa(ps2)
#'
#' library(dplyr)
#' library(ggtree)
#' library(cowplot)
#'
#' suppressPackageStartupMessages(library(dplyr))
#' suppressPackageStartupMessages(library(ggtree))
#' suppressPackageStartupMessages(library(cowplot))
#'
#' plot1 <- phy_tree(ps1) %>%
#' ggtree +
Expand Down
5 changes: 2 additions & 3 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,7 @@ of phyloseq functions include
My general aim is for these functions to be drop-in replacements for phyloseq's
versions; however, there are small differences that should not affect most use
cases. In some functions, I have added optional arguments to allow modifying
the phyloseq behavior. See [NEWS.md](./NEWS.md) for information about these
differences and enhancements.
the original behavior.

New functions that provide additional types of taxonomic merging include

Expand All @@ -46,7 +45,7 @@ New functions that provide additional types of taxonomic merging include
taxa. This function provides an alternative to the indirect phylogenetic
merging done by `tip_glom()` that is much faster and arguably more intuitive.

See [NEWS.md](./NEWS.md) for details and examples.
See the [Changelog](news/index.html) for details and examples.

## Installation

Expand Down
6 changes: 2 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,7 @@ functions include
My general aim is for these functions to be drop-in replacements for
phyloseq’s versions; however, there are small differences that should
not affect most use cases. In some functions, I have added optional
arguments to allow modifying the phyloseq behavior. See
[NEWS.md](./NEWS.md) for information about these differences and
enhancements.
arguments to allow modifying the original behavior.

New functions that provide additional types of taxonomic merging include

Expand All @@ -40,7 +38,7 @@ New functions that provide additional types of taxonomic merging include
phylogenetic merging done by `tip_glom()` that is much faster and
arguably more intuitive.

See [NEWS.md](./NEWS.md) for details and examples.
See the [Changelog](news/index.html) for details and examples.

## Installation

Expand Down
12 changes: 12 additions & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
url: https://mikemc.github.io/speedyseq
reference:
- title: "Merging taxa"
- contents:
- merge_taxa_vec
- ends_with("glom")
- title: "Data manipulation"
- contents:
- psmelt
- title: "Plotting"
- contents:
- starts_with("plot")
148 changes: 148 additions & 0 deletions docs/404.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 66c527e

Please sign in to comment.