ordination101.Rmd

---
title: "Basic Ordination Methods"
subtitle: "Ordination for Ecological Methods"
author: "Jari Oksanen"
date: "Nov 12 to 21, 2018"
output:
  xaringan::moon_reader:
    css: ['default', './resources/gavins.css']
    lib_dir: libs
    nature:
      highlightStyle: github
      highlightLines: true
      countIncrementalSlides: false
      ratio: '16:9'
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(comment=NA,
                      echo = FALSE, message = FALSE, warning = FALSE, cache = TRUE,
                      fig.align = 'center', fig.height = 5, fig.width = 5.5, dev = 'svg')
knitr::knit_hooks$set(crop.plot = knitr::hook_pdfcrop)
options(htmltools.dir.version = FALSE)
knitr::opts_chunk$set(rgl.newwindow = TRUE)
```
```{r packages, include = FALSE, cache = FALSE}
require('vegan')
if (packageVersion('vegan') < '2.5.0') # in github
   stop("needs vegan 2.5-0")
data('varespec', 'varechem', 'mite', 'mite.env', 'dune', 'dune.env', 'BCI', 'BCI.env', 'pyrifos')
data('dune.phylodis','dune.taxon')
require('natto') # in github, natto::raodist()
require('vegan3d')
if (packageVersion('vegan3d') < '1.1.1') # in github
   stop("needs vegan3d 1.1-1")
# require('labdsv')
data('bryceveg','brycesite', package = "labdsv")
require(analogue)
data(abernethy)
library('mgcv')
require('knitr')
require('viridis')
```
class: bottom

Source files of these slides are [in github](https://github.com/jarioksa/ordination101).

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.

© Jari Oksanen 2017&mdash;2018

This version was built on `r date()`

<!-- ********** INTRO ************ -->
---
# This Is Ordination

.center[
```{r whatisthis}
par(mar=c(4,4,0,0)+ .1)
palette(viridis(8))
foo <- cca(varespec ~ Al + P + K, varechem)
plot(foo, type="n")
points(foo, dis="si", pch=21, col=2, bg=6)
ordilabel(foo, dis="sp", col=4, cex=.8, priority=colSums(varespec))
text(foo, dis="cn", col=2, cex=1.6, lwd=2)
```
]
---
class: inverse center middle

# Ordination: Why and What?
---
# Multivariate Data

- Ordination methods are multivariate graphical tools to display
  multivariate data

- *Multivariate data* have several variables and several sampling
  units (SU), and such data are best analysed with
  *multivariate methods*

- We do not have one obvious dependent variable in multivariate data,
  but we would still want to understand the major features of the data

  - Species abundances in sampling sites, OTUs of DNA sequence
    analysis in sampling units, measurements of chemical and physical
    conditions in sampling stations *etc*

- If we have a clear hypothesis with one dependent variable, it is
  better to use univariate methods

- Sometimes we can summarize multiple variables into one indicator
  (*e.g*, diversity), and then we can use univariate methods

- If all else fails, we *must* use multivariate methods

---
# Multivariate Data and Display

.pull-left[
.small[
```{r echo=FALSE}
varespec
```
]
]

.pull-right[
```{r mvexample, results=FALSE}
par(mar=c(4,4,0,1)+.1)
palette(viridis(8))
m <- metaMDS(varespec)
m <- MDSrotate(m, varechem$Humdepth)
ef <- envfit(m, varechem)
plot(m, type="n")
text(m, dis="si", col=1, cex=0.8)
text(m, dis="sp", col=5, cex=0.8)
plot(ef, col=3, p.=0.05)
```
]
---
# Reading Ordination Plots

.pull-left[

- The ordination graphs shows the main features of similarities and differences in the data

- If two SUs are close to each other, they have similar communities
- If two SUs are far away from each other, the communities differ

- If two species are close to each other, they have similar occurrence
  patterns, and occur most abundantly in SUs that are close to them

- It can be assumed that environmental conditions are behind these
  differences and we can identify the external variables that explain
  the ordination structure

]

.pull-right[
```{r mvexample2}
par(mar=c(4,4,0,1)+.1)
palette(viridis(8))
# redraw the previous
plot(m, type="n")
text(m, dis="si", col=1, cex=0.8)
text(m, dis="sp", col=5, cex=0.8)
plot(ef, col=3, p.=0.05)
```
]

---
# Why Ordination?

- Ordination provides a graphical summary of main structure in the data

- Instead of looking at a huge number of variables with a huge number
  of observations, we compress multidimensional data into two (or
  three) dimensions that we can easily inspect and comprehend

- Ordination is **descriptive**, but it can help us to understand the
  main features of the data

- Often ordination is *private:* we familiarize with the data so that
  we can formulate research hypotheses, check the adequacy of our
  hypothesis, and communicate the results to other people

  - *Private:* we do it, but we do not tell anybody what we do in our privacy

  - Ordination need not be published or shown to anyone to be useful for you

- Sometimes we can build rigorous tests on ordination results: these
  summarize the major structure and allow using simpler test
  procedures

  - Some ordination methods allow testing within themselves

- Many people never need ordination, but some do &mdash; that all
  depends on your research questions

---
# The Purpose of These Lectures

- Provide information on ordination for those who may need it

- Even if you do not need them yourself, it is useful to understand
  how to interpret publications that use ordination

- These lectures are not about using ordination software, but they are
  about the **methods**

  - There are several alternative software implementations to perform
    these analyses, although the lecture slides were written using
    **R** statistical language and mainly with **vegan** package

  - There are no **R** commands in these slides, but if you *want* to
    see them, look at the
    [source code](https://github.com/jarioksa/ordination101)
    of these lectures

- I avoid mathematics, but also over-simplification and
  superficiality: I try to *explain* how these methods work in terms
  of common statistical tools, for instance, regression analysis (and
  that is about how complicated it becomes)

- I would be really pleased if you understand what I say: Please
  interrupt me and ask as soon as you get lost

---
# The Structure of These Lectures I

In the first part of lectures I describe four main basic ordination
methods:

- Principal Components Analysis (PCA) is just an extension of
  regression analysis, and a simple summary of the data. I also
  explain why it fails often with ecological community data.

- Correspondence Analysis (CA) is a variant of PCA, but that small
  difference makes it much better for community data.

- In Principal Coordinates Analysis (PCoA) the main focus is what we
  actually mean with things being similar, and how we can use
  ecological judgement in defining similarity.

- Nonmetric Multidimensional Scaling (NMDS) is presented as a flexible
  and robust tool that often is most reliable in finding interpretable
  results.

- I also explain how to interpet ordination results with the help of
  external information, such as measurements of chemical or physical
  parameters and other environmental conditions

---
# The Structure of These Lectures II

In the latter part of lectures I describe how to use external
environmental variables to *constrain* ordination. These methods are
useful if we have a controlled experiment and we want to
analyse the statistical significance of multivariate responses. Even
with survey data, the use of external variables can make
interpretation much easier. Some of the main themes are:

- How constrained ordination is defined as an extension of regression
  analysis, and how it can be applied in the context of PCA, CA and
  PCoA.

- How we can test the statistical significance of external variables
  in multivariate setting.

- How we can condition the analysis for some background variables and
  remove their effects before main ordination.

- How we can partition multivariate response to different sources
  variation.

<!-- ********** PCA ************** -->
---
class: inverse center middle

# PCA: Principal Components Analysis

PCA  is the  basic ordination  method and  the one  that Statisticians
know. It is poor for most uses,  but we need to know PCA to understand
how other methods improve upon it.  The main problem of PCA is that it
is  linear, but  community  data are  nonlinear.
---
# From Regression to Ordination

- What is the best possible predictor variable for a species? --- It
  is the species itself!

  - If you predict values of $y$ with $y$, you get a perfect fit and
    explain $y$ completely (but what about other species?)

- Some species can also be powerful in predicting abundances of
  other species

- The best possible predictor is such a (linear) combination of
  species that explains abundances of all species as well as possible

- That linear combination is the first PCA axis
---
# 3 Cryptogams in Reindeer Pastures
 
```{r bottpredict, fig.width=8, fig.height=5}
bott <- varespec[, c("Pleuschr","Cladrang","Cladstel")]
bott0 <- scale(bott, scale = FALSE)
par(mar = c(4,4,0,1)+.1)
palette(viridis(8))
panel.lm <- function(x, y, col.smooth = 1, tcex=1.5, ...)
{
    usr <- par("usr"); on.exit(par(usr))
    points(x, y, ...)
    mod <- lm(y ~ x - 1)
    txt <- paste0("Resid.Var = ", round(summary(mod)$sigma^2, 1))
    abline(mod, col = col.smooth, lwd=2)
    text(mean(usr[1:2]), 0.9*usr[4], txt, cex=tcex)
}
pairs(bott0, pch=16, col = 4, ylim = range(bott0), panel = panel.lm)
```
---
# Best Predictor
```{r bottpc1, fig.width=9.5, fig.height=4}
mod <- rda(bott0)
koef <- mod$CA$v[,1]
xlab = paste(formatC(koef,digits=2, flag="+"), names(koef), sep="*",collapse="")
PC1 <- (bott0 %*% mod$CA$v)[,1]
df <- data.frame(bott0)
palette(viridis(8))
par(mar = c(4,4,0,1)+.1, mfrow=c(1,3))
ylim <- range(bott0)
plot(Pleuschr ~ PC1, df, type="n", ylim=ylim, xlab=xlab, cex.lab=1.2)
panel.lm(PC1, df$Pleuschr, pch=16, col=2, col.smooth=4)
plot(Cladrang ~ PC1, df, type="n", ylim=ylim, xlab=xlab, cex.lab=1.2)
panel.lm(PC1, df$Cladrang, pch=16, col=2, col.smooth=4)
plot(Cladstel ~ PC1, df, type="n", ylim=ylim, xlab=xlab, cex.lab=1.2)
panel.lm(PC1, df$Cladstel, pch=16, col=2, col.smooth=4)

```
---
# What was Explained?

- PCA explains the **variance** of the data: the squared
  differences between observed values $y$ and species means $\bar y$:
  $\mathrm{Var}(y_j) = \sum_{i=1}^N(y_{ij} - \bar y_j)^2/(N-1)$
  - Ordination always centres data so that species means $\bar y = 0$
  
  - The total variance (sum of species variances) is called **inertia**

- In the example, the variances of three species were
  `r round(apply(bott0, 2, var), 1)`, with sum
  `r round(mod$tot.chi, 1)`, and the sum of residual variances of the PC
  regression was `r round(mod$tot.chi - mod$CA$eig[1], 1)`

- The variance explained by the first PC is the **eigenvalue** of the
  axis, and it is the difference of total variance and residual
  variance $\lambda_1 =$ `r round(mod$CA$eig[1], 1)` which is
  `r round(100*mod$CA$eig[1]/mod$tot.chi, 1)`% of total variance

- The coefficients (`r round(mod$CA$v[,1], 3)`) of three species are
  the **species scores** used in ordination diagrams, and they are also
  regression coefficients that project centred species abundances to
  the PC
  - Ordination always centres data so that the intercept
  is zero

- The result of this projection are the sampling unit **(SU) scores**
  that are used in ordination diagrams -- this is also called as the
  **principal component**

---
# Second and Further Principal Components

- Second PC has similar components, but is **orthogonal** to the
  previous one:

- Species scores and SU scores are orthogonal to (uncorrelated with)
  corresponding scores in the previous axes

- The second PC explains the largest possible amount of remaining
  variance
  - In the example, its eigenvalue is $\lambda_2 =$ `r round(mod$CA$eig[2], 1)`
    which is  `r round(100*mod$CA$eig[2]/mod$tot.chi, 1)`% of total variance

- The sum of all $K$ eigenvalues is equal to the variances of all
   species (inertia) $\sum_{k=1}^K \lambda_k = \sum_{j=1}^S \mathrm{Var}(y_j)$:
   The axes decompose variance into independent ordered components

- In modern software, we do not find second and further PCs one-by-one
  with orthogonalization, but the interpretation of the results is
  still similar

---
# Is This Circular?

- We explain species with species &mdash; and this sounds badly circular

- We perform no statistical testing, and what looks regression
  actually is **rotation** of species data

- We only like to find a way of looking at the data so that some first axes
  (say PC1 and PC2) show as much of the variance of the data as possible

- Then we can ignore the later axes and say that a **major part** of
  variation is displayed in the first dimensions that we
  can plot

- PCA is **not** a statistical method, but it is only a rotation of
  the data

---
# Species Scores in Multidimensions

.pull-left[

- We had species scores for a single axis, but in 2 and more dimensions
  we must look at all dimensions simultaneously

- The coefficients together define the **direction** to which the
  species abundance **increases** most steeply &mdash; and it
  **decreases** to the opposite direction

- The response is linear in 2D: the contours of linear trend surface are
  perpendicular to the arrow

- The estimated abundances of all species on each SU can be read by
  projecting the SU point to the arrow: PCA **approximates** data

- Species were centred: The zero contour goes through the origin

]

.pull-right[
```{r plot3sp}
par(mar=c(4,4,0,1)+.1)
palette(viridis(8))
biplot(mod, type = c("text","points"))
tmp <- ordisurf(mod ~ Cladstel, df, add = TRUE, col=4, knots=1)
```
]

---
# Distance and Direction From the Origin

.pull-left[

- In the origin, all species occur at their average abundances

- The further away from the origin SU is situated, the more strongly
  its species composition differs from the average

- For species the angles between arrows indicate the similarity of
  change, and the length of the arrow the steepness of the change:
  right angle means no relation, zero angle identical response, and
  direct angle opposite responses

- Try yourself: where these species are most abundant and scarcest?
  Which species attain highest abundances and where? Which species
  replace each other and which are independent from each other?
]

.pull-right[

```{r plot3sp2}
par(mar=c(4,4,0,1)+.1)
palette(viridis(8))
biplot(mod, col=c(4,1))
```

]
---
# Rotation in Species Space

.pull-left[

- In **species space** every species is an *axis* (or a *dimension*)

- Each axis is at right angle against all other axes

- The SUs are points, and the abundance of each species gives its location
  in species axes

- There can be a huge number of species axes, and we cannot see them all

- We **rotate** the species space so that we see as much as possible of the 
  locations of the points: This is the **PCA!**

- Originally we had a *species*-dimensional space, and we *reduce* it
  to two or three dimensional space

]

.pull-right[
```{r specspace}
palette(viridis(8))
ordiplot3d(bott0, mar=c(4,4,0,2), angle=20, pch=16, col=2, ax.col=4)
```
[Please Click Me!](http://cc.oulu.fi/~jarioksa/opetus/metodi/3D/PCArotation/index.html)

]
---
# Complete Reindeer Pasture Data

.pull-left[

- Analysis of complete data is pretty similar to the analysis of these
  three species

- PCA analyses variances and shows absolute differences from the species mean
  $y - \bar y$: minor species change little in absolute units

- Actually only some few species influence results, and the data are
  not very multivariate

]

.pull-right[
```{r pcavare}
par(mar=c(4,4,0,1)+.1)
palette(viridis(8))
modf <- rda(varespec)
biplot(modf, col=c(4,1))
```
]

---
# Explaining Variance

.pull-left[

- Abundant species have high variance, and PCA explains variance

- It may be very easy to explain variance when only a few species
  account for the most, because then data are not very
  multidimensional

- Minor species may be ecological indicators, but they have little
  effect on community ordination with PCA

- Three abundant cryptogams accounted for
  `r round(100*mod$tot.chi/modf$tot.chi, 1)`% of total variance in the complete
  data set, and are dominant in PCA

]

.pull-right[

```{r varabu}
par(mar=c(4,4,0,1)+.1)
v <- apply(varespec, 2, var)
plot(colMeans(varespec), v, xlab="Mean Abundance", ylab="Variance", type="n")
ordilabel(cbind(colMeans(varespec), v), priority = v, fill = "yellow", cex=1)
```

]
---
# Equal Weights to All Species

.pull-left[

- Use correlation instead of variance and minimize residuals $1-R^2$ in
  PC regression: This gives equal weights to all species

- Equalized data are more multidimensional, and scarce species will
  also influence the result, and the ordination changes from the previous

- PCA implies that species should be shown by arrows, but this
  makes graphs very messy, and often it makes sense to drop arrows and
  just show the species scores at arrowheads

- They still should be interpreted like arrows: **Distance** and
  **direction** from the origin counts

]

.pull-right[
```{r pcacorr}
mod1 <- rda(varespec, scale=TRUE)
palette(viridis(8))
par(mar=c(4,4,0,1)+.1)
plot(mod1, scaling="sites", type="n")
text(mod1, scaling="sites", dis="site", col = 1, cex=0.8)
text(mod1, scaling="sites", dis="species", col=4, cex=0.8)
```
]
---
class:  inverse center middle

## PCA Trouble: It is Linear!
---
# Linear Method

- PCA builds upon the idea of *linear regression*

- PC axes are sometimes called as latent variables: unknown and hidden
  real variables that we dig up from data

- Community response to environmental variables is non-linear: PC axes
  cannot find such variables

```{r mtfgrad, fig.width=9, fig.height=3}
source('data/mtfit.R')
palette(viridis(6))
par(mar=c(4,4,0.4,1)+.1)
matplot(altfit, mtfit, type="l", lty=1, xlab="Altitude (m)", ylab="Species Response")
```
Mt Field (Tasmania): Fitted Responses
---
# Horseshoe: Beware!

.pull-left[

- If species have nonlinear, unimodal responses to gradients, the
  gradient appears as a curve: **horseshoe artefact**

- People often want to interpret axes as latent variables, but even a
  single gradient is not an axis: **Beware**

- Even if you are aware of the risk of horseshoe, it can hide other
  shorter gradients: Using two dimensions for one gradient is wasteful
  and confusing

- In general, PCA **should not be used** for community data

- PCA is useful for linear data, or for deriving new combined
  variables for other analyses when PCA explains a large proportion of
  variance

]

.pull-right[
```{r horseshoe}
mod <- rda(mtfit)
palette(viridis(8))
par(mar=c(4,4,0,1)+.1)
plot(mod, dis="si", scaling="si", type="n")
points(mod, dis="si", scaling="si", pch=16, col=2)
```
PCA of the expected species responses of previous slide
]
---
# Horseshoe is Common

```{r morehorses, fig.width=11}
palette(viridis(8))
par(mfrow=c(1,3), mar=c(4,4,2,1)+.1)
m1 <- rda(varespec)
m2 <- rda(bryceveg)
m3 <- rda(mite)
plot(m1, dis="si", type="n", main="Reindeer Pastures")
points(m1, dis="si", pch=16)
plot(m2, dis="si", type="n", main="Bryce Canyon")
points(m2, dis="si", pch=16)
plot(m3, dis="si", type="n", main="Oribatid Mites")
points(m3, dis="si", pch=16)
```
---
# When Would We Expect a Horseshoe?


- **Always** with community data

- Even when we cannot see the horseshoe, it may be hidden below the
  crumble of noisy analysis

- Horseshoe can hide underlying gradients and disperse their effect
  over several PCs so that they cannot be found and interpreted

- Horseshoe can be expected also in homogeneous data with short
  gradients

---
# Niche for PCA

.pull-left[

- New synthetic variables from correlated measurements to avoid
  problems of collinearity in statistical analysis

  - Using PCs as explanatory variables in regression

- For this PCA should explain a large amount of variance

- Use only the subset of correlated variables that may be assumed to
  describe the same underlying (latent) variable: **measurement
  model** in **Factor Analysis**

- Use correlations when variables are not measured in the same units

- Do not use axes directly: they may not be parallel to interpretable
  variables

]

.pull-right[

```{r latent}
mod <- rda(varechem, scale=TRUE)
palette(viridis(8))
par(mar=c(4,4,1,1)+.1)
biplot(mod, display="sp", col=1)
```
.small[
Soil data for reindeer pastures
]
]

<!-- ********* CA ************ -->
---
class: inverse center middle

# CA: Correspondence Analysis

Correspondence Analysis is an eigenvector method just like PCA, and it
can be seen as its variant. However, this small difference makes it
handle nonlinear species responses better than PCA. For community
data, CA is usually a much better alternative than PCA.

---
# Correspondence Analysis

- In PCA we explain the differences of species from their averages
  $y_{ij} - \bar y_j$ using their mean squares (variance) as a measure
  of goodness of fit

- In CA the expected value is derived both from SUs and species,
  and the goodness of fit is $\chi^2$

- We first scale the observed data so that their sum is one
  $\left( \sum_{i=1}^N \sum_{j=1}^S y_{ij} = 1 \right)$
  &mdash; that is, we divide data with their total

- If all communities have identical species composition, and all
  species occur in equal proportion in all SUs, then the expected
  abundance is $r_i c_j$ where $r_i$ is row (SU) sum and $c_j$ is the
  column (species) sum: This takes the place of $\bar y_j$ of PCA

- We look at proportional difference
  $(y_{ij} - r_i c_j)/\sqrt{r_i c_j}$, and its square is
  $\chi^2 = \sum_i \sum_j (y_{ij} - r_i c_j)^2/(r_i c_j)$:
  This takes the place of variance of PCA

- Then we proceed with regression to find the axes, just like in PCA
  (except that we need to use weights $r_i$, $c_j$)

---
# Small But Important Difference

.pull-left[

- CA is based on similar regression as PCA, but with
  $\chi$-standardized data and weights

- Regression is linear, but *observed* values of data rather pack
  together so that high values are close to each other along the axis

- The response of species can be approximately unimodal: This matches
  the gradient model of species responses

- The solution can be better related to environmental variables than in PCA

- CA is still **not** a completely unimodal ordination and it still
  shows a curve artefact, but not as garbled as PCA

]

.pull-right[

```{r cagam}
palette(viridis(3))
par(mar=c(4,4,1,1)+.1)
mod <- cca(bott) # from PCA lecture
CA1 <- mod$CA$u[,1]
matplot(CA1, bott, pch=16, xlab = "CA axis 1", ylab="Species Abundance")
legend("top", colnames(bott), pch=16, lty=1, col=1:3, bty="n")
i <- order(CA1)
for(k in 1:3) {
   m <- gam(bott[,k] ~ s(CA1, k=4), family=quasipoisson)
   lines(CA1[i], fitted(m)[i], lwd=2, col = k)
}
```
]

---
# PCA Orders Linearly, CA Packs

.pull-left[
```{r pcatabasco}
tabasco(varespec, rda(varespec), scale="log", col=viridis(10))
```
**PCA:** Rows and columns ordered by PC1
]

.pull-right[
```{r catabasco, }
tabasco(varespec, cca(varespec), scale="log", col=viridis(10))
```
**CA:** Rows and columns ordered by CA1
]

---
# More Clearly in Sparse Data

.pull-left[
```{r pcatabadune}
tabasco(dune, rda(dune), col=viridis(5))
```
**PCA:** Dutch Dune Meadows
]

.pull-right[
```{r catabadune, }
tabasco(dune, cca(dune), col=viridis(5))
```
**CA:** Diagonally structured table
]
---
# CA Preserves Order

- CA can also have a curve artefact with dominant long gradient, but
  it is not garbled and involuted like the PCA horseshoe

- First CA axis preserves the correct ordering: CA is a **seriation method** 

```{r abernethy, fig.width=7.5, fig.height=3.5}
par(mar=c(4,4,0.2,1)+.1, mfrow=c(1,2))
palette(viridis(8))
m0 <- rda(abernethy[,1:36])
m1 <- cca(abernethy[,1:36])
plot(m0, dis="si", scaling="si", type="n")
ordiarrows(m0, rep("a", nrow(abernethy)), scaling="si", col=7)
points(m0, dis="si", scaling="si", pch=16, col=1)
plot(m1, dis="si", scaling="si", type="n")
ordiarrows(m1, rep("a", nrow(abernethy)), scaling="si", col=7)
points(m1, dis="si", scaling="si", pch=16, col=1)
```
.pull-right[
Abernethy Forest pollen data (5515 to 12145 BP)
]
---
# Unimodal Responses in 2D

.pull-left[
- CA packs species also in multidimensional space

- Instead of an arrow of increase, the species score may be regarded
  as a *centre of abundance*

- It is said that CA approximates the unimodal model

- We may think that the species scores gives the species maximum and
  the abundance decreases to every direction from the centroid given
  by the species score

- In PCA species close to the origin changed little and was poorly
  presented by the ordination, but in CA it really may have its
  optimum there

]

.pull-right[
```{r cascores}
par(mar=c(4,4,0.9,0.5)+.1, mfrow=c(2,2))
palette(viridis(8))
mod <- cca(dune)
with(dune, tmp <- ordisurf(mod ~ Poaprat, bubble=2, family=quasipoisson, knots=2, col=6, scaling="si", main="Poa pratensis"))
abline(h=0, v=0, lty=3)
with(dune, tmp <- ordisurf(mod ~ Trifrepe, bubble=2, family=quasipoisson, knots=2, col=6, scaling="si", main="Trifolium repens"))
abline(h=0, v=0, lty=3)
with(dune, tmp <- ordisurf(mod ~ Lolipere, bubble=2, family=quasipoisson, knots=2, col=6, scaling="si",main="Lolium perenne"))
abline(h=0, v=0, lty=3)
with(dune, tmp <- ordisurf(mod ~ Juncarti, bubble=2, family=quasipoisson, knots=2, col=6, scaling="si", main="Juncus articulatus"))
abline(h=0, v=0, lty=3)
```
]

---
# Numbers and Interpretation

.pull-left[

- Sum of all eigenvalues (inertia) $\sum_k \lambda_k = \chi^2$, and
  $\lambda_k / \chi^2$ gives the proportion explained by the axis

- This inertia is **not** variance but $\chi^2$, and eigenvalues or percentages
  cannot be compared with PCA

- The eigenvalue is bound to be $\lambda \le 1$ (and $\lambda \ge 0$)

- High eigenvalue is good, but too high (say, $\lambda > 0.7$) are
  suspicious: disjunct or partly disjunct data where a group of SUs
  shares hardly anything (or nothing) with others

- Species scores can be seen as points indicating species optima
  instead of arrows of increase in PCA

- Both species and SU scores are **weighted averages** of each other
  &mdash; but exact relationship depends on scaling in graphs

]

.pull-right[

```{r caplot}
palette(viridis(8))
par(mar=c(4,4,0,1)+.1)
plot(mod, scaling="sites", type="n")
text(mod, scaling="sites", dis="site", col = 1, cex=0.8)
text(mod, scaling="sites", dis="species", col=4, cex=0.8)
```

]

---
# Weighted Averages and Scaling

.pull-left[
```{r caspecwa}
par(mar=c(4,4,1,1)+.1)
palette(viridis(8))
with(dune, tmp <- ordisurf(mod ~ Lolipere, bubble=2.5, family=quasipoisson, knots=2, main="scaling='species'", pch=21, col=1, bg=4))
text(mod, dis="sp", cex=0.8, col=3)
with(dune, ordispider(mod, Lolipere>0, w=Lolipere, label=FALSE, show=TRUE, col=4))
 with(dune, plot(envfit(mod ~ rep("A",nrow(dune)), w=Lolipere), labels="Lolipere", bg=8))
```
**Dots:** SUs, size: *Lolium perenne*, **Text:** species
]

.pull-right[
```{r casitewa}
par(mar=c(4,4,1,1)+.1)
palette(viridis(8))
plot(mod, scaling="si", type="n", main="scaling='sites'")
ordispider(mod, unlist(dune[18,]>0), scaling="si", w = unlist(dune[18,]), show=T, display="sp", col=4)
cl <- rep("white", nrow(dune))
cl[18] <- palette()[8]
ordilabel(mod, dis="si", cex=0.8, scaling="si", fill=cl)
points(mod, scaling="si", display="sp", cex=unlist(dune[18,]/2+0.4), pch=21, bg=4, col=1)
```
**Text:** SUs, **Dots:** species, size: abundance in SU 18

]
---
# Rare Species

.pull-left[

- Rare species are often extreme in CA

- CA explains deviation from expected abundance, and it expects every
  species occur at every SU &mdash; rare species occur only in one or
  two

- CA assesses deviation as proportional to expected &mdash; rare
  species have low expected values and look extreme

- CA is weighted, and rare species have low weights &mdash; they do
  not influence ordination of SUs or other species very much

- Some people tell you to routinely remove rare species from the data,
  but I do not recommed doing so

]

.pull-right[
```{r carare}
par(mar=c(4,4,0,1)+.1)
cl <- rev(viridis(5))
fr <- colSums(bryceveg > 0)
fr <-  cut(fr, breaks=c(0,1,3,7,15,100))
bmod <- cca(bryceveg)
plot(bmod, dis="sp", type="n")
points(bmod, dis="sp", pch=21, bg=cl[fr])
legend("topleft", c(" 1"," 2 - 3"," 4 - 7"," 8 - 15","16 - 87"), pch=21, pt.bg=cl, title="Species Frequency")
```
Bryce Canyon, species scores
]
---
class: inverse center middle

# Interpretation: Ordination and External Variables

Gradient model is the deep idea of ordination: If sampling units are
close to each other in ordination, they occur in similar environments,
and if they are far away in ordination, the environments differ. If our
ordination model is such that it can be assumed to be consistent with
gradient model (*i.e.*, it is able to handle unimodal species
responses), we can try to identify the underlying environmental
variables that cause the ordination structure.

---
## Fitted Vectors

.pull-left[

- Most popular method of displaying continuous variables

- Can show many variables in one graph

- Implies a **linear trend** surface, similarly as species arrows in PCA

- **Direction** shows the steepest change in variable: the direction
  of the **gradient**
  - Gradients are **not** parallel to axes: do **not** interpret axes, but gradients

- **Length** shows the relative importance or the strength of the variable

- The arrow is drawn from the origin and shows the direction of
  increase: there is equally strong **decrease to opposite** direction

]

.pull-right[
```{r envfit}
par(mar=c(4,4,0,1)+.1)
palette(viridis(8))
vmod <- cca(varespec)
plot(vmod, dis="si", type="n")
points(vmod, pch=21, col=1, bg=4)
ef <- envfit(vmod ~ ., varechem)
plot(ef, col=1, bg=8, cex=1)
```
Reindeer Pastures and Soil data
]
---
# Linear Trend Surface: Is it Adequate?

```{r ordisurf, fig.width=10}
palette(viridis(8))
par(mar=c(4,4,0,1)+.1, mfrow=c(1,2))
tmp <- ordisurf(vmod ~ Al, varechem, bubble=2.5, pch=21, col=4, bg=4, main="")
abline(h=0, v=0, lty=3)
plot(envfit(vmod ~ Al, varechem), col=1, cex=1, bg=8)
tmp <- ordisurf(vmod ~ N, varechem, bubble=2.5, pch=21, col=4, bg=4, main="")
abline(h=0, v=0, lty=3)
plot(envfit(vmod ~ N, varechem), col=1, cex=1, bg=8)
```

---
# Significance tests

.pull-left[

- The strength of the fitted vector is given by the correlation $r$
  (or its square $r^2$) to the direction of the arrow head

- **Permutation tests:** Rows of environmental variables are shuffled
    into random order and $r^2$ is re-calculated

- The $P$-value is the probability that $r^2$ of randomized values is
  larger or equal to the observed value of fitted vector

- If this probability is low, say $P \le 0.05$, the variable is often
  regarded as significantly different from random

- Each variable is fitted independently, and other (correlated)
  variables do not influence its significance

]

.small[.pull-right[
```{r echo=FALSE, results=TRUE}
ef$vectors
```
]]

---
# Categorical Variables (Factors)

.pull-left[

- For one factor we can show its value for each point (and with tricks
  for two or three factors)

- The mean location or the **centroid** can be shown for several factors

- We also must assess the dispersion of points about the centroid
  &mdash; for this there are several graphical tools (next slide) for
  one factor in time

- Goodness of fit can be assessed as the proportion of total variance
  of points that can explained by the centroids

  - $r^2$ is the same statistic as for vectors, and for factors it
    gives the proportion of variance accounted for by the factor
    centroids


]

.pull-right[
```{r factor}
par(mar=c(4,4,0,1)+.1)
palette(viridis(8))
eff <- envfit(mod ~ Management, dune.env)
plot(mod, dis="si", type="n")
with(dune.env, points(mod, dis="si", pch=21, bg=viridis(4)[as.numeric(Management)]))
plot(eff, labels=levels(dune.env$Management), cex=1, bg=8)
legend("topright", levels(dune.env$Management), pch=21, pt.bg=viridis(4), title = "Management")
```
]
---
## Displaying Factors

```{r factorshow, fig.width=8}
par(mar=c(3,4,0.8,1)+.1, mfrow=c(2,2))
palette(viridis(8))
plot(mod, dis="si", type="n", main="Spider")
ordispider(mod, dune.env$Management, col=viridis(4), label=TRUE)
with(dune.env, points(mod, dis="si", pch=21, bg=viridis(4)[as.numeric(Management)]))
plot(mod, dis="si", type="n", main="Convex Hull")
ordihull(mod, dune.env$Management, draw="poly", col=viridis(4), label=TRUE)
with(dune.env, points(mod, dis="si", pch=21, bg=viridis(4)[as.numeric(Management)]))
plot(mod, dis="si", type="n", main="SD Ellipse")
ordiellipse(mod, dune.env$Management, col=viridis(4), draw="poly", label=TRUE)
with(dune.env, points(mod, dis="si", pch=21, bg=viridis(4)[as.numeric(Management)]))
plot(mod, dis="si", type="n", main="Enclosing Ellipse")
ordiellipse(mod, dune.env$Management, kind="ehull", col=viridis(4), label=TRUE)
with(dune.env, points(mod, dis="si", pch=21, bg=viridis(4)[as.numeric(Management)]))
```

---
# Significance Tests for Factors

.pull-left[
.small[
```{r echo=FALSE, results=TRUE}
eff$factors
```
]

- **Confidence ellipse** is derived from Standard Error ellipse and
    shows the likely location of the centroid
    - SD ellipse shows the dispersion of points

- If confidence ellipses do not overlap, they are approximately
  different at the given level

]

.pull-right[
```{r confell}
palette(viridis(8))
par(mar=c(4,4,0.8,1)+.1)
plot(mod, dis="si", type="n", main="95% Confidence Ellipse, Bonferroni correction")
ordiellipse(mod, dune.env$Management, kind="se", conf=1-0.05/6, draw="poly", col=viridis(4), label=TRUE, cex=1)
with(dune.env, points(mod, dis="si", pch=21, bg=viridis(4)[as.numeric(Management)]))
```
]

---
# Lessons

- Fitted vectors are the most concise method of displaying the effects
  of continuous environmental variables, **but...**

- They imply a linear trend, and you should always check that this is adequate

- The environmental variables are not usually parallel to axes, but
  they are oblique

- You should interpret the directions of the gradients (and the arrow
  points to the steepest gradient) instead of axes

- You should **never** **interpret** **axes**, but directions in the
  ordination space

- Significance tests can help in interpretation of results, but they
  should not be trusted blindly (or not at all): they are only correlations

---
class: inverse center middle

# PCoA: Principal Coordinates Analysis

Principal Coordinates Analysis (PCoA) uses dissimilarities among SUs
instead of rectangular data of SUs and variables. If dissimilarities
are Euclidean distances, then PCoA is equal to PCA. However, the
method can be used with other dissimilarities, and often these are
useful for community analysis. PCoA is also known as metric or classic
multidimensional scaling.
---
# Ordination of Distances

- PCA was based on variances and linear projections of species when
  deriving principal components from regressions

- The same information can be extracted from distances among SUs

- More precisely, these distances are known as Euclidean distances in
  the species space

- Why are they called **Euclidean**? &mdash; Because they **are**
  distances in Euclidean space!

- In species space, SUs are points on species axes that are at right
  angle to each others. The *difference* of species $j$ abundance for two
  SUs $i$ and $k$ is $y_{ij}-y_{kj}$ which is the *leg* or *cathetus* of a
  right-angled triangle, and the *squared* Euclidean distance of these two SUs
  over all *catheti* (species) is the squared *hypotenuse*
  $d_{ik}^2 = \sum_{j=1}^S (y_{ij}-y_{kj})^2$

- It makes no sense to use Euclidean distances in PCoA, because PCA
  will give the same result more easily, but we can use other
  dissimilarities which are more useful in ecology

- These dissimilarities are handled similarly as Euclidean distances,
  but the results may be more useful (although there may be some
  geometrical glitches that we can usually ignore)

- All ordination methods we discuss are distance-based, and
  eigenvector methods are based on Euclidean distances (PCA) or weighted
  Euclidean distances of transformed data (CA)

---
## What is Wrong with Euclidean Distances?

- **They have no fixed upper limit,** but distances can vary among SUs
    that have nothing in common, and communities with high abundances
    appear more different than communities with low abundances

    - This can be cured with Euclidean distances of **transformed
      data**
    - For instance *Hellinger* or *Chord* distances have an
      upper limit if there are no shared species
      
    - If a distance can be expressed as a Euclidean distance of
      transformed data, it is better to use PCA with transformed data

- **They are based on squared differences** giving undue influence to
    exceptional high abundances

    - Instead of squared differences $(y_{ij}-y_{kj})^2$ we can use
      absolute differences $|y_{ij}-y_{kj}|$: *Manhattan* or
      *city-block* distances
      
    - Some dissimilarity measures combine both points: Bray-Curtis
      dissimilarity is based on absolute differences and is scaled to
      maximum value of one:
      $d_{ik} = \frac{\sum_j |y_{ij}-y_{kj}|}{\sum_j (y_{ij}+y_{kj})}$

- **Research interest dictates** to use a specific non-Euclidean
    dissimilarity index
---
# Example: beta Diversity

.pull-left[

- We want to study Whittaker's beta diversity
  $\beta = \gamma/\bar \alpha -1$, where $\gamma$ is the number of species in
  pooled communities, and $\bar \alpha$ is the average number of species in
  communities

- For two communities of $A$ and $B$ species with $J$ shared species, we have
  $\gamma = A+B-J$ and $\bar \alpha = (A+B)/2$ giving
$$\beta=\frac{A+B-J}{(A+B)/2}-1$$
$$~=\frac{2(A+B-J)}{A+B}-\frac{A+B}{A+B}$$
$$~=\frac{A+B-2J}{A+B}$$
   which is Sørensen dissimilarity

]

.pull-right[

```{r}
palette(viridis(8))
par(mar=c(4,4,0,1)+.1)
d <- designdist(varespec, "(A+B-2*J)/(A+B)", "binary")
dmod <- dbrda(d ~ 1)
plot(dmod, type="n")
alpha <- diversity(varespec)
points(dmod, dis="si", cex=sqrt(alpha), pch=21, bg=4, col=1)
```
Reindeer Pastures, $\beta$-diversity, size Shannon $H'$
]
---
# Example 2: Phylogenetic Distance

.pull-left[

- In normal dissimilarities, all species are regarded as equal

- If species are related, they would not contribute a full independent
  species to the dissimilarities

- Here we use Rao distance which is one of many phylogenetic
  dissimilarity indices (and they can equally well be used for
  functional traits *etc.*)

- We put a limit: If species have diverged more than 65 Myr ago, they
  are regarded as completely independent

- PCoA is distance-based, and has no information on species, but these
  can be projected to the ordination afterwards similarly as species
  scores in PCA

]

.pull-right[
```{r}
palette(viridis(8))
par(mar=c(4,4,0,1)+.1)
plot(hclust(dune.phylodis), hang=-1, ylab="Time (Myr)", main="", sub="", xlab="")
abline(h=65, col=4)
```
Data for Tree: Zanne *et al* (2014), *Nature* **506,** 89&mdash;92.
]
---
# Example 2: Phylogenetic Distance

```{r fig.width=11}
palette(viridis(8))
par(mar=c(4,4,0,1)+.1, mfrow=c(1,2))
m0 <- rda(dune)
pl <- plot(m0, type="n")
text(pl, "sites", pch=21, col=1, bg=1)
text(pl, "species", col=3, cex=0.8)
ordispider(m0, dune.taxon$Family, display="species", label=TRUE, col=3, cex=0.8)
d <- distrao(dune, dune.phylodis, dmax=65)
md <- dbrda(d ~ 1, sqrt.dist = TRUE)
sppscores(md) <- dune
pl <- plot(md, type="n")
text(pl, "sites", pch=21, col=1, bg=1)
text(pl, "species", col=3, cex=0.8)
ordispider(md, dune.taxon$Family, display="species", label=TRUE, cex=0.8)
```
---
class: inverse center middle

# NMDS: Nonmetric Multidimensional Scaling

PCA, CA & PCoA are eigenvector methods. They map community
dissimilarities linearly so that with the maximum number of dimensions
the dissimilarities are exactly reproduced, but some first dimensions
show as much as possible of the dissimilarities. NMDS is completely
different: it maps dissimilarities non-linearly onto low-dimensional
ordination. It assumes very little of data or response models and is
therefore more robust than eigenvector methods with their strict
parametric mapping.
---
# Eigenvector Methods: Linear Mapping

.pull-left[

- Eigenvector methods *decompose* observed dissimilarities

- With all possible axes, they reproduce exactly observed dissimilarities
- First axes show the largest possible amount of observed dissimilarities

- They map observed dissimilarities linearly so that they approach the
  original dissimilarities from below: each new axis *adds* to
  estimated dissimilarities

- Eigenvector methods may have their fixed way of defining observed
  dissimilarities: **PCA** uses Euclidean distances and **CA** uses
  $\chi$-distances (which are Euclidean distances of
  $\chi^2$-transformed data)

]

.pull-right[

```{r}
par(mar=c(4,4,0,1)+.1)
palette(viridis(8))
stressplot(cca(mite), l.col=2, p.col="white", bg=6, pch=21)
```
Oribatid mites: 2-dim CA of $\chi$-distances

]

---
## NMDS: Monotone Mapping

.pull-left[
- NMDS maps dissimilarities non-linearly onto low-dimensional ordination

- **Monotone regression**: Euclidean distances of points in the
    ordination space are **rank-order similar** to community
    dissimilarities


- Rank-order similar: if *observed dissimilarity* $d_{a} < d_{b}$ then
  Euclidean *ordination distance* $\delta_{a} \le \delta_{b}$
  
- Rank orders of observed dissimilarities cannot be exactly
  preserved by rank-orders of ordination distances in low-dimensional
  solutions, and this causes **stress**

- **Stress:** scatter of observed dissimilarities against
    expected monotone regression

]

.pull-right[
```{r}
par(mar=c(4,4,0,1)+.1)
palette(viridis(8))
mmod <- metaMDS(mite, trace=FALSE, parallel=2)
stressplot(mmod, l.col=2, p.col="white", bg=6, pch=21)
```
.small[Oribatid mites: NMDS in 2D]

]
---
## Stress and Iterations

- Stress is the spread of points from the
  monotone regression step line

- The ordination space is *metric* and Euclidean: the *non-metric*
  refers only to the regression used to transform observed
  dissimilarities (the step line in plots)

- NMDS is computationally much more demanding than eigenvector methods
  that are based on linear algebra (and the moden algorithms are
  really good in linear algebra)

- The solution is iterative and there is no guaranteed convergence,
  but we must repeat and judge the solutions for goodness and
  stability

- Stress is (in theory) in the range $0 \ldots 1$, but in practice
  stress $< 0.1$ is extremely good, stress close to $0$ hints to
  problems (like insufficient data or disjunctions), and random
  configurations in 2D have stress about $0.4$, leaving stress $0.2$
  usually as very good

---
# Advantages of NMDS

- Eigenvector methods can give good ordination if the
  ordination model is similar to the model of species responses to
  gradients

  - PCA model is linear and poor, but CA can cope with some regular
    unimodal response models and give good results

- NMDS does not assume any specific response model, but it can adjust
  to many models &mdash; even non-regular &mdash; and this makes it
  **robust**

- Robust: assumes little of data and tolerates violations of
  assumptions

- Can use ecological more meaningful dissimilarities than Euclidean
  distances and $\chi$-distances of PCA and CA
  
  - Shares this feature with PCoA (which still applies Euclidean
    mapping of these dissimilarities)

- Champion in tests with simulated data with non-regular unimodal
  species responses

---
## How Eigenvector Methods Differ from NMDS?

- **Origin:** shows the average (expected) community composition in
    eigenvector methods, but has **no** special meaning in NMDS

    - It is convenient to have the origin in the middle of the
      ordination in NMDS, but it means nothing: All that means is the
      configuration of points and distances between points &mdash; and
      these do not depend on the location of the origin

- **First axis** shows the major variation of the *data* (in terms of
    inertia, either variance of $\chi^2$) in eigenvector methods, but
    has **no** special meaning in NMDS

    - It is convenient to rotate NMDS space so that the first axis
      shows the major variation of the *ordination*, but it means
      nothing: All that means is the configuration of points and
      distances between points &mdash; and these do not depend on the
      directions of the axes

    - Alternatively NMDS can be rotated parallel to environmental
      variables

- What is in common is that NMDS and Eigenvector methods (PCA, CA, PCoA)
  have **metric ordination space** that can be interpreted in the same way
  for the configuration of points and using environmental variables
---
# These NMDS Ordinations are Identical

```{r fig.width=11}
palette(viridis(8))
par(mar=c(4,4,0,1)+.1, mfrow=c(1,2))
plot(mmod, type="n")
text(mmod, dis="si", col=1, cex=0.8)
text(mmod, dis="sp", col=5, cex=0.8, xpd=TRUE)
plot(envfit(mmod ~ SubsDens + WatrCont + Shrub, mite.env), col=3, bg=8, xpd=TRUE)
mrot <- MDSrotate(mmod, mite.env$WatrCont)
plot(mrot, type="n")
text(mrot, dis="si", col=1, cex=0.8)
text(mrot, dis="sp", col=5, cex=0.8, xpd=TRUE)
plot(envfit(mrot ~ SubsDens + WatrCont + Shrub, mite.env), col=3, bg=8, xpd=TRUE)

```
---
## Example: Abernethy Pollen Profile

```{r abergrid, fig.width=8, results=FALSE}
palette(viridis(8))
par(mar=c(4,4,0.7,1)+.1, mfrow=c(2,2))
#x <- rbinom(prod(dim(mtfit)), 1, mtfit)
#dim(x) <- dim(mtfit)
x <- abernethy[,1:36]
m1 <- rda(decostand(x, "hell"), scale=FALSE)
pl <- plot(m1, dis="si", scaling="si", type="n", main="PCA: Hellinger")
tmp <- ordisurf(pl, abernethy$Age, add=T, col=4)
grp <- rep(1, nrow(x)) # for ordiarrows
ordiarrows(pl, grp, col=7)
points(pl, dis="si", pch=16, col=2)
m2 <- cca(x)
pl <- plot(m2, dis="si", scaling="si", type="n", main="CA: Chi")
tmp <- ordisurf(pl, abernethy$Age, add=T, col=4)
ordiarrows(pl, grp, col=7)
points(pl, dis="si", pch=16, col=2)
m3 <- dbrda(vegdist(wisconsin(x))~1)
pl <- plot(m3, dis="si", scaling="si", type="n", main="PCoA: Bray-Curtis / Wisconsin")
tmp <- ordisurf(pl, abernethy$Age, add=T, col=4)
ordiarrows(pl, grp, col=7)
points(pl, dis="si", pch=16, col=2)
m4 <- metaMDS(vegdist(wisconsin(x)), trace=FALSE)
m4 <- MDSrotate(m4, abernethy$Age)
pl <- plot(m4, dis="si", scaling="si", type="n", main="NMDS: Bray-Curtis / Wisconsin")
tmp <- ordisurf(pl, abernethy$Age, add=T, col=4)
ordiarrows(pl, grp, col=7)
points(pl, "sites", pch=16, col=2)
```
---
## Stress Plot Indicates NMDS Does Best

```{r aberstress, fig.width=8, results=FALSE}
par(mar=c(4,4,0.7,1)+.1, mfrow=c(2,2))
palette(viridis(8))
stressplot(m1, l.col=2, p.col="white",bg=6,pch=21, main="PCA: Hellinger")
stressplot(m2, l.col=2, p.col="white",bg=6,pch=21, main="CA: Chi")
stressplot(m3, l.col=2, p.col="white",bg=6, pch=21,main="PCoA: Bray-Curtis / Wisconsin")
stressplot(m4, l.col=2, p.col="white",bg=6,pch=21, main="NMDS: Bray-Curtis / Wisconsin")
```
---
# Is NMDS Credible?

.pull-left[

- NMDS is the only method that does not produce a curve &mdash; but is
  this credible?

- There is a cycle in SU sequence in the oldest times. Species scores
  indicate that the cycle is *Rumex* &mdash; *Anthyllis vulneraria*
  &mdash; Graminoids &mdash; *Artemisia* and then going to the linear
  succession *Juniperus communis* &mdash; *Betula* &mdash; *Pinus
  sylvestris* &mdash; *Alnus glutinosa*

- This looks like a credible cycle of pioneer communities (*Rumex*)
  and tundra steppes (*Artemisia*) before going to secular heath and
  forest succession

- All other methods had a curve that hid the periglacial pioneer cycle

]

.pull-right[

```{r abernmds}
par(mar=c(4,4,0,0)+.1)
palette(viridis(8))
sppscores(m4) <- x
pl <- plot(m4, type="n")
text(pl, "sites", labels=abernethy$Age, cex=0.7)
ordiarrows(pl, rep("a",nrow(abernethy)), col=7)
ordilabel(pl, dis="sp", cex=0.7, labels=make.cepnames(colnames(x)), priority=colSums(x), col=3)
```

]

---
class: inverse center middle

# Constrained Ordination

Constrained ordination is also known as canonical ordination. In
unconstrained ordination we used only community data to find its main
structure, and interpreted the results via fitting environmental
variables to the ordination. In constrained ordination we derive the
result directly through environmental variables (constraints) and find
only that structure in the communities that can be explained with
constraints. The results explain a smaller proportion of total
variation than unconstrained ordination, but they are more strongly
interpretable with the constraints. Constrained ordination allows
for statistical testing of the constraints, and it is most useful in
the analysis of designed experiments.

---
# Constrained Ordination

- In *unconstrained* ordination (PCA, CA, PCoA) we explained species
  with species and found the best possible linear predictors of
  observed values

- In *constrained* ordination we explain species with *external
  variables* (such as environment)

- This gives *poorer* results than unconstrained ordination, but more
  easily interpretable because we constrain the results with the very
  same variables we use in interpretation

- In unconstrained ordination we first find the ordination and then
  see how it can be explained

- In constrained ordination we build the ordination so that it can be
  explained with the environmental variables (constraints)

- We use external variables, and we can test their significances

---
# From Regression to Constrained Ordination

- Constrained Ordination: Find the *linear combination* of
  **constraints** that best explains all species


.pull-left[

**Method 1**

- Use constraints as explanatory variables, and find their linear
  combination that has the smallest residual variation for all species

- Use the same linear combination for all species and that is the
  first constrained ordination axis

- This directly gives *regression coefficients* for environmental
  variables and the first constrained axis

]

.pull-right[

**Method 2**

- Fit a linear regression separately for all species using all
  constraints as explanatory variables

- Each species has a different regression model

- Ordinate the fitted values of species as in unconstrained ordination

- This directly gives *species scores* and the first constrained axis

]

---
# Method 1: LC of Constraints

```{r rdamethod1, fig.width=11}
par(mar=c(4,4,1,1)+.1, mfrow=c(1,3))
palette(viridis(8))
cmod <- rda(bott0 ~Al + P, varechem)
rda1 <- cmod$CCA$u[,1]
koef <- coef(cmod)[,1]
koef <- paste(formatC(koef, digits=3, flag="+"), names(koef), sep="*", collapse="")
ylim <- range(bott0)
for(i in 1:3) {
   plot(rda1, bott0[,i], ylim=ylim, xlab=koef, ylab=colnames(bott0)[i], type="n", cex.lab=1.6)
   panel.lm(rda1, bott0[,i], pch=16, col=2, col.smooth=4)
}
```

---
# Method 2: Ordination of Fitted Values

```{r rdamethod2a, fig.width=8}
par(mar=c(4,4,0,1)+.1)
palette(viridis(8))
pairs(fitted(cmod), pch=16, col=4, ylim=range(fitted(cmod)), panel=panel.lm, labels = paste("Fit", colnames(bott0)))
```

---
# Method 2: Ordination of Fitted Values

```{r rdamethod2b, fig.width=11}
par(mar=c(4,4,1,1)+.1, mfrow=c(1,3))
palette(viridis(8))
rda1 <- cmod$CCA$u[,1]
koef <- cmod$CCA$v[,1]
koef <- paste(formatC(koef, digits=2, flag="+"), names(koef), sep="*", collapse="")
fv <- fitted(cmod)
ylim <- range(fv)
for(i in 1:3) {
   plot(rda1, fv[,i], ylim=ylim, xlab=koef, ylab=paste("Fit", colnames(fv)[i]), type="n", cex.lab=1.5, cex=1.2)
   panel.lm(rda1, fv[,i], pch=16, col=2, col.smooth=4)
}
```
---
# What Was Explained?

- **Method 1** explained *total* variance

  - It was `r round(cmod$tot.chi, 1)`, the first axis explained
    $\lambda_1=$ `r round(cmod$CCA$eig[1], 1)`, and all constrained
    axes explained $\Lambda_k=$ `r round(cmod$CCA$tot.chi, 1)`

- **Method 2** explained the variance of *fitted values*

  - It was $\Lambda_k=$ `r round(cmod$CCA$tot.chi, 1)`, and the first
    axis explained $\lambda_1=$ `r round(cmod$CCA$eig[1], 1)`

- We had two constraints (`Al` & `P`) and therefore we had two constrained
  axes: the *rank* of constraints was 2

- Both methods gave the same two constrained axes and the same
  eigenvalues (explained variance)

- The sum of first two *un*constrained eigenvalues was
  `r round(sum(rda(bott0)$CA$eig[1:2]), 1)` which is more than
  constrained $\Lambda_k$

- Unconstrained PC axes are the best possible linear predictors and
  anything else must be worse

- Constrained ordination should have *clearly* lower eigenvalues than
  unconstrained &mdash; or constraints did not have any effect!

---
# Flavours of Constrained Ordination

- **Redundancy Analysis (RDA):** extends PCA

  - Explains variance, shows Euclidean distances

  - PCA was poor for community data, but RDA ordinates fitted values
    &mdash; and these are linear! &mdash; and RDA works OK with
    communities

  - It is still best to use correlations or Hellinger distances to
    tune down superdominant species

- **Constrained or Canonical Correspondence Analysis (CCA):** extends CA

  - Explains $\chi^2$, shows $\chi$-distances
  
  - The most popular approach (but this depends on your cultural
    background: Francophones prefer RDA)

- **Distance-based Redundancy Analysis (dbRDA):** extends PCoA

  - Explains any squared dissimilarities, shows any dissimilarities
  
  - If you are using dissimilarities elsewhere, it is consistent to
    use them in constrained ordination

---
class: inverse center middle

## Looking At the Constrained Ordination

---
# The Scores

- **Species scores**

  - These are also regression coefficients that derive the constrained
    axis from species data

- Two kind of **SU scores**

  - **Linear combination (LC) scores** which are the linear
      combinations of *constraints*

  - Together with the species scores, the LC scores find the *fitted*
    values of each species (of method 2)

  - **Weighted Averages (WA) scores** which are derived from the
      species scores and observed values of the input community

  - Together with the species scores, the WA scores estimate the
    *observed* values of each species

- **Biplot scores** for constraints

  - These are correlations with the constraints and the LC scores for SUs

  - They are found separately for each constraining variable, and
    other constraining variables do not influence their values

---
# Interpretation of Biplot Scores

.pull-left[

- LC scores really are linear combinations of constraints, and when
  projected onto biplot arrows, they are the observed values of constraints

- When WA scores are projected onto biplot arrows, they give the
  prediction of environmental conditions from species data:
  bioindication, reconstruction of environment

- All biplot arrows have correlation $r=1$ over all constrained axes:
  the LC scores are made of them, and therefore the correlation is
  perfect

- We can also have *regression scores* which together with all other
  constraints define the fitted values of community, but they are
  difficult to interpret and never used in graphics

]

.pull-right[

```{r rdabp, results=FALSE}
par(mar=c(4,4,0,1)+.1)
palette(viridis(8))
mrda <- rda(varespec ~ Al + P, varechem)
plot(mrda, dis=c("wa","bp"), type="n", scaling="si")
points(mrda, dis="wa", pch=21, bg=8, col=1, scaling="si")
points(mrda, dis="lc", pch=21, bg=4, col=1, scaling="si")
text(mrda, dis="bp", col=1, lwd=2, scaling="si")
Alfit <- ordisurf(mrda ~ Al, varechem, plot=FALSE, knots=1, display="lc", scaling="si")
Pfit <- ordisurf(mrda ~ P, varechem, plot=FALSE, knots=1,display="lc", scaling="si")
xy <- scores(mrda, dis="wa", choices=1:2,scaling="si")
xn1 <- seq(min(xy[,1]), max(xy[,1]), len=31)
xn2 <- seq(min(xy[,2]), max(xy[,2]), len=31)
xy <- data.frame(expand.grid(x1=xn1, x2=xn2))
fit <- predict(Alfit, newdata=xy)
dim(fit) <- c(31,31)
contour(xn1, xn2, fit, add=TRUE, col=5)
fit <- predict(Pfit, newdata=xy)
dim(fit) <- c(31,31)
contour(xn1, xn2, fit, add=TRUE, col=5)
legend("bottomleft", c("LC","WA"), pch=21, pt.bg=c(4,8), bg="white")
```
Complete Reindeer Pasture Data 
]

---
# LC Scores Are Constraints

```{r bpxy, results=FALSE, fig.width=11}
par(mar=c(4,4,0,1)+.1, mfrow=c(1,2))
palette(viridis(8))
plot(mrda, dis=c("lc","bp"), type="n")
ordisurf(mrda ~ Al, varechem, add=T, knots=1, col=5, display="lc")
ordisurf(mrda ~ P, varechem, add=T, knots=1,col=5, display="lc")
ordilabel(mrda, dis="lc", fill=8, col=2)
text(mrda, dis="bp", col=1, lwd=2)
xy <- varechem[, c("Al","P")]
plot(xy, type="n")
ordisurf(xy ~ Al, varechem, add=T, knots=1, col=5)
ordisurf(xy ~ P, varechem, add=T, knots=1,col=5)
ordilabel(xy, fill=8, col=2)
```
---
# Factor Constraints

.pull-left[

- Factors (categorical variables) are coded as **contrasts**

- For instance, a factor with four classes is coded by three indicator
  variables that express the difference against the first class

  - **0**: another class, **1**: same class

- Ordered factors can be expressed as polynomial contrasts that
  describe the linear, quadratic, cubic *etc* effects of the variable

- These contrasts are used as ordinary continuous variables in the analysis

- Contrasts can be coded in many alternative ways, but all give the
  same fitted results and the same ordination

]

.pull-right[
```{r echo=FALSE}
data(dune.env)
cat("*Dunes -- Management:\n")
contrasts(dune.env$Management)
cat("\n*Dunes -- Moisture (ordered factor):\n")
signif(contrasts(dune.env$Moisture), 4)
```
]

---
# Ordination Display of Factors

.pull-left[

- Factors are displayed by the centroids of their levels

- The ordered factors can be described by biplot arrows for their
  polynomial contrasts in addition to their centroids

- The types of contrasts do not influence the centroids, and we rarely
  need to know the exact type of contrasts used

  - Contrasts are internal to the program, and there rarely is reason
    to show them in graphics

]

.pull-right[
```{r dunefact}
par(mar=c(4,4,0,1)+.1)
palette(viridis(8))
m <- rda(dune ~ Moisture + Management, dune.env)
plot(m, dis=c("cn","wa"), type="n", scaling="si")
points(m, dis="wa", pch=21, col=1, bg=5, scaling="si")
text(m, dis="cn", col=2, xpd=T, scaling="si")

```
]

---
# LC and WA Scores for Factors

.pull-left[

- LC scores are linear combinations of constraints: and this means
  that they really are defined by constraints only

- Equal constraints means equal LC scores &mdash; and the species
  composition has no influence!

- For a single factor constraint, LC scores with the same factor level
  will fall into one and same point

- WA scores show scatter with respect to LC scores: they tell how well
  species composition can *predict* the constraints

- LC are completely defined also for several factors and continuous
  variables, but with larger number of constraints this is not
  immediately visible

]

.pull-right[
```{r lcwa}
par(mar=c(4,4,0,1)+.1)
palette(viridis(4))
m <- rda(dune ~ Management, dune.env)
plot(m, dis=c("lc","wa"), type="n", scaling="si")
ordispider(m, scaling="si", col=3)
points(m, dis="wa", pch=21, col=1, bg=dune.env$Management, scaling="si")
points(m, dis="lc", pch=21, col=1, bg=dune.env$Management, cex=3, scaling="si")
legend("bottomleft", levels(dune.env$Management), pch=21, col=1, pt.bg=1:4)
legend("topleft", c("WA","LC"), pch=21, col=1, pt.bg=1, pt.cex=c(1,3))
```
Dune Meadows constrained by Management
]

---
# WA Scores & Prediction of Environment

.pull-left[

- LC scores *are* the environment, but WA scores are based on
  community composition

- WA scores are as similar to LC scores as possible: they predict the
  environment as well as the community composition is able (with linear
  algebra of ordination)

- This is known as biondication, calibration or reconstruction of
  environment in the various fields of Ecology, Environmental
  Sciences and Palaeoecology

- We should always use *cross-validation* to assess the goodness of
  prediction: test performance of the method in another subset of the
  data than was used in constrained ordination

]

.pull-right[
```{r rdacalib, fig.height=4}
par(mar=c(4,4,0,1)+.1)
palette(viridis(8))
mmite <- rda(decostand(mite, "hell") ~ WatrCont + SubsDens + Topo + Shrub,
   data=mite.env)
cal <- calibrate(mmite)[,"WatrCont"]
plot(cal ~ WatrCont, mite.env, xlab="Observed WatrCont", ylab="WatrCont Predicted from WA scores", pch=21, col="white", bg=4, cex=1.2)
abline(0, 1, lwd=3, col=1)
legend("topleft", "Exact Prediction", lty=1, lwd=3, col=1, bty="n")
```
.small[`r deparse(mmite$call, width.cutoff=500)`]
]

---
class: inverse center middle

## Permutation Tests

Constrained Ordination is based on regression with external variables,
and this allows for statistical analysis of constraints. The response
data are multivariate, correlated and non-Normal, and we cannot use
normal parametric tests, but we must use permutation tests.

---
# Permutation Tests

- The goodness of ordination solution is analysed using similar
  $F$-statistic as in ordinary ANOVA
$$F = \frac{\Lambda_k/p}{\Lambda_r/(n-p-1)}$$
  where $\Lambda_k$ is the constrained inertia explained by $p$
  constraints, $\Lambda_r$ is the inertia of residuals, and $n$ is
  the number of SUs

- The total inertia $\Lambda$ (variance, $\chi^2$, squared
  dissimilarities) is decomposed as $\Lambda = \Lambda_k + \Lambda_r$

- In permutation test the response is shuffled into random order, the
  ordination is repeated and $F$ is re-evaluated

- This gives us a great number of permutation $F$-values of randomized data

- The observed $F$ is put among the permutation $F$, and all $F$ are
  ordered by magnitude

- The $P$-value is the proportional rank of observed $F$: this gives
  the probability to have similar or better $F$ in random data

- If we have 999 permutations and we add the observed $F$ to these,
  the proportial ranks are for 1000 values, and the lowest possible
  $P=$  0.001

---
# Overall Test

.pull-left[
.small[
```{r mano, echo=FALSE}
mano <- cca(dune ~ A1 + Management + Moisture, dune.env)
set.seed(4711)
ano <- anova(mano)
ano
```
]

- `Management` has 4 factor levels and uses 3 contrasts, `Moisture` is a
  4-level ordered factor and uses 3 contrasts, and `A1` is continuous
  variable, and together these use 7 degrees of freedom, and the total
  inertia is `r round(mano$tot.chi, 3)` giving

$$F = \frac{`r round(mano$CCA$tot.chi, 3)`/`r mano$CCA$rank`}{`r round(mano$CA$tot.chi, 3)`/(`r nobs(mano)`-`r mano$CCA$rank` -1)}$$


]

.pull-right[
```{r anova, dependson='mano', echo=FALSE}
par(mar=c(4,4,0,1)+.1)
palette(viridis(8))
# I hate trellis
lstrip <- trellis.par.get("strip.background")
lstrip$alpha <- 0.5
lstrip$col <- rev(viridis(8))[-1]
trellis.par.set("strip.background", lstrip)
densityplot(permustats(ano), pch=16, cex=0.6, col=1)
```
]

---
# Terms: Sequential Tests

.pull-left[
.small[
```{r manoseq, echo=FALSE, dependson='mano'}
set.seed(4711)
(ano <- anova(mano, by = "term"))
```
]

- **Sequential:** Explains residuals of the previous terms

- Order-dependent

- Decomposes inertia: sum equals to total inertia

]

.pull-right[
```{r anovaseq, dependson='manoseq'}
par(mar=c(4,4,0,1)+.1)
palette(viridis(8))
trellis.par.set("strip.background", lstrip)
densityplot(permustats(ano), pch=16, cex=0.6, col=1, layout=c(1,3), as.table=TRUE)
```
]

---
# Terms: Marginal Tests

.pull-left[
.small[
```{r manomar, echo=FALSE, dependson='mano'}
set.seed(4711)
(ano <- anova(mano, by = "margin"))
```
]

- **Marginal:** Explains residuals after all other terms

- Order-independent

- Shows only the unique component of inertia that cannot be explained
  by any other term

]

.pull-right[
```{r anovamar, dependson='manomar'}
par(mar=c(4,4,0,1)+.1)
palette(viridis(8))
trellis.par.set("strip.background", lstrip)
densityplot(permustats(ano), pch=16, cex=0.6, col=1, layout=c(1,3), as.table=TRUE)

```
]

---
# Restricted Permutations

.pull-left[

- Restricted permutation allows analysis of data with dependencies

- Clusters of SUs that will not move or that can be permuted with
  other clusters

- SUs within clusters can be permuted *freely*, or as one-dimensional
  *series* or as spatial *grids*, and these can be *mirrorred*

- Permutations can be *constant* or *vary* in clusters

- For instance, Barro Colorado Island Rainforest plots are in
  contiguous grid, and free permutation breaks spatial structure

- BCI has $50! = `r signif(factorial(50), 4)`$ free permutations, but
  only $50 \times 4 = 200$ permutations with mirrorred grid

]

.pull-right[
```{r respermu, fig.height=4.3}
par(mar=c(0,0,0,0)+.25, mfrow=c(2,1))
palette(viridis(8))
set.seed(4711)
hab <- as.numeric(BCI.env$Habitat)
dim(hab) <- c(5,10)
N <- unique(BCI.env$UTM.NS)
E <- unique(BCI.env$UTM.EW)
image(E, N, t(hab), col=viridis(5), axes=F, xlab="",ylab="", asp=1)
text(BCI.env[,1:2], labels=shuffleFree(50,50), col=c(8,8,1,1,1)[BCI.env$Habitat])
image(E, N, t(hab), col=viridis(5), axes=F, xlab="",ylab="", asp=1)
text(BCI.env[,1:2], labels=shuffleGrid(5,10,mirror=TRUE), col=c(8,8,1,1,1)[BCI.env$Habitat])
```
Barro Colorado Island Rainforests (BCI): Free & Grid Permutations &mdash; colour: Habitat type
]

---
# Free and Spatial Error

.pull-left[
.small[
```{r bcifree, echo=FALSE, fig.height=2.9}
palette(viridis(8))
par(mar=c(4,4,0,1)+.1)
trellis.par.set("strip.background", lstrip)
mbci <- cca(BCI ~ Habitat, BCI.env)
ano <- anova(mbci)
densityplot(permustats(ano), col=1, pch=16, cex=0.6)
ano
```
]
]

.pull-right[
.small[
```{r bcigrid, echo=FALSE, fig.height=2.9}
palette(viridis(8))
par(mar=c(4,4,0,1)+.1)
trellis.par.set("strip.background", lstrip)
# will do only 199 or all possible permutations
ctrl <- how(within=Within(type="grid", nrow=5, ncol=10, mirror=TRUE), nperm=999)
ano <- anova(mbci, permutations=ctrl)
densityplot(permustats(ano), col=1, pch=16, cex=0.6)
ano
```
]
]

---
class: inverse center middle

## Model Choice

Constrained ordination is application of regression analysis on
multivariate responses. If we have a designed experiment or strong
theory, we can select the constraints in advance and test their
significances. In exploratory analysis we must choose the variables
among candidates. This means trouble.

---
# What Variables to Keep?

- If we have a designed experiment, we keep and analyse all design
  variables &mdash; even if these are insignificant

- If we have a strong prior theory, we keep and analyse all variables
  of interest &mdash; even those that prove to be insignificant

- In exploratory analysis we choose the variables among candidates,
  and normally want all kept variables be significant

- The problem is that there may be several different sets of
  significant variables

- In particular when explanatory variables are correlated, there may
  be several alternative good choices, and picking one of correlated
  variables can have cascading effects on later choices

- Automatic procedures of model choice are popular but dangerous,
  unstable and biased

- If variables are picked by their goodness, the statistical tests on
  their goodness are biased

- All these problems are similar as in ordinary regression

---
# Automatic Model Choice

```{r ordistep, fig.width=10, echo=FALSE, results=FALSE}
par(mar=c(4,4,1,1)+.1, mfrow=c(1,2))
palette(viridis(8))
m0 <- cca(varespec ~ 1, varechem)
m1 <- cca(varespec ~ ., varechem)
set.seed(4711)
mup <- ordistep(m0, formula(m1), permutations=999)
mdown <- ordistep(m1, permutations=999, Pout=0.05)
plot(mup, main="Forward Selection", type="n")
text(mup, dis="si", col=3, cex=0.8)
text(mup, dis="sp", col=4, cex=0.8)
text(mup, dis="bp", col=1, cex=1.2, xpd=TRUE)
plot(mdown, main="Backward Elimination", type="n")
text(mdown, dis="si", col=3, cex=0.8)
text(mdown, dis="sp", col=4, cex=0.8)
text(mdown, dis="bp", col=1, cex=1.2, xpd=TRUE)
```
---
# Both Models Are Good

.pull-left[
.small[
```{r echo=FALSE, dependson='ordistep'}
set.seed(4711)
anova(mup, by="term")
```

- **Forward selection:** variables are added as long as they improve
    the model after all previously selected variables

- Similar to *sequential* tests for terms

- `Al` was the first choice: it correlates with many other
  variables and this makes it a good surrogate for them all

]]

.pull-right[
.small[
```{r echo=FALSE, dependson='ordistep'}
set.seed(4711)
anova(mdown, by="margin")
```

- **Backward elimination:** variables are removed if they do not have
    unique importance

- Similar to *marginal* tests for terms

- `Al` was one of the first dropped: it correlates with many other
  variables and has no unique contribution

]]

---
# How Many Constraints?

- Constrained ordination is based on the linear combinations of constraints

- If there are many constraints, almost anything can be made of their
  linear combinations

- **Many** constraints mean **no** constraints

- With many constraints it can be almost the same to use unconstrained
  ordination plus vector fitting &mdash; and actually this can be
  wiser

- Constrained ordination can cure curves of PCA, but then we need a
  low number of constraints

- Automatic choice of variables tries to liberate ordination from
  constraints

  - The best possible constraint is an unconstrained ordination axis
    (PC, CA), and with automatical model choice we try to get as close
    to that as possible

- The best use of constrained methods is for designed patterns, and
  the next good is strict constraint with interesting variables

---
# Ordination Wants to be Free!

```{r liberame, fig.width=12, results=FALSE, echo=FALSE}
par(mar=c(4,4,0,0.3)+.1, mfrow=c(1,3))
palette(viridis(8))
k <- complete.cases(brycesite)
m0 <- dbrda(vegdist(bryceveg) ~ 1, data=brycesite, subset=k)
m2 <- dbrda(vegdist(bryceveg) ~ elev + depth, data=brycesite, subset=k)
m <- dbrda(vegdist(bryceveg) ~ annrad + asp + av + depth + grorad + pos + elev + slope, brycesite, subset=k)
set.seed(4711)
m <- ordistep(m0, formula(m), permutations=999, parallel=2)
plot(m0, type="n", dis="si", scaling="si")
points(m0, dis="si", pch=21, col=1, bg = c(4,8)[brycesite$depth[k]], scaling="si")
legend("topleft", levels(brycesite$depth), pch=21, col=1, pt.bg=c(4,8), title="Soil Depth")
plot(m2, type="n", dis=c("si","cn"), scaling="si")
points(m2, dis="si", pch=21, col=1, bg = c(4,8)[brycesite$depth[k]], scaling="si")
text(m2, dis="cn", col=1, cex=1.2, xpd=TRUE, scaling="si")
plot(m, type="n", scaling="si", dis=c("si","cn"))
points(m, dis="si", pch=21, col=1, bg = c(4,8)[brycesite$depth[k]], scaling="si")
text(m, dis="cn", col=1, cex=1.2, xpd=TRUE, scaling="si")
```
Bryce Canyon, Bray-Curtis dissimilarity

---
# Too Many Constraints?

- Constrained ordination **is** regression, and we have all the same
  problems as in univariate regression &mdash; but multiplied

- Too many regressors is overfitting: Absolute limit is that the
  number of constraints equals the number of observations, and we have
  a complete fit with no residual error

- In ordination we focus on first axes, and there the overfitting can
  appear much earlier &mdash; even with two constraints

- You may be worried when

  - Residual variation is low

  - First constrained eigenvalues are nearly as high as first
    eigenvalues of unconstrained ordination

  - First unconstrained residual eigenvalues are low compared to first
    constrained ones

  - You see a curve

- It may be better to use unconstrained analysis plus fitted vectors
  and factors than many useless constraints


---

class: inverse center middle

## Partial Ordination

We can use constraints in two stages: We first remove the effects of
some some background variables and then analyse the residuals with
constraints. We can also decompose total variation into several
sources of explanatory variables.

---
# Partial Ordination

- Constrained ordination actually has two components: Ordination of
  fitted values and ordination of residuals

- If we want to remove the effect of some variables, we can fit them
  first and subject residuals to constrained ordination: This gives us
  **partial ordination**

- **Conditions:** The terms partialled out

- The conditions can be background variables or confounding factors
  that we want to remove before analysing the effects of variables of
  interest

  - Partial out natural environmental differences of plots or blocks
    in designed experiments

  - Partial out initial conditions in designed experiments

  - Partial out *Plot* in split-*plot* designs

  - Remove uninteresting environmental variation before focussing on
    the interesting variables

---

# Example: Effects of Insecticide

```{r setpyrifos, echo=FALSE, results = FALSE}
example(pyrifos)
ctrl <- how(plots=Plots(strata=ditch, type="free"), within=Within(type="series", constant=TRUE), nperm=999)
pmod <- rda(pyrifos ~ dose*week + Condition(week))
```

- Pyrifos insecticide was administered at 4 different levels (plus
  control) in 12 Dutch ditches and the development of zooplankton was
  monitored for 2 times before and 9 times after the treatment (at -4
  to 24 weeks)

- Zooplankton has a natural annual succession, but we want to partial
  out that effect to see the effect of insecticide

- The interesting variable is the insecticide `dose*time`
  interaction, but the main effect of `time` will be aliased out from
  the model because it was already used as a condition and partialled
  out, leaving us `dose + dose:time` 

- In mixed effect models we would use ditch as a random effect, but it
  cannot be used as a condition, because the insecticide treatment is
  constant in each ditch, but *permuting* ditches takes into account
  the random effect in the error term

- Observations are repeated over weeks, and therefore we use series
  permutation within each ditch

- Time goes only forward and simultaneously in all ditches, and
  therefore we use no mirrorring and have the same permutation within
  each ditch

- These restrictions reduce the number of all possible permutations
  from $`r signif(factorial(nrow(pyrifos)), 4)`$ free permutations to
  $`r signif(numPerms(pyrifos, ctrl), 4)`$ which is still a lot

---
# Example: Effects of Insecticide

.pull-left[

- The annual succession was *partialled out*

- The *random effect* of ditch (block) was handled with permutation of
  ditches

- The *temporal dependence* was handled with restricting permutation to
  constant series within ditch

.small[
```{r pyriano, echo=FALSE, dependson='setpyrifos'}
anova(pmod, permutations=ctrl, parallel=2)
```
]]

.pull-right[
```{r pyripca}
par(mar=c(4,4,0,0)+.1)
mod <- rda(pyrifos)
trellis.par.set("strip.background", lstrip)
lset <- trellis.par.get("superpose.line")
lset$col <- viridis(7)
trellis.par.set("superpose.line", lset)
lset <- trellis.par.get("superpose.symbol")
lset$col <- viridis(7)
trellis.par.set("superpose.symbol", lset)
ordixyplot(mod, form = PC2 ~ PC1|dose, groups=ditch, type=c("p","arrows"),
  len=0.05, as.table=TRUE, lwd=2, pch=16, cex=0.5, scaling="site")
```
Unconstrained PCA for the Pyrifos experiment
]

---
# Partial Model: Only for Numbers?

.pull-left[

- Complicated partial models are often useless as graphics

- Their main use may be in statistical testing of multivariate
  observations with non-parmetric permutation

- In the Pyrifos example, the partial ordination shows that doses of
  insecticide treatment are located in different parts of the
  ordination, but the temporal cycles are not different

  - The model shows the `dose:time` interaction instead of `time`

]

.pull-right[
```{r pyriprda}
par(mar=c(4,4,0,0)+.1)
trellis.par.set("strip.background", lstrip)
lset <- trellis.par.get("superpose.line")
lset$col <- viridis(7)
trellis.par.set("superpose.line", lset)
lset <- trellis.par.get("superpose.symbol")
lset$col <- viridis(7)
trellis.par.set("superpose.symbol", lset)
ordixyplot(pmod, form = RDA1 ~ RDA2|dose, groups=ditch, type=c("p","arrows"),
  len=0.05, as.table=TRUE, lwd=2, pch=16, cex=0.5, scaling="site")
```
Partial RDA for the Pyrifos experiment
]

---

# Partitioning of Variation

- Partial ordination decomposes total variation $\Lambda$ into
  variation partialled out $\Lambda_p$, residual explained variation
  $\Lambda_k$ and into residuals $\Lambda_r$, or total inertia
  $\Lambda = \Lambda_p + \Lambda_k + \Lambda_r$

- Sometimes we are interested in the relative magnitudes of several
  sources of variation

- That variation can be variance (RDA), $\chi^2$ (CCA) or derived from
  squared dissimilarties (dbRDA)

- The proportional variation of a component is
  $R^2 = \Lambda_k/\Lambda$ &mdash; which really is the squared correlation
  coefficient

- $R^2$ is biased: even random data have $R^2>0$, and when you add
  terms, $R^2$ can only improve, and therefore we use adjusted $R^2$
  that has expected value $E(R^2) = 0$ in random data

- We can have several components of variation

- These sources are decomposed into their unique effect and effects
  shared with other sources of variation

---

# Partitioning of Variation: Pyrifos

.pull-left[
.small[
```{r varpart, echo=FALSE}
vp <- varpart(pyrifos, ~week, ~dose)
vp
```
]
]

.pull-right[
```{r varpartplot}
par(mar=c(4,4,0,1)+.1)
palette(viridis(8))
plot(vp, bg = c(4,7), Xnames=c("Time", "Dose"))

```
]

---

class: inverse center bottom

# The End
--

## Thank You

---
.small[
```{r end, echo=FALSE, cache=FALSE}
options(width=132)
sessionInfo()
``` 
]