04-factorial.Rmd

# Factorial experiments {#factorial}

In Chapters \@ref(crd) and \@ref(blocking), we assumed the objective of the experiment was to investigate $t$ **unstructured** treatments, defined only as a collection of distinct entities (drugs, advertisements, receipes, etc.). That is, there was not necessarily any explicit relationship between the treatments (although we could clearly choose which paticular comparisons between treatments were of interest via choice of contrast).

In many experiments, particularly in industry, engineering and the physical sciences, the treatments are actually defined via the choice of a **level** relating to each of a set of **factors**. We will focus on the commonly occurring case of factors at **two levels**. For example, consider the below experiment from the pharmaceutical industry.

::: {.example #factorial-desilylation}
Desilylation experiment [@Owen2001]

In this experiment, performed at GlaxoSmithKline, the aim was to optimise the desilylation^[Desilylation is a process of removing silyl, SiH$_3$ a silicon hydride, from a compound.] of an ether into an alcohol, which was a key step in the synthesis of a particular antibiotic. There were $t=16$ treatments, defined via the settings of four different factors, as given in Table \@ref(tab:factorial-desilylation-data).

```{r factorial-desilylation-data, message = F}
desilylation <- FrF2::FrF2(nruns = 16, nfactors = 4, randomize = F,
                           factor.names = list(temp = c(10, 20), time = c(19, 25),
                                               solvent = c(5, 7), reagent = c(1, 1.33)))
yield <- c(82.93, 94.04, 88.07, 93.97, 77.21, 92.99, 83.60, 94.38, 
           88.68, 94.30, 93.00, 93.42, 84.86, 94.26, 88.71, 94.66)
desilylation <- data.frame(desilylation, yield = yield)
rownames(desilylation) <- paste("Trt", 1:16)
knitr::kable(desilylation, 
             col.names = c("Temp (degrees C)", "Time (hours)", "Solvent (vol.)", 
                           "Reagent (equiv.)", "Yield (%)"),
             caption = "Desilylation experiment: 16 treatments defined 
             by settings of four factors, with response (yield).")
```

Each treatment is defined by the choice of one of two levels for each of the four factors. In the `R` code above, we have used the function `FrF2` (from the package of the same name) to generate all $t = 2^4 = 16$ combinations of the two levels of these four factors. We come back to this function later in the chapter.
:::

This **factorial treatment structure** lends itself to certain treatment contrasts being of natural interest.

## Factorial contrasts

Throughout this chapter, we will assume there are no blocks or other restrictions on randomisation, and so we will assume a completely randomised design can be used. We start by assuming the same unit-treatment model as Chapter \@ref(crd):

\begin{equation}
y_{ij} = \mu + \tau_i + \varepsilon_{ij}\,, \quad i = 1, \ldots, t; j = 1, \ldots, n_i\,, (\#eq:factorial-utm)
\end{equation}

where $y_{ij}$ is the response from the $j$th application of treatment $i$, $\mu$ is a constant parameter, $\tau_i$ is the effect of the $i$th treatment, and $\varepsilon_{ij}$ is the random individual effect from each experimental unit with $\varepsilon_{ij} \sim N(0, \sigma^2)$ independent of other errors.

Now, the number of treatments $t = 2^f$, where $f$ equals the number of factors in the experiment.

For Example \@ref(exm:factorial-desilylation), we have $t = 2^4 = 16$ and $n_i = 1$ for all $i=1,\ldots, 16$; that is, each of the 16 treatments are replicated once. In general, we shall assume common treatment replication $n_i = r \ge 1$.

If we fit model \@ref(eq:factorial-utm) and compute the ANOVA table, we notice a particular issue with this design.

```{r desilylation-anova, warning = F}
desilylation <- data.frame(desilylation, trt = factor(1:16))
desilylation.lm <- lm(yield ~ trt, data = desilylation)
anova(desilylation.lm)
```
All available degrees of freedom are being used to estimate parameters in the mean ($\mu$ and the treatment effects $\tau_i$). There are no degrees of freedom left to estimate $\sigma^2$. This is due to a lack of treatment replication. Without replication in the design, model \@ref(eq:factorial-utm) is **saturated**, with as many treatments as there are observations and an unbiased estimate of $\sigma^2$ cannot be obtained. We will return to this issue later. 

### Main effects

Studying Table \@ref(tab:factorial-desilylation-data), there are some comparisons between treatments which are obviously of interest. For example, comparing the average effect from the first 8 treatments with the average effect of the second 8, using

$$
\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau} = \sum_{i=1}^tc_i\tau_i\,,
$$
with

$$
\boldsymbol{c}^{\mathrm{T}} = (-\boldsymbol{1}_{2^{f-1}}^{\mathrm{T}}, \boldsymbol{1}_{2^{f-1}}^{\mathrm{T}}) / 2^{f-1} = (-\boldsymbol{1}_8^{\mathrm{T}}, \boldsymbol{1}_8^{\mathrm{T}}) / 8\,.
$$

```{r desilylation-lm, message = F}
desilylation.emm <- emmeans::emmeans(desilylation.lm, ~ trt)
reagent_me.emmc <- function(levs, ...) data.frame('reagent m.e.' = rep(c(-1, 1), rep(8, 2)) / 8)
emmeans::contrast(desilylation.emm, 'reagent_me')
```
This contrast compares the average treatment effect from the 8 treatments which have `reagent` set to its low level (1 equiv.) to the average effect from the 8 treatments which have `reagent` set to its high level. This is a "fair" comparison, as both of these sets of treatments have each of the combinations of the factors `temp`, `time` and `solvent` occuring equally often (twice here). Hence, the **main effect** of `reagent` is averaged over the levels of the other three factors.

As in Chapter \@ref(crd), we can estimate this treatment contrast by applying the same contrast coefficients to the treatment means, 

$$
\widehat{\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}} = \sum_{i=1}^tc_i\bar{y}_{i.}\,,
$$
where, for this experiment, each $\bar{y}_{i.}$ is the mean of a single observation (as there is no treatment replication). We see that inference about this contrast is not possible, as no standard error can be obtained.

::: {.definition #main-effect}
The **main effect** of a factor $A$ is defined as the difference in the average response from the high and low levels of the factor

$$
\mbox{ME}(A) = \bar{y}(A+) - \bar{y}(A-)\,,
$$
where $\bar{y}(A+)$ is the average response when factor $A$ is set to its high level, averaged across all combinations of levels of the other factors (with $\bar{y}(A+)$ defined similarly for the low level of $A$).
:::

As we have averaged the response across the levels of the other factors, the intepretation of the main effect extends beyond this experiment. That is, we can use it to infer something about the system under study. Assuming model \@ref(eq:factorial-utm) is correct, any variation in the main effect can only come from random error in the observations. In fact,

\begin{align*}
\mbox{var}\{ME(A)\} & = \frac{\sigma^2}{n/2} + \frac{\sigma^2}{n/2} \\
& = \frac{4\sigma^2}{n}\,,
\end{align*}

and assuming $r>1$, 

\begin{equation}
\hat{\sigma}^2 = \frac{1}{2^f(r-1)} \sum_{i=1}^{2^f}\sum_{j=1}^r(y_{ij} - \bar{y}_{i.})^2\,,
(\#eq:factorial-sigma-hat-2)
\end{equation}

which is the residual mean square.

For Example \@ref(exm:factorial-desilylation), we can also calculate main effect estimates for the other three factors by defining appropriate contrasts in the treatments.

```{r desilylation-all-me-contrasts, message = F}
contrast.mat <- FrF2::FrF2(nruns = 16, nfactors = 4, randomize = F, 
                           factor.names = c("temp", "time", "solvent", "reagent"))
fac.contrasts.emmc <- function(levs, ...) 
  dplyr::mutate_all(data.frame(contrast.mat), function(x) scale(as.numeric(as.character(x)), scale = 8))
main_effect_contrasts <- fac.contrasts.emmc()
rownames(main_effect_contrasts) <- paste("Trt", 1:16)
knitr::kable(main_effect_contrasts, caption = 'Desilylation experiment: main effect contrast coefficients', col.names = c("Temperature", "Time", "Solvent", "Reagent"))
```
Estimates can be obtained by applying these coefficients to the observed treatment means.

```{r desilylation-all-me-estimates}
t(as.matrix(main_effect_contrasts)) %*% yield
emmeans::contrast(desilylation.emm, 'fac.contrasts')
```
Main effects are often displayed graphically, using **main effect plots** which simply plot the average response for each factor level, joined by a line. The larger the main effect, the larger the slope of the line (or the bigger the difference between the averages). Figure \@ref(fig:desilylation-me-plots) presents the four main effect plots for Example \@ref(exm:factorial-desilylation).

```{r desilylation-me-plots, fig.align = 'center', fig.keep = T, fig.cap = 'Desilylation experiment: main effect plots', fig.width = 7, fig.height = 7, , fig.show = 'hold'}
## calculate the means
temp_bar <- aggregate(yield ~ temp, data = desilylation, FUN = mean)
time_bar <- aggregate(yield ~ time, data = desilylation, FUN = mean)
solvent_bar <- aggregate(yield ~ solvent, data = desilylation, FUN = mean)
reagent_bar <- aggregate(yield ~ reagent, data = desilylation, FUN = mean)

## convert factors to numeric
fac_to_num <- function(x) as.numeric(as.character(x))
temp_bar$temp <- fac_to_num(temp_bar$temp)
time_bar$time <- fac_to_num(time_bar$time)
solvent_bar$solvent <- fac_to_num(solvent_bar$solvent)
reagent_bar$reagent <- fac_to_num(reagent_bar$reagent)

## main effect plots
plotmin <- min(temp_bar$yield, time_bar$yield, solvent_bar$yield, reagent_bar$yield)
plotmax <- max(temp_bar$yield, time_bar$yield, solvent_bar$yield, reagent_bar$yield)
par(cex = 2)
layout(matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2, byrow = TRUE), respect = T)
plot(temp_bar, pch = 16, type = "b", ylim = c(plotmin, plotmax))
plot(time_bar, pch = 16, type = "b", ylim = c(plotmin, plotmax))
plot(solvent_bar, pch = 16, type = "b", ylim = c(plotmin, plotmax))
plot(reagent_bar, pch = 16, type = "b", ylim = c(plotmin, plotmax))
```

### Interactions

Another contrast that could be of interest in Example \@ref(exm:factorial-desilylation) has coefficients

$$
\boldsymbol{c}^{\mathrm{T}} = (\boldsymbol{1}_4^{\mathrm{T}}, -\boldsymbol{1}_8^{\mathrm{T}}, \boldsymbol{1}_4^{\mathrm{T}}) / 8 \,,
$$
where the divisor $8 = 2^{f-1} = 2^3$.

This contrast measures the difference between the average treatment effect from treatments 1-4, 13-16 and treatments 5-12. Checking back against Table \@ref(tab:factorial-desilylation-data), we see this is comparing those treatments where `solvent` and `reagent` are both set to their low (1-4) or high (13-16) level against those treatments where one of the two factors is set high and the other is set low (5-12).

Focusing on `reagent`, if the effect of this factor on the response was independent of the level to which `solvent` has been set, you would expect this contrast to be zero - changing from the high to low level of `reagent` should affect the response in the same way, regardless of the setting of `solvent`. This argument can be reversed, focussing on the effect of `solvent`. Therefore, if this contrast is large, we say the two factors **interact**.

```{r desilylation-solvent-reagent-interaction}
sol_reg_int.emmc <- function(levels, ...) 
  data.frame('reagent x solvent' = .125 * c(rep(1, 4), rep(-1, 8), rep(1, 4)))
emmeans::contrast(desilylation.emm, 'sol_reg_int')
```
For Example \@ref(exm:factorial-desilylation), this interaction contrast seems quite small, although of course without an estimate of the standard error we are still lacking a formal method to judge this.

It is somewhat more informative to consider the above interaction contrast as the average difference in two "sub-contrasts"

$$
\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau} = \frac{1}{2}\left\{\frac{1}{4}\left(\tau_{13} + \tau_{14} + \tau_{15} + \tau_{16} - \tau_5 - \tau_6 - \tau_7 - \tau_8\right)
- \frac{1}{4}\left(\tau_9 + \tau_{10} + \tau_{11} + \tau_{12} - \tau_1 - \tau_2 - \tau_3 - \tau_4\right) \right\}\,,
$$
The first component in the above expression is the effect of changing `reagent` from high to low **given `solvent` is set to it's high level**. The second component the effect of changing `reagent` from high to low **given `solvent` is set to it's low level**. This leads to our definition of a two-factor interaction.

::: {.definition #two-factor-interaction}
The **two-factor interaction** between factors $A$ and $B$ is defined as the average difference in main effect of factor $A$ when computed at the high and low levels of factor $B$.

\begin{align*}
\mbox{Int}(A, B) & = \frac{1}{2}\left\{\mbox{ME}(A\mid B+) - \mbox{ME}(A \mid B-)\right\} \\
& = \frac{1}{2}\left\{\mbox{ME}(B \mid A+) - \mbox{ME}(B \mid A-)\right\} \\
& = \frac{1}{2}\left\{\bar{y}(A+, B+) - \bar{y}(A-, B+) - \bar{y}(A+, B-) + \bar{y}(A-, B-)\right\}\,,
\end{align*}

where $\bar{y}(A+, B-)$ is the average response when factor $A$ is set to its high level and factor $B$ is set to its low level, averaged across all combinations of levels of the other factors, and other averages are defined similarly. The conditional main effects of factor $A$ when factor $B$ is set to its high level is defined as

$$
\mbox{ME}(A\mid B+) = \bar{y}(A+, B+) - \bar{y}(A-, B+)\,,
$$

with similar definitions for other conditional main effects.
:::

As the sum of the squared contrast coefficients is the same for two-factor interactions as for main effects, the variance of the contrast estimator is also the same.

$$
\mbox{var}\left\{\mbox{Int}(A, B)\right\} = \frac{4\sigma^2}{n}\,.
$$
For Example \@ref(exm:factorial-desilylation) we can calculate two-factor interactions for all ${4 \choose 2} = `r choose(4, 2)`$ pairs of factors. The simplest way to calculate the contrast coefficients is as the elementwise, or Hadamard, product^[For two matrices $A$ and $B$ of the same dimension $m\times n$, the Hadamard product $A\odot B$ is a matrix of the same dimension with elements given by the elementwise product, $(A\odot B)_{ij} = A_{ij}B_{ij}$.] of the unscaled main effect contrasts (before dividing by $2^{f-1}$).

```{r desilylation-2fi-interactions}
fac.contrasts.int.emmc <- function(levs, ...) {
  with(sqrt(8) * main_effect_contrasts, {
    data.frame('tem_x_tim' = temp * time,
               'tem_x_sol' = temp * solvent,
               'tem_x_rea' = temp * reagent,
               'tim_x_sol' = time * solvent,
               'tim_x_rea' = time * reagent,
               'sol_x_rea' = solvent * reagent)
  })
}
two_fi_contrasts <- fac.contrasts.int.emmc()
rownames(two_fi_contrasts) <- paste("Trt", 1:16)
knitr::kable(two_fi_contrasts, caption = 'Desilylation experiment: two-factor interaction contrast coefficients')
```
Estimates of the interaction contrasts can again by found by considering the equivalent contrasts in the observed treatment means.

```{r desilylation-all-2fi-estimates}
t(as.matrix(two_fi_contrasts)) %*% yield
emmeans::contrast(desilylation.emm, 'fac.contrasts.int')
```
As with main effects, interactions are often displayed graphically using **interaction** plots, plotting average responses for each pairwise combination of factors, joined by lines. 

```{r desilylation-2fi-plots, fig.align = 'center', fig.cap = 'Desilylation experiment: two-factor interaction plots', fig.width = 7, fig.height = 7, fig.show = 'hold'}
plotmin <- min(desilylation$yield)
plotmax <- max(desilylation$yield)
par(cex = 2)
layout(matrix(c(1, 2, 3, 4, 5, 6), nrow = 3, ncol = 2, byrow = TRUE), respect = T)

with(desilylation, {
  interaction.plot(temp, time, yield, type = "b", pch = 16, legend = F, 
                   ylim = c(plotmin, plotmax))
  legend("bottomright", legend = c("Time = 19", "Time = 25"), lty = 2:1, cex = .75)
  interaction.plot(temp, solvent, yield, type = "b", pch = 16, legend = F, 
                   ylim = c(plotmin, plotmax))
  legend("bottomright", legend = c("Solvent = 5", "Solvent = 7"), lty = 2:1, cex = .75)
  interaction.plot(temp, reagent, yield, type = "b", pch = 16, legend = F, 
                   ylim = c(plotmin, plotmax))
  legend("bottomright", legend = c("Reagent = 1", "Reagent = 1.33"), lty = 2:1, cex = .75)
  interaction.plot(time, solvent, yield, type = "b", pch = 16, legend = F, 
                   ylim = c(plotmin, plotmax))
  legend("bottomright", legend = c("Solvent = 5", "Solvent = 7"), lty = 2:1, cex = .75)
  interaction.plot(time, reagent, yield, type = "b", pch = 16, legend = F, 
                   ylim = c(plotmin, plotmax))
  legend("bottomright", legend = c("Reagent = 1", "Reagent = 1.33"), lty = 2:1, cex = .75)
  interaction.plot(solvent, reagent, yield, type = "b", pch = 16, legend = F, 
                   ylim = c(plotmin, plotmax))
  legend("bottomright", legend = c("Reagent = 1", "Reagent = 1.33"), lty = 2:1, cex = .75)
})
```

Parallel lines in an interaction plot indicate no (or very small) interaction (`time` and `solvent`, `time` and `reagent`, `solvent` and `reagent`). The three interactions with `temp` all demonstrate much more robust behaviour at the high level; changing `time`, `solvent` or `reagent` makes little difference to the response at the high level of `temp`, and much less difference than at the low level of `temp`.

If a system displays important interactions, the main effects of factors involved in those interactions should no longer be interpreted. For example, it makes little sense to discuss the main effect of `temp` when is changes so much with the level of `reagent` (from strongly positive when `reagent` is low to quite small when `reagent` is high).

Higher order interactions can be defined similarly, as average differences in lower-order effects. For example, a three-factor interaction measures how a two-factor interaction changes with the levels of a third factor.

\begin{align*}
\mbox{Int}(A, B, C) & = \frac{1}{2}\left\{\mbox{Int}(A, B \mid C+) - \mbox{Int}(A, B \mid C-)\right\} \\
& = \frac{1}{2}\left\{\mbox{Int}(A, C \mid B+) - \mbox{Int}(A, C \mid B-)\right\} \\
& = \frac{1}{2}\left\{\mbox{Int}(B, C \mid A+) - \mbox{Int}(B, C \mid A-)\right\}\,, \\
\end{align*}

where 

$$
\mbox{Int}(A, B \mid C+) = \frac{1}{2}\left\{\bar{y}(A+, B+, C+) - \bar{y}(A-, B+, C+) - \bar{y}(A+, B-, C+) + \bar{y}(A-, B-, C+)\right\}
$$
is the interaction between factors $A$ and $B$ using only those treatments where factor $C$ is set to it's high level. Higher order interaction contrasts can again be constructed by (multiple) hadamard products of (unscaled) main effect contrasts.

::: {.definition #factorial-effects}
A **factorial effect** is a main effect or interaction contrast defined on a factorial experiment. For a $2^f$ factorial experiment with $f$ factors, there are $2^f-1$ factorial effects, ranging from main effects to the interaction between all $f$ factors. The contrast coefficients in a factorial contrast all take the form $c_i = \pm 1 / 2^{f-1}$.
:::

For Example \@ref(exm:factorial-desilylation), we can now calculate all the factorial effects.

```{r desilylation-fe}
## hadamard products
unscaled_me_contrasts <- 8 * main_effect_contrasts
factorial_contrasts <- model.matrix(~.^4, unscaled_me_contrasts)[, -1] / 8

## all factorial effects - directly, as there is no treatment replication
t(factorial_contrasts) %*% yield

## using emmeans
factorial_contrasts.emmc <- function(levs, ...) data.frame(factorial_contrasts)
desilylation.effs <- emmeans::contrast(desilylation.emm, 'factorial_contrasts')
desilylation.effs
```

## Three principles for factorial effects {#fact-principles}

Empirical study of many experiments [@BoxMeyer1986; @Li2006] have demonstrated that the following three principles **often** hold when analysing factorial experiments.


::: {.definition #effect-hierarchy}
**Effect hierarchy**: lower-order factorial effects are more likely to be important than higher-order effects; factorial effects of the same order are equally likely to be important.

For example, we would anticipate more large main effects from the analysis of a factorial experiment than two-factor interactions.
:::

::: {.definition #effect-sparsity}
**Effect sparsity**: the number of large factorial effects is likely to be small, relative to the total number under study.

This is sometimes called the **pareto principle**.
:::

::: {.definition #effect-heredity}
**Effect heredity**: interactions are more likely to be important if at least one parent factor also has a large main effect.
:::

These three principles will provide us with some useful guidelines when analysing, and eventually constructing, factorial experiments.

## Normal effect plots for unreplicated factorial designs {#unreplicated-normal-plots}

The lack of an estmate for $\sigma^2$ means alternatives to formal inference methods (e.g. hypothesis tests) must be found to assess the size of factorial effects. We will discuss a method that essentially treats the identification of large factorial effects as an outlier identification problem.

Let $\hat{\theta}_j$ be the $j$th estimated factorial effect, with $\hat{\theta}_j = \sum_{i=1}^tc_{ij}\bar{y}_{i.}$ for $\boldsymbol{c}_j^{\mathrm{T}} = (c_{1j}, \ldots, c_{tj})$ a vector of factorial contrast coefficients (defining a main effect or interaction). Then the estimator follows a normal distribution

$$
\hat{\theta}_j \sim N\left(\theta_j, \frac{4\sigma^2}{n}\right)\,,\qquad j = 1, \ldots, 2^f-1\,,
$$
for $\theta_j$ the true, unknown, value of the factorial effect, $j = 1,\ldots, 2^f$. Further more, for $j, l = 1, \ldots 2^f-1; \, j\ne l$,

\begin{align*}
\mbox{cov}(\hat{\theta}_j, \hat{\theta}_l) & =  \mbox{cov}\left(\sum_{i=1}^tc_{ij}\bar{y}_{i.}, \sum_{i=1}^tc_{il}\bar{y}_{i.}\right) \\
& = \sum_{i=1}^tc_{ij}c_{il}\mbox{var}(\bar{y}_{i.}) \\
& = \frac{\sigma^2}{r} \sum_{i=1}^tc_{ij}c_{il} \\
& = 0\,, \\
\end{align*}

as $\sum_{i=1}^tc_{ij}c_{il} = 0$ for $j\ne l$. That is, the factorial contrasts are independent as the contrast coefficient vectors are orthogonal.

Hence, under the null hypothesis $H_0: \theta_1 = \cdots = \theta_{2^f-1} = 0$ (all factorial effects are zero), the $\hat{\theta}_j$ form a sample from **independent** normally distributed random variables from the distribution

\begin{equation}
\hat{\theta}_j \sim N\left(0, \frac{4\sigma^2}{n}\right)\,,\qquad j = 1, \ldots, 2^f-1\,.
(\#eq:npph0)
\end{equation}

To assess evidence against $H_0$, we can plot the ordered estimates of the factorial effects against the ordered quantiles of a standard normal distribution. Under $H_0$, the points in this plot should lie on a straightline (the slope of the line will depend on the unknown $\sigma^2$). We anticipate that the majority of the effects will be small (*effect sparsity*), and hence any large effects that lie away from the line are unlikely to come from distribution \@ref(eq:npph0) and may be significantly different from zero. We have essentially turned the problem into an outlier identification problem.

For Example \@ref(exm:factorial-desilylation), we can easily produce this plot in `R`. Table \@ref(tab:desilylation-effects-plot1) gives the ordered factorial effects, which are then plotted against standard normal quantiles in Figure \@ref(fig:desilylation-effects-plot1).

```{r desilylation-effects-plot1, fig.align = 'center', fig.cap = 'Desilylation experiment: normal effects plot', results = 'hold', fig.show = 'hold'}
effs <- dplyr::arrange(transform(desilylation.effs)[,1:2], dplyr::desc(estimate))
knitr::kable(effs, caption = "Desilylation experiment: sorted estimated factorial effects")
qqnorm(effs[ ,2], ylab = "Factorial effects", main = "") # note that qqnorm/qqline/qqplot don't require sorted data
qqline(effs[ ,2])
```
In fact, it is more usual to use a **half-normal** plot to assess the size of factorial effects, where we plot the sorted absolute values of the estimated effects against the quantiles of a half-normal distribution^[The absolute value of a normally distributed random variable follows a half-normal distribution.]

```{r desilylation-half-normal-plot, fig.align = 'center', fig.cap = 'Desilylation experiment: half-normal effects plot'}
p <- .5 + .5 * (1:16 - .5) /16 # probabilities we will plot against
qqplot(x = qnorm(p), y = abs(effs[,2]), ylab = "Absolute factorial effects", 
       xlab = "Half-normal quantiles")
```

The advantage of a half-normal plot such as Figure \@ref(fig:desilylation-half-normal-plot) is that we only need to look at effects appearing in the top right corner (significant effects will always appear "above" a hypothetical straight line) and we do not need to worry about comparing large positive and negative values. For these reason, they are usually preferred over normal plots.

For the desilylation experiment, we can see the effects fall into three groups: one effect standing well away from the line, and almost certainly significant (`temp`, from Table \@ref(tab:desilylation-effects-plot1)), then a group of six effects (`reagent`, `time`, `temp.solvent`, `solvent`, `temp.time`, `temp.reagent`) which may be significant, and then a group of 8 much smaller effects. 

### Lenth's method for approximate hypothesis testing

The assessment of normal or half-normal effect plots can be quite subjective. @Lenth1989 introduced a simple method for conducting more formal hypothesis testing in unreplicated factorial experiments.

Lenth's method uses a **pseudo standard error** (PSE):

$$
\mbox{PSE} = 1.5 \times \mbox{median}_{|\hat{\theta}_i| < 2.5s_0}|\hat{\theta}_i|\,,
$$
where $s_0 = 1.5\times \mbox{median} |\hat{\theta}_i|$ is a *consistent*^[Essentially, $s_0$ tends in probability to $\sigma$ as the number of factorial effects tends to infinity.] estimator of the standard deviation of the $\hat{\theta}_i$ under $H_0: \theta_1 = \cdots=\theta_{2^f-1}=0$. The PSE trims approximately 1%^[Under $H_0$, the $\hat{\theta}_i$ come from a mean-zero normal distribution, and about 1% of deviates fall outside $\pm 2.57\sigma^2$.] of the $\hat{\theta}_i$ to produce a **robust** estimator of the standard deviation, in the sense that it is not influenced by large $\hat{\theta}_i$ belonging to important effects.

For Example \@ref(exm:factorial-desilylation), we can construct the PSE as follows.

```{r desilylation-pse}
s0 <- 1.5 * median(abs(effs[, 2]))
trimmed <- abs(effs[, 2]) < 2.5 * s0
pse <- 1.5 * median(abs(effs[trimmed, 2]))
pse
```

The PSE can be used to construct test statistics

$$
t_{\mbox{PSE}, i} = \frac{\hat{\theta}_i}{\mbox{PSE}}\,,
$$

which mimic the usual $t$-statistics used when an estimate of $\sigma^2$ is available. These quantities can be compared to reference distribution which was tabulated by @Lenth1989 and simulated in `R` using the `unrepx` package.

```{r desilylation-lenth}
eff_est <- effs[, 2]
names(eff_est) <- effs[, 1]
lenth_tests <- unrepx::eff.test(eff_est, method = "Lenth")
knitr::kable(lenth_tests, caption = "Desilylation experiment: hypothesis tests using Lenth's method.")
```

The function `eff.test` calculates unadjusted p-values (`p.value`) and simultaneous p-values (`simult.pval`) adjusted to account for multiple testing. Using the latter, from Table \@ref(tab:desilylation-lenth) we see that the main effects of `temp` and `reagent` are significant at the experiment-wise 5% level and, obeying **effect heredity**, their interaction (the p-value is borderline, and hovers around 0.05 depending on simulation error). 

The package `unrepx` also provides the function `hnplot` to display these results graphically by adding a reference line to a half-normal plot; see Figure \@ref(fig:desilylation-lenth-plot). The `ME` and `SME` lines indicate the absolute size of effects that would be required to reject $H_0: \theta_i = 0$ at an individual or experimentwise $100\alpha$% level, respectively.

```{r desilylation-lenth-plot, fig.cap = "Desilylation experiment: half-normal plot with reference lines from Lenth's method."}
unrepx::hnplot(eff_est, method = "Lenth", horiz = F, ID = 2.7, alpha = 0.05)
```
Informally, factorial effects with estimates greater than `SME` are thought highly likely to be significant, and effects between `ME` and `SME` are considered somewhat likely to be significant (and still worthy of further investigation if the budget allows).

## Regression modelling for factorial experiments {#fac-reg}

We have identified $d = 2^f-1$ factorial effects that we wish to estimate from our experiment. As $d < t = 2^f$, we can estimate these factorial effects using a full-rank linear regression model.

Let $t\times d$ matrix $C$ hold each factorial contrast as a column. Then 

$$
\hat{\boldsymbol{\theta}} = C^{\mathrm{T}}\bar{\boldsymbol{y}}\,,
$$

with $\hat{\boldsymbol{\theta}}^{\mathrm{T}} = (\hat{\theta}_1, \ldots, \hat{\theta}_d)$ being the vector of estimated factorial effects and $\bar{\boldsymbol{y}}^{\mathrm{T}} = (\bar{y}_{1.}, \ldots, \bar{y}_{t.})$ being the vector of treatment means.

We can define an $n\times d$ expanded contrast matrix as $\tilde{C} = C \otimes \boldsymbol{1}_r$, where each row of $\tilde{C}$ gives the contrast coefficients for each run of the experiment. Then,

$$
\hat{\boldsymbol{\theta}} = \frac{1}{r}\tilde{C}^{\mathrm{T}}\boldsymbol{y}\,.
$$
To illustrate, we will imagine a hypothetical version of Example \@ref(exm:factorial-desilylation) where each treatment was repeated three times (with $y_{i1} = y_{i2} = y_{i3}$).

```{r factorial-regression-c-matrix}
y <- kronecker(desilylation$yield, rep(1, 3)) # hypothetical response vector
C <- factorial_contrasts
Ctilde <- kronecker(C, rep(1, 3))
t(Ctilde) %*% y / 3 # to check
```
If we define a model matrix $X = \frac{2^{f}}{2}\tilde{C}$, then $X$ is a $n\times d$ matrix with entries $\pm 1$ and columns equal to unscaled factorial contrasts. Then

\begin{align}
\left(X^{\mathrm{T}}X\right)^{-1}X^{\mathrm{T}}\bY & = \frac{1}{n} \times \frac{2^f}{2}\tilde{C}^{\mathrm{T}}\bY (\#eq:factorial-ls)\\
& = \frac{1}{2r}\tilde{C}^{\mathrm{T}}\bY \\
& = \frac{1}{2}\hat{\boldsymbol{\theta}}\,. \\
\end{align}

The left-hand side of equation \@ref(eq:factorial-ls) is the least squares estimator $\hat{\boldsymbol{\beta}}$ from the model

$$
\bY = \boldsymbol{1}_n\beta_0 + X\boldsymbol{\beta} + \boldsymbol{\varepsilon}\,,
$$
where $\bY$ is the response vector and $\boldsymbol{\varepsilon}$ the error vector from unit-treatment model \@ref(eq:factorial-utm). We have simply re-expressed the mean response as $\mu + \tau_i = \beta_0 + \boldsymbol{x}_i^{\mathrm{T}}\boldsymbol{\beta}$, where $d$-vector $\boldsymbol{x}_i$ holds the unscaled contrast coefficients for the main effects and interactions.

We can illustrate these connections for Example \@ref(exm:factorial-desilylation).

```{r desilylation-X-C}
X <- 8 * C
Xt <- t(X)
XtX <- Xt %*% X
2 * solve(XtX) %*% t(X) %*% desilylation$yield
```

The more usual way to think about this modelling approach is as a regression model with $f$ (quantitative^[When qualitative factors only have two levels, each regression term only has 1 degree of freedom, and so there is practically little difference from a quantitative variable.]) variables, labelled $x_1, \ldots, x_{2^f-1}$, scaled to lie in the interval $[-1, 1]$ (in fact, they just take values $\pm 1$). We can then fit a regression model in these variables, and include products of these variables to represent interactions. We usually also include the intercept term. For Example \@ref(exm:factorial-desilylation):

```{r desilylation-reg}
desilylation.df <- dplyr::mutate(desilylation, 
                                      across(.cols = temp:reagent, 
                                             ~ as.numeric(as.character(.x))))
desilylation.df <- dplyr::select(desilylation.df, -c(trt))
desilylation.df <- dplyr::mutate(desilylation.df, across(.cols = temp:reagent, 
                                               ~ scales::rescale(.x, to = c(-1, 1))))
desilylation_reg.lm <- lm(yield ~ (.) ^ 4, data = desilylation.df)
knitr::kable(2 * coef(desilylation_reg.lm)[-1], caption = "Desilylation example: factorial effects calculated using a regression model.")
```

A regression modelling approach is usually more straightforward to apply than defining contrasts in the unit-treatment model, and makes clearer the connection between interaction contrasts and products of main effect contrasts (automatically defined in a regression model). It also enables us to make use of the `effects` package in `R` to quickly produce main effect and interaction plots. 

```{r desilylation-effects, fig.cap = "Desilylation experiment: interaction plot generated using the `effects` package."}
temp_x_time <- effects::Effect(c("temp", "time"), desilylation_reg.lm, xlevels = list(time = c(-1, 1)), se = F)
plot(temp_x_time, main = "", rug = F, x.var = "temp", ylim = c(80, 100))
```

### ANOVA for factorial experiments

The basic ANOVA table has the following form.

Table: (\#tab:anova-fact) The ANOVA table for a full factorial experiment

| Source      | Degress of Freedom  | (Sequential) Sum of Squares                                                                                | Mean Square    |
| :---        | :---                | :---                                                                                                       | :---           |   
| Regression  | $2^f-1$               | $\sum_{j=1}^{2^f-1}n\hat{\beta}_j^2 - n\bar{y}^2$                                                                          | Reg SS/$(2^f-1)$ |
| Residual    | $2^f(r-1)$               | $(\boldsymbol{Y}-X\hat{\boldsymbol{\beta}})^{\textrm{T}}(\boldsymbol{Y}-X\hat{\boldsymbol{\beta}})$ | RSS/$(2^f(r-1))$ |
| Total       | $2^fr-1$               | $\boldsymbol{Y}^{\textrm{T}}\boldsymbol{Y}-n\bar{Y}^{2}$     |             |

The regression sum of squares for a factorial experiment has a very simple form. If we include an intercept column in $X$, from Section \@ref(anova-revision),

\begin{align*}
\mbox{Regression SS} & = \mbox{RSS(null)} - \mbox{RSS} \\
& = \hat{\boldsymbol{\beta}}^{\mathrm{T}}X^{\mathrm{T}}X\hat{\boldsymbol{\beta}} - n\bar{y}^2 \\
& = \sum_{j=1}^{2^f-1}n\hat{\beta}_j^2 - n\bar{y}^2\,,
\end{align*}

as $X^{\mathrm{T}}X = nI_{2^f}$. Hence, the $j$th factorial effect contributes $n\hat{\beta}_j^2$ to the regression sum if squares, and this quantity can be used to construct a test statistic **if** $r>1$ and hence an estimate of $\sigma^2$ is available. For Example \@ref(exm:factorial-desilylation), the regression sum of squares and ANOVA table are given in Tables \@ref(tab:desilylation-regss) and \@ref(tab:desilylation-anova-ss).

```{r desilylation-regss, warning = F, message = F, results = 'hold'}
ss <- nrow(desilylation) * coef(desilylation_reg.lm)^2
regss <- sum(ss) - nrow(desilylation) * mean(desilylation$yield)^2
names(regss) <- "Regression"
knitr::kable(c(regss, ss[-1]), col.names = "Sum Sq.", caption = "Desilylation experiment: regression sums of squares for each factorial effect calculated directly.")
```
```{r desilylation-anova-ss}
knitr::kable(anova(desilylation_reg.lm)[, 1:2], caption = "Desilylation experiment: ANOVA table from `anova` function.")
```

## Exercises

1. A reactor experiment that was presented by Box, Hunter and Hunter (2005, pp259-261) that used a full factorial design for $m=5$ factors, each at two levels, to investigate the effect of *feed rate* (litres/min), *catalyst* (%), *agitation rate* (rpm), *temperature* (C) and *concentration* (%) on the *percentage reacted*. The levels of the experimental factors will be coded as $-1$ for low level, and $1$ for high level. Table \@ref(tab:reactor-expt) presents the true factor settings corresponding to these coded values. 


    Table: (\#tab:reactor-expt) Factor levels for the full factorial reactor experiment

    | Factor | Low level ($-1$) | High level ($1$) |
    | :----: | :--------------: | :--------------: |
    | Feed Rate (litres/min) | 10	| 15 |
    | Catalyst (%)	| 1	| 2 |
    | Agitation Rate (rpm) | 100 | 120 |
    | Temperature (C) | 140 | 180 |
    | Concentration (\%) | 3 | 6 |
    
    The data from this experiment is given in Table \@ref(tab:reactor-data).
    
    ```{r reactor-data, message = F}
    reactor.frf2 <- FrF2::FrF2(nruns = 32, nfactors = 5, randomize = F,
                               factor.names = c("FR", "Cat", "AR", "Temp", "Conc"))
    y <- c(61, 53, 63, 61, 53, 56, 54, 61, 69, 61, 94, 93, 66, 60, 95, 98, 56, 63,
           70, 65, 59, 55, 67, 65, 44, 45, 78, 77, 49, 42, 81, 82)
    reactor <- data.frame(reactor.frf2, pre.react = y)
    knitr::kable(reactor, caption = "Reactor experiment.")
    ```
    a. Estimate all the factorial effects from this experiment, and use a half-normal plot and Lenth's method to decide which are significantly different from zero.
    
    b. Use the `effects` package to produce main effect and/or interaction plots for each significant factorial effect from part a.
    
    c. Now fit a regression model that only includes terms corresponding to main effects and two-factor interactions. How many degrees of freedom does this model use? What does this mean for the estimation of $\sigma^2$? How does the estimate of $\sigma^2$ from this model relate to your analysis in part a?
    
<details><summary><b>Solution</b></summary>

a. We will estimate the factorial effects as twice the corresponding regression parameters.

```{r reactor-estimate}
reactor <- dplyr::mutate(reactor, across(.cols = FR:Conc, 
                                         ~ as.numeric(as.character(.x))))
reactor.lm <- lm(pre.react ~ (.) ^ 5, data = reactor)
fac.effects <- 2 * coef(reactor.lm)[-1]
knitr::kable(fac.effects, caption = "Reactor experiment: estimated factorial effects.")
```

There are several large factorial effects, including the main effects of Catalyst and Temperature and the interaction between these factors, and the interaction between Concentration and Temperature. We can assess their significance using a half-normal plot and Lenth's method.

```{r reactor-lenth}
unrepx::hnplot(fac.effects, horiz = F, method = "Lenth", alpha = 0.05)
```

We see that PSE = `r unrepx::PSE(fac.effects, method = "Lenth")`, giving individual and simultaneous margins of error of `r unrepx::ME(fac.effects, method = "Lenth")[1]` and `r unrepx::ME(fac.effects, method = "Lenth")[2]`, respectively (where the latter is adjusted for multiple testing). There is a very clear distinction between the five effects which are largest in absolute value and the other factorial effects, which form a very clear line. The five of the largest effects are given in Table \@ref(tab:reactor-largest), are all greater than both margins of error and can be declared as significant. 

```{r reactor-largest}
knitr::kable(fac.effects[abs(fac.effects) > unrepx::ME(fac.effects, 
                                                       method = "Lenth")[2]],
             caption = "Reactor experiment: factorial effects significantly different from zero via Lenth's method.")
```

b. We will produce plots for the interactions between Catalyst and Temperature and Temperature and Concentration. We will not produce main effect plots for Catalyst and Temperature, as these are involved in the large interactions.

```{r reactor-plots, warning = F, fig.show = 'hold', fig.cap = "Reactor experiment: interaction plots."}
Cat_x_Temp <- effects::Effect(c("Cat", "Temp"), reactor.lm, 
                              xlevels = list(Cat = c(-1, 1), Temp = c(-1, 1)), 
                              se = F)
Temp_x_Conc <- effects::Effect(c("Temp", "Conc"), reactor.lm, 
                               xlevels = list(Conc = c(-1, 1), Temp = c(-1, 1)),
                               se = F)
plot(Cat_x_Temp, style = "stacked", main = "", rug = F, x.var = "Cat", 
     ylim = c(50, 90))
plot(Temp_x_Conc, style = "stacked", main = "", rug = F, x.var = "Conc", 
     ylim = c(50, 90))
```
Notice that changing the level of Temperature changes substantial the effect of both Catalyst and Concentration on the response; in particular, the effect of Concentration changes sign depending on the level of Temperature.

c. We start by fiting the reduced regression model.

```{r reactor-reduced}
reactor2.lm <- lm(pre.react ~ (.) ^ 2, data = reactor)
summary(reactor2.lm)
```
This model includes regression parameters corresponding to $5 + {5 \choose 2} = `r length(reactor2.lm$coefficients) - 1`$ factorial effects, plus the intercept, and hence uses `r length(reactor2.lm$coefficients)` degrees of freedom. The remaining 16 degrees of freedom, which were previously used to estimate three-factor and higher interactions, is now used to estimate $\sigma^2$, the background variation.

The residual mean square in the reduced model, used to estimate $\sigma^2$, is the sum of the sums of squares for the higher-order interactions in the original model, divided by 16 (the remaining degrees of freedom). 

```{r reactor-compare-sigma2, warning = F}
sum(anova(reactor.lm)[16:31, 3]) / 16
summary(reactor2.lm)$sigma^2

```

This "pooling" of higher-order effects to estimate $\sigma^2$ maybe a reasonable strategy here, given that the high-order interactions are all small, but could be biased if one or more interactions were large.
</details>

2. [Adapted from @Morris2011] Consider an unreplicated ($r=1$) $2^6$ factorial experiment. The total sums of squares,

    $$
\mbox{Total SS} = \sum_{i=1}^n(y_i - \bar{y})^2\,,
$$

    has value 2856. Using Lenth's method, an informal analysis of the data suggests that there are only three important factorial effects, with least squares estimates 

    - main effect of factor $A$ = 3
    - interaction between factors $A$ and $B$ = 4
    - interaction between factors $A$, $B$ and $C$ = 2.

    If a linear model including only an intercept and these three effects is fitted to the data, what is the value of the residual sum of squares?
  
<details><summary><b>Solution</b></summary>
The residual sum of squares has the form

$$
\mbox{RSS} = (\bY - X\hat
{\boldsymbol{\beta}})^{\mathrm{T}}(\bY - X\hat
{\boldsymbol{\beta}})\,,  
$$

where in this case $X$ is a $2^6\times 4$ model matrix, with columsn corresponding to the intercept, main effect of factor $A$, the interaction between factors $A$ and $B$, the interaction between factors $A$, $B$ and $C$. We can rewrite the RSS as

\begin{equation*}
\begin{split}
\mbox{RSS} & = (\bY - X\hat
{\boldsymbol{\beta}})^{\mathrm{T}}(\bY - X\hat
{\boldsymbol{\beta}}) \\
& = \bY^{\mathrm{T}}\bY - 2\bY^{\mathrm{T}}X\hat
{\boldsymbol{\beta}} + \hat
{\boldsymbol{\beta}}^{\mathrm{T}}X^{\mathrm{T}}X\hat
{\boldsymbol{\beta}} \\
& = \bY^{\mathrm{T}}\bY - 2\hat
{\boldsymbol{\beta}}^{\mathrm{T}}X^{\mathrm{T}}X\hat
{\boldsymbol{\beta}} + \hat
{\boldsymbol{\beta}}^{\mathrm{T}}X^{\mathrm{T}}X\hat
{\boldsymbol{\beta}} \\
& = \bY^{\mathrm{T}}\bY - \hat
{\boldsymbol{\beta}}^{\mathrm{T}}X^{\mathrm{T}}X\hat
{\boldsymbol{\beta}}\,, 
\end{split}
\end{equation*}

as $\bY^{\mathrm{T}}X = \hat{\boldsymbol{\beta}}^{\mathrm{T}}X^{\mathrm{T}}X$.

Due the matrix $X$ having orthogonal columns, $X^{\mathrm{T}}X = 2^fI_{p+1}$, for a model containing coefficients corresponding to $p$ factorial effects; here, $p=3$. Hence,

$$
\mbox{RSS} = \bY^{\mathrm{T}}\bY - 2^f \sum_{i=0}^{p}\hat{\beta}_i^2\,.
$$

Finally, the estimate of the intercept takes the form $\hat{\beta}_0 = \bar{Y}$, and so

\begin{equation*}
\begin{split}
\mbox{RSS} & = \bY^{\mathrm{T}}\bY - 2^f\bar{y}^2 -  2^f\sum_{i=1}^{p}\hat{\beta}_i^2 \\
& = \sum_{i=1}^{2^f}(y_i - \bar{y})^2 - 2^f\sum_{i=1}^{p}\hat{\beta}_i^2 \\
& = \mbox{Total SS} - 2^f\sum_{i=1}^{p}\hat{\beta}_i^2\,
\end{split}
\end{equation*}

Recalling that each regression coefficient is one-half of the corresponding factorial effect, for this example we have:

$$
\mbox{RSS} = 2856 - 2^6(1.5^2 + 2^2 + 1^2) = `r 2856 - 2^6*(1.5^2 + 2^2 + 1^2)`\,.
$$

</details>

3. [Adapted from @Morris2011] Consider a $2^7$ experiment with each treatment applied to two units ($r=2$). Assume a linear regression model will be fitted containing terms corresponding to all factorial effects.

    a. What is the variance of the estimator of each factorial effect, up to a constant factor $\sigma^2$?
  
    b. What is the variance of the least squares estimator of $E(y_{11})$, the expected value of an observation with the first treatment applied? You can assume the treatments are given in standard order, so the first treatment is defined by setting all factors to their low level. [The answer is, obviously, the same for $E(y_{12})$]. In a practical experimental setting, why is this not a useful quantity to estimate?
  
    c. What is the variance of the least squares estimator of $E(y_{11}) - E(y_{21})$? You may assume that the second treatment has all factors set to their low levels except for the seventh factor.

<details><summary><b>Solutions</b></summary>

a. Each factorial contrast is scaled so the variance for the estimator is equal to $4\sigma^2/n = \sigma^2 / 64$.

b. $E(y_{11}) = \boldsymbol{x}_1^{\mathrm{T}}\boldsymbol{\beta}$, where $\boldsymbol{x}_1^{\mathrm{T}}$ is the row of the $X$ matrix corresponding to the first treatment and $\boldsymbol{\beta}$ are the regression coefficients. The estimator is given by

    $$
\hat{E}(y_{11}) = \boldsymbol{x}_1^{\mathrm{T}}\hat{\boldsymbol{\beta}}\,,
$$

    with variance

    \begin{align*}
\mathrm{var}\left\{\hat{E}(y_{11})\right\} & = \mathrm{var}\left\{\boldsymbol{x}_1^{\mathrm{T}}\hat{\boldsymbol{\beta}}\right\} \\
& = \boldsymbol{x}_1^{\mathrm{T}}\mbox{var}(\hat{\boldsymbol{\beta}})\boldsymbol{x}_1 \\
& = \boldsymbol{x}_1^{\mathrm{T}}\left(X^\mathrm{T}X\right)^{-1}\boldsymbol{x}_1\sigma^2 \\
& = \frac{\boldsymbol{x}_1^{\mathrm{T}}\boldsymbol{x}_1\sigma^2}{2^8} \\
& = \frac{2^7\sigma^2}{2^8} \\
& = \sigma^2 / 2\,.
\end{align*}

    This holds for the expected response from any treatment, as $\boldsymbol{x}_j^{\mathrm{T}}\boldsymbol{x}_j = 2^7$ for all treatments, as each entry of $\boldsymbol{x}_j$ is equal to $\pm 1$.

    This would not be a useful quantity to estimate in a practical experiment, as it is not a contrast in the treatments. In particular, it depends on the estimate of the overall mean, $\mu$ or $\beta_0$ (in the unit-treatment or regression model) that will vary from experiment to experiment.

c. The expected values of $y_{11}$ and $y_{21}$ will only differ in terms involving the seventh factor, which is equal to its low level (-1) for the first treatment and its high level (+1) for the second treatment; all the other terms will cancel. Hence

    $$
E(y_{11}) - E(y_{21}) = -2\left(\beta_7 + \sum_{j=1}^6\beta_{j7} + \sum_{j=1}^6\sum_{k=j+1}^6\beta_{jk7} + \ldots + \beta_{1234567}\right)\,.
$$

    The variance of the estimator has the form

    \begin{align*}
\mathrm{var}\left\{\widehat{E(y_{11}) - E(y_{21})}\right\} & = 4\times\mathrm{var}\bigg(\hat{\beta}_7 + \sum_{j=1}^6\hat{\beta}_{j7} + \sum_{j=1}^6\sum_{k=j+1}^6\hat{\beta}_{jk7} + \\
& \ldots + \hat{\beta}_{1234567}\bigg) \\
& = \frac{4\sigma^2}{2\times 2^7}\sum_{j=0}^6{6 \choose j} \\
& = \frac{\sigma^2}{2^6}\times `r sum(choose(6, 0:6))` \\
& = \sigma^2\,.
\end{align*}

    Or, as this is a treatment comparison in a CRD, we have

    $$
    \hat{E}(y_{11}) - \hat{E}(y_{21}) = \widehat{\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}}\,,
$$

    where $\boldsymbol{c}$ corresponds to a pairwise treatment comparison, and hence has one entry equal to +1 and one entry equal to -1. From Section \@ref(contrast-crd),

    \begin{align*}
\mathrm{var}\left(\widehat{\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}}\right) 
& = \sum_{i=1}^tc_i^2\mathrm{var}(\bar{y}_{i.}) \\
& = \sigma^2\sum_{i=1}^tc_i^2/n_i\,,
\end{align*}

    where in this example $n_i = 2$ for all $i$ and $\sum_{i=1}^tc_i^2 = 2$. Hence, the variance is again equal to $\sigma^2$. 

</details>