-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path02-comparative.Rmd
802 lines (582 loc) · 46.6 KB
/
02-comparative.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
# Completely randomised designs {#crd}
The simplest form of experiment we will consider compares $t$ different **unstructured** treatments. By unstructured, we mean the treatments form a discrete collection, not related through the settings of other experimental features (compare with factorial experiments in Chapter \@ref(factorial)). We also make the assumption that there are no restrictions in the randomisation of treatments to experimental units (compare with Chapter \@ref(blocking) on blocking). A designs for such an experiment is therefore called a **completely randomised design** (CRD).
::: {.example #one-way}
Pulp experiment [@WH2009, ch. 2]
In a paper pulping mill, an experiment was run to examine differences between the reflectance (brightness; ratio of amount of light leaving a target to the amount of light striking the target) of sheets of pulp made by $t=4$ operators. The data are given in Table \@ref(tab:pulp-expt-data) below.
```{r pulp-expt-data, warning = F}
pulp <- data.frame(operator = rep(factor(1:4), 5),
repetition = rep(1:5, rep(4, 5)),
reflectance = c(59.8, 59.8, 60.7, 61.0, 60.0, 60.2, 60.7, 60.8,
60.8, 60.4, 60.5, 60.6, 60.8, 59.9, 60.9, 60.5, 59.8, 60.0, 60.3, 60.5)
)
knitr::kable(
tidyr::pivot_wider(pulp, names_from = operator, values_from = reflectance)[, -1],
col.names = paste("Operator", 1:4),
caption = "Pulp experiment: reflectance values (unitless) from four different operators."
)
```
The experiment has one factor (operator) with four levels (sometimes called a one-way layout). The CRD employed has equal replication of each treatment (operator).
We can informally compare the responses from these four treatments graphically.
```{r pulp-boxplot, fig.align = 'center', fig.cap = 'Pulp experiment: distributions of reflectance from the four operators.'}
boxplot(reflectance ~ operator, data = pulp)
```
Figure \@ref(fig:pulp-boxplot) shows that, relative to the variation, there may be a difference in the mean response between treatments 1 and 2, and 3 and 4. In this chapter, we will see how to make this comparison formally using linear models, and to assess how the choice of design impacts our results.
:::
Throughout this chapter we will assume the $i$th treatment is applied to $n_i$ experimental unit, with total number of runs $n = \sum_{i=1}^t n_i$ in the experiment.
## A unit-treatment linear model
An appropriate, and common, model to describe data from such experiments when the response is continuous is given by
\begin{equation}
y_{ij} = \mu + \tau_i + \varepsilon_{ij}\,, \quad i = 1, \ldots, t; j = 1, \ldots, n_i\,,
(\#eq:utm)
\end{equation}
where $y_{ij}$ is the response from the $j$th application of treatment $i$, $\mu$ is a constant parameter, $\tau_i$ is the effect of the $i$th treatment, and $\varepsilon_{ij}$ is the random individual effect from each experimental unit with $E(\varepsilon_{ij})=0$ and $\mathrm{Var}(\varepsilon_{ij}) = \sigma^2$. All random errors are assumed independent and here we also assume $\varepsilon_{ij} \sim N(0, \sigma^2)$.
Model \@ref(eq:utm) assumes that each treatment can be randomly allocated to one of the $n$ experimental units, and that the response observed is independent of the allocation of all the other treatments (the stable unit treatment value assumption or SUTVA).
Why is this model appropriate and commonly used? The expected response from the application of the $i$th treatment is
$$
E(y_{ij}) = \mu + \tau_i\,.
$$
The parameter $\mu$ can be thought of as representing the impact of many different features particular to **this** experiment but common to all units, and $\tau_i$ is the deviation due to applying treatment $i$. From the applicable of two different hypothetical experiments, A and B, the expected response from treatment $i$ may be different due to a different overall mean. From experiment A:
$$
E(y_{ij}) = \mu_{\mathrm{A}} + \tau_i\,.
$$
From experiment B:
$$
E(y_{ij}) = \mu_{\mathrm{B}} + \tau_i\,.
$$
But the **difference** between treatments $k$ and $l$ ($k, l = 1,\ldots, t$)
\begin{align*}
E(y_{kj}) - E(y_{lj}) & = \mu_A + \tau_k - \mu_A - \tau_l \\
& = \tau_k - \tau_l\,,
\end{align*}
is constant across different experiments. This concept of **comparison** underpins most design of experiments, and will be applied throughout this module.
## The partitioned linear model
In matrix form, we can write model \@ref(eq:utm) as
$$
\bY = X_1\mu + X_2\btau + \bvarepsilon\,,
$$
where $X_1 = \boldsymbol{1}_n$, the $n$-vector with every entry equal to one,
$$
X_2 = \bigoplus_{i = 1}^t \boldsymbol{1}_{n_i} = \begin{bmatrix}
\boldsymbol{1}_{n_1} & \boldsymbol{0}_{n_1} & \cdots & \boldsymbol{0}_{n_1} \\
\boldsymbol{0}_{n_2} & \boldsymbol{1}_{n_2} & \cdots & \boldsymbol{0}_{n_2} \\
\vdots & & \ddots & \vdots \\
\boldsymbol{0}_{n_t} & \boldsymbol{0}_{n_t} & \cdots & \boldsymbol{1}_{n_t} \\
\end{bmatrix}\,,
$$
with $\bigoplus$ denoting "direct sum", $\boldsymbol{0}_{n_i}$ is the $n_i$-vector with every entry equal to zero, $\btau = [\tau_1, \ldots, \tau_t]^{\mathrm{T}}$ and $\bvarepsilon = [\varepsilon_{11}, \ldots, \varepsilon_{tn_t}]^{\mathrm{T}}$.
Why are we partitioning the model? Going back to our discussion of the role of $\mu$ and $\tau_i$, it is clear that we not interested in estimating $\mu$, which represents an experiment-specific contribution to the expected mean. Our only interest is in estimating the (differences between the) $\tau_i$. Hence, we can treat $\mu$ as a nuisance parameter.
If we define $X = [X_1\, \vert\, X_2]$ and $\bbeta^{\mathrm{T}} = [\mu \vert \btau^{\mathrm{T}}]$, we can write the usual least squares equations
\begin{equation}
X^{\mathrm{T}}X\hat{\bbeta} = X^{\mathrm{T}}\bY
(\#eq:crd-ls)
\end{equation}
as a system of two matrix equations
\begin{align*}
X_1^{\mathrm{T}}X_1\hat{\mu} + X_1^{\mathrm{T}}X_2\hat{\btau} & = X_1^{\mathrm{T}}\bY \\
X_2^{\mathrm{T}}X_1\hat{\mu} + X_2^{\mathrm{T}}X_2\hat{\btau} & = X_2^{\mathrm{T}}\bY\,. \\
\end{align*}
Assuming $(X_1^{\mathrm{T}}X_1)^{-1}$ exists, which it does in this case, we can pre-multiply the first of these equations by $X_2^{\mathrm{T}}X_1(X_1^{\mathrm{T}}X_1)^{-1}$ and subtract it from the second equation to obtain
\begin{align*}
X_2^{\mathrm{T}}[I_n - X_1(X_1^{\mathrm{T}}X_1)^{-1}X_1^{\mathrm{T}}]X_1\hat{\mu}
& + X_2^{\mathrm{T}}[I_n - X_1(X_1^{\mathrm{T}}X_1)^{-1}X_1^{\mathrm{T}}]X_2\hat{\btau} \\
& = X_2^{\mathrm{T}}[I_n - X_1(X_1^{\mathrm{T}}X_1)^{-1}X_1^{\mathrm{T}}]\bY\,.
\end{align*}
Writing $H_1 = X_1(X_1^{\mathrm{T}}X_1)^{-1}X_1^{\mathrm{T}}$, we obtain
\begin{equation}
X_2^{\mathrm{T}}[I_n - H_1]X_1\hat{\mu} + X_2^{\mathrm{T}}[I_n - H_1]X_2\hat{\btau} = X_2^{\mathrm{T}}[I_n - H_1]\bY\,.
(\#eq:almost-reduced)
\end{equation}
The matrix $H_1$ is a "hat" matrix for a linear model containing only the term $\mu$, and hence $H_1X_1 = X_1$ (see MATH2010 or STAT6123). Hence the first term in \@ref(eq:almost-reduced) is zero, and we obtain the **reduced normal equations** for $\btau$:
\begin{equation}
X_2^{\mathrm{T}}[I_n - H_1]X_2\hat{\btau} = X_2^{\mathrm{T}}[I_n - H_1]\bY\,.
(\#eq:crd-red-normal)
\end{equation}
Note that the solutions from \@ref(eq:crd-red-normal) are not different from the solution to $\hat{\btau}$ that would be obtained from solving \@ref(eq:crd-ls); equation \@ref(eq:crd-red-normal) is simply a re-expression, where we have eliminated the nuisance parameter $\mu$. This fact means that we rarely need to solve \@ref(eq:crd-red-normal) explicitly.
Recalling that for a hat matrix, $I_n - H_1$ is idempotent and symmetric (see MATH2010 or MATH6174), if we define
$$
X_{2|1} = (I_n - H_1)X_2\,,
$$
then we can rewrite equation \@ref(eq:crd-red-normal) as
\begin{equation}
X_{2|1}^{\mathrm{T}}X_{2|1}\hat{\btau} = X_{2|1}^{\mathrm{T}}\bY\,,
(\#eq:rne)
\end{equation}
which are the normal equations for a linear model with expectation $E(\bY) = X_{2|1}\btau$.
## Reduced normal equations for the CRD
For the CRD discussed in this chapter, $X_1^{\mathrm{T}}X_1 = n$, the total number of runs in the experiment^[In later chapters we will see examples where $X_1$ has $>1$ columns, and hence $X_1^{\mathrm{T}}X_1$ is a matrix.]. Hence $(X_1^{\mathrm{T}}X_1)^{-1} = 1/n$ and $H_1 = \frac{1}{n}J_n$, with $J_n$ the $n\times n$ matrix with all entries equal to 1.
The adjusted model matrix then has the form
\begin{align}
X_{2|1} & = (I_n - H_1)X_2 \nonumber\\
& = X_2 - \frac{1}{n}J_nX_2 \nonumber\\
& = X_2 - \frac{1}{n}[n_1\boldsymbol{1}_n \vert \cdots \vert n_t\boldsymbol{1}_n]\,. (\#eq:crd-x21)
\end{align}
That is, every column of $X_2$ has been adjusted by the subtraction of the column mean from each entry^[Often called "column centred"]. Also notice that each row of $X_{2|1}$ has a row-sum equal to zero ($= 1 - \sum_{i=1}^tn_t/n$). Hence, $X_{2|1}$ is not of full column rank, and so the reduced normal equations do not have a unique solution^[If we recalled the material on "dummy" variables from MATH2010 or STAT6123, we would already have realised this.].
Although \@ref(eq:rne), and hence, \@ref(eq:crd-ls), have no unique solution^[That is, for any two solutions $\tilde{\boldsymbol{\beta}}_1$ and $\tilde{\boldsymbol{\beta}}_2$, $X\tilde{\boldsymbol{\beta}}_1 = X\tilde{\boldsymbol{\beta}}_2$.], it can be shown that both $\widehat{X_{2|1}\boldsymbol{\tau}}$ and $\widehat{X\boldsymbol{\beta}}$ have unique solutions. Hence fitted values $\hat{\bY} = \widehat{X\boldsymbol{\beta}}$ and the residual sum of squares
$$
RSS = \left(\bY - \widehat{X\boldsymbol{\beta}}\right)^{\mathrm{T}}\left(\bY - \widehat{X\boldsymbol{\beta}}\right)
$$
are both uniquely defined for any solution to \@ref(eq:crd-ls). That is, every solution to the normal equations leads to the same fitted values and residual sum of squares.
In MATH2010 and STAT6123 we fitted models with categorical variables by defining a set of dummy variables and estimating a reduced model. Here, we will take a slightly different approach and study which combinations of parameters from \@ref(eq:utm) are estimable, and in particular which linear combinations of the treatment parameters $\tau_i$ we can estimate.
Let's study equation \@ref(eq:rne) in more detail. We have
\begin{align*}
X^{\mathrm{T}}_{2|1}X_{2|1} & = X_2^{\mathrm{T}}(I_n - H_1)X_2 \\
& = X_2^{\mathrm{T}}X_2 - X_2^{\mathrm{T}}H_1X_2 \\
& = \mathrm{diag}(\boldsymbol{n}) - \frac{1}{n}X_2^{\mathrm{T}}J_nX_2 \\
& = \mathrm{diag}(\boldsymbol{n}) - \frac{1}{n}\boldsymbol{n}\boldsymbol{n}^{\mathrm{T}}\,,
\end{align*}
where $\boldsymbol{n}^{\mathrm{T}} = (n_1, \ldots, n_t)$. Hence, the reduced normal equations become
\begin{align}
\left[\mathrm{diag}(\boldsymbol{n}) - \frac{1}{n}\boldsymbol{n}\boldsymbol{n}^{\mathrm{T}}\right]\hat{\boldsymbol{\tau}} & = X_2^{\mathrm{T}}\by - \frac{1}{n}X_2^{\mathrm{T}}J_n\bY \\
& = X_2^{\mathrm{T}}\by - \boldsymbol{n}\bar{y}_{..}\,,
(\#eq:crd-rne)
\end{align}
where $\bar{y}_{..} = \frac{1}{n}\sum_{i = 1}^t\sum_{j = 1}^{n_i} y_{ij}$, i.e. the overall average of the observations from the experiment.
From \@ref(eq:crd-rne) we obtain a system of $t$ equations, each having the form
\begin{equation}
\hat{\tau}_i - \hat{\tau}_w = \bar{y}_{i.} - \bar{y}_{..}\,,
(\#eq:crd-irne)
\end{equation}
where $\hat{\tau}_w = \frac{1}{n}\sum_{i=1}^tn_i\hat{\tau}_i$ and $\bar{y}_{i.} = \frac{1}{n_i}\sum_{j = 1}^{n_i}y_{ij}$ $(i = 1, \ldots, t)$.
These $t$ equations are not independent; when multiplied by the $n_i$, they sum to zero due to the linear dependency between the columns of $X_{2|1}$. Hence, there is no unique solution to $\hat{\boldsymbol{\tau}}$ from equation \@ref(eq:crd-rne). However, we can estimate certain linear combinations of the $\tau_i$, called *contrasts*.
## Contrasts
::: {.definition #contrast}
A treatment **contrast** is a linear combination $\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}$ with coefficient vector $\boldsymbol{c}^{\mathrm{T}} = (c_1,\ldots, c_t)$ such that $\boldsymbol{c}^{\mathrm{T}}\boldsymbol{1} = 0$; that is, $\sum_{i = 1}^t c_i = 0$.
:::
For example, assume we have $t = 3$ treatments, then the following vectors $\boldsymbol{c}$ all define contrasts:
1. $\boldsymbol{c}^{\mathrm{T}} = (1, -1, 0)$,
2. $\boldsymbol{c}^{\mathrm{T}} = (1, 0, -1)$,
3. $\boldsymbol{c}^{\mathrm{T}} = (0, 1, -1)$.
In fact, they define all ${3\choose 2} = 3$ pairwise comparisons between treatments. The following are also contrasts:
4. $\boldsymbol{c}^{\mathrm{T}} = (2, -1, -1)$,
5. $\boldsymbol{c}^{\mathrm{T}} = (0.5, -1, 0.5)$,
each comparing the sum, or average, of expected responses from two treatments to the expected response from the remaining treatment.
The following are not contrasts, as $\boldsymbol{c}^{\mathrm{T}}\boldsymbol{1} \ne 0$:
6. $\boldsymbol{c}^{\mathrm{T}} = (2, -1, 0)$,
7. $\boldsymbol{c}^{\mathrm{T}} = (1, 0, 0)$,
with the final example once again demonstrating that we cannot estimate the individual $\tau_i$.
## Treatment contrast estimators in the CRD {#contrast-crd}
We estimate a treatment contrast $\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}$ in the CRD via linear combinations of equations \@ref(eq:crd-irne):
\begin{align*}
& \sum_{i=1}^t c_i\hat{\tau}_i - \sum_{i=1}^tc_i\hat{\tau}_w = \sum_{i=1}^tc_i\bar{y}_{i.} - \sum_{i=1}^tc_i\bar{y}_{..} \\
\Rightarrow & \sum_{i=1}^t c_i\hat{\tau}_i = \sum_{i=1}^tc_i\bar{y}_{i.}\,,
\end{align*}
as $\sum_{i=1}^tc_i\hat{\tau}_w = \sum_{i=1}^tc_i\bar{y}_{..} = 0$, as $\sum_{i=1}^tc_i = 0$. Hence, the unique estimator of the contrast $\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}$ has the form
$$
\widehat{\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}} = \sum_{i=1}^tc_i\bar{y}_{i.}\,.
$$
That is, we estimate the contrast in the treatment effects by the corresponding contrast in the treatment means.
The variance of this estimator is straightforward to obtain:
\begin{align*}
\mathrm{var}\left(\widehat{\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}}\right)
& = \sum_{i=1}^tc_i^2\mathrm{var}(\bar{y}_{i.}) \\
& = \sigma^2\sum_{i=1}^tc_i^2/n_i\,,
\end{align*}
as, under our model assumptions, each $\bar{y}_{i.}$ is an average of independent observations with variance $\sigma^2$. Similarly, from model \@ref(eq:utm) we can derive the distribution of $\widehat{\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}}$ as
$$
\widehat{\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}} \sim N\left(\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}, \sigma^2\sum_{i=1}^tc_i^2/n_i\right)\,.
$$
Confidence intervals and hypothesis tests for $\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}$ can be constructed/conducted using this distribution, e.g.
- a $100(1-\alpha)$% confidence interval:
$$
\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau} \in \sum_{i=1}^tc_i\bar{y}_{i.} \pm t_{n-t, 1-\frac{\alpha}{2}}s\sqrt{\sum_{i=1}^tc_i^2/n_i}\,,
$$
where $t_{n-t, 1-\frac{\alpha}{2}}$ is the $1-\frac{\alpha}{2}$ quantile of a $t$-distribution with $n-t$ degrees of freedom and
\begin{equation}
s^2 = \frac{1}{n-t}\sum_{i=1}^t\sum_{j=1}^{n_i}(y_{ij} - \bar{y}_{i.})^2
(\#eq:crd-s2)
\end{equation}
is the estimate of $\sigma^2$.
- the hypothesis $H_0: \boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau} = 0$ against the two-sided alternative $H_1: \boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau} \ne 0$ is rejected using a test of with confidence level $1-\alpha$ if
$$
\frac{|\sum_{i=1}^tc_i\bar{y}_{i.}|}{s\sqrt{\sum_{i=1}^tc_i^2/n_i}} > t_{n-t, 1-\frac{\alpha}{2}}\,.
$$
## Analysing CRDs in R {#r-crd}
Let's return to Example \@ref(exm:one-way).
```{r one-way-analysis}
knitr::kable(
tidyr::pivot_wider(pulp, names_from = operator, values_from = reflectance)[, -1],
col.names = paste("Operator", 1:4),
caption = "Pulp experiment: reflectance values (unitless) from four different operators."
)
```
Clearly, we could directly calculate, and then compare, mean responses for each operator. However, there are (at least) two other ways we can proceed which use the fact we are fitting a linear model. These will be useful when we consider more complex models.
1. Using `pairwise.t.test`.
```{r one-way-t-test}
with(pulp,
pairwise.t.test(reflectance, operator, p.adjust.method = 'none'))
```
This function performs hypothesis tests for all pairwise treatment comparisons (with a default confidence level of 0.95). Here we can see that operators 1 and 4, 2 and 3, and 2 and 4 have statistically significant differences.
2. Using `lm` and the `emmeans` package.
```{r one-way-emmeans}
pulp.lm <- lm(reflectance ~ operator, data = pulp)
anova(pulp.lm)
pulp.emm <- emmeans::emmeans(pulp.lm, ~ operator)
pairs(pulp.emm, adjust = 'none')
```
Here, we have first fitted the linear model object. The `lm` function, by default, will have set up dummy variables with the first treatment (operator) as a baseline (see MATH2010 or STAT6123). We then take the intermediate step of calculating the ANOVA table for this experiment, and use an F-test to compare the model accounting for operator differences to the null model; there are differences between operators at the 5% significance level,
The choice of dummy variables in the linear model is unimportant; any set could be used^[Recall that although $\mu$ and $\boldsymbol{\tau}$ are not uniquely estimable, fitted values $\hat{y}_i = \hat{\mu} + \hat{\tau}_i$ **are**, and hence it does not matter which dummy variables we use in `lm`.], as in the next line we use the `emmeans` function (from the package of the same name) to specify that we are interested in estimating contrasts in the factor `operator` (which specifies our treatments in this experiment). Finally, the `pairs` command performs hypothesis tests for all pairwise comparisons between the four treatments. The results are the same as those obtained from using `pairwise.t.test`.
Our preferred approach is using method 2 (`lm` and `emmeans`), for four main reasons:
a. The function `contrasts` in the `emmeans` package can be used to estimate arbitrary treatment contrasts (see `help("contrast-methods")`).
```{r one-way-contrasts}
# same as `pairs` above
emmeans::contrast(pulp.emm, "pairwise", adjust = "none")
# estimating single contrast c = (1, -.5, -.5)
# comparing operator 1 with operators 2 and 3
contrast1v23.emmc <- function(levs, ...)
data.frame('t1 v avg t2 t3' = c(1, -.5, -.5, 0))
emmeans::contrast(pulp.emm, 'contrast1v23')
```
a. It more easily generalises to the more complicated models we will see in Chapter \@ref(blocking).
a. It explicitly acknowledges that we have fitted a linear model, and so encourages us to check the model assumptions (see [Exercise 3](#nap-black-ex)).
a. It is straightforward to apply adjustments for [multiple comparisons](#multiple-comp).
## Multiple comparisons {#multiple-comp}
When we perform hypothesis testing, we choose the critical region (i.e. the rule that decides if we reject $H_0$) to control the probability of a type I error; that is, we control the probability of incorrectly rejecting $H_0$. If we need to test multiple hypotheses, e.g. to test all pairwise differences, we need to consider the overall probability of incorrectly rejecting **one or more** null hypothesis. This is called the **experiment-wise** or **family-wise** error rate.
For Example \@ref(exm:one-way), there are ${4 \choose 2} = 6$ pairwise comparisons. Under the assumption that all tests are independent^[They aren't, but it simplifies the maths!], assuming each individual test has type I error 0.05, the experiment-wise type I error rate is given by:
```{r type-I}
alpha <- 0.05
1 - (1 - alpha)^6
```
An experiment-wise error rate of `r 1 - (1 - alpha)^6` is substantially greater than 0.05. Hence, we would expect to make many more type I errors than may be desirable. [xkcd](https://xkcd.com/882) has a fun example:
```{r xkcd-jelly-beans, echo = F, out.width = '75%', fig.align = 'center'}
if (knitr::is_html_output())
{
knitr::include_graphics("https://imgs.xkcd.com/comics/significant.png")
} else {
knitr::include_url("https://imgs.xkcd.com/comics/significant.png")
}
```
```{r type-I-jb}
alpha <- 0.05
1 - (1 - alpha)^20
```
Therefore it is usually desirable to maintain some control of the experiment-wise type I error rate. We will consider two methods.
1. The **Bonferroni method**. An upper bound on the experiment-wise type I error rate for testing $k$ hypotheses can be shown to be
\begin{align*}
P(\mbox{wrongly reject at least one of } H_{0}^1, \ldots, H_{0}^k) = & P\left(\bigcup_{i=1}^{k}\{\mbox{wrongly reject } H_{0}^i\}\right) \\
& \leq \sum_{i=1}^{k}\underbrace{P(\mbox{wrongly reject } H_{0}^i)}_{\leq \alpha} \\
& \leq k\alpha\,.
\end{align*}
Hence a *conservative*^[So the experiment-wise type I error will actually be less than the prescribed $\alpha$] adjustment for multiple comparisons is to test each hypothesis at size $\alpha / k$, i.e. for the CRD compare to the quantile $t_{n-t, 1-\frac{\alpha}{2k}}$ (or multiply each individual p-value by $k$).
For Example \@ref(exm:one-way), we can test all pairwise comparisons, each at size $\alpha/k$ using the `adjustment` argument in `pairs`.
```{r bonferroni}
pairs(pulp.emm, adjust = 'bonferroni')
```
Now, only one comparison is significant at an experiment-wise type I error rate of $\alpha = 0.05$ (operators 2 and 4).
1. **Tukey's method**. An alternative approach that gives an exact experiment-wise error rate of $100\alpha$% compares the $t$ statistic to a critical value from the studentised range distribution^[Given $t$ independent samples of size $n$ from the same normal distribution, the studentised range distribution is the distribution of $\frac{R}{S/\sqrt{n}}$, where $R = \bar{y}_{max}-\bar{y}_{min}$ is the difference between the largest and smallest sample means, and $S = \sqrt{\frac{1}{tn-1}\sum_{i=1}^t\sum_{j=1}^n(y_{ij} - \bar{y})^2}$ is the pooled sample standard deviation.], given by $\frac{1}{\sqrt{2}}q_{t, n-t, 1-\alpha}$ with $q_{t, n-t, 1-\alpha}$ the $1-\alpha$ quantile from the studentised range distribution (available in `R` as `qtukey`).
For Example \@ref(exm:one-way):
```{r tukey}
pairs(pulp.emm)
```
The default adjustment in the `pairs` function is the Tukey method. Comparing the p-values for each comparison using unadjusted t-tests, the Boneferroni and Tukey methods:
```{r bonf-tukey}
pairs.u <- pairs(pulp.emm, adjust = 'none')
pairs.b <- pairs(pulp.emm, adjust = 'bonferroni')
pairs.t <- pairs(pulp.emm)
data.frame(transform(pairs.b)[, 1:5], Bonf.p.value = transform(pairs.b)[, 6], Tukey.p.value = transform(pairs.t)[, 6], unadjust.p.value = transform(pairs.u)[, 6])
```
Although the decision on which hypotheses to reject (comparson 2 - 4) is the same here for both methods, the p-values from the Bonferroni method are all larger, reflecting its more conservative nature.
## Impact of design choices on estimation
Recall from Section \@ref(contrast-crd) that the width of a confidence interval for a contrast $\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}$ is given by $2t_{n-t, 1-\frac{\alpha}{2}}s\sqrt{\sum_{i=1}^tc_i^2/n_i}$. The expectation of the square of this quantity is given by
$$
4t^2_{n-t, 1-\frac{\alpha}{2}}\sigma^2\sum_{i=1}^tc_i^2/n_i\,,
$$
as $E(s^2) = \sigma^2$. It is intuitive that a good design should have small values of the square root of this quantity (divided by $2\sigma$),
$$
t_{n-t, 1-\frac{\alpha}{2}}\sqrt{\sum_{i=1}^tc_i^2/n_i}\,,
$$
which can be achieved either by increasing $n$, and hence reducing the size of the $t$-quantile, or for choice of the $n_i$ for a fixed $n$, i.e. through choice of replication of each treatment.
### Optimal treatment allocation {#crd-opt-all}
It is quite common that although the total number, $n$, of runs in the experiment may be fixed, the number $n_1, n_2, \ldots, n_t$ applied to the different treatments is under the experimenter's control. Choosing $n_1, n_2$ subject to $n_1+n_2 = n$ was the first **optimal design** problem we encountered in Chapter \@ref(intro).
Assume interest lies in estimating the set of $p$ contrasts $\boldsymbol{c}_1^{\mathrm{T}}\boldsymbol{\tau}, \ldots, \boldsymbol{c}_p^{\mathrm{T}}\boldsymbol{\tau}$, with $\boldsymbol{c}_l^{\mathrm{T}} = (c_{l1}, \ldots, c_{lt})$. One useful measure of the overall quality of the estimators of these $p$ contrasts is the average variance, given by
$$
\frac{\sigma^2}{p}\sum_{l=1}^p\sum_{i=1}^tc_{li}^2/n_i\,.
$$
So we will minimise this variance by allocating larger values of $n_i$ to the treatments with correspondingly larger values of the contrast coefficients $c_{li}$. Therefore an approach to optimal allocation is to choose $\boldsymbol{n} = (n_1, \ldots, n_t)^{\mathrm{T}}$ so as to
\begin{equation}
\mbox{minimise} \quad \phi(\boldsymbol{n}) = \sum_{l=1}^p\sum_{i=1}^tc_{li}^2/n_i\,\qquad \mbox{subject to} \quad \sum_{i=1}^tn_i = n\,.
(\#eq:opt-all)
\end{equation}
This is a discrete optimisation problem (the $n_i$ are integers). It is usually easier to solve the relaxed problem, where we allow continuous $0\le n_i \le n$, and round the resulting solution to obtain integers. There is no guarantee that such a rounded allocation will actually be the optimal integer-valued solution, but it is usually fairly close.
To solve the continuous version of \@ref(eq:opt-all) we will use the method of Lagrange mutliplers, where we define the function
$$
h(\boldsymbol{n}, \lambda) = \phi(\boldsymbol{n}) + \lambda\left(\sum_{i=1}^tn_i - n\right)\,,
$$
introducing the new scalar variable $\lambda$, and solve the set of $t+1$ equations:
\begin{align*}
\frac{\partial h}{\partial n_1} & = 0 \\
\vdots & \\
\frac{\partial h}{\partial n_t} & = 0 \\
\frac{\partial h}{\partial \lambda} & = 0\,.
\end{align*}
In this case, we have
\begin{equation}
\frac{\partial h}{\partial n_i} = -\sum_{l=1}^pc_{li}^2/n_i^2 + \lambda = 0\,,\quad i=1,\ldots t,
(\#eq:lm-ni)
\end{equation}
and
$$
\frac{\partial h}{\partial \lambda} = \sum_{i=1}^t n_i - n = 0\,.
$$
This last equation ensures $\sum_{i=1}^tn_i = n$. From the $t$ equations described by \@ref(eq:lm-ni), we get
$$
n_i \propto \sqrt{\sum_{l=1}^pc_{li}^2}
$$
We don't need to explicitly solve for $\lambda$ to find the normalising constant for each $n_i$. As we know $\sum_{i=1}^tn_i = n$, we obtain,
\begin{equation}
n_i = \frac{\sqrt{\sum_{l=1}^pc_{li}^2}}{\sum_{i=1}^t\sqrt{\sum_{l=1}^pc_{li}^2}}n\,.
(\#eq:opt-ni)
\end{equation}
Let's return to Example \@ref(exm:one-way) and calculate the optimal allocations under two different sets of contrasts. First, we define an `R` function for calculating \@ref(eq:opt-ni).
```{r opt-ni}
opt_ni <- function(C, n) {
CtC <- t(C) %*% C
n * sqrt(diag(CtC)) / sum(sqrt(diag(CtC)))
}
```
Checking that the function `opt-ni` matches \@ref(eq:opt-ni) is left as an exercise.
Consider two sets of contrasts:
1. All pairwise comparisons between the four treatments
\begin{align*}
c_1 & = (-1, 1, 0, 0) \\
c_2 & = (-1, 0, 1, 0) \\
c_3 & = (-1, 0, 0, 1) \\
c_4 & = (0, -1, 1, 0) \\
c_5 & = (0, -1, 0, 1) \\
c_6 & = (0, 0, -1, 1)\,.
\end{align*}
Calculating \@ref(eq:opt-ni), we obtain
```{r opt-pairwise}
C <- matrix(
c(
-1, 1, 0, 0,
-1, 0, 1, 0,
-1, 0, 0, 1,
0, -1, 1, 0,
0, -1, 0, 1,
0, 0, -1, 1),
nrow = 6, byrow = T
)
opt_ni(C, 20)
```
Hence confirming that equal replication of the treatments is optimal for minimising the average variance of estimators of the pairwise treatment differences.
1. If operator 4 is new to the mill, it may be desired to test their output to the average output from the other three operators, using a contrast with coefficients $c = (1/3, 1/3, 1/3, -1)$. The allocation to minimise the variance of the corresponding estimator is given by:
```{r opt-op4}
C <- matrix(
c(1/3, 1/3, 1/3, -1),
nrow = 1
)
opt_ni(C, 20)
```
So the optimal allocation splits 10 units between operators 1-3, and allocates 10 units to operator 4. There is no exact integer rounding possible, so we will use $n_1 = 4$, $n_2=n_3 = 3$, $n_4 = 10$ and calculate the efficiency by comparing the variance of this allocation to that from the equally allocated design.
```{r compare-allocations}
crd_var <- function(C, n) {
CtC <- t(C) %*% C
sum(diag(CtC) / n)
}
n_equal <- rep(5, 4)
n_opt <- c(4, 3, 3, 10)
crd_var(C, n_opt) / crd_var(C, n_equal)
```
So the efficiency of the equally allocated design for estimating this contrast is `r 100 * round(crd_var(C, n_opt) / crd_var(C, n_equal), 4)` %.
### Overall size of the experiment {#crd-size}
We can also consider the complementary question: suppose the proportion of runs that is to be allocated to each treatment has been fixed in advance, what size of experiment should be performed to meet the objectives? That is, given a fixed proportion, $w_i$, of resource to be allocated to the $i$th treatment, so that $n_i = nw_i$ units will be allocated to that treatment, what value of $n$ should be chosen?
One way of thinking about this question is to consider the ratio
\begin{align*}
\frac{|\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}|}{\sqrt{\mbox{Var}(\widehat{\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}})}} & = \frac{|\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}|}{\sqrt{\frac{\sigma^2}{n} \sum_{i=1}^tc_i^2/w_i}} \\
& = \sqrt{n}\frac{|\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}| / \sigma}{\sqrt{\sum_{i=1}^tc_i^2/w_i}}\,,
\end{align*}
which is analogous to the test statistic for $H_0: \boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau} = 0$. For a given value of the signal-to-noise ratio $d = |\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}| / \sigma$, we can choose $n$ to result in a specified value of $T = |\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}| / \sqrt{\mbox{Var}(\widehat{\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}})}$:
$$
n = T^2\frac{\sum_{i=1}^t c_i^2/w_i}{d^2}\,.
$$
Returning to Example \@ref(exm:one-way), assume are testing a single pairwise comparison and that we require $T = 3$, so that the null hypothesis would be comfortably rejected at the 5% level (cf 1.96 for a standard z-test). For equal allocation of the units to each treatment ($w_1 = \cdots = w_4 = 1/4$) and a variety of different values of the signal-to-noise ratio $d$, we obtained the following optimal experiment sizes:
```{r exp-size}
opt_n <- function(cv, prop, snr, target) target ^ 2 * c(t(cv) %*% diag( 1 / prop) %*% cv) / snr ^ 2
cv <- c(-1, 1, 0, 0)
w <- rep(1/4, 4)
snr <- c(0.5, 1, 1.5, 2, 2.5, 3)
cbind('Signal-to-noise' = snr, 'n' = opt_n(cv, w, snr, 3))
```
So, for example, to achieve $T = 3$ with a signal-to-noise ratio of $d=0.5$ requires $n=288$ runs. As would be expected, the number of runs required to achieve this value of $T$ decreases as the signal-to-noise ration increases. For $d=3$, only a very small experiment with $n=8$ runs is needed.
## Exercises
1.
a. For Example \@ref(exm:one-way), calculate the mean response for each operator and show that the treatment differences and results from hypothesis tests using the results in Section \@ref(contrast-crd) are the same as those found in Section \@ref(r-crd) using `pairwise.t.test`, and `emmeans`.
a. Also check the results in Section \@ref(multiple-comp) by (i) adjusting individual p-values (for Bonferroni) and (ii) using the `qtukey` command.
<details>
<summary><b>Solution</b></summary>
As a reminder, the data from the experiment is as follows.
```{r exercise-1-data, echo = F}
knitr::kable(
tidyr::pivot_wider(pulp, names_from = operator, values_from = reflectance)[, -1], col.names = paste("Operator", 1:4),
)
```
The mean response, and variance, from each treatment is given by
```{r exercise-1-means, echo = F}
ybars2 <- pulp |>
dplyr::group_by(operator) |>
dplyr::summarise(n_i = dplyr::n(), mean = mean(reflectance),
variance = var(reflectance))
knitr::kable(ybars2)
s2 <- (1 / (nrow(pulp) - nrow(ybars2))) * sum(ybars2$variance
* (ybars2$n_i - 1))
```
The sample variance, $s^2 = `r round(s2, 3)`$, from \@ref(eq:crd-s2). As $\sum_{i=1}^tc_i^2/n_i = \frac{2}{5}$ for contrast vectors $\boldsymbol{c}$ corresponding to pairwise differences, the standard error of each pairwise difference is given by $\sqrt{\frac{2s^2}{5}} = `r round(sqrt(2 * s2 / 5), 3)`$. Hence, we can create a table of pairwise differences, standard errors and test statistics.
```{r exercise-1-results, echo = F}
res.tab <- transform(pairs.b)
t.pvalue <- 2*(1 - pt(abs(res.tab[, 5]), res.tab[, 4]))
pulp.results <- data.frame(res.tab[, 1:5],
unadjust.p.value = t.pvalue,
Bonferroni = pmin(1, 6 * t.pvalue),
Tukey = 1 - ptukey(abs(res.tab[, 5]) * sqrt(2),
4, res.tab[, 4]))
knitr::kable(pulp.results, digits = 3)
```
Unadjusted p-values are obtained from the t-distribution, as twice the tail probabilities (`2 * (1 - pt(abs(t.ratio), 16))`). For Bonferroni, we simply multiply these p-values by ${t \choose 2} = 6$, and then take the minimum of this value and 1. For the Tukey method, we use `1 - ptukey(abs(t.ratio) * sqrt(2), 4, 16)` (see `?ptukey`).
Alternatively, to test each hypothesis at the 5% level, we can compare each t.ratio to (i) `qt(0.975, 16) = `r round(qt(0.975, 16), 3)`` (unadjusted); (ii) `qt(1 - 0.025/6, 16) = `r round(qt(1 - 0.025/6, 16), 3)`` (Bonferroni); or (iii) `qtukey(0.95, 4, 16) / sqrt(2) = `r round(qtukey(0.95, 4, 16) / sqrt(2), 3)``.
</details>
2. \ [Adapted from @WH2009] The bioactivity of four different drugs $A$, $B$, $C$ and $D$ for treating a particular illness was compared in a study and the following ANOVA table was given for the data:
| Source | Degrees of freedom | Sums of squares | Mean square |
| :----- | :----------------: | :-------------: | :---------: |
| Treatment | 3 | 64.42 | 21.47 |
| Residual | 26 | 62.12 | 2.39 |
| Total | 29 | 126.54 |
i. What considerations should be made when assigning drugs to patients, and why?
i. Use an $F$-test to test at the 0.01 level the null hypothesis that the four drugs have the same bioactivity.
i. The average response from each treatment is as follows: $\bar{y}_{A.}=66.10$ ($n_A=7$ patients), $\bar{y}_{B.}=65.75$ ($n_B=8$), $\bar{y}_{C.} = 62.63$ ($n_C=9$), and $\bar{y}_{D.}=63.85$ ($n_D=6$). Conduct hypothesis tests for all pairwise comparisons using the Bonferroni and Tukey methods for an experiment-wise error rate of 0.05.
i. In fact, $A$ and $B$ are brand-name drugs and $C$ and $D$ are generic drugs. Test the null hypothesis at the 5% level that brand-name and generic drugs have the same bioactivity.
<details>
<summary><b>Solution</b></summary>
i. Each patient should be randomly allocated to one of the drugs. This is to protect against possible bias from lurking variables, e.g. demographic variables or subjective bias from the study administrator (blinding the study can also help to protect against this).
i. Test statistic = (Treatment mean square)/(Residual mean square) = 21.47/2.39 = 8.98. Under $H_0$: no difference in bioactivity between the drugs, the test statistic follows an $F_{3,26}$ distribution, which has a 1\% critical value of `qf(0.99, 3, 26) = `r qf(0.99, 3, 26)``. Hence, we can reject $H_0$.
i. For each difference, the test statistic has the form
$$
\frac{|\bar{y}_{i.}-\bar{y}_{j.}|}{s\sqrt{\frac{1}{n_i}+\frac{1}{n_j}}}\,,
$$
for $i, j = A, B, C, D;\, i\ne j$. The treatment means and repetitions are given in the question (note that not all $n_i$ are equal). From the ANOVA table, we get $s^2 = 62.12/26 = 2.389$. The following table summarises the differences between drugs:
| | $A-B$ | $A-C$ | $A-D$ | $B-C$ | $B-D$ | $C-D$ |
|:- | :---- | :---- | :---- | :---- | :---- | :---- |
| Abs. difference | 0.35 | 3.47 | 2.25 | 3.12 | 1.9 | 1.22 |
| Test statistic | 0.44 | 4.45 | 2.62 | 4.15 | 2.28 | 1.50 |
The Bonferroni critical value is $t_{26, 1-0.05/12} = `r qt(1 - 0.01/12, 26)`$. The Tukey critical value is $q_{4,26, 0.95}/\sqrt{2} = `r qtukey(0.95, 4, 26) / sqrt(2)`$ (available `R` as `qtukey(0.95, 4, 26) / sqrt(2)`). Hence under both methods, bioactivity of drugs $A$ and $C$, and $B$ and $C$, are significantly different.
i. A suitable contrast has $\boldsymbol{c} = (0.5, 0.5, -0.5, -0.5)$, with $\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau} = (\tau_A + \tau_B) / 2 - (\tau_C + \tau_D) / 2$ (the difference in average treatment effects).
An estimate for this contrast is given by $(\bar{y}_{A.} + \bar{y}_{B.}) / 2 - (\bar{y}_{C.} + \bar{y}_{D.}) / 2$, with variance
$$\mbox{Var}\left(\frac{1}{2}(\bar{y}_{A.}+\bar{y}_{B.}) - \frac{1}{2}(\bar{y}_{C.}+\bar{Y}_{D.})\right) = \frac{\sigma^2}{4}\left(\frac{1}{n_A} + \frac{1}{n_B} + \frac{1}{n_C} + \frac{1}{n_D}\right)\,.$$
Hence, a test statistic for $H_0:\, \frac{1}{2}(\tau_A+\tau_B) - \frac{1}{2}(\tau_C+\tau_D)=0$ is given by
$$
\frac{\frac{1}{2}(\bar{y}_{A.}+\bar{y}_{B.}) - \frac{1}{2}(\bar{y}_{C.}+\bar{y}_{D.})}{\sqrt{\frac{s^2}{4}\left(\frac{1}{n_A} + \frac{1}{n_B} + \frac{1}{n_C} + \frac{1}{n_D}\right)}} = \frac{2.685}{\frac{\sqrt{2.389}}{2}\sqrt{\frac{1}{7} + \frac{1}{8} + \frac{1}{9} + \frac{1}{6}}} = 4.70\,.
$$
The critical value is $t_{26, 1-0.05/2} = `r qt(0.975, 26)`$. Hence, we can reject $H_0$ and conclude there is a difference between brand-name and generic drugs.
</details>
3. <a id = "nap-black-ex"></a>The below table gives data from a completely randomised design to compare six different batches of hydrochloric acid on the yield of a dye (naphthalene black 12B).
```{r nap-black}
napblack <- data.frame(batch = rep(factor(1:6), rep(5, 6)),
repetition = rep(1:5, 6),
yield = c(145, 40, 40, 120, 180, 140, 155, 90, 160, 95,
195, 150, 205, 110, 160, 45, 40, 195, 65, 145,
195, 230, 115, 235, 225, 120, 55, 50, 80, 45)
)
knitr::kable(
tidyr::pivot_wider(napblack, names_from = batch, values_from = yield)[, -1],
col.names = paste("Batch", 1:6),
caption = "Naphthalene black experiment: yields (grams of standard colour) from six different batches of hydrochloric acid."
)
```
Conduct a full analysis of this experiment, including
a. exploratory data analysis;
a. fitting a linear model, and conducting an F-test to compare to a model that explains variation using the six batches to the null model;
a. usual linear model diagnostics;
b. multiple comparisons of all pairwise differences between treatments.
<details>
<summary><b>Solution</b></summary>
a. Two of the simplest ways of examining the data are to calculate basic descriptive statistics, e.g. the mean and standard deviation of the yield in each batch, and to plot the data in the different batches using a simple graphical display, e.g. a stripchart of the yields in each batch. Notice that in both \texttt{aggregate} and \texttt{stripchart} we use the formula \texttt{yield $\sim$ batch}. This formula splits the data into groups defined by batch.
```{r napblack-summary, fig.align = 'center', fig.cap = 'Naphthalene black experiment: distributions of dye yields from the six batches.'}
aggregate(yield ~ batch, data = napblack, FUN = function(x) c(mean = mean(x),
st.dev = sd(x)))
boxplot(yield ~ batch, data = napblack)
```
Notice that even within any particular batch, the number of grams of standard dyestuff colour determined by the dye trial varies from observation to observation. This *within-group* variation is considered to be random or residual variation. This cannot be explained by any differences between batches. However, a second source of variation in the overall data set can be explained by variation between the batches, i.e. between the different batch means themselves. We can see from the box plots (Figure \@ref(fig:napblack-summary)) and the mean yields in each batch that observations from batch number five appear to have given higher yields (in grams of standard colour) than those from the other batches.
a. When we fit linear models and compare them using analysis of variance (ANOVA), it enables us to decide whether the differences that seem to be evident in these simple plots and descriptive statistics are statistically significant or whether this kind of variation could have arisen by chance, even though there are no real differences between the batches.
An ANOVA table may be used to compare a linear model including differences between the batches to the null model. The linear model we will fit is a simple unit-treatment model:
\begin{equation}
Y_{ij} = \mu + \tau_i + \varepsilon_{ij} \,,\qquad i=1,\ldots,6;~j=1,\ldots,5\,,
(\#eq:linmod)
\end{equation}
where $Y_{ij}$ is the response obtained from the $j$th repetition of the $i$th batch, $\mu$ is a constant term, $\tau_i$ is the expected effect due to the observation being in the $i$th batch $(i=1,\ldots,5)$ and $\varepsilon_{ij}$ are the random errors.
A test of the hypothesis that the group means are all equal is equivalent to a test that the $\tau_i$ are all equal to 0 $(H_0:\, \tau_1 = \tau_2 = \cdots = \tau_6 = 0)$. We can use `lm` to fit model \@ref(eq:linmod), and `anova` to test the hypothesis. Before we fit the linear model, we need to make sure `batch` has type `factor`^[Factors are variables in `R` which take on a limited number of different values (e.g. categorical variables). We need to define a categorical variable, like `batch` as a `factor` to ensure they are treated correctly by functions such as `lm`.].
```{r linmod}
napblack$batch <- as.factor(napblack$batch)
napblack.lm <- lm(yield ~ batch, data = napblack)
anova(napblack.lm)
```
The p-value of `r round(anova(napblack.lm)$"Pr(>F)"[1], digits = 4)` indicates significant differences between at least two of the batch
means. Therefore $H_0$ is rejected and a suitable multiple comparison test should be carried
out.
a. To perform our analysis, we have fitted a linear model. Therefore, we should use some plots of the residuals $y_{ij} - \hat{y}_{ij}$ to check the model assumptions, particularly that the errors are independently and identically normally distributed. The function `rstandard` which produces residuals which have been standardised to have variance equal to 1.
```{r residuals, fig.show = "hold", fig.align = "center", fig.cap = "Residuals against batch (left) and fitted values (right) for the linear model fit to the naphthalene black data.", out.width = "100%"}
standres <- rstandard(napblack.lm)
fitted <- fitted(napblack.lm)
par(mfrow = c(1, 2), pty = "s")
with(napblack, {
plot(batch, standres, xlab = "Batch", ylab = "Standarised residuals")
plot(fitted, standres, xlab = "Fitted value", ylab = "Standarised residuals")
})
```
The plots (Figure \@ref(fig:residuals)) show no large standardised residuals ($>2$ in absolute value^[We would anticipate 95\% of the standardised residuals to lie in [-1.96, 1.96], as they will follow a standard normal distribution if the model assumptions are correct.]). While there is some evidence of unequal variation across batches, there is no obvious pattern with respect to fitted values (e.g. no "funnelling").
We can also plot the standardised residuals against the quantiles of a standard normal distribution to assess the assumption of normality.
```{r normalplot, fig.align = "center", fig.cap = "Normal probability plot for the standardised residuals for the linear model fit to the naphthalene black data.", out.width = "100%"}
par(pty = "s")
qqnorm(standres, main = "")
```
The points lie quite well on a straight line (see Figure \@ref(fig:normalplot)), suggesting the assumption of normality is valid. Overall, the residual plots look reasonable; some investigation of transformations to correct for non-constant variance could be investigated (see MATH2010/STAT6123).
a. When a significant difference between the treatments has been indicated, the next stage is to try to determine which treatments differ. In some cases a specific difference is of interest, a control versus a new treatment for instance, in which case that difference could now be
inspected. However, usually no specific differences are to be considered a priori, and \textit{any}
difference is of practical importance. A multiple comparison procedure is required to
investigate all possible differences, which takes account of the number of possible differences
available amongst the treatments (15 differences between the six batches here).
We will use Tukey's method for controlling the experiment-wise type I error rate, fixed here at 5%, as implemented by `emmeans`.
```{r nap-black-tukey}
napblack.emm <- emmeans::emmeans(napblack.lm, 'batch')
pairs(napblack.emm)
```
We have two significant differences, between batches 4-5 and 5-6.
```{r nap-black-sig}
subset(transform(pairs(napblack.emm)), p.value < 0.05)
```
</details>
4. [Adapted from @Morris2011] Consider a completely randomised design with $t = 5$ treatments and $n=50$ units. The contrasts
$$
\tau_2 - \tau_1, \quad \tau_3 - \tau_2, \quad \tau_4 - \tau_3, \tau_5 - \tau_4
$$
are of primary interest to the experimenter.
a. Find an allocation of the 50 units to the 5 treatments, i.e. find $n_1, \ldots, n_5$, that minimises the average variance of the corresponding contrast estimators.
a. Fixing the proportions of experimental effort applied to each treatment to those found in part (a), i.e. to $w_i = n_i/50$, find the value of $n$ required to make the ratio $T = |\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}|/\sqrt{\mbox{var}\left(\widehat{\boldsymbol{c}^{\mathrm{T}}\boldsymbol{\tau}}\right)} = 2$ assuming a signal-to-noise ratio of 1.
<details>
<summary><b>Solution</b></summary>
a. We can use the function `opt_ni` given in Section \@ref(crd-opt-all):
```{r opt-alloc}
n <- 50
C <- matrix(
c(
-1, 1, 0, 0, 0,
0, -1, 1, 0, 0,
0, 0, -1, 1, 0,
0, 0, 0, -1, 1
), nrow = 4, byrow = T
)
opt_ni(C, n)
```
Rounding, we obtain a solution of the form $n_1 = n_5 =8$, $n_2 = n_4 = 11$ and $n_3 = 12$. Any of $n_2, n_3, n_4$ may be rounded up to 12 to form a design with the same variance.
```{r opt-round, results = "hold"}
nv <- c(8, 11, 11, 11, 8)
crd_var(C, nv + c(0, 1, 0, 0, 0))
crd_var(C, nv + c(0, 0, 1, 0, 0))
crd_var(C, nv + c(0, 0, 0, 1, 0))
```
a. The optimal ratios for each treatment from part (a) are $w_1 = w_5 = `r opt_ni(C, n)[1]/n`$ and $w_2 = w_3 = w_4 = `r opt_ni(C, n)[2]/n`$. Fixing these, we can use code from Section \@ref(crd-size) to find the required value of $n$ for each contrast.
```{r crd-opt-n}
nv <- NULL
for(i in 1:4) nv[i] <- opt_n(C[i, ], opt_ni(C, n) / n, 1, 2) # snr = 1, target = 2
nv
```
Hence, we need $n = `r ceiling(nv[1])`$ for to achieve $T = 2$ for the first and last contrasts, and $n = `r ceiling(nv[2])`$ for the second and third. The differences are due to the different proportions $w_i$ assumed for each treatment. To achieve $T=2$ for all contrasts, we pick the larger number, $n = `r ceiling(nv[1])`$.
</details>