8-arima.Rmd

---
title: "ETC3550: Applied forecasting for business and economics"
author: "Ch8. ARIMA models"
date: "OTexts.org/fpp2/"
fontsize: 14pt
output:
  beamer_presentation:
    fig_width: 7
    fig_height: 3.5
    highlight: tango
    theme: metropolis
    includes:
      in_header: header.tex
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, cache=TRUE, warning=FALSE, message=FALSE)
library(fpp2)
```

# Stationarity and differencing

## Stationarity

\begin{block}{Definition}
If $\{y_t\}$ is a stationary time series, then for all $s$, the distribution of $(y_t,\dots,y_{t+s})$ does not depend on $t$.
\end{block}\pause

A \textbf{stationary series} is:

*  roughly horizontal
*  constant variance
*  no patterns predictable in the long-term

## Stationary?

```{r}
autoplot(dj) + ylab("Dow Jones Index") + xlab("Day")
```

## Stationary?

```{r}
autoplot(diff(dj)) + ylab("Change in Dow Jones Index") + xlab("Day")
```

## Stationary?

```{r}
autoplot(strikes) + ylab("Number of strikes") + xlab("Year")
```

## Stationary?

```{r}
autoplot(hsales) + xlab("Year") + ylab("Total sales") +
  ggtitle("Sales of new one-family houses, USA")
```

## Stationary?
```{r}
autoplot(eggs) + xlab("Year") + ylab("$") +
  ggtitle("Price of a dozen eggs in 1993 dollars")
```

## Stationary?
```{r}
autoplot(window(pigs/1e3, start=1990)) + xlab("Year") + ylab("thousands") +
  ggtitle("Number of pigs slaughtered in Victoria")
```

## Stationary?

```{r}
autoplot(lynx) + xlab("Year") + ylab("Number trapped") +
  ggtitle("Annual Canadian Lynx Trappings")
```

## Stationary?

```{r}
autoplot(window(ausbeer, start=1992)) + xlab("Year") + ylab("megalitres") +
  ggtitle("Australian quarterly beer production")
```

## Stationarity

\begin{block}{Definition}
If $\{y_t\}$ is a stationary time series, then for all $s$, the distribution of $(y_t,\dots,y_{t+s})$ does not depend on $t$.
\end{block}\pause\vspace*{0.4cm}

Transformations help to \textbf{stabilize the variance}.

For ARIMA modelling, we also need to \textbf{stabilize the mean}.

## Non-stationarity in the mean
\structure{Identifying non-stationary series}

* time plot.

* The ACF of stationary data drops to zero relatively quickly
* The ACF of non-stationary data decreases slowly.
* For non-stationary data, the value of $r_1$ is often
     large and positive.

## Example: Dow-Jones index

```{r}
autoplot(dj) + ylab("Dow Jones Index") + xlab("Day")
```

## Example: Dow-Jones index

```{r}
ggAcf(dj)
```

## Example: Dow-Jones index

```{r}
autoplot(diff(dj)) + ylab("Change in Dow Jones Index") + xlab("Day")
```

## Example: Dow-Jones index

```{r}
ggAcf(diff(dj))
```

## Differencing

* Differencing helps to \textbf{stabilize the mean}.
* The differenced series is the \emph{change} between each observation
in the original series: ${y'_t = y_t - y_{t-1}}$.
* The differenced series will have only $T-1$ values since it is not possible to calculate a difference $y_1'$ for the first observation.

## Second-order  differencing

Occasionally the differenced data will not appear stationary and it
may be necessary to difference the data a second time:\pause
\begin{align*}
y''_{t}  &=  y'_{t}  - y'_{t - 1} \\
&= (y_t - y_{t-1}) - (y_{t-1}-y_{t-2})\\
&= y_t - 2y_{t-1} +y_{t-2}.
\end{align*}\pause

* $y_t''$ will have  $T-2$  values.
* In practice,  it is almost never necessary to go beyond second-order
differences.

## Seasonal differencing

A seasonal difference is the difference between an observation and the corresponding observation from the previous year.\pause
$${y'_t = y_t - y_{t-m}}$$
where $m=$ number of seasons.\pause

* For monthly data $m=12$.
* For quarterly data $m=4$.

## Electricity production
```{r, echo=TRUE, fig.height=4}
usmelec %>% autoplot()
```

## Electricity production
```{r, echo=TRUE, fig.height=4}
usmelec %>% log() %>% autoplot()
```

## Electricity production

```{r, echo=TRUE, fig.height=3.5}
usmelec %>% log() %>% diff(lag=12) %>%
  autoplot()
```

## Electricity production
```{r, echo=TRUE, fig.height=3.5}
usmelec %>% log() %>% diff(lag=12) %>%
  diff(lag=1) %>% autoplot()
```

## Electricity production

* Seasonally differenced series is closer to being stationary.
* Remaining non-stationarity  can be removed with further first difference.

If $y'_t = y_t - y_{t-12}$ denotes seasonally differenced series, then twice-differenced series i

\begin{block}{}
\begin{align*}
y^*_t &= y'_t - y'_{t-1} \\
      &= (y_t - y_{t-12}) - (y_{t-1} - y_{t-13}) \\
      &= y_t - y_{t-1} - y_{t-12} + y_{t-13}\: .
\end{align*}
\end{block}\vspace*{10cm}

## Seasonal differencing

When both seasonal and first differences are applied\dots\pause

* it makes no difference
which is done first---the result will be the same.
* If seasonality is strong, we recommend that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for further first difference.
\pause

It is important that if differencing is used, the differences are
interpretable.

## Interpretation of differencing

* first differences are the change between \textbf{one observation and the
next};
* seasonal differences are the change between \textbf{one year to the
next}.
\pause

But taking lag 3 differences for yearly data, for example, results in a model which cannot be sensibly interpreted.

## Unit root tests

\structure{Statistical tests to determine the required order of differencing.}

  1. Augmented Dickey Fuller test: null hypothesis is that the data are non-stationary and non-seasonal.
  2. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: null  hypothesis is that the data are stationary and non-seasonal.
  3. Other tests available for seasonal data.

## KPSS test
\fontsize{10}{11}\sf

```{r, echo=TRUE}
library(urca)
summary(ur.kpss(goog))
```

\pause


```{r, echo=TRUE}
ndiffs(goog)
```


## Automatically selecting differences

STL decomposition: $y_t = T_t+S_t+R_t$

Seasonal strength $F_s = \max\big(0, 1-\frac{\text{Var}(R_t)}{\text{Var}(S_t+R_t)}\big)$

If $F_s > 0.64$, do one seasonal difference.

\pause\fontsize{12}{15}\sf\vspace*{1cm}

```{r, echo=TRUE}
usmelec %>% log() %>% nsdiffs()
usmelec %>% log() %>% diff(lag=12) %>% ndiffs()
```


## Your turn

For the `visitors` series, find an appropriate differencing (after transformation if necessary) to obtain stationary data.


## Backshift notation

A  very useful notational device is the backward  shift operator,  $B$,  which is used as follows:
$$
{B y_{t}  =  y_{t - 1}} \: .
$$\pause
In  other  words,   $B$,  operating on  $y_{t}$,   has  the
effect   of   \textbf{shifting  the  data  back  one   period}. \pause
Two applications of  $B$  to  $y_{t}$ \textbf{shifts the data  back  two
periods}:
$$
B(By_{t})  =  B^{2}y_{t}  =  y_{t-2}\: .
$$\pause
For  monthly  data, if we wish to shift attention  to  ``the
same  month last year,''  then  $B^{12}$
is used,  and  the
notation is  $B^{12}y_{t}$  =  $y_{t-12}$.

## Backshift notation

The   backward   shift  operator  is  convenient   for describing  the
process  of  \textit{differencing}. \pause
A first difference can be written as
$$
y'_{t}  =  y_{t} -   y_{t-1} = y_t - By_{t}  =  (1 - B)y_{t}\: .
$$\pause
Note  that a first difference is represented by  $(1 -  B)$.
\pause

Similarly,   if second-order differences (i.e.,   first
differences  of  first differences) have  to  be  computed,
then:
\[
y''_{t}  =  y_{t} -   2y_{t - 1}  +  y_{t - 2} = (1 - B)^{2} y_{t}\: .
\]

## Backshift notation

* Second-order difference  is  denoted   $(1- B)^{2}$.

* \textit{Second-order  difference} is not  the  same  as  a \textit{second  difference},  which would  be  denoted $1- B^{2}$;

* In general,  a  $d$th-order difference can be written as
$$(1 - B)^{d} y_{t}.$$

* A seasonal difference followed by a first difference can be written as
$$ (1-B)(1-B^m)y_t\: .$$

## Backshift notation

The ``backshift'' notation is convenient because the terms can be multiplied
together to see the combined effect.
\begin{align*}
(1-B)(1-B^m)y_t &= (1 - B - B^m + B^{m+1})y_t \\
&= y_t-y_{t-1}-y_{t-m}+y_{t-m-1}.
\end{align*}\pause
For monthly data, $m=12$ and we obtain the same result as earlier.

# Non-seasonal ARIMA models

## Autoregressive models

\begin{block}{Autoregressive (AR) models:}
$$
  y_{t}  =  c  +  \phi_{1}y_{t - 1}  +  \phi_{2}y_{t - 2}  +  \cdots  +  \phi_{p}y_{t - p}  + \varepsilon_{t},
$$
where $\varepsilon_t$ is white noise.  This is a multiple regression with \textbf{lagged values} of $y_t$ as predictors.
\end{block}

```{r arp, echo=FALSE, fig.height=3}
set.seed(1)
p1 <- autoplot(10 + arima.sim(list(ar = -0.8), n = 100)) +
  ylab("") + ggtitle("AR(1)")
p2 <- autoplot(20 + arima.sim(list(ar = c(1.3, -0.7)), n = 100)) +
  ylab("") + ggtitle("AR(2)")
gridExtra::grid.arrange(p1,p2,nrow=1)
```

## AR(1) model

\begin{block}{}
\centerline{$y_{t}    =   2 -0.8 y_{t - 1}  +  \varepsilon_{t}$}
\end{block}
\rightline{$\varepsilon_t\sim N(0,1)$,\quad $T=100$.}

```{r, echo=FALSE, out.width="50%", fig.height=2.2, fig.width=2.2}
p1
```

## AR(1) model

\begin{block}{}
\centerline{$y_{t}    =   c + \phi_1 y_{t - 1}  +  \varepsilon_{t}$}
\end{block}


* When $\phi_1=0$, $y_t$ is **equivalent to WN**
* When $\phi_1=1$ and $c=0$, $y_t$ is **equivalent to a RW**
* When $\phi_1=1$ and $c\ne0$, $y_t$ is **equivalent to a RW with drift**
* When $\phi_1<0$, $y_t$ tends to **oscillate between positive and negative values**.

## AR(2) model

\begin{block}{}
\centerline{$y_t = 8 + 1.3y_{t-1} - 0.7 y_{t-2} + \varepsilon_t$}
\end{block}
\rightline{$\varepsilon_t\sim N(0,1)$, \qquad $T=100$.}

```{r, fig.height=2.2, fig.width=2.2, out.width="50%"}
p2
```


## Stationarity conditions


We normally restrict autoregressive models to stationary data, and then some constraints on the values of the parameters are required.

\begin{block}{General condition for stationarity}
Complex roots of $1-\phi_1 z - \phi_2 z^2 - \dots - \phi_pz^p$ lie outside the unit circle on the complex plane.
\end{block}\pause

* For $p=1$:  $-1<\phi_1<1$.
* For $p=2$:\newline $-1<\phi_2<1\qquad \phi_2+\phi_1 < 1 \qquad \phi_2 -\phi_1 < 1$.
* More complicated conditions hold for $p\ge3$.
* Estimation software takes care of this.

## Moving Average (MA) models

\begin{block}{Moving Average (MA) models:}
$$
  y_{t}  =  c +  \varepsilon_t + \theta_{1}\varepsilon_{t - 1}  +  \theta_{2}\varepsilon_{t - 2}  +  \cdots  + \theta_{q}\varepsilon_{t - q},
$$
where $\varepsilon_t$ is white noise.
This is a multiple regression with  \textbf{past \emph{errors}}
as predictors. \emph{Don't confuse this with moving average smoothing!}
\end{block}

```{r maq, fig.height=2.5}
set.seed(2)
p1 <- autoplot(20 + arima.sim(list(ma = 0.8), n = 100)) +
  ylab("") + ggtitle("MA(1)")
p2 <- autoplot(arima.sim(list(ma = c(-1, +0.8)), n = 100)) +
  ylab("") + ggtitle("MA(2)")
gridExtra::grid.arrange(p1,p2,nrow=1)
```

## MA(1) model

\begin{block}{}
\centerline{$y_t = 20 + \varepsilon_t + 0.8 \varepsilon_{t-1}$}
\end{block}
\rightline{$\varepsilon_t\sim N(0,1)$,\quad $T=100$.}

```{r, fig.height=2.2, fig.width=2.2, out.width="50%"}
p1
```

## MA(2) model

\begin{block}{}
\centerline{$y_t = \varepsilon_t -\varepsilon_{t-1} + 0.8 \varepsilon_{t-2}$}
\end{block}
\rightline{$\varepsilon_t\sim N(0,1)$,\quad $T=100$.}

```{r, fig.height=2.2, fig.width=2.2, out.width="50%"}
p2
```

## MA($\infty$) models

It is possible to write any stationary AR($p$) process as an MA($\infty$) process.

**Example: AR(1)**
\begin{align*}
y_t &= \phi_1y_{t-1} + \varepsilon_t\\
&= \phi_1(\phi_1y_{t-2} + \varepsilon_{t-1}) + \varepsilon_t\\
&= \phi_1^2y_{t-2} + \phi_1 \varepsilon_{t-1} + \varepsilon_t\\
&= \phi_1^3y_{t-3} + \phi_1^2\varepsilon_{t-2} + \phi_1 \varepsilon_{t-1} + \varepsilon_t\\
&\dots
\end{align*}\pause
Provided $-1 < \phi_1 < 1$:
\[ y_t = \varepsilon_t + \phi_1 \varepsilon_{t-1} + \phi_1^2 \varepsilon_{t-2} + \phi_1^3 \varepsilon_{t-3} + \cdots
\]


## Invertibility

* Any MA($q$) process can be written as an AR($\infty$) process if we impose some constraints on the MA parameters.
* Then the MA model is called "invertible".
* Invertible models have some mathematical properties that make them easier to use in practice.
* Invertibility of an ARIMA model is equivalent to forecastability of an ETS model.

## Invertibility

\begin{block}{General condition for invertibility}
Complex roots of $1+\theta_1 z + \theta_2 z^2 + \dots + \theta_qz^q$ lie outside the unit circle on the complex plane.
\end{block}\pause

* For $q=1$:  $-1<\theta_1<1$.
* For $q=2$:\newline $-1<\theta_2<1\qquad \theta_2+\theta_1 >-1 \qquad \theta_1 -\theta_2 < 1$.
* More complicated conditions hold for $q\ge3$.
* Estimation software takes care of this.


## ARIMA models

\begin{block}{Autoregressive Moving Average models:}
\begin{align*}
y_{t}  &=  c  +  \phi_{1}y_{t - 1}  +  \cdots  +  \phi_{p}y_{t - p} \\
& \hspace*{2.4cm}\text{} + \theta_{1}\varepsilon_{t - 1} +  \cdots  + \theta_{q}\varepsilon_{t - q} +  \varepsilon_{t}.
\end{align*}
\end{block}\pause

* Predictors include both **lagged values of $y_t$ and lagged errors.**
* Conditions on coefficients ensure stationarity.
* Conditions on coefficients ensure invertibility.
\pause

### Autoregressive Integrated Moving Average models
* Combine ARMA model with **differencing**.
* $(1-B)^d y_t$ follows an ARMA  model.

## ARIMA models

\structure{Autoregressive Integrated Moving Average models}
\begin{block}{ARIMA($p, d, q$) model}
\begin{tabular}{rl}
AR:& $p =$  order of the autoregressive part\\
I: & $d =$  degree of first differencing involved\\
MA:& $q =$  order of the moving average part.
\end{tabular}
\end{block}

* White noise model:  ARIMA(0,0,0)
* Random walk:  ARIMA(0,1,0) with no constant
* Random walk with drift:  ARIMA(0,1,0) with \rlap{const.}
* AR($p$): ARIMA($p$,0,0)
* MA($q$): ARIMA(0,0,$q$)

## Backshift notation for ARIMA

* ARMA model:\vspace*{-1cm}\newline
\parbox{12cm}{\small\begin{align*}
\hspace*{-1cm} 
y_{t}  &=  c + \phi_{1}By_{t} + \cdots + \phi_pB^py_{t}
           +  \varepsilon_{t}  +  \theta_{1}B\varepsilon_{t} + \cdots + \theta_qB^q\varepsilon_{t} \\
\hspace*{-1cm} 
\text{or}\quad & (1-\phi_1B - \cdots - \phi_p B^p) y_t = c + (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t
\end{align*}}

* ARIMA(1,1,1) model:

\[
\begin{array}{c c c c}
(1 - \phi_{1} B) & (1  -  B) y_{t} &= &c + (1  + \theta_{1} B) \varepsilon_{t}\\
{\uparrow}  & {\uparrow}    &   &{\uparrow}\\
{\text{AR(1)}} & {\text{First}}   &     &{\text{MA(1)}}\\
& {\hbox to 0cm{\hss\text{difference}\hss}}\\
\end{array}
\]\pause
Written out:
$$y_t =   c + y_{t-1} + \phi_1 y_{t-1}- \phi_1 y_{t-2} + \theta_1\varepsilon_{t-1} + \varepsilon_t $$

## R model

\fontsize{13}{16}\sf

\begin{block}{Intercept form}
\centerline{$(1-\phi_1B - \cdots - \phi_p B^p) y_t' = c + (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t$}
\end{block}

\begin{block}{Mean form}
\centerline{$(1-\phi_1B - \cdots - \phi_p B^p)(y_t' - \mu) = (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t$}
\end{block}

 * $y_t' = (1-B)^d y_t$ 
 * $\mu$ is the mean of $y_t'$. 
 * $c = \mu(1-\phi_1 - \cdots - \phi_p )$.

## US personal consumption

```{r}
autoplot(uschange[,"Consumption"]) +
  xlab("Year") +
  ylab("Quarterly percentage change") +
  ggtitle("US consumption")
```

## US personal consumption
\fontsize{10}{11}\sf

```{r, echo=TRUE}
(fit <- auto.arima(uschange[,"Consumption"]))
```

\pause\vfill


```
```{r usconsumptioncoefs, echo=FALSE}
coef <- coefficients(fit)
intercept <- coef['intercept'] * (1-coef['ar1'] - coef['ar2'])
```

```{r, include=FALSE}
if(!identical(arimaorder(fit),c(p=2L,d=0L,q=2L)))
  stop("Different model from expected")
```

### ARIMA(2,0,2) model:
\centerline{$
  y_t = c + `r format(coef['ar1'], nsmall=3, digits=3)`y_{t-1}
          `r format(coef['ar2'], nsmall=3, digits=3)` y_{t-2}
          `r format(coef['ma1'], nsmall=3, digits=3)` \varepsilon_{t-1}
          + `r format(coef['ma2'], nsmall=3, digits=3)` \varepsilon_{t-2}
          + \varepsilon_{t},
$}
where $c= `r format(coef['intercept'], nsmall=3, digits=3)` \times (1 - `r format(coef['ar1'], nsmall=3, digits=3)` + `r format(-coef['ar2'], nsmall=3, digits=3)`) = `r format(intercept, nsmall=3, digits=3)`$
and $\varepsilon_t$ is white noise with a standard deviation of $`r format(sqrt(fit$sigma2), nsmall=3, digits=3)` = \sqrt{`r format(fit$sigma2, nsmall=3, digits=3)`}$. 


## US personal consumption
\fontsize{12}{15}\sf

```{r, echo=TRUE, fig.height=4}
fit %>% forecast(h=10) %>% autoplot(include=80)
```


## Understanding ARIMA models
\fontsize{14}{16}\sf

* If $c=0$ and $d=0$, the long-term forecasts will go to zero.
* If $c=0$ and $d=1$, the long-term forecasts will go to a non-zero constant.
* If $c=0$ and $d=2$, the long-term forecasts will follow a straight line.

* If $c\ne0$ and $d=0$, the long-term forecasts will go to the mean of the data.
* If $c\ne0$ and $d=1$, the long-term forecasts will follow a straight line.
* If $c\ne0$ and $d=2$, the long-term forecasts will follow a quadratic trend.


## Understanding ARIMA models
\fontsize{14}{15.5}\sf

### Forecast variance and $d$
  * The higher the value of $d$, the more rapidly the prediction intervals increase in size.
  * For $d=0$, the long-term forecast standard deviation will go to the standard deviation of the historical data.

### Cyclic behaviour
  * For cyclic forecasts,  $p\ge2$ and some restrictions on coefficients are required.
  * If $p=2$, we need $\phi_1^2+4\phi_2<0$. Then average cycle of length
\[
  (2\pi)/\left[\text{arc cos}(-\phi_1(1-\phi_2)/(4\phi_2))\right].
\]


# Estimation and order selection

## Maximum likelihood estimation

Having identified the model order, we need to estimate the
parameters $c$, $\phi_1,\dots,\phi_p$,
$\theta_1,\dots,\theta_q$.\pause


* MLE is very similar to least squares estimation obtained by minimizing
$$\sum_{t-1}^T e_t^2.$$
* The `Arima()` command allows CLS or MLE estimation.
* Non-linear optimization must be used in either case.
* Different software will give different estimates.

## Partial autocorrelations

\structure{Partial autocorrelations} measure relationship\newline
between $y_{t}$  and  $y_{t - k}$, when
the effects of other time lags --- $1,
2, 3, \dots, k - 1$ --- are removed.\pause
\begin{block}{}
\begin{align*}
\alpha_k&= \text{$k$th partial autocorrelation coefficient}\\
&= \text{equal to the estimate of $b_k$ in regression:}\\
& \hspace*{0.8cm} y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_k y_{t-k}.
\end{align*}
\end{block}\pause

* Varying number of terms on RHS gives $\alpha_k$ for different values of $k$.
* There are more efficient ways of calculating $\alpha_k$.
* $\alpha_1=\rho_1$
* same critical values of $\pm 1.96/\sqrt{T}$ as for ACF.


## Example: US consumption

```{r}
uschange[,"Consumption"] %>% diff() %>%
  autoplot() +
    xlab("Year") +
    ylab("Quarterly percentage change") +
    ggtitle("US consumption")
```

## Example: US consumption


```{r usconsumptionacf}
p1 <- ggAcf(uschange[,"Consumption"],main="")
p2 <- ggPacf(uschange[,"Consumption"],main="")
gridExtra::grid.arrange(p1,p2,nrow=1)
```

## ACF and PACF interpretation

**AR(1)**
\begin{align*}
\hspace*{1cm}\rho_k &= \phi_1^k\qquad\text{for $k=1,2,\dots$};\\
\alpha_1 &= \phi_1 \qquad\alpha_k = 0\qquad\text{for $k=2,3,\dots$}.
\end{align*}

So we have an AR(1) model when

  * autocorrelations exponentially decay
  * there is a single significant partial autocorrelation.


## ACF and PACF interpretation


**AR($p$)**

  * ACF dies out in an exponential or damped sine-wave manner
  * PACF has all zero spikes beyond the $p$th spike

So we have an AR($p$) model when

  * the ACF is  exponentially decaying or sinusoidal
  * there is a significant spike at lag $p$ in PACF, but none beyond $p$

## ACF and PACF interpretation

**MA(1)**
\begin{align*}
\hspace*{1cm}\rho_1 &= \theta_1\qquad \rho_k = 0\qquad\text{for $k=2,3,\dots$};\\
\alpha_k &= -(-\theta_1)^k
\end{align*}

So we have an MA(1) model when

 * the PACF is  exponentially decaying and
 * there is a single significant spike in ACF

## ACF and PACF interpretation

**MA($q$)**

 * PACF dies out in an exponential or damped sine-wave manner
 * ACF has all zero spikes beyond the $q$th spike

So we have an MA($q$) model when

  * the PACF is  exponentially decaying or sinusoidal
  * there is a significant spike at lag $q$ in ACF, but none beyond $q$

## Example: Mink trapping

```{r}
autoplot(mink) +
  xlab("Year") +
  ylab("Minks trapped (thousands)") +
  ggtitle("Annual number of minks trapped")
```

## Example: Mink trapping

```{r}
p1 <- ggAcf(mink,main="")
p2 <- ggPacf(mink,main="")
gridExtra::grid.arrange(p1,p2,nrow=1)
```


## Information criteria

\structure{Akaike's Information Criterion (AIC):}
\centerline{$\text{AIC} = -2 \log(L) + 2(p+q+k+1),$}
where $L$ is the likelihood of the data,\newline
$k=1$ if $c\ne0$ and $k=0$ if $c=0$.\pause\vspace*{0.2cm}

\structure{Corrected AIC:}
\centerline{$\text{AICc} = \text{AIC} + \frac{2(p+q+k+1)(p+q+k+2)}{T-p-q-k-2}.$}\pause\vspace*{0.2cm}

\structure{Bayesian Information Criterion:}
\centerline{$\text{BIC} = \text{AIC} + [\log(T)-2](p+q+k-1).$}
\pause\vspace*{-0.2cm}
\begin{block}{}Good models are obtained by minimizing either the AIC, \text{AICc}\ or BIC\@. Our preference is to use the \text{AICc}.\end{block}


# ARIMA modelling in R

## How does auto.arima() work?

\begin{block}{A non-seasonal ARIMA process}
\[
\phi(B)(1-B)^dy_{t} = c + \theta(B)\varepsilon_t
\]
Need to select appropriate orders: \alert{$p,q, d$}
\end{block}

\structure{Hyndman and Khandakar (JSS, 2008) algorithm:}

  * Select no.\ differences \alert{$d$} and \alert{$D$} via KPSS test and seasonal strength measure.
  * Select \alert{$p,q$} by minimising AICc.
  * Use stepwise search to traverse model space.

## How does auto.arima() work?
\fontsize{12}{13}\sf

\begin{block}{}
\centerline{$\text{AICc} = -2 \log(L) + 2(p+q+k+1)\left[1 +
\frac{(p+q+k+2)}{T-p-q-k-2}\right].$}
where $L$ is the maximised likelihood fitted to the \textit{differenced} data,
$k=1$ if $c\neq 0$ and $k=0$ otherwise.
\end{block}\pause

Step1:
:  Select current model (with smallest AICc) from:\newline
ARIMA$(2,d,2)$\newline
ARIMA$(0,d,0)$\newline
ARIMA$(1,d,0)$\newline
ARIMA$(0,d,1)$
\pause\vspace*{-0.1cm}

Step 2:
:  Consider variations of current model:

    * vary one of $p,q,$ from current model by $\pm1$;
    * $p,q$ both vary from current model by $\pm1$;
    * Include/exclude $c$ from current model.

  Model with lowest AICc becomes current model.

\structure{Repeat Step 2 until no lower AICc can be found.}


## Choosing your own model

```{r, echo=TRUE, fig.height=4}
ggtsdisplay(internet)
```

## Choosing your own model

```{r, echo=TRUE, fig.height=4}
ggtsdisplay(diff(internet))
```

## Choosing your own model
\fontsize{13}{14}\sf

```{r, echo=TRUE, fig.height=4}
(fit <- Arima(internet,order=c(3,1,0)))
```

## Choosing your own model
\fontsize{13}{14}\sf

```{r, echo=TRUE, fig.height=4}
auto.arima(internet)
```

## Choosing your own model
\fontsize{13}{14}\sf

```{r internettryharder, echo=TRUE, fig.height=4}
auto.arima(internet, stepwise=FALSE,
  approximation=FALSE)
```


## Choosing your own model

```r
checkresiduals(fit)
```

```{r, echo=FALSE, fig.height=4}
checkresiduals(fit, test=FALSE)
```


## Choosing your own model

```{r, echo=FALSE}
checkresiduals(fit, plot=FALSE)
```


## Choosing your own model

```{r, echo=TRUE, fig.height=4}
fit %>% forecast %>% autoplot
```


## Modelling procedure with `Arima`
\fontsize{12}{13}\sf

1. Plot the data. Identify any unusual observations.
2. If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
3. If the data are non-stationary: take first differences of the data until the data are stationary.
4. Examine the ACF/PACF: Is an AR($p$) or MA($q$) model appropriate?
5. Try your chosen model(s), and  use the \text{AICc} to search for a better model.
6. Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
7. Once the residuals look like white noise, calculate forecasts.


## Modelling procedure with `auto.arima`
\fontsize{12}{13}\sf

1. Plot the data. Identify any unusual observations.
2. If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.

\vspace*{1.15cm}

3. Use `auto.arima` to select a model.

\vspace*{1.15cm}

6. Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
7. Once the residuals look like white noise, calculate forecasts.

## Modelling procedure

\centerline{\includegraphics[height=8.cm]{Figure-8-10}}


## \large Seasonally adjusted electrical equipment
\fontsize{11.5}{15}\sf

```{r ee1, fig.height=3.3, echo=TRUE}
eeadj <- seasadj(stl(elecequip, s.window="periodic"))
autoplot(eeadj) + xlab("Year") +
  ylab("Seasonally adjusted new orders index")
```


## \large Seasonally adjusted electrical equipment

1. Time plot shows sudden changes, particularly big drop in 2008/2009 due to global economic environment. Otherwise nothing unusual and no need for  data adjustments.
2. No evidence of changing variance, so no Box-Cox transformation.
3. Data are clearly non-stationary, so we take first differences.


## \large Seasonally adjusted electrical equipment

```{r ee2, echo=TRUE, fig.height=4}
ggtsdisplay(diff(eeadj))
```

## \large Seasonally adjusted electrical equipment

4. PACF is suggestive of AR(3). So initial candidate model is ARIMA(3,1,0). No other obvious candidates.
5. Fit ARIMA(3,1,0) model along with variations: ARIMA(4,1,0), ARIMA(2,1,0), ARIMA(3,1,1), etc. ARIMA(3,1,1) has smallest \text{AICc} value.

## \large Seasonally adjusted electrical equipment
\fontsize{10}{10}\sf

```{r, echo=TRUE}
(fit <- Arima(eeadj, order=c(3,1,1)))
```

## \large Seasonally adjusted electrical equipment

6. ACF plot of residuals from ARIMA(3,1,1) model look like white noise.

\fontsize{11}{11}\sf

```r
checkresiduals(fit)
```

```{r, echo=FALSE, fig.height=3.4}
checkresiduals(fit)
```

## \large Seasonally adjusted electrical equipment

```{r, echo=FALSE}
checkresiduals(fit, plot=FALSE)
```

## \large Seasonally adjusted electrical equipment

```{r, echo=TRUE}
fit %>% forecast %>% autoplot
```

# Forecasting

## Point forecasts

1. Rearrange ARIMA equation so $y_t$ is on LHS.
2. Rewrite equation by replacing $t$ by $T+h$.
3. On RHS, replace future observations by their forecasts, future errors by zero, and past errors by corresponding residuals.

Start with $h=1$. Repeat for $h=2,3,\dots$.


## Point forecasts
\fontsize{14}{14}\sf

\structure{ARIMA(3,1,1) forecasts: Step 1}
\begin{block}{}
\centerline{$(1-\phi_1B -\phi_2B^2-\phi_3B^3)(1-B) y_t = (1+\theta_1B)\varepsilon_{t},$}
\end{block}
\pause\vspace*{-0.4cm}
\begin{align*}
\left[1-(1+\phi_1)B +(\phi_1-\phi_2)B^2 + (\phi_2-\phi_3)B^3 +\phi_3B^4\right] y_t\\ = (1+\theta_1B)\varepsilon_{t},
\end{align*}\pause\vspace*{-0.4cm}
\begin{align*}
y_t - (1+\phi_1)y_{t-1} +(\phi_1-\phi_2)y_{t-2} + (\phi_2-\phi_3)y_{t-3}\\ \mbox{}+\phi_3y_{t-4} = \varepsilon_t+\theta_1\varepsilon_{t-1}.
\end{align*}\pause\vspace*{-0.4cm}
\begin{align*}
y_t = (1+\phi_1)y_{t-1} -(\phi_1-\phi_2)y_{t-2} - (\phi_2-\phi_3)y_{t-3}\\\mbox{} -\phi_3y_{t-4} + \varepsilon_t+\theta_1\varepsilon_{t-1}.
\end{align*}


## Point forecasts (h=1)
\fontsize{14}{14}\sf

\begin{block}{}
\begin{align*}
y_t = (1+\phi_1)y_{t-1} -(\phi_1-\phi_2)y_{t-2} - (\phi_2-\phi_3)y_{t-3}\\\mbox{} -\phi_3y_{t-4} + \varepsilon_t+\theta_1\varepsilon_{t-1}.
\end{align*}
\end{block}\pause
\structure{ARIMA(3,1,1) forecasts: Step 2}
\begin{align*}
y_{T+1} = (1+\phi_1)y_{T} -(\phi_1-\phi_2)y_{T-1} - (\phi_2-\phi_3)y_{T-2}\\\mbox{} -\phi_3y_{T-3} + \varepsilon_{T+1}+\theta_1\varepsilon_{T}.
\end{align*}\pause
\structure{ARIMA(3,1,1) forecasts: Step 3}
\begin{align*}
\hat{y}_{T+1|T} = (1+\phi_1)y_{T} -(\phi_1-\phi_2)y_{T-1} - (\phi_2-\phi_3)y_{T-2}\\\mbox{} -\phi_3y_{T-3} + \theta_1 e_{T}.
\end{align*}

## Point forecasts (h=2)
\fontsize{14}{14}\sf

\begin{block}{}
\begin{align*}
y_t = (1+\phi_1)y_{t-1} -(\phi_1-\phi_2)y_{t-2} - (\phi_2-\phi_3)y_{t-3}\\\mbox{} -\phi_3y_{t-4} + \varepsilon_t+\theta_1\varepsilon_{t-1}.
\end{align*}
\end{block}\pause
\structure{ARIMA(3,1,1) forecasts: Step 2}
\begin{align*}
y_{T+2} = (1+\phi_1)y_{T+1} -(\phi_1-\phi_2)y_{T} - (\phi_2-\phi_3)y_{T-1}\\\mbox{} -\phi_3y_{T-2} + \varepsilon_{T+2}+\theta_1\varepsilon_{T+1}.
\end{align*}\pause
\structure{ARIMA(3,1,1) forecasts: Step 3}
\begin{align*}
\hat{y}_{T+2|T} = (1+\phi_1)\hat{y}_{T+1|T} -(\phi_1-\phi_2)y_{T} - (\phi_2-\phi_3)y_{T-1}\\\mbox{} -\phi_3y_{T-2}.
\end{align*}


## Prediction intervals

\begin{block}{95\% prediction interval}
$$\hat{y}_{T+h|T} \pm 1.96\sqrt{v_{T+h|T}}$$
where $v_{T+h|T}$ is estimated forecast variance.
\end{block}\pause

* $v_{T+1|T}=\hat{\sigma}^2$ for all ARIMA models regardless of parameters and orders.
* Multi-step prediction intervals for ARIMA(0,0,$q$):
\centerline{$\displaystyle y_t = \varepsilon_t + \sum_{i=1}^q \theta_i \varepsilon_{t-i}.$}
\centerline{$\displaystyle
v_{T|T+h} = \hat{\sigma}^2 \left[ 1 + \sum_{i=1}^{h-1} \theta_i^2\right], \qquad\text{for~} h=2,3,\dots.$}


## Prediction intervals

\begin{block}{95\% Prediction interval}
$$\hat{y}_{T+h|T} \pm 1.96\sqrt{v_{T+h|T}}$$
where $v_{T+h|T}$ is estimated forecast variance.
\end{block}

* Multi-step prediction intervals for ARIMA(0,0,$q$):
\centerline{$\displaystyle y_t = \varepsilon_t + \sum_{i=1}^q \theta_i \varepsilon_{t-i}.$}
\centerline{$\displaystyle
v_{T|T+h} = \hat{\sigma}^2 \left[ 1 + \sum_{i=1}^{h-1} \theta_i^2\right], \qquad\text{for~} h=2,3,\dots.$}

\pause

* AR(1): Rewrite as MA($\infty$) and use above result.
* Other models beyond scope of this subject.


## Prediction intervals

* Prediction intervals **increase in size with forecast horizon**.
* Prediction intervals can be difficult to calculate by hand
* Calculations assume residuals are **uncorrelated** and **normally distributed**.
* Prediction intervals tend to be too narrow.
    * the uncertainty in the parameter estimates has not been accounted for.
    * the ARIMA model assumes historical patterns will not change during the forecast period.
    * the ARIMA model assumes uncorrelated future \rlap{errors}


## Your turn

For the `usgdp` data:

 * if necessary, find a suitable Box-Cox transformation for the data;
 * fit a suitable ARIMA model to the transformed data using `auto.arima()`;
 * check the residual diagnostics;
 * produce forecasts of your fitted model. Do the forecasts look reasonable?

<!-- # Backshift notation revisited

## Backshift notation

A  very useful notational device is the backward  shift operator,  $B$,  which is used as follows:
$$
{B y_{t}  =  y_{t - 1}} \: .
$$\pause
In  other words, $B$, operating on $y_{t}$, has the effect of **shifting the  data back one period**. \pause
Two applications of  $B$  to  $y_{t}$ **shifts the data  back  two periods**:
$$
B(By_{t})  =  B^{2}y_{t}  =  y_{t-2}\: .
$$\pause
For  monthly  data, if we wish to shift attention  to  ``the same  month last year,''  then  $B^{12}$ is used,  and  the notation is  $B^{12}y_{t}$  =  $y_{t-12}$.


## Backshift notation

  * First difference: $1-B$.
  * Double difference:  $(1- B)^{2}$.
  * $d$th-order difference: $(1 - B)^{d} y_{t}.$
  * Seasonal difference: $1-B^m$.
  * Seasonal difference followed by a first difference: $(1-B)(1-B^m)$.
  * Multiply terms together together to see the combined effect:
\begin{align*}
(1-B)(1-B^m)y_t &= (1 - B - B^m + B^{m+1})y_t \\
&= y_t-y_{t-1}-y_{t-m}+y_{t-m-1}.
\end{align*}

## Backshift notation for ARIMA
\fontsize{13}{15}\sf

  * ARMA model:

  \vspace*{-0.9cm}
\begin{align*}
\hspace*{0.5cm} y_{t}  &=  c  +  \phi_{1}y_{t - 1}  +  \cdots  +  \phi_{p}y_{t - p}
 + \varepsilon_t + \theta_{1}\varepsilon_{t - 1} +  \cdots  + \theta_{q}\varepsilon_{t - q}\\
       &=  c + \phi_{1}By_{t} + \cdots + \phi_pB^py_{t}
           +  \varepsilon_{t}  +  \theta_{1}B\varepsilon_{t} + \cdots + \theta_qB^q\varepsilon_{t} \\
\phi(B)y_t & = c + \theta(B) \varepsilon_t\\
  & \hspace*{0.6cm} \text{where $\phi(B)= 1-\phi_1B - \cdots - \phi_p B^p$}\\
  & \hspace*{0.6cm} \text{and $\theta(B) = 1 + \theta_1 B + \cdots + \theta_q B^q$.}
\end{align*}

\pause

  * ARIMA(1,1,1) model:

$$
\begin{array}{c c c c}
(1 - \phi_{1} B) & (1  -  B) y_{t} &= &c + (1  + \theta_{1} B) \varepsilon_{t}\\
\uparrow  & \uparrow    &   &\uparrow\\
{\text{AR(1)}} & {\text{First}}   &     &{\text{MA(1)}}\\
& {\hbox to 0cm{\hss\text{difference}\hss}}\\
\end{array}
$$


## Backshift notation for ARIMA
\fontsize{13}{15}\sf


  * ARIMA($p,d,q$) model:


\begin{equation*}
  \arraycolsep=0.1cm
  \begin{array}{c c c c}
    (1-\phi_1B - \cdots - \phi_p B^p) & (1-B)^d y_{t} &= &c + (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t\\
    {\uparrow} & {\uparrow} & &{\uparrow}\\
    {\text{AR($p$)}} & \hbox to 0cm{\hss\text{$d$ differences}\hss} & &{\text{MA($q$)}}\\
  \end{array}
\end{equation*}


 -->
# Seasonal ARIMA models

## Seasonal ARIMA models

| ARIMA | $~\underbrace{(p, d, q)}$ | $\underbrace{(P, D, Q)_{m}}$ |
| ----: | :-----------------------: | :--------------------------: |
|       | ${\uparrow}$              | ${\uparrow}$                 |
|       | Non-seasonal part         | Seasonal part of             |
|       | of the model              | of the model                 |


where $m =$ number of observations per year.


## Seasonal ARIMA models

E.g., ARIMA$(1, 1, 1)(1, 1, 1)_{4}$  model (without constant)\pause
$$(1 - \phi_{1}B)(1 - \Phi_{1}B^{4}) (1 - B) (1 - B^{4})y_{t} ~= ~
(1 + \theta_{1}B) (1 + \Theta_{1}B^{4})\varepsilon_{t}.
$$\pause

\setlength{\unitlength}{1mm}
\begin{footnotesize}
\begin{picture}(100,25)(-5,0)
\thinlines
{\put(5,22){\vector(0,1){6}}}
{\put(22,10){\vector(0,1){18}}}
{\put(38,22){\vector(0,1){6}}}
{\put(52,10){\vector(0,1){18}}}
{\put(77,22){\vector(0,1){6}}}
{\put(95,10){\vector(0,1){18}}}
{\put(-10,17){$\left(\begin{array}{@{}c@{}} \text{Non-seasonal} \\ \text{AR(1)}
                    \end{array}\right)$}}
{\put(12,5){$\left(\begin{array}{@{}c@{}} \text{Seasonal} \\ \text{AR(1)}
                    \end{array}\right)$}}
{\put(25,17){$\left(\begin{array}{@{}c@{}} \text{Non-seasonal} \\ \text{difference}
                    \end{array}\right)$}}
{\put(40,5){$\left(\begin{array}{@{}c@{}} \text{Seasonal} \\ \text{difference}
                    \end{array}\right)$}}
{\put(65,17){$\left(\begin{array}{@{}c@{}} \text{Non-seasonal} \\ \text{MA(1)}
                    \end{array}\right)$}}
{\put(85,5){$\left(\begin{array}{@{}c@{}} \text{Seasonal} \\ \text{MA(1)}
                    \end{array}\right)$}}
\end{picture}
\end{footnotesize}


\vspace*{10cm}


## Seasonal ARIMA models

E.g., ARIMA$(1, 1, 1)(1, 1, 1)_{4}$  model (without constant)
$$(1 - \phi_{1}B)(1 - \Phi_{1}B^{4}) (1 - B) (1 - B^{4})y_{t} ~= ~
(1 + \theta_{1}B) (1 + \Theta_{1}B^{4})\varepsilon_{t}.
$$

All the factors can be multiplied out and the general model
written as follows:
\begin{align*}
y_{t}  &= (1 + \phi_{1})y_{t - 1} - \phi_1y_{t-2} + (1 + \Phi_{1})y_{t - 4}\\
&\text{}
 -  (1  + \phi_{1}  +  \Phi_{1} + \phi_{1}\Phi_{1})y_{t - 5}
 +  (\phi_{1}  +  \phi_{1} \Phi_{1}) y_{t - 6} \\
& \text{}  - \Phi_{1} y_{t - 8} +  (\Phi_{1}  +  \phi_{1} \Phi_{1}) y_{t - 9}
  - \phi_{1} \Phi_{1} y_{t  -  10}\\
  &\text{}
+    \varepsilon_{t} + \theta_{1}\varepsilon_{t - 1} + \Theta_{1}\varepsilon_{t - 4}  + \theta_{1}\Theta_{1}\varepsilon_{t - 5}.
\end{align*}
\vspace*{10cm}


## Common ARIMA models

The US Census Bureau uses the following models most often:\vspace*{0.5cm}

\begin{tabular}{|ll|}
\hline
ARIMA(0,1,1)(0,1,1)$_m$& with log transformation\\
ARIMA(0,1,2)(0,1,1)$_m$& with log transformation\\
ARIMA(2,1,0)(0,1,1)$_m$& with log transformation\\
ARIMA(0,2,2)(0,1,1)$_m$& with log transformation\\
ARIMA(2,1,2)(0,1,1)$_m$& with no transformation\\
\hline
\end{tabular}


## Seasonal ARIMA models
The seasonal part of an AR or MA model will be seen in the seasonal lags of
the PACF and ACF.

\structure{ARIMA(0,0,0)(0,0,1)$_{12}$ will show:}

  * a spike at lag 12 in the ACF but no other significant spikes.
  * The PACF will show exponential decay in the seasonal lags;
     that is, at lags 12, 24, 36, \dots.

\structure{ARIMA(0,0,0)(1,0,0)$_{12}$ will show:}

  *  exponential decay in the seasonal lags of the ACF
  * a single significant spike at lag 12 in the PACF.

## European quarterly retail trade

```{r, echo=TRUE, fig.height=3.6}
autoplot(euretail) +
  xlab("Year") + ylab("Retail index")
```

## European quarterly retail trade

```{r, echo=TRUE, fig.height=4}
euretail %>% diff(lag=4) %>% ggtsdisplay()
```

## European quarterly retail trade

```{r, echo=TRUE, fig.height=3.8}
euretail %>% diff(lag=4) %>% diff() %>%
  ggtsdisplay()
```

## European quarterly retail trade

  * $d=1$ and $D=1$ seems necessary.
  * Significant spike at lag 1 in ACF suggests non-seasonal MA(1) component.
  * Significant spike at lag 4 in ACF suggests seasonal MA(1) component.
  * Initial candidate model: ARIMA(0,1,1)(0,1,1)$_4$.
  * We could also have started with ARIMA(1,1,0)(1,1,0)$_4$.

## European quarterly retail trade

```{r, echo=TRUE, fig.height=3.5}
fit <- Arima(euretail, order=c(0,1,1),
  seasonal=c(0,1,1))
checkresiduals(fit)
```

## European quarterly retail trade

\fontsize{13}{14}\sf

```{r, echo=FALSE}
checkresiduals(fit, plot=FALSE)
```

## European quarterly retail trade

```{r, echo=FALSE}
aicc <- c(
  Arima(euretail, order=c(0,1,2), seasonal=c(0,1,1))$aicc,
  Arima(euretail, order=c(0,1,3), seasonal=c(0,1,1))$aicc
  )
```

  * ACF and PACF of residuals show significant spikes at lag 2, and maybe lag 3.
  * AICc of ARIMA(0,1,2)(0,1,1)$_4$ model is `r round(aicc[1],2)`.
  * AICc of ARIMA(0,1,3)(0,1,1)$_4$ model is `r round(aicc[2],2)`.
\pause\vfill

```r
fit <- Arima(euretail, order=c(0,1,3),
  seasonal=c(0,1,1))
checkresiduals(fit)
```


## European quarterly retail trade

\fontsize{12}{15}\sf

```{r}
(fit <- Arima(euretail, order=c(0,1,3),
  seasonal=c(0,1,1)))
```

## European quarterly retail trade
\fontsize{13}{15}\sf

```{r, echo=TRUE, fig.height=4}
checkresiduals(fit)
```

## European quarterly retail trade
\fontsize{13}{15}\sf

```{r, echo=FALSE}
checkresiduals(fit, plot=FALSE)
```

## European quarterly retail trade

```{r, echo=TRUE, fig.height=4}
autoplot(forecast(fit, h=12))
```

## European quarterly retail trade
\fontsize{12}{14}\sf

```{r, echo=TRUE}
auto.arima(euretail)
```

## European quarterly retail trade
\fontsize{12}{14}\sf

```{r euretailtryharder, echo=TRUE}
auto.arima(euretail, 
  stepwise=FALSE, approximation=FALSE)
```


## Cortecosteroid drug sales


```{r h02}
lh02 <- log(h02)
tmp <- cbind("H02 sales (million scripts)" = h02,
             "Log H02 sales"=lh02)
autoplot(tmp, facets=TRUE) + xlab("Year") + ylab("")
```

## Cortecosteroid drug sales
```{r h02b}
ggtsdisplay(diff(lh02,12), xlab="Year",
  main="Seasonally differenced H02 scripts")
```

## Cortecosteroid drug sales

  * Choose $D=1$ and $d=0$.
  * Spikes in PACF at lags 12 and 24 suggest seasonal AR(2) term.
  * Spikes in PACF suggests possible non-seasonal AR(3) term.
  * Initial candidate model: ARIMA(3,0,0)(2,1,0)$_{12}$.


## Cortecosteroid drug sales


```{r h02aicc, echo=FALSE}
models <- rbind(
  c(3,0,0,2,1,0),
  c(3,0,1,2,1,0),
  c(3,0,2,2,1,0),
  c(3,0,1,1,1,0),
  c(3,0,1,0,1,1),
  c(3,0,1,0,1,2),
  c(3,0,1,1,1,1))
aicc <- numeric(NROW(models))
modelname <- character(NROW(models))
for(i in seq_along(aicc))
{
  fit <- Arima(lh02, order=models[i,1:3],
          seasonal=models[i,4:6])
  aicc[i] <- fit$aicc
  modelname[i] <- as.character(fit)
}
modelname <- sub("\\[12\\]","$_{12}$",modelname)
j <- order(aicc)
knitr::kable(data.frame(Model=modelname,AICc=aicc)[j,], escape=FALSE,
             digits=2, row.names=FALSE, align='cc', booktabs=TRUE)
```

## Cortecosteroid drug sales
\fontsize{11}{12}\sf

```{r arimah02, echo=TRUE}
(fit <- Arima(h02, order=c(3,0,1), seasonal=c(0,1,2),
   lambda=0))
```


## Cortecosteroid drug sales

```{r h02res, echo=TRUE, fig.height=4, dependson='arimah02'}
checkresiduals(fit, lag=36)
```

## Cortecosteroid drug sales
\fontsize{11}{15}\sf


```{r h02resb, echo=FALSE, fig.height=4, dependson='arimah02'}
checkresiduals(fit, plot=FALSE,lag=36)
```


## Cortecosteroid drug sales
\fontsize{8}{10}\sf

```{r h02auto, echo=TRUE, fig.height=3.6}
(fit <- auto.arima(h02, lambda=0))
```

## Cortecosteroid drug sales
\fontsize{10}{12}\sf

```{r, echo=TRUE, fig.height=4, dependson='h02auto'}
checkresiduals(fit, lag=36)
```


## Cortecosteroid drug sales
\fontsize{11}{15}\sf

```{r, echo=FALSE, dependson='h02auto'}
checkresiduals(fit, plot=FALSE, lag=36)
```


## Cortecosteroid drug sales
\fontsize{8}{10}\sf

```{r h02tryharder, echo=TRUE, fig.height=3.6}
(fit <- auto.arima(h02, lambda=0, max.order=9,
  stepwise=FALSE, approximation=FALSE))
```


## Cortecosteroid drug sales
\fontsize{10}{12}\sf

```{r, echo=TRUE, fig.height=4, dependson='h02tryharder'}
checkresiduals(fit, lag=36)
```


## Cortecosteroid drug sales
\fontsize{11}{15}\sf

```{r, echo=FALSE, dependson='h02tryharder'}
checkresiduals(fit, plot=FALSE, lag=36)
```

## Cortecosteroid drug sales
\fontsize{10}{12}\sf

Training data: July 1991 to June 2006

Test data: July 2006--June 2008

```r
getrmse <- function(x,h,...)
{
  train.end <- time(x)[length(x)-h]
  test.start <- time(x)[length(x)-h+1]
  train <- window(x,end=train.end)
  test <- window(x,start=test.start)
  fit <- Arima(train,...)
  fc <- forecast(fit,h=h)
  return(accuracy(fc,test)[2,"RMSE"])
}
getrmse(h02,h=24,order=c(3,0,0),seasonal=c(2,1,0),lambda=0)
getrmse(h02,h=24,order=c(3,0,1),seasonal=c(2,1,0),lambda=0)
getrmse(h02,h=24,order=c(3,0,2),seasonal=c(2,1,0),lambda=0)
getrmse(h02,h=24,order=c(3,0,1),seasonal=c(1,1,0),lambda=0)
getrmse(h02,h=24,order=c(3,0,1),seasonal=c(0,1,1),lambda=0)
getrmse(h02,h=24,order=c(3,0,1),seasonal=c(0,1,2),lambda=0)
getrmse(h02,h=24,order=c(3,0,1),seasonal=c(1,1,1),lambda=0)
getrmse(h02,h=24,order=c(3,0,3),seasonal=c(0,1,1),lambda=0)
getrmse(h02,h=24,order=c(3,0,2),seasonal=c(0,1,1),lambda=0)
getrmse(h02,h=24,order=c(2,1,3),seasonal=c(0,1,1),lambda=0)
getrmse(h02,h=24,order=c(2,1,4),seasonal=c(0,1,1),lambda=0)
getrmse(h02,h=24,order=c(2,1,5),seasonal=c(0,1,1),lambda=0)
getrmse(h02,h=24,order=c(4,1,1),seasonal=c(2,1,2),lambda=0)
```

## Cortecosteroid drug sales
\fontsize{12}{14}\sf

```{r, cache=TRUE}
models <- rbind(
  c(3,0,0,2,1,0),
  c(3,0,1,2,1,0),
  c(3,0,2,2,1,0),
  c(3,0,1,1,1,0),
  c(3,0,1,0,1,1),
  c(3,0,1,0,1,2),
  c(3,0,1,1,1,1),
  c(3,0,3,0,1,1),
  c(3,0,2,0,1,1),
  c(2,1,3,0,1,1),
  c(2,1,4,0,1,1),
  c(2,1,5,0,1,1),
  c(4,1,1,2,1,2))
h <- 24
train.end <- time(h02)[length(h02)-h]
test.start <- time(h02)[length(h02)-h+1]
train <- window(h02,end=train.end)
test <- window(h02,start=test.start)

rmse <- numeric(NROW(models))
modelname <- character(NROW(models))
for(i in seq(length(rmse)))
{
  fit <- Arima(train, order=models[i,1:3],
          seasonal=models[i,4:6], lambda=0)
  fc <- forecast(fit,h=h)
  rmse[i] <- accuracy(fc, test)[2,"RMSE"]
  modelname[i] <- as.character(fit)
}
k <- order(rmse)
knitr::kable(data.frame(Model=modelname[k],RMSE=rmse[k]),
             digits=4)
```


## Cortecosteroid drug sales

  * Models with lowest AICc values tend to give slightly better results than the other models.
  * AICc comparisons must have the same orders of differencing. But RMSE test set comparisons can involve any models.
  * Use the best model available, even if it does not pass all tests.

## Cortecosteroid drug sales
\fontsize{11}{14}\sf

```{r h02f, echo=TRUE, fig.height=3}
fit <- Arima(h02, order=c(3,0,1), seasonal=c(0,1,2),
  lambda=0)
autoplot(forecast(fit)) +
  ylab("H02 sales (million scripts)") + xlab("Year")
```


# ARIMA vs ETS


## ARIMA vs ETS


  * Myth that ARIMA models are more general than exponential smoothing.

  * Linear exponential smoothing models all special cases of ARIMA models.

  * Non-linear exponential smoothing models have no equivalent ARIMA counterparts.

  * Many ARIMA models have no exponential smoothing counterparts.

  * ETS models all non-stationary. Models with seasonality or non-damped trend (or both) have two unit roots; all other models have one unit \rlap{root.}


\vspace*{10cm}


## Equivalences

\fontsize{12}{14}\sf

|**ETS model**  | **ARIMA model**             | **Parameters**                       |
| :------------ | :-------------------------- | :----------------------------------- |
| ETS(A,N,N)    | ARIMA(0,1,1)                | $\theta_1 = \alpha-1$                |
| ETS(A,A,N)    | ARIMA(0,2,2)                | $\theta_1 = \alpha+\beta-2$          |
|               |                             | $\theta_2 = 1-\alpha$                |
| ETS(A,A,N)    | ARIMA(1,1,2)                | $\phi_1=\phi$                        |
|               |                             | $\theta_1 = \alpha+\phi\beta-1-\phi$ |
|               |                             | $\theta_2 = (1-\alpha)\phi$          |
| ETS(A,N,A)    | ARIMA(0,0,$m$)(0,1,0)$_m$   |                                      |
| ETS(A,A,A)    | ARIMA(0,1,$m+1$)(0,1,0)$_m$ |                                      |
| ETS(A,A,A)    | ARIMA(1,0,$m+1$)(0,1,0)$_m$ |                                      |


## Your turn

\fontsize{13}{13}\sf

For the `condmilk` series:

  * Do the data need transforming? If so, find a suitable transformation.
  * Are the data stationary? If not, find an appropriate differencing which yields stationary data.
  * Identify a couple of ARIMA models that might be useful in describing the time series.
  * Which of your models is the best according to their AIC values?
  * Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better.
  * Forecast the next 24 months of data using your preferred model.
  * Compare the forecasts obtained using `ets()`.