-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathaupec.tex
400 lines (320 loc) · 29.3 KB
/
aupec.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
\documentclass[conference]{IEEEtran}
\IEEEoverridecommandlockouts
% The preceding line is only needed to identify funding in the first footnote. If that is unneeded, please comment it out.
\usepackage{cite}
\usepackage{amsmath,amssymb,amsfonts}
\usepackage{algorithmic}
\usepackage{graphicx}
\usepackage{textcomp}
\usepackage{xcolor}
\usepackage{url}
\def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em
T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}}
\begin{document}
\title{Day Ahead Load Forecasting for the Modern Distribution Network - A Tasmanian Case Study}
\author{\IEEEauthorblockN{Michael Jurasovic}
\IEEEauthorblockA{\textit{School of Engineering} \\
\textit{University of Tasmania}\\
Hobart, Australia \\
\and
\IEEEauthorblockN{Evan Franklin}
\IEEEauthorblockA{\textit{School of Engineering} \\
\textit{University of Tasmania}\\
Hobart, Australia \\
}
\and
\IEEEauthorblockN{Michael Negnevitsky}
\IEEEauthorblockA{\textit{School of Engineering} \\
\textit{University of Tasmania}\\
Hobart, Australia \\
\and
\IEEEauthorblockN{Paul Scott}
\IEEEauthorblockA{\textit{Research School of Computer Science} \\
\textit{Australian National University}\\
Canberra, Australia \\
}
\maketitle
\begin{abstract}
Penetration of distributed energy resources in distribution networks is predicted to increase dramatically in the next seven years, bringing with it the opportunity for utilities to have a greater presence at low levels of the network.
To achieve this effectively, utilities will require accurate short term load forecasts.
This paper presents a novel neural network-based load forecasting system that applies recent advances in neural attention mechanisms.
The forecasting system is trained and assessed on ten years of historical half-hourly load, weather, and calendar data to produce a 24-hour horizon half-hourly online forecast.
When forecasting during anomalous peak holiday periods on a feeder that has a typical load of less than 1000kVA the forecasting system achieves a MAPE of 7.4\% and a mean error of -15kVA.
The forecasting system is implemented in a residential battery trial and is able to successfully forecast major peaks with sufficient lead time and accuracy to enable the fleet of batteries to charge ahead of time and provide network support.
\end{abstract}
\begin{IEEEkeywords}
load forecasting, machine learning, DER
\end{IEEEkeywords}
\section{Introduction}
Electricity distribution networks, and the way in which they are managed, are currently going through a significant transition, with perhaps more change over the last ten years than in the previous hundred. %the sort of statement where a ref would help strengthen it, but OK not to have
Until recently, generation and load were largely viewed and managed separately: power was produced almost exclusively by large, centralised generating units, and was consumed by customers after routing via %traversing is not quite right word. reticulation... hmm???
the transmission and distribution networks.
Networks were designed and built for demand profiles which were relatively stable over time, with demand forecasting at distribution feeder level required primarily for long term network planning purposes only.
Increasingly, power is both consumed by customers connected to the distribution network and is also generated and manipulated by distributed energy resources (DER) within the distribution network, often behind the meter of individual consumers.
The impact of increasing levels of DER in the system creates, among other things, both a need and an opportunity to more actively manage distribution networks, while also resulting in a generally less predictable net demand profile.
DERs are controllable devices in the power network that generate, store, and/or consume power.
This includes solar photovoltaic generation (PV), battery storage, and electrical vehicles.
In Australia, the dominant DER technology deployed to date is solar PV, with over 1.8 million systems now reportedly installed on residential properties \cite{apvi2017}.
It is widely anticipated that battery storage and electrical vehicle uptake may be just as rapid \cite{bloomberg2018}.
The Tasmanian distribution network, meanwhile, is forecast to experience significant increases in these technologies by 2025: \\
\begin{itemize}
\item 680\% increase in battery storage capacity (from 11MWh to 75MWh) \cite{Jacobs2017}
\item 170\% increase in PV installation capacity (from 130MW to 220MW) \cite{Jacobs2017}
\item 39\% of new car sales will be electrical vehicles - the highest in the country \cite{AEMO2016}
\end{itemize}
The changing nature of the distribution network presents an opportunity to maximize the use of existing assets by delaying the need for network augmentations, while also providing customers with a more reliable supply of power.
For example, batteries could be used to peak-shift, reducing maximum feeder load.
However, to achieve this reliably and with optimal use of available DER generally requires sophisticated methods to optimize the power flow to and from the distributed resources.
\begin{figure}[htbp]
\centerline{\includegraphics[width=.35\textwidth]{images/bruny_island_map.png}}
\caption{Bruny Island in southeast Tasmania (shown) experiences significant increases in load during holiday periods and was used as a case study for the load forecaster.}
\label{fig:bruny_map}
\end{figure}
One method to achieve this coordination of DER in distribution networks is presented in \cite{Scott2014} and has been implemented on Bruny Island, Tasmania as part of the CONSORT project (CONSumer energy systems providing cost-effective grid suppORT).
Bruny Island, shown in Figure \ref{fig:bruny_map}, is a popular holiday destination and during peak periods --- such as Easter morning and afternoon peaks --- the submarine feeder supplying the island becomes overloaded and is supplemented by a diesel generator located on the island.
The aim of the CONSORT project was to effectively peak shift the load away from the morning and afternoon peaks to prevent the use of the generator.
To fulfil such an objective while using the available distributed resources optimally, the CONSORT project relies upon having an accurate, online, 24-hour horizon forecast at the feeder level.
However, load forecasting methods commonly employed in industry are neither intended to forecast with high accuracy over a time period this short nor at the feeder level. \cite{CIGRE2016}.
An improved method for producing accurate feeder-level forecasts is not only highly desirable for this project, but will also in the future become a critical element of active distribution network management more generally.
In this paper, a novel neural network-based day-ahead feeder level load forecasting system is developed, with its performance evaluated using ten years of historical demand data for training and testing. We implement the forecast live on Tasmania's Bruny Island distribution network, enabling the CONSORT project's residential battery systems to effectively support the network during periods of peak demand.
\section{Forecasting System Architecture}
Recurrent Neural Networks (RNN) have recently been popular for load forecasting in electricity networks \cite{Kong2018}.
However, RNNs have been out-performed by the Transformer \cite{Vaswani2017} model in several domains including machine translation \cite{Vaswani2017} (where the architecture was first proposed and applied), medical time series forecasting and regression \cite{Song2017}, and image generation \cite{Parmar2018}.
The Transformer is a purely attention-based neural network model and does not rely on recursion like traditional RNNs.
This allows a larger degree of parallelism to reduce training time, while also allowing the model to effectively handle long-range dependencies in the inputs.
The proposed forecasting system uses a Transformer neural network model combined with similar period selection.
\subsection{Similar Period Selection} \label{simperiod}
Load profiles are influenced by exogenous data such as weather, day of the week, and holiday type \cite{Weron2006}.
A simple and intuitive method of load forecasting is to find periods in the past with similar exogenous data to the period being forecast and then use the load profiles from these past periods to form a forecast \cite{Senjyu1998}.
However, these similar period methods can be insufficient to capture complex patterns, especially over holiday periods which occur only once per year \cite{Chen2010}.
%Holiday type indicates which holiday the load profile occurs on - Easter or Christmas for example.
%These different holidays are assigned different integer identifiers (with normal days assigned identifier 0) to form a time series.
The forecasting system was provided with historical load and weather profiles from periods that had similar exogenous data to the period of the load profile being forecast.
Similar periods were identified by first finding candidate similar periods an integer multiple of 1 year $\pm$30 days away from the period being forecast.
These candidates were then filtered down to periods with exactly matching hour and minute.
Then the weighted Euclidean distance between the period being forecast and each candidate similar period was calculated using the following features:
maximum future temperature,
minimum future temperature,
maximum past load,
current holiday type,
current day of the week,
current day of the month, and
current month of the year.
The holiday type indicates the current holiday --- Easter or Christmas for example --- and is encoded as a time series of integers with a different integer for each holiday.
When the holiday type always occurs on the same date each year then the month of the year and day of the month were used, whereas when the holiday type always occurs on the same day of the week each year then the day of the week was used.
The candidate similar periods with the lowest Euclidean distance were selected as the final set of similar periods.
When training and testing the model the similar periods were selected from both the past and the future, as the train and test datasets were only five years each.
It was assumed that, for testing, there were no changes in the patterns underlying the load profile over the duration of the testing set.
\begin{figure}[htbp]
\centerline{\includegraphics[trim=0 0.8cm 0 0, width=.35\textwidth]{images/transformer.pdf}}
\caption{The Transformer architecture.}
\label{fig:transformer}
\end{figure}
\subsection{Transformer}
The transformer neural network architecture, shown in figure \ref{fig:transformer}, was introduced by \cite{Vaswani2017} in 2017 and at the time was the state of the art in neural machine translation.
This architecture follows the standard sequence-to-sequence/encoder-decoder architecture: the encoder transforms an input sequence $X = (\boldsymbol{x}_1, ..., \boldsymbol{x}_P)$ into a latent representation $Z = (\boldsymbol{z}_1, ..., \boldsymbol{z}_M)$, and the decoder transforms $Z$ into an output sequence $Y = (\boldsymbol{y}_1, ..., \boldsymbol{y}_R)$, where $x_t$, $y_t$, and $z_t$ are of arbitrary dimension.
This matches the requirements of a load forecasting system, where $P$ is the length of the time series input, $x_t$ is of dimension equal to the number of variables in the input time series, $R$ is the length of the time series output, and $y_t$ is of dimension equal to the number of variables in the output time series.
The latent representation $Z$ is used solely by the model as an internal representation of the input data.
The encoder is constructed of a stack of $L$ identical layers, each containing two sub-layers.
The first is multi-head self-attention and the second is a position-wise feed-foward network.
Both sub-layers have a residual connection around them and are fed into a normalization layer.
The decoder is similar to the encoder except for a third layer which implements multi-head attention on the outputs of the encoder.
The input to the decoder is the previous output of the decoder, but shifted right by one.
This requires an iterative approach to be used to predict all points in the time series.
The self-attention in the decoder is masked so that when evaluating a query at time $t$ it does not assign large weights to keys/values occurring after $t$ in time, making the decoder autoregressive.
\subsection{Input Embedding}
The input $\boldsymbol{X} \in \mathbb{R}^{T \times N}$, where the rows represent $T$ points in time and the columns represent $N$ time series, is embedded by applying a dense layer to produce an embedded $\boldsymbol{X'} \in \mathbb{R}^{T \times d}$, where $d$ is the hidden dimension of the model and $d$ is the same for both the encoder and the decoder.
This is intended to allow the neural network to learn the relationships and dependencies between the different input time series.
The embedded representation is given by Equation \ref{dense_layer}, with learned weights $\boldsymbol{W} \in \mathbb{R}^{N \times d}$ and a learned bias vector $\boldsymbol{b} \in \mathbb{R}^{d}$.
\begin{equation} \label{dense_layer}
\text{dense}(\boldsymbol{X}) = \text{max}(0, \boldsymbol{XW} + \boldsymbol{b})
\end{equation}
\subsection{Positional Encoding}
The model has no way of telling the position or order of each element in the input, so this information is injected in the positional encoding layer.
This is done by using a learned lookup table to add the same value to the inputs at both test and train time depending on their position in time in the input.
Specifically, the positional encoding layer adds a matrix lookup table of embeddings $\boldsymbol{E}$ to the input, where $\boldsymbol{E}$ is of the same dimension as the input.
\subsection{Multi-Head Attention} \label{multihead_attention}
The primary innovation of the Transformer architecture is multi-head attention.
Generic attention and dot-product attention will now be described as these are prerequisite to describing multi-head attention.
Given a single query vector and a set of key and value pairs (with each key and each value being a vector), an attention function matches the query to the keys to produce a weight for each key by applying an arbitrary fitness function.
These key weights are then used to create an output vector comprised of the weighted sum of the values, where each value's weight is the weight assigned to its corresponding key.
\begin{figure}[htbp]
\centerline{\includegraphics[width=.35\textwidth]{images/multihead_attn.pdf}}
\caption{Multiheaded attention, the primary component of the of the Transformer architecture.}
\label{fig:multihead}
\end{figure}
Scaled dot-product attention, shown in Figure \ref{fig:multihead} (a), is a specific implementation of an attention function.
It uses the dot product of the query and each key to generate the weights, which are then passed through a softmax function such that the sum of all weights is equal to 1.
In practice, the query row vectors are combined into a single matrix, $\boldsymbol{Q}$, allowing cheap matrix calculations to be used to evaluate the attention outputs in parallel.
The keys and values are represented by row vectors in $\boldsymbol{K}$ and $\boldsymbol{V}$, respectively.
The dot product of the keys and queries is scaled (hence the name) by multiplying it by $1 / \sqrt{d_k}$, with $d_k$ being the key and query dimension, to prevent the dot product from becoming large when $d_k$ is large as this may cause the softmax gradient to become very small and affect the gradient descent training.
The causal mask shown in Figure \ref{fig:multihead} is used exclusively in the decoder self-attention to prevent the attention function from matching any query to a key that occurs after itself in time.
This is achieved by leaving the lower triangular portion of the matrix untouched and setting the other values to be large negative numbers, indicating a very poor match.
Multi-head attention, shown in Figure \ref{fig:multihead} (b), applies a separate dense layer to each of the values, queries, and keys.
The dense layer is applied per Equation \ref{dense_layer} with learned weights $\boldsymbol{W} \in \mathbb{R}^{d \times d}$ and a learned bias vector $\boldsymbol{b} \in \mathbb{R}^{d}$.
The outputs of the dense layers are then split along the last axis into $h$ sets, or heads.
As a result the key, query, and value dimension is reduced by a factor of $h$ to $\frac{d}{h}$.
Scaled dot-product attention is then run independently on each set.
The results are concatenated and put through a final dense layer to produce the output of the attention function.
The dense layer function on the output is defined by Equation \ref{dense_layer} where $\boldsymbol{W} \in \mathbb{R}^{d \times d}$ is a learned weight matrix and $\boldsymbol{b} \in \mathbb{R}^{d}$ is a learned bias vector.
The dense layer combined with the split allows the multi-head attention to pick out information from different subspaces in the input and direct these to different attention heads.
This is in contrast to a single head which must average all subspaces.
\subsection{Feed-forward}
The feed-forward layer is a two layer network with a rectified linear unit in the middle.
Given an input $\boldsymbol{X} \in \mathbb{R}^{T \times d}$, the output $\boldsymbol{X'} \in \mathbb{R}^{T \times d}$ is populated by Equation \ref{feedforward} where $\boldsymbol{W}_1 \in \mathbb{R}^{d \times 4d}$, $\boldsymbol{b}_1 \in \mathbb{R}^{4d}$, $\boldsymbol{W}_2 \in \mathbb{R}^{4d \times d}$, and $\boldsymbol{b}_2 \in \mathbb{R}^{d}$ are learned weights and biases.
\begin{equation} \label{feedforward}
\text{feedforward}(\boldsymbol{X}) = \text{max}(0, \boldsymbol{X} \boldsymbol{W}_1 + \boldsymbol{b}_1) \boldsymbol{W}_2 + \boldsymbol{b}_2
\end{equation}
\subsection{Decoder Dense Output}
The output of the decoder is passed through a dense layer to project the hidden dimension to the desired dimension of 1.
The layer is implemented per Equation \ref{dense_layer} where $\boldsymbol{W} \in \mathbb{R}^{d \times 1}$ is a learned weight matrix and $\boldsymbol{b} \in \mathbb{R}^{1}$ is a learned bias vector.
By adjusting the dimension of $\boldsymbol{W}$ and $\boldsymbol{b}$ the network could be modified to perform multiple forecasts simultaneously.
\subsection{Residuals \& Normalization}
Residual connections \cite{He2015} are applied around each sub-layer.
That is, the output of each sub-layer is given by $\boldsymbol{X'} = \boldsymbol{X} + \text{subLayer}(\boldsymbol{X})$ where subLayer$(\boldsymbol{X})$ is the original output of the sub-layer.
The outputs are then normalized by applying layer normalization \cite{Ba2016}.
\subsection{Dropout and Training}
To help prevent overfitting to the training data, dropout \cite{srivastava14a} is applied during training at the output of both positional encoding layers, and immediately after the softmax operation in all multi-head attention layers.
Additionally, the encoder inputs, decoder inputs, and expected outputs were normalized between 0 and 1 and summed with randomly distributed noise with a mean of 0 and standard deviation of 0.01 before being supplied to the model during training.
The model is trained using the Adam optimizer \cite{Kingma2014} and a modified sum of errors squared loss function.
Given a vector $\boldsymbol{y}$ of predictions from the model and a vector $\boldsymbol{y'}$ of expected predictions the loss function $l$ is given by
\begin{equation}
l = \sum_{t=0}^{R}((\boldsymbol{y}_t - \boldsymbol{y'}_t)^2 \times |\boldsymbol{y'}_t|^c)
\end{equation}
where $c$ is a model hyperparameter.
For $c>0$ this function accentuates loss when the actual value is large --- making the model more accurate at forecasting peaks.
When testing or being used for inference, the decoder outputs are generated sequentially one at a time.
After each output value is generated it is shifted right by one and populated in the decoder input and the model is executed again until all the outputs have been generated.
Values that have not yet been generated are set to zero in the decoder input.
These zero values do not affect the output of the decoder, as the decoder self-attention is masked so that it does not make use of them.
When training, the decoder input is set to the known expected value and the model is executed --- and the learnable parameters updated --- a single time.
\section{Case Study}
\subsection{Bruny Island}
Bruny Island, shown in Figure \ref{fig:bruny_map}, is located approximately two kilometres off the coast of south-east Tasmania with a permanent resident population of approximately 800 people.
The island is a popular holiday destination, with Easter periods typically experiencing an influx of up to 500 cars in a single day.
The island is supplied by two feeders, depicted in Figure \ref{fig:bruny_network}, with one feeder supplying a small portion of the island to the North and the other supplying the main portion of the island to the South.
This case study deals only with the feeder supplying the main portion of the island to the South.
During holiday period morning and afternoon peaks the submarine feeder reaches its capacity and a diesel generator located on the island is used to reduce the feeder load.
The substantial increase in load over the Easter holiday period for multiple years can be seen in Figure \ref{fig:bruny_easter}.
To avoid the use of the generator, the CONSORT project installed a set of residential batteries on the island for the purposes of peak-shifting.
These batteries are coordinated by the network aware coordination algorithm (NAC).
In order to peak-shift while making efficient use of the batteries, the NAC requires an accurate forecast of load with a 24-hour horizon and 30-minute resolution.
The proposed forecasting system was implemented for this purpose.
\begin{figure}[htbp]
\centerline{\includegraphics[width=.25\textwidth]{images/bruny_single_line.pdf}}
\caption{Single line schematic of distribution network on Bruny Island.}
\label{fig:bruny_network}
\end{figure}
\begin{figure}[htbp]
\centerline{\includegraphics[width=.40\textwidth]{images/easter_bruny.png}}
\caption{Easter load on Bruny Island 2008 through 2017.
Unusable missing/bad data can also be seen in this graph.}
\label{fig:bruny_easter}
\end{figure}
\subsection{Data and Model Configuration}
The following data was available from 2009-2018:
\begin{itemize}
\item Apparent power at reclosers R1 through R4 (Figure \ref{fig:bruny_network}).
\item Temperature at Lenah Valley, Tasmania (50km from Bruny Island).
\item Apparent power consumption at St Helens, Tasmania.
\end{itemize}
This data was averaged to 30 minute resolution and split into a training set containing data from October 2009 through September 2014, and a testing set containing data from October 2014 through April 2018.
The network was supplied with data from the previous and future 24 hours, for a total input sequence length of 96 (representing 48 hours at 30 minute resolution).
The output sequence length was 48 (24 hours).
The following time series were supplied to the model input:
\begin{itemize}
\item Apparent power from recloser R1 (Figure \ref{fig:bruny_network}), with future values set to zero.
\item Temperature.
\item Day of the week as an integer from 0 to 6 (local time).
\item Minutes since midnight (local time).
\item Boolean 1 or 0 indicating whether it is a holiday.
\item Holiday type.
\end{itemize}
When used for inference, temperature forecasts were obtained from the Bureau of Meteorology.
Additionally, five similar periods were identified using data from R1 by the process described in section \ref{simperiod}.
The data over the similar periods for each of the following time series was provided as input:
\begin{itemize}
\item Reclosers R1, R2, R3, and R4 (Figure \ref{fig:bruny_network}) (as separate time series).
\item St Helens recloser.
\item Lenah Valley temperature.
\end{itemize}
In total, 36 time series were provided as input to the model.
St Helens was included because it was observed to display similar patterns to Bruny Island around holiday periods.
The forecasting system was configured with the parameters in table \ref{table:parameters}, with the upper section giving transformer model parameters and the lower section giving weights used for similar period selection.
\begin{table}[htbp]
\caption{Case study model parameters.}
\begin{center}
\begin{tabular}{clc}
% 3.\hline
\textbf{Parameter}&\textbf{Description}&\textbf{Value} \\
\hline
$L$ & Number of encoder and decoder layers & 4 \\
$d$ & Hidden dimension & 32 \\
$h$ & Number of attention heads & 4 \\
$D$ & Dropout fraction & 0.2 \\
$c$ & Loss function modifier & 3 \\
- & Training batch size & 16 \\
\hline
- & Maximum future temperature weight & 10 \\
- & Minimum future temperature weight & 20 \\
- & Maximum past load weight & 30 \\
- & Holiday type weight & 1e9 \\
- & Day of week weight & 1e6 \\
- & Day of month weight & 1e6 \\
- & Month of year weight & 1e6 \\
\end{tabular}
\label{table:parameters}
\end{center}
\end{table}
\subsection{Offline Evaluation}
The forecaster was first evaluated on historical data around Easter 2018, shown in Figure \ref{fig:easter_forecasts}.
Notably, the forecaster was able to accurately predict the first large peak while also transitioning smoothly between normal and holiday periods.
This is in contrast to load forecasting models which sometimes tend toward trivially repeating the previous day's load.
\begin{figure}[htbp]
\centerline{\includegraphics[width=.40\textwidth]{images/easter_2018_all_forecast.png}}
\caption{Forecasts over the Easter 2018 period.
The black dashed line is the actual recorded load, and all previous forecasts are shown in grey.}
\label{fig:easter_forecasts}
\end{figure}
\begin{figure}[htbp]
\centerline{\includegraphics[width=.40\textwidth]{images/bruny_mape.png}}
\caption{Mean absolute percentage error of each point in the forecast when evaluated over a variety of holiday periods.}.
\label{fig:bruny_mape}
\end{figure}
The performance of the forecaster was evaluated on every Easter, Queen's Birthday, and July school holiday period from 2015 through 2018 (2018 excludes July).
The results are shown in Figure \ref{fig:bruny_mape}, showing the mean absolute percentage error (MAPE) as a function of forecast horizon.
The mean MAPE is 7.4\%.
Furthermore, the errors between predicted and actual load are fairly evenly distributed around -15 kVA, shown in Figure \ref{fig:bruny_hist}.
This indicates that the model has some room for improvement when predicting large holiday peaks, but overall this is evidence that the model has been able to generalize from the training data, as the training data is mostly comprised of normal days.
\begin{figure}[htbp]
\centerline{\includegraphics[width=.35\textwidth]{images/errors_histogram.png}}
\caption{Distribution of forecast error when evaluated over the same periods as Figure \ref{fig:bruny_mape}.}
\label{fig:bruny_hist}
\end{figure}
\subsection{Online Evaluation}
When implemented live on the Bruny Island distribution network during the July 2018 school holiday period, as part of the CONSORT project, the forecaster was observed to reliably forecast large demand peaks.
This enabled the fleet of distributed batteries to be used effectively in providing network support via net demand peak reduction. An accurate forecast, issued early enough in advance of the occurence of the demand peak, was observed to give the batteries adequate time to store energy in the lead up to, and discharge during the demand peak period. In at least one instance over the test period this was sufficient to avoid the island's diesel generator from being used at all, when it otherwise almost certainly would have been required.
Data collected during this peak demand period can be seen in Figure \ref{fig:bruny_nac}.
The upper section shows 24 hours of historical load in black, plus the most recent 24-hour horizon forecast in dashed black (recalculated every five minutes) and all old forecasts in grey.
The lower section shows the battery charge rate, where a negative value of battery charge rate indicates the batteries are supporting the grid.
Typically the generator is switched on when load exceeds 1050 kVA.
During the first peak the graph shows the batteries supplying between 50 and 100 kW to the island.
Without this support from the batteries, the generator would have been required to operate.
\begin{figure}[htbp]
\centerline{\includegraphics[width=.35\textwidth]{images/bruny_nac.pdf}}
\caption{Results from the forecasting system's implementation in the CONSORT project.}
\label{fig:bruny_nac}
\end{figure}
\section{Conclusion}
This paper presents a novel neural network-based load forecasting system and applies it to Bruny Island, Tasmania, allowing a fleet of residential batteries to effectively support the network during major peaks.
The single load forecasting system was able to accurately predict peaks both during anomalous holiday periods and during periods of normal load, with a mean MAPE of 7.4\% and a mean error of -15kVA.
It is expected that this system would be equally applicable to any distribution feeder, and could be trivially expanded to perform multiple forecasts simultaneously by adjusting the output dimension of the final dense layer of the neural network.
\section*{Acknowledgment}
This work has been supported by TasNetworks, who also provided network and demand data. It has also been supported by researchers from the ARENA-funded CONSORT Bruny Island Battery Trial.
\bibliographystyle{IEEEtran}
\bibliography{aupec}
\end{document}