Time-Series-Forecasting-with-Deep-Learning

Implemented by Spiros Chalkias & Harry Maraziaris

This project is seperated into 4 topics:

A: Time series forecasting
B: Time series anomaly detection with LSTM autoencoders
C: Autoencoders for the compression of stock market time series
D: Comparison of normal with compressed datasets, using kNN-and-Clustering-on-Curves-and-Time-Series

Usage

You can have access to project's parent directory by typing:
$ cd ~/Time-Series-Forecasting-with-Deep-Learning

Prerequisites

In order to run the project, you need to install the following:

Python 3
pip
Pandas
numPy
matplotlib
seaborn
Tensorflow
sklearn
tqdm

Project's Structure

The filesystem structure is as follows:

/Time-Series-Forecasting-with-Deep-Learning/ : Project's main directory.
/src/ : Project's source code.
/data/ : Input data used by the project.
/out_files/ : Files generated by the compression of time series.
/saved models/ : Folder where all models are being saved, in order to quickly demonstrate their usage without re-training them.
/reports/ : Experiment reports for topics A, B and C individually.
/reports/d_comparison_results/ex4_results/ : Comparison of the original dataset versus the compressed dataset using this project's executables.
/src/preprocess.py : File containing all the utility functions used by the project's main files.
/src/forecast.py : File used for time series forecasting.
/src/detect.py : File used for time series anomaly detection with LSTM autoencoders.
/src/reduce.py : File used for compression of stock market time series using autoencoders.
/src/time-series-forecasting.ipynb : The python notebook used in order to train the models and tune the data!
/data/nasdaq2007_17.csv : Data file used in topics A and B.
/data/input.csv : Input file used in topic C.
/data/query.csv : Query file used in topic C.
/out_files/output_dataset_file.csv : Compressed time series file used as an input file in this project.
/out_files/output_query_file.csv : Compressed time series file used as query file in this project.

Build & Run

While being in the project's parent directory, simply type the following in order to execute each question's corresponding file.

Run A - Time Series forecasting

python3 ./src/forecast.py -d <dataset> -n <number of time series selected>

Run B - Time Series Anomaly Detection with LSTM Autoencoders

python3 ./src/detect.py -d <dataset> -n <number of time series selected> -mae <error value as double>

Run C - Autoencoders for the compression of stock market time series

python3 ./src/reduce.py -d <dataset> -q <queryset> -od <output_dataset_file> -oq <output_query_file>

General Notes

The project was written in Python 3, using Tensorflow and specifically Keras API.
The assignment's code was inspired by the three (3) articles provided in the lectures and displayed in the Resources section.
In order to prevent overfitting, Early Stopping has been added to every model.
Each model is being compiled with:
- Mean Squared Error (MSE) as a loss function.
- Adam as an optimizer.
- Mean Absolute Error (MAE) as an evaluation metric.
MinMax scaler is used in order to properly scale the data.
In Anomaly Detection, if the anomaly threshold is not provided by the user, then it is being automatically computed by taking the maximum value, when computing the training set's Mean Absolute Error (MAE).

Fine-tuning

Fine-tuning reports showcasing our experiments for topics A, B and C can be found in the additional PDFs provided in the submitted directory.

Comparison with kNN-and-Clustering-on-Curves-and-Time-Series

Search

MAF : Maximum Approximation Factor
AAT : Average Approximation 1-NN Time taken

We observe that our search algorithms run around x100 faster on the compressed dataset, which is expected. We also notice that our Approximation algorithms run quite well, obtaining scores of perfect MAF = 1 on the reduced datasets and less than 4 on the original dataset.

Stats	LSH-Euclidean	LSH-Discrete-Frechet	LSH-Continuous-Frechet
MAF	3.43	3.84	2.67
AAT (sec)	40.61	3.66	105.39

Table 1: Original input and query files

Stats	LSH-Euclidean	LSH-Discrete-Frechet	LSH-Continuous-Frechet
MAF	1	1	1
AAT (sec)	0.03	0.01	0.03

Table 2: Reduced input and query files

Clustering

Clustering - Mean Vector

We observe that our clustering algorithms run faster on the reduced datasets, as expected, at a factor of at least 20. We also obtain very good Silhouette scores (> 0.8 on average) in both the clustering of the Original and the Reduced datasets. Thus we could argue that most of the information used to cluster our timeseries is preserved even after their compression, leading to equally good clustering.

Stats	Cluster 1	Cluster 2	Cluster 3	Cluster 4	Overall
Cluster Size	1	21	1	326	349
Silhouette	1	0.24274	1	0.75713	0.72757
Clustering Time : 0.002 sec

Table 3: Reduced clustering: Lloyd's assignment

Stats	Cluster 1	Cluster 2	Cluster 3	Cluster 4	Overall
Cluster Size	1	21	1	326	349
Silhouette	1	0.44962	1	0.97536	0.97249
Clustering Time : 0.092 sec

Table 4: Original clustering: Lloyd's assignment

Stats	Cluster 1	Cluster 2	Cluster 3	Cluster 4	Overall
Cluster Size	2	1	1	345	349
Silhouette	0.81551	1	1	0.9105	0.91047
Clustering Time : 0.003 sec

Table 5: Reduced clustering: LSH

Stats	Cluster 1	Cluster 2	Cluster 3	Cluster 4	Overall
Cluster Size	1	1	1	346	349
Silhouette	1	1	1	0.9769	0.97718
Clustering Time : 0.053 sec

Table 6: Original clustering: LSH

Stats	Cluster 1	Cluster 2	Cluster 3	Cluster 4	Overall
Cluster Size	1	154	1	193	349
Silhouette	1	0.12298	1	0.71973	0.45801
Clustering Time : 0.002 sec

Table 7: Reduced clustering: Hypercube

Stats	Cluster 1	Cluster 2	Cluster 3	Cluster 4	Overall
Cluster Size	2	1	1	345	349
Silhouette	0.44962	1	1	0.97536	0.97249
Clustering Time : 0.095 sec

Table 8: Original clustering: Hypercube

Clustering - Mean Frechet

We observe that our clustering algorithms run very faster on the reduced datasets, at a factor of around 2500. We also obtain high Silhouette scores in the Reduced datasets, indicating a good clustering.

Stats	Cluster 1	Cluster 2	Cluster 3	Cluster 4	Overall
Cluster Size	264	1	1	83	349
Silhouette	0.6975	1	1	0.1360	0.5657
Clustering Time : 0.054 sec

Table 9: Reduced clustering: Lloyd's assignment

Stats	Cluster 1	Cluster 2	Cluster 3	Cluster 4	Overall
Cluster Size	345	1	2	1	349
Silhouette	-	-	-	-	-
Clustering Time : 2551.43 sec

Table 10: Original clustering: Lloyd's assignment

Stats	Cluster 1	Cluster 2	Cluster 3	Cluster 4	Overall
Cluster Size	1	2	1	345	349
Silhouette	1	0.74056	1	0.90767	0.90724
Clustering Time : 0.053 sec

Table 11: Reduced clustering: LSH

Stats	Cluster 1	Cluster 2	Cluster 3	Cluster 4	Overall
Cluster Size	1	345	1	2	349
Silhouette	-	-	-	-	-
Clustering Time : 3555.52 sec

Table 12: Original clustering: LSH

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Time-Series-Forecasting-with-Deep-Learning

Usage

Prerequisites

Project's Structure

Build & Run

Run A - Time Series forecasting

Run B - Time Series Anomaly Detection with LSTM Autoencoders

Run C - Autoencoders for the compression of stock market time series

General Notes

Fine-tuning

Comparison with kNN-and-Clustering-on-Curves-and-Time-Series

Search

Clustering

Clustering - Mean Vector

Clustering - Mean Frechet

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
out_files		out_files
reports		reports
src		src
LICENSE		LICENSE
README.md		README.md
time-series-forecasting.ipynb		time-series-forecasting.ipynb

License

spChalk/Time-Series-Forecasting-with-Deep-Learning

Folders and files

Latest commit

History

Repository files navigation

Time-Series-Forecasting-with-Deep-Learning

Usage

Prerequisites

Project's Structure

Build & Run

Run A - Time Series forecasting

Run B - Time Series Anomaly Detection with LSTM Autoencoders

Run C - Autoencoders for the compression of stock market time series

General Notes

Fine-tuning

Comparison with kNN-and-Clustering-on-Curves-and-Time-Series

Search

Clustering

Clustering - Mean Vector

Clustering - Mean Frechet

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages