This project was built on Ubuntu 18.04 (Running on Windows Subsystem Linux(WSL), Kernel:4.19.75-microsoft-standard). Tested in Ubuntu 18.04 (bionic beaver, Kernel:5.2.0)
- OS: Linux (Ubuntu 18.04)
- Docker
- Flink (v1.9.1)
- Kafka (v2.12)
- Elasticsearch (v5.6.0)
- Kibana (v5.6.0)
- Redis (Latest Docker image)
- Python (v3.*)
Note: All the commands below should be run from the root directory of the repository. Some of the bash scipts require root access. So, if asked, please provide the root credentials. Also, note that starting Flink
slave nodes require adding them to ~/.ssh/known_hosts
. So, you have to explicitly type yes when prmopted.
Install Maven
sudo apt install maven
Install OpenJDK8
sudo apt install openjdk-8-jdk
Install Docker
For installing Docker please follow the instructions here
Install Python
For installing Python please follow the instructions here
Install pip3
sudo apt install python3-pip
pip3 install -r code/customer-code/requirements.txt
The following script will deploy the whole pipeline and download test data
code/deployment-scripts/deploy-all
Upload the schema of the final sink Elasticsearch
(mysimpbdp-coredms)
code/customer-code/coredms-schema-upload
Transform the location id ==> (lat,lon) pairs and Populating Redis
python3 code/customer-code/customer_transformer.py
Running Customerstreamapp
code/customer-code/run-customerstreamapp <parallelism degree>
Running Customer Real Timeview
python3 code/customer-code/customer_realtime-view.py --n <number of locations>
Starting streaming customer data to kafka
python3 code/customer-code/customer_producer.py --rows <number of rows>
code/deployment-scripts/cleanup