In this project, we are using Apache Flink as a stream-processing framework to execute queries in a multi-cluster network, the results of which we then join and aggregate. We aim to emulate various network conditions by manipulating different network parameters such as delays, window sizes, and buffer sizes, which allows us to measure the impact of these on throughput and latency.
For further information, check out the Wiki pages.
Technology | Version | Download link |
---|---|---|
Java SE | 11.0.19 | https://www.oracle.com/java/technologies/javase/jdk11-archive-downloads.html |
Apache Kafka | 2.12-3.4.0 | https://kafka.apache.org/downloads |
Apache Flink | 1.17.1 | https://flink.apache.org/downloads/ |
Apache Maven | 3.9.4 | https://maven.apache.org/download.cgi |
Docker | latest | https://docs.docker.com/get-docker/ |
Download the Kafka binary kafka_2.12-3.4.0
and the Apache Flink binary flink-1.17.1
, unzip them, and don't change anything in the Flink config directory.
Download the following libraries required by the C Kafka client and JSON-formatted string processing:
sudo apt update
sudo apt install librdkafka-dev
sudo apt install libcjson-dev
Python was only used for evaluating our experiments. Our application can still run without Python.
pip install kafka-python3
pip install matplotlib
- Clone the repository. If you need help, follow these instructions.
- Create a
config.sh
file from theconfig.sh.template
and add the paths from your system. Then place it in theshell-scripts/
directory. - In the
docker-app
folder you need a folderjobs
where the jars for both clusters will be stored when using Docker.
Note: Before starting the app, make sure that you have set the correct value (false
or true
) for the variable DOCKER_DEPLOYMENT
in the config.sh
file depending on what environment you want to work with.
From now on, it is only necessary to use two commands which are shown below:
./start.sh
calls a script that starts the whole system. The first time you execute it another flink directory for the second cluster will be created locally.
./stop.sh
stops everything (Flink, Kafka, Zookeeper, ...).
For our experiments, we utilized two different machines running Ubuntu 22.04 as the operating system. The first machine runs Ubuntu on a virtual machine using VirtualBox 5, while the second machine employs Ubuntu as the native operating system.
Parameter | Virtual machine | Native OS |
---|---|---|
RAM | 11 GB | 32 GB |
Processor | Intel i7-10750H CPU 2.60GHz × 6 cores | AMD Ryzen 5 5600 × 6 cores |
Storage | 70 GB | 1000 GB |
GPU | NVIDIA GeForce RTX 3060 6 GB | NVIDIA GeForce RTX 3070 8 GB |
The experiments were conducted on the machine with native Ubuntu as OS.
Try to change the sleep
value in shell-scripts.sh
. Increase the value if necessary. This problem may occur when the Kafka broker is still starting, but still not running fully and the user tries to connect to it.
The Flink clusters need some time to be up and running. Increase the sleep
value in shell-scripts.sh
if necessary.
Developers:
Supervisor: