Performance comparison between the two data-streaming platforms.
Here is explained step by step the installation of Kafka and RabbitMQ for Arch Linux with Docker.
Note: You only really need to install docker and then run the docker compose to have both Kafka and RabbitMQ ready to use. Alternativlely, when you run make, the docker images is also downloaded and the containers started.
The following steps are subtracted from the arch linux instalation guide on the Docker docs.
wget https://download.docker.com/linux/static/stable/x86_64/docker-27.4.0.tgz -qO- | tar xvfz - docker/docker --strip-components=1
mv ./docker /usr/local/bin
It can be found on the release notes
wget https://desktop.docker.com/linux/main/amd64/178034/docker-desktop-x86_64.pkg.tar.zst -qO-
sudo pacman -U ./docker-desktop-x86_64.pkg.tar.zst
This steps can be found on apache kafka docker docs To start the broker execute the following command (the image will be automatically pulled):
docker run -d --name kafka-broker apache/kafka:latest
But it can also be done from Docker Desktop.
For testing the instalation 2 terminals are needed. One for the publisher and one for the subscriber.
For opening a terminal a broker terminal execute:
docker exec --workdir /opt/kafka/bin/ -it kafka-broker sh
For creating a topic execute this on one of the broker terminals:
./kafka-topics.sh --bootstrap-server localhost:9092 --create --topic example_topic
For the publisher, type in one of the terminals the following command:
./kafka-console-producer.sh --bootstrap-server localhost:9092 --topic test-topic
And then for the subscriber type this on the other:
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --from-beginning
As we write in the publisher terminal we should be seeing the same text on the subscriber one.
In the directory tests/kafka
there are python scripts to check that all the installation is correct.
For deleting a topic execute this on a broker terminal:
./kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic test-topic
For deleting the container, type this on a normal terminal:
docker rm -f kafka-broker
Like the previous case its only necesary to install the following command:
docker run -d --name rabbitmq-broker -p 5672:5672 -p 15672:15672 rabbitmq:3
The testing is inside the directory tests/rabbitmq
.
In one terminal execute the subscriber:
python3 tests/rabbitmq/rabbitmq_subscriber.py "#"
In the other terminal execute the publisher:
python3 tests/rabbitmq/rabbitmq_publisher.py
A Hello World
should appear on the subscriber terminal.
Like Kafka, for deleting the container type:
docker rm -f rabbitmq-broker
On the base directory, execute the following command:
docker compose up -d
This will initialize both kafka and rabbitmq.
For removing the containers:
docker compose down
Note: this is executed by the "execute_all.bash" script, so if you are going to use it, there is no need to do it manually.
Be ensure to do all the docker installation steps and to have installed the python modules inside "requirements.txt". Then type the following command:
make
The performance measures executions and graphs will be executed.
The Executer class handles the initialization and meassuring of time, from wich inherits the MainExecuter, this class will generate the data of the publisher and implements the iteration and execution of the executer that will call the publisher and the subscriber. This two are called in the same way as the main one, with the exception that they will do all the iterations indicated by num_execs
and that they will do a different action (publish on a topic or read its content). All processes are created with the python multiprocessing module.
The key measure that it is used to analyze both platforms is the size_msg
parameter (the number of characters in a string message). This measure is complemented with incremental
option, wich makes messages follow the size
Subscriber measures have been taken having in account the time to read all the messages once the first is disponible. In the publisher the measures go from the send of the first message to the acknowledge that the subscriptor had recived the last. And the main executer measures the time from both plus the initialization.
The parameters description can be seen by executing the following command:
python3 src/main_executer.py -h
The only two parameters that are needed to clarify (because of their similitud) are:
-
--iteration-size: The number of messages that are sent for each iteration. Meaning that all this messages will be measured as one unit. The default is
$10$ . -
--number-iterations: The total number of measures that will be done with the same parameters. The default is
$100$ (with them the mean is calculated to generate more precise graphs)
![]() |
![]() |
As it can be seen on the graphs (specialy on the incremental), there is some critical points, in wich Kafka increments the time of execution. This is probably due to Kafka's topics partitioned nature. This critical parts are probably moments where Kafka is allocating more memory in the partition of the topic (or creating new partitions for that same topic) because the messages are longer.
![]() |
![]() |
Here the difference previously commented is more notable. As it can be seen, RabbitMQ time grows exponentially. This is probably because the queue of messages it's getting congested when the size of message grows. In contrast, Kafka is faster when the partition is already allocated.
![]() |
![]() |
Given the results of the subscriber, the theory of the partitions memory allocation gains strength, because the subscriber operation only retrieves the information that it's probably cached.
While Kafka has a better improvement in big messages, RabbitMQ have the advantage that there is no partition allocation needed, so it doesn't have that penalization at the first iterations. On the other hand, RabbitMQ's queue could be a drawback when the message is to big.
Having this in mind, for short messages that doesn't require persistence (such as social media notifications) RabbitMQ could be a much more interesting choice. But in paradigms that are big data related or need that the time of publishing-consuming stays mostly the same, without being affected, Kafka it's the better choice.
├── data
│ ├── KafkaExecuter.csv
│ ├── KafkaExecuter_publisher.csv
│ ├── KafkaExecuter_subscriber.csv
│ ├── RabbitMQExecuter.csv
│ ├── RabbitMQExecuter_publisher.csv
│ └── RabbitMQExecuter_subscriber.csv
├── docker-compose.yml
├── graphs
│ ├── general_execution_incremental.png
│ ├── general_execution_not_incremental.png
│ ├── publisher_execution_incremental.png
│ ├── publisher_execution_not_incremental.png
│ ├── subscriber_execution_incremental.png
│ └── subscriber_execution_not_incremental.png
├── LICENSE
├── makefile
├── README.md
├── requirements.txt
├── scripts
│ ├── execute_all.bash
│ └── generate_graphs.py
├── src
│ ├── Executer
│ │ ├── Executer.py
│ │ ├── __init__.py
│ │ ├── KafkaExecuter
│ │ │ ├── __init__.py
│ │ │ ├── KafkaExecuter.py
│ │ │ ├── KafkaPublisherExecuter.py
│ │ │ └── KafkaSubscriberExecuter.py
│ │ ├── MainExecuter.py
│ │ └── RabbitMQExecuter
│ │ ├── __init__.py
│ │ ├── RabbitMQExecuter.py
│ │ ├── RabbitMQPublisherExecuter.py
│ │ └── RabbitMQSubscriberExecuter.py
│ └── main_executer.py
└── tests
├── kafka
│ ├── kafka_admin.py
│ ├── kafka_publisher.py
│ └── kafka_subscriber.py
└── rabbitmq
├── rabbitmq_publisher.py
└── rabbitmq_subscriber.py
- data: Data files are stored there by default.
- docker-compose.yml: Docker compose that initializes Kafka and RabbitMQ.
- graphs: Graphs are stored there by default.
- LICENSE: License file.
- makefile: Make file to execute all.
- README.md: Markdown file with the documentation.
- requirements.txt: Requirements for the python porgram.
- scripts: Scripts folder.
- src: Source code folder.
- tests: Tests code folder.