Apache Kafka is an open-source distributed streaming system used for stream processing, real-time data pipelines, and data integration at scale. Originally created to handle real-time data feeds at LinkedIn in 2011, Kafka quickly evolved from messaging queue to a full-fledged event streaming platform capable of handling over 1 million messages per second, or trillions of messages per day.
Kafka has numerous advantages. Today, Kafka is used by over 80% of the Fortune 100 across virtually every industry, for countless use cases big and small. It is the de facto technology developers and architects use to build the newest generation of scalable, real-time data streaming applications. While these can be achieved with a range of technologies available in the market, below are the main reasons Kafka is so popular.
Capable of handling high-velocity and high-volume data, Kafka can handle millions of messages per second.
Scale Kafka clusters up to a thousand brokers, trillions of messages per day, petabytes of data, hundreds of thousands of partitions. Elastically expand and contract storage and processing.
Can deliver these high volume of messages using a cluster of machines with latencies as low as 2ms
Safely, securely store streams of data in a distributed, durable, reliable, fault-tolerant cluster
High Availability
Extend clusters efficiently over availability zones or connect clusters across geographic regions, making Kafka highly available and fault tolerant with no risk of data loss.