Photo by Mark Decile on Unsplash

Processing streaming events with Apache Kafka

The Streamer

Article from ADMIN 61/2021

By Kai Wähner

Apache Kafka reads and writes events virtually in real time, and you can extend it to take on a wide range of roles in today's world of big data and event streaming.

Event streaming is a modern concept that aims at continuously processing large volumes of data. The open source Apache Kafka [1] has established itself as a leader in the field. Kafka was originally developed by the career platform LinkedIn to process massive volumes of data in real time. Today, Kafka is used by more than 80 percent of the Fortune 100 companies, according to the project's own information.

Apache Kafka captures, processes, stores, and integrates data on a large scale. The software supports numerous applications, including distributed logging, stream processing, data integration, and pub/sub messaging. Kafka continuously processes the data almost in real time, without first writing to a database.

Kafka can connect virtually to any other data source in traditional enterprise information systems, modern databases, or the cloud. Together with the connectors available for Kafka Connect, Kafka forms an efficient integration point without hiding the logic or routing within the centralized infrastructure.

Many organizations use Kafka to monitor operating data. Kafka collects statistics from distributed applications to create centralized feeds with real-time metrics. Kafka also serves as a central source of truth for bundling data generated by various components of a distributed system. Kafka fields, stores, and processes data streams of any size (both real-time streams and from other interfaces, such as files or databases). As a result, companies entrust Kafka with technical tasks such as transforming, filtering, and aggregating data, and they also use it for critical business applications, such as payment transactions.

Kafka also works well as a modernized version of the traditional message broker, efficiently decoupling a process that generates events from one or more other processes that receive events.