What is Kafka, and What Does it Bring to In-memory Databases like VoltDB?
Kafka is a persistent, high performance message queue developed by the folks at LinkedIn and contributed to the Apache Foundation. Kafka is highly available, partitions (or shards) messages, and is simple and efficient to use. Great at serializing and multiplexing streams of data, Kafka provides “at least once” delivery, and gives clients (subscribers) the ability to rewind and replay streams.
Kafka is one of the most popular message queues for streaming data, in part because of its simple and efficient architecture, and also due to its LinkedIn pedigree and status as an Apache project. Because of its persistence capabilities, it is often used to front-end Hadoop data feeds.
VoltDB can ingest, transact and provide event-based decisions on high velocity data feeds, making a Kafka loader extremely interesting in the big data/fast data application space. With the Kafka loader, VoltDB can subscribe to topics and transact on incoming messages, even faster than Kafka can deliver! This capability allows VoltDB applications to process and make decisions on data the moment it arrives, rather than waiting for business logic to batch-process data in the Hadoop data lake.
Like VoltDB, Kafka is typically run in a clustered topology. Kafka partitions and maintains messages in broker nodes as logs. A typical Kafka cluster deployment is shown below:
Kafka Use Cases
Unlike traditional message queues, Kafka can scale to handle hundreds of thousands of messages per second, thanks to the partitioning built in to a Kafka cluster. Kafka can be used in the following use cases (among many more):
- Log aggregation
- Stream processing
- Event sourcing
- Commit log for distributed systems.
Why Choose a Kafka Importer?
Kafka, though suitable for the above use cases, is inherently dependent on what producers are producing and consumers are consuming. It is often used as a component in complex solution stacks, such as the ZFSC and SMACK stacks, as it needs complementary products for the following areas:
- Data integration
Data consumed needs to be processed. For any use case that requires downstream availability of data combined with high ingest rates, an in-memory database such as VoltDB provides an ideal integration.
- Real-time data processing
Data in Kafka comes in a variety of shapes and sizes, making analysis difficult without extraction to mature analytics/processing products. VoltDB is able to ingest data fast – at rates measured over 3 million transactions a second – at the speed it arrives. This makes VoltDB well-suited for real-time data processing and decisioning for vast amounts of rapidly-moving data.
- Easy access to data in various SQL-like or fast serving use cases
Once data is transacted on and organized, VoltDB, with its rich interface to SQL tables and views, makes it easier to write applications. Applications that aggregate metrics and counters, for example, are good examples of how VoltDB makes data more meaningful and actionable.
Learn more about real-time decisions with VoltDB and Apache Kafka when you listen to our pre-recorded webinar.