Apache Kafka | VoltDB
page-template-default,page,page-id-6710,page-child,parent-pageid-6224,mkd-core-1.0,highrise-ver-1.0,,mkd-smooth-page-transitions,mkd-ajax,mkd-grid-1300,mkd-blog-installed,mkd-header-standard,mkd-sticky-header-on-scroll-up,mkd-default-mobile-header,mkd-sticky-up-mobile-header,mkd-dropdown-slide-from-bottom,mkd-dark-header,mkd-header-style-on-scroll,mkd-full-width-wide-menu,mkd-header-standard-in-grid-shadow-disable,mkd-search-dropdown,wpb-js-composer js-comp-ver-6.0.5,vc_responsive
VoltDB / Product / Kafka


VoltDB Kafka Importer

VoltDB has provided Kafka support in multiple releases. The VoltDB Kafka importer, given the Kafka topic name from which to consume data and a destination table name in VoltDB, will automatically import data as it arrives. Since the Kafka importer is an internal, continuously-consuming service, you can set up the importer to import to staging and production database instances from the same Kafka cluster.

Coupling VoltDB’s high-velocity database technology with the fast ingestion of Kafka makes fast data more actionable. Using a Kafka importer with VoltDB bridges gaps Kafka has in the areas of data extraction, integration, processing and analytics.

VoltDB Kafka Connect Sink Connector

If you prefer to run your connectors within your Kafka environment, you can instead use the Confluent-certifed VoltDB Kafka Connect Sink Connector (listed here) to import data into VoltDB from Kafka.

How VoltDB Kafka Export Works

VoltDB can use Kafka to export data at high speed. Developers can specify certain tables in the schema as sources for export. At runtime, any data written to the specified tables is sent to the VoltDB Export Connector, which queues the data for export, then sends it to the selected output target. VoltDB provides connectors for exporting to files, for exporting to other business processes, and for exporting to a distributed message queue such as Kafka. The VoltDB Kafka Export Connector writes export data to a Kafka distributed message queue, where one or more other processes can read the data.

What is Kafka, and What Does it Bring to In-memory Databases like VoltDB?

Kafka is a persistent, high performance message queue developed by the folks at LinkedIn and contributed to the Apache Foundation. Kafka is highly available, partitions (or shards) messages, and is simple and efficient to use. Great at serializing and multiplexing streams of data, Kafka provides “at least once” delivery, and gives clients (subscribers) the ability to rewind and replay streams.

Kafka is one of the most popular message queues for streaming data, in part because of its simple and efficient architecture, and also due to its LinkedIn pedigree and status as an Apache project. Because of its persistence capabilities, it is often used to front-end Hadoop data feeds.

VoltDB can ingest, transact and provide event-based decisions on high velocity data feeds, making a Kafka loader extremely interesting in the big data/fast data application space. With the Kafka loader, VoltDB can subscribe to topics and transact on incoming messages, even faster than Kafka can deliver! This capability allows VoltDB applications to process and make decisions on data the moment it arrives, rather than waiting for business logic to batch-process data in the Hadoop data lake.

Like VoltDB, Kafka is typically run in a clustered topology. Kafka partitions and maintains messages in broker nodes as logs. A typical Kafka cluster deployment is shown below:

Kafka Use Cases

Unlike traditional message queues, Kafka can scale to handle hundreds of thousands of messages per second, thanks to the partitioning built in to a Kafka cluster. Kafka can be used in the following use cases (among many more):

  • Messaging
  • Log aggregation
  • Stream processing
  • Event sourcing
  • Commit log for distributed systems.

Why Choose a Kafka Importer?

Kafka, though suitable for the above use cases, is inherently dependent on what producers are producing and consumers are consuming. It is often used as a component in complex solution stacks, such as the ZFSC and SMACK stacks, as it needs complementary products for the following areas:

  • Data integration
    Data consumed needs to be processed. For any use case that requires downstream availability of data combined with high ingest rates, an in-memory database such as VoltDB provides an ideal integration.
  • Real-time data processing
    Data in Kafka comes in a variety of shapes and sizes, making analysis difficult without extraction to mature analytics/processing products. VoltDB is able to ingest data fast – at rates measured over 3 million transactions a second – at the speed it arrives. This makes VoltDB well-suited for real-time data processing and decisioning for vast amounts of rapidly-moving data.
  • Easy access to data in various SQL-like or fast serving use cases
    Once data is transacted on and organized, VoltDB, with its rich interface to SQL tables and views, makes it easier to write applications. Applications that aggregate metrics and counters, for example, are good examples of how VoltDB makes data more meaningful and actionable.

Learn more about real-time decisions with VoltDB and Apache Kafka when you listen to our pre-recorded webinar.