VoltDB's new Connector for Kafka -- Tapping into the "Data Main" of the Enterprise

Fast Data -- some of our customers have told us that managing fast data is like drinking from a fire hose.

I find it interesting how often water is used when talking about data. Clive Humby, back in 2006, called data the new "oil", but I like the water metaphor -- water is a natural resource just like oil is, and it is much less damaging to the environment if you accidentally leave the spigot open.

The water simile is extended further by the corresponding big data piece of the data management picture, which is often called the data lake.

Managing all that data (water) has gotten easier with the wider adoption of Kafka. VoltDB is a really, really fast operational in-memory transactional database and our customers, whether in Telco, FinServ, Adtech or Online Gaming, have been using VoltDB as the database for their high-performance applications in this Fast Data space for many years, before Kafka arrived on the scene. But now more people and more applications need to access Fast Data, and Kafka is a great way to provide that wider access. In a way, Kafka is becoming like the water main of data.

Back in the early 2000s, I worked for a company that sold messaging software. We talked with customers about the need for data flow throughout an enterprise to be standardized through the use of an Enterprise Service Bus (ESB). A typical large enterprise, even back then, had lots of different places they got data (data sources) and many different applications and data management platforms that needed the data (data sinks). Enterprises would need to create different communications interfaces between each pair (source and sink), leading to what we belittling called "spaghetti interfacing" for reasons obvious to anyone looking at the architectural diagrams.

The ESB was an attempt at a solution for that interfacing challenge -- you created one ESB and all interfacing between data sources and data sinks was done through the ESB. If some source needed to send data to some application, you created a connector for the source to the ESB and for the application to the ESB. The communications could be either point-to-point (data is consumed by a single sink), using something called a queue as the middleman, or it could be a one-to-many where the data could be used by many downstream applications, which was accomplished using a publish/subscribe (pub/sub) model. The source would publish data to a particular topic. Any application that needed data from that source simply needed to connect to the ESB, through its own connector, and "subscribe" to the topic. Sources could dump data into multiple queues (or topics) and sinks could subscribe to any available queues/topics.

Sound familiar? If you are familiar with Kafka, it should -- Kafka has a very similar data access model.

Increasingly our customers are standardizing their IT system architectures to allow all their applications and downstream systems to be able to make use of Fast Data sources, and also are adding more and more Fast Data sources to feed those applications. Many of them have found Kafka to be very useful for allowing them to do both.

VoltDB introduced the ability to interface with Kafka back in v4.3 in May 2014 through VoltDB's own import and export framework, which is part of our core product. With the creation of Confluent's Kafka Connect framework, we decided to give our customers a choice -- we created a new connector, a Sink Connector, based on the new framework, and completed certification with Confluent.

Customers can use (or continue to use) VoltDB's Kafka import or export capabilities, or they can choose to use the new Kafka Connect Sink Connector. (We are considering adding a Source Connector in the future -- we are keeping a close eye on how our customers want to use VoltDB and Kafka together and will continue to make that combination more powerful and more useful.)

VoltDB continues to offer one of the fastest transactional databases in the world, allowing customers to make real-time decisions on their real-time data. Now, we're making it more convenient to use VoltDB with Kafka, allowing you to harness your fast data even easier.

by Dennis Duckworth