Streaming Data Pipeline

A streaming data system is traditionally used to handle a “stream” of arriving events. These systems are built to ingest fast-moving data feeds, but they lack context and state, which are necessary for decision-making. Unlike OLAP and OLTP systems, streaming systems are not optimized to store data or produce fast lookups.

VoltDB enables applications to use real-time streaming data to enrich user experience, optimize interactions, and create value. The scale-out, SQL ACID-compliant architecture ensures data durability and provides standard application interfaces (JDBC) with broad ad-hoc query capability. Applications can take action on real-time, per-event data as it is streaming in, then export it to the long-term data warehouse or analytics store for reporting and analysis. VoltDB is a platform that offers real time ingest capabilities to real-time applications, while supporting stateful buffering of the feed for downstream batch processing, meeting both sets of requirements.

VoltDB scales to "firehose" speeds and includes a built-in pipeline connector called "VoltDB Export". You can use VoltDB Export to stream data that has been processed by VoltDB to downstream systems. We ship export connectors to Hadoop (HDFS), HPE Vertica, message queues like Kafka and RabbitMQ and local file systems. Export data can be formatted as CSV, JDBC rows or Avro messages. The export connector API is open source and is easily extensible if you need to connect VoltDB to other downstream components or format data using a different serialization.

Sample Apps

Check out our sample apps to see VoltDB running in action

Sample Apps

White Paper Download

Fast Data Application Requirements for CTOs and Architects

Learn More

Documentation

Export is automatic and asynchronous. Learn more in our documentation

Documentation

Ingest

Connect to VoltDB using a native VoltDB driver, by POSTing requests directly over HTTP or by using one of the pre-built loaders to connect to an existing data source.

Native Drivers Loaders
C++ Kafka
Java CSV
PHP JDBC
Python SQL Command Tool
C# Hadoop OutputFormat
JDBC  
Others: see Clients and Drivers Vertica UDX 

Example: Use the VoltDB Kafka Loader to feed events from Kafka to VoltDB

More on the VoltDB Ecosystem

VoltDB has two loaders available. The VoltDB Kafka Loader makes it easy for VoltDB to ingest streams of data from Kafka message queues. Additionally the VoltDB JDBC Loader facilitates loading data into VoltDB from relational data stores, allowing for efficient retrieval of all records from the specified table in a remote database. This can then be inserted into a matching table in VoltDB. This pattern is often used to install result data, such as computed user segmenting used for digital ad network applications, into your fast data pipeline.

VoltDB also includes a Hadoop OutputFormat implementation to import job data from Hadoop into VoltDB. This comes in handy when performing historical analysis and capturing the intelligence needed when making real-time decisions. For example, a digital ad decision-making application may use a user/household segmentation data set, computed in Hadoop, to target different demographics with specific ads. 

This output format is used by our Hadoop connectors for both Apache Pig and Apache Hive. The connectors can be found in our github repository here.

VoltDB also built a Vertica UDx, a user-defined extension, that takes a Vertica result set and loads it into VoltDB. You can download the UDx here.

Analyze and Decide

VoltDB processes each incoming event or request as a discrete ACID transaction. A transaction can be one or more SQL statements pre-defined in a DDL file, an ad-hoc SQL statement issued by the application, or a combination of SQL and Java encapsulated in a VoltDB stored procedure.

Use VoltDB’s scale-out, shared nothing architecture, fault tolerant, ACID, and SQL database capabilities to:

  • Filter duplicate events
  • Sessionize real-time click streams
  • Enrich (denormalize) incoming data using reference tables
  • Classify events in real time using analytics from Hadoop/Warehouses

Export

Solving data-at-scale problems requires using multiple tools together — “one size does not fit all.” VoltDB embraces this point of view and includes a native, high performance integration interface: VoltDB Export. Use VoltDB to process discrete events in real time (thousands to millions of events per second with per-event responses in milliseconds). Connect VoltDB processed data feeds to downstream pipleline components like Hadoop, Kafka or enterprise data warehouse (OLAP) systems using VoltDB export.

VoltDB export is fully parallel - it gets faster as you add nodes to the cluster. Export is fault tolerant, implementing at-least-once delivery to the downstream system. (Optional unique row identifiers handle possible duplicates in case of fault processing.) If the connection between VoltDB and the destination is broken or interrupted, VoltDB buffers export data to local disks until export can resume ensuring durability.

Common Export Use Cases
Enrich incoming events with static metadata or computed aggregates and export Avro data from VoltDB to Hadoop (HDFS).
Evaluate conditions and rules for each incoming event and export alerts or notifications to Kafka, RabbitMQ, or Amazon SNS.
Batch incoming data to local flat files (CSV) for collection.
Filter duplicate events and send unique rows to HPE Vertica or another OLAP DB.
Re-assemble split messages using VoltDB and stream the transformed feed to SparkML.

Export is simple to use. Declare a VoltDB table as “EXPORTED” in the VoltDB DDL file and all rows inserted in to that table are handed to the export connector. Connectors serialize content to the required format (CSV, Avro, JDBC) and push it downstream to the destination system.

Example Export Configuration


Developer Resources:

There are numerous resources available to developers.

Get Connected:

  • Developer Central

    One centralized place with all developer resources. Go There

  • A Look at a VoltDB Sample App

    In this blog, John Hugg walks us through a sample app in VoltDB. Read More

  • How VoltDB Works

    Take a simple dive into the VoltDB structure. Read More

  • Build a Sample App

    After Downloading VoltDB, here's a tutorial in building a sample app. Dive In

icon-1.png

Get Started Today

It shouldn't take weeks to begin building blazing apps with real-time personalization and fast transactions. Developers: Download VoltDB and spin through our Quick Start Guide in less than 30 minutes.

Download & Quick Start