A streaming data pipeline system, sometimes called a data stream management system, is used to consume and make some sense out of a “stream”, that is, a continuous, flowing set of new data. Traditional streaming data systems were built to ingest fast-moving data feeds, but they lack context and don’t maintain state, which are both necessary for decision-making. Unlike OLAP and OLTP systems, streaming systems are not optimized to store data, produce fast lookups, or do any complex analyses on the data in the context of historical data — only to consume the data, apply some types of queries continuously, and gracefully pass processed data along to backend systems.
VoltDB enables applications to use real-time streaming data to enrich user experience, optimize interactions, and create value. The scale-out, SQL ACID-compliant architecture ensures data durability and provides standard application interfaces (JDBC) with broad ad-hoc query capability. Applications can take action on real-time, per-event data as it is streaming in, then export it to the long-term data warehouse or analytics store for reporting and analysis. VoltDB is a platform that offers real-time ingest capabilities to real-time applications, while supporting stateful buffering of the feed for downstream batch processing, meeting both sets of requirements.
VoltDB scales to “firehose” speeds and includes a built-in pipeline connector facility called “VoltDB Export”. You can use VoltDB Export to stream data that has been processed by VoltDB to downstream systems. We have export connectors for Hadoop (HDFS); major data warehouse such as Teradata and HPE Vertica; message queues like Kafka, Kinesis and RabbitMQ; and local file systems. Export data can be formatted as CSV, JDBC rows or Avro messages. The export connector API is open source and is easily extensible if you need to connect VoltDB to other downstream components or format data using a different serialization.