As the volume and velocity of data grows, so do the challenges of building fast data applications. The fast data stack is emerging across both verticals and industries alike for building applications that process these high velocity streams of data that quickly accumulate in a big data lake.
This new stack, the fast data stack, has a unique purpose: to grab real-time data and output recommendations, decisions and analyses in milliseconds. Over the next several years this emerging fast data stack will gain prominence and serve as a starting point for developers writing applications for streaming data.
An ACID-compliant operational database like VoltDB, in combination with message queues like Kafka or Kinesis for data ingestion and export, can process each incoming event or request as a discrete transaction for analytics, decisions and real-time action or interaction.
Fast Streaming Data use cases require a fast data stack that performs three major functions: ingest, analyze, and act.
Data ingestion is the first stage in the fast data stack. The job of ingestion is to interface to inbound streaming data sources, and to accept and transform or normalize incoming data. Ingestion marks the first point at which data can be transacted against, applying key functions and processes to produce value from the data, including insight, intelligence, and action.
As data is ingested, it is used by one or more analytic and decision engines to accomplish specific tasks on the streaming data. The challenge for the analysis and decision-making portion of the fast data stack is to keep pace with the velocity of the data stream. Streaming analytics need to consume high-velocity data while maintaining real-time analytics, in the form of counters, aggregations and leaderboards.
Real-time decisions are used to influence the next step of processing. Real-time decision engines are doing a lot of work, in that they consume the velocity of the data stream and at the same time are processing complex logic, all in time to complete the real-time decision feedback loop.
Unlike traditional databases, new in-memory OLTP databases like VoltDB can process streams of data and produce analyses and decisions in milliseconds. As a single integrated platform, the VoltDB in-memory OLTP database reduces the complexity of building fast data applications by eliminating the need to connect streaming systems and non-relational data stores, and also provides a familiar, proven interaction model (SQL), simplifying application development and capturing real-time analytics using industry standard SQL-based tools.
One action many companies take with the data is to send it to a long-term data storage platform, whether a data lake or a data warehouse, where it can be explored along with all the other historical data once fast data analytics are completed. Whether enterprises use Vertica, Teradata or Hadoop – maybe with Spark – they want to use that large dataset to help them with business insights. Data exported from the fast data pipeline can be used to build new models or refine existing insights; those models can then be applied to the fast data engine for execution against newly-arriving streams of data.
VoltDB is a unique combination of a fast in-memory OLTP database that supports multi-source ingestion and multi-target export to perform analytics on incoming streams of data, while also managing large volumes of transactions on live data, all in real time.