Fast data is data in motion, streaming into applications and computing environments from hundreds of thousands to millions of endpoints – mobile devices, sensor networks, financial transactions, stock tick feeds, logs, retail systems, telco call routing, authorization systems, and more.
Systems and applications designed to take advantage of Fast Data enable companies to make real-time, per-event decisions that have direct, immediate impact on business interactions and observations. Fast data systems operationalize the learning and insights that companies derive from big data.
Big Data is data at rest. Big Data describes data’s volume – petabytes to exabytes - and variety: structured, semi-structured and unstructured data that has the potential to be analyzed for information. Big Data systems facilitate the exploration and analysis of stored, large data sets.
VoltDB supports Big Data analytical capabilities through ecosystem support for Big Data offerings based on Hadoop (Cloudera, MapR, Hortonworks) and data warehouse offerings from HPE Vertica, Teradata, IBM Netezza and others.
Fast data is different from Big data. Fast data has different requirements and uses a different technology stack that has the ability to analyze, decide, act on and extract value -- recommendations, decisions, and actions -- as fast as data arrives, typically in milliseconds.
Enterprises need a technology stack that not only is capable of ingesting and analyzing fast streams of incoming data, but also has the ability to enrich live streams of fast data with analytical insights gleaned from big data stores – all as fast data enters the pipeline.
Databases used to handle fast data and big data are generally grouped into two camps: online analytical processing systems (OLAP), and online transactional processing systems (OLTP). Let’s look at the differences between these two approaches to data management.
OLAP databases are geared towards analyzing data at rest: looking at data that’s been saved to understand trends, without taking immediate action. Examples of OLAP analytic output include:
None of these use cases involve making a change or update to the fast stream of data. Information is gathered for purely analytical purposes: what happened, when did it happen, why did it happen, and who caused it to happen. OLAP systems look for patterns, but aren’t architected to act on those patterns to change outcomes.
OLTP systems organize data as a series of records. OLTP systems are designed for fast record lookups using indexes. They provide a query/transaction model that allows applications to query and read/write record data coherently and consistently.
Databases in the online transactional processing (OLTP) category are geared towards transactional use cases. These can include request-response applications: does a mobile subscriber have enough minutes left to make a call? If the response is yes, they may let the call go through; if no, they end the call, or offer the subscriber a plan upgrade. These systems are operational in nature: they involve the dollars-and-cents decisions that can make or break a business.
Most businesses rely on a data pipeline that includes systems to handle fast and big data. These systems may be vastly different and not designed to work well together to manage the volume and variety of big data, as well as the velocity of fast data. Let’s look at an alternative: the fast/big data pipeline.
Big data systems are centered on a data lake or warehouse, a storage location in which an enterprise stores and analyzes its data. This component is a critical element of a data pipeline that must capture all information. The big data platform’s core requirements are to store historical data that will be sent or shared with other data management products, and also to support frameworks for executing jobs directly against the data in the data lake.
Fast data systems include a fast in-memory database component. These Fast data databases have a number of critical requirements, which include the ability to ingest and interact with live data feeds, make decisions on each event in the feeds, and apply real-time analytics to provide visibility into fast streams of incoming data.
The Fast data/big data pipeline supports fast incoming streams of live data created in a multitude of new end points. It operationalizes the use of that data in applications, and exports data to a data lake or data warehouse for deep, long-term storage and analytics.
The Fast/Big data pipeline unifies applications, analytics, and application interaction across multiple functions, products, and disciplines.
Applications are the main point of entry for data streaming into the enterprise. They are the initial collection point for data, and are responsible for interactions – personalized offers, decisions, updates to balances or accounts, adjustments to the distribution of power in an electrical grid.
Application interaction has the same characteristics as those described for fast data—it ingests events, interacts with data for decisions, uses real-time analytics to enhance the experience, and exports the data for storage and further analysis.
The application is both the organization’s and the consumer’s “interface” to the data. Applications are responsible for interaction. The greatest value from applications and the data they process comes with interactions that are accurately performed in real time. Fast data systems make better, faster real-time applications.
What’s needed to build a data-driven application that runs on streams of fast and big data? It comes down to four general requirements to get it right:
Ingesting streams of Fast Data isn’t enough. Remember, an application faces the stream of data, and the “thing” at the other end is usually looking for some form of interaction.
Applications need to act on each event, with the benefit of context, i.e., stateful, stored data. The ability to interact with the ingest/data feed means businesses can know what the customer wants, at the exact moment of his or her need.
Real-time analytics analyze streams of incoming data, per-event, at ingestion. Analytic results are used in real-time to guide application interactions.
Once Fast Data analytics are completed, the data moves through the pipeline for storage long-term analytical processing; data ingestion and export flow at the same rate.
Data-driven applications need to manage and drive value from fast-moving streams of data. Traditional database tools are too slow to ingest data, analyze it in real-time, and make decisions. They can’t meet fast data’s demands.
Successfully interacting with fast data requires a new approach to handling these new data streams. Several alternatives are available, which fall into three categories: fast OLAP systems (the province of Business Intelligence applications), stream-processing systems (Storm, Spark), and OLTP (database) systems.
Each of these solutions has its advantages, but some are better suited to fast data than others. Let’s evaluate their core strengths and weaknesses for the requirements of fast data applications.
Fast OLAP systems organize data to enable efficient queries across multiple dimensions of terabytes to petabytes of stored data. Where OLAP solutions fall short in fast data use cases are in response times – they’re typically running batch processes that take minutes or hours to complete – and in interactive transactional (multi-factor) decisions.
Streaming systems’ main purpose is to capture data. Unlike OLAP and OLTP systems, streaming systems are not optimized to store data, nor do they optimize for fast record lookup. And since they aren’t storing data, they are not optimized for scans across different dimensions of the data set. Instead, streaming systems are optimized for running computations across a “stream” of arriving events.
Fast OLTP systems offer per-event decision making (requiring ACID semantics and database transactions), real-time data enrichment, and streaming analytics as tools to build smart, fast applications.
Within OLTP systems, there are two types of architectures: traditional SQL systems and New/NoSQL systems.
Traditional SQL systems are disk-based systems that can be challenging to scale at the throughput required by today’s Fast Data requirements. These systems are general-purpose systems.
NewSQL and NoSQL solutions can provide the speed and availability required by Fast Data applications. Each comes with its own specialty. NoSQL systems trade query expressiveness (SQL) and schema for a flexible data model, low-latency lookup, and high availability. NewSQL solutions provide similar scalability but also offer the expressiveness of SQL queries, strong consistency, and high availability, while providing a strong schema contract.
Until recently, there hasn't been a viable option for combining the best of big data with the best of fast data while providing the familiarity and expressiveness of the SQL query language, the speed and scale of NoSQL, and the relational, transactional strength and consistency of traditional RDBMS. We created VoltDB to fill the gap.
Say goodbye to stitching together several products, and hello to simplicity:
It shouldn't take weeks to begin building blazing apps with real-time personalization and fast transactions. Developers: Download VoltDB and spin through our Quick Start Guide in less than 30 minutes.Download & Quick Start