Apache Hadoop is an open source Big Data framework and ecosystem that enables distributed processing of large data sets across clusters of computers.
Using Hadoop and Big Data with VoltDB
VoltDB serves as a real-time application database used in conjunction with Hadoop and analytical results derived from Hadoop and big data in applications including real-time scoring, policy enforcement, and customer interaction. VoltDB provides the ability to ingest data as fast as it arrives; perform real-time analytics in-memory; make automated decisions in real time; and continuously pass, or export, processed data into Hadoop.
A Hadoop data pipeline with VoltDB is shown below:
VoltDB provides support for high-velocity export of processed data via a built-in, transactional extract feature. VoltDB Export feeds processed data to HDFS/Hadoop. Application developers can automate the export process by specifying tables in the schema as sources for export. At runtime, any data written to the specified tables is sent to an export connector, whose job it is to move these tuples to the export target safely and with the lowest possible latency. VoltDB provides connectors for export to files (CSV); via WebHDFS to Hadoop; via data serialization and exchange services such as Avro; and for export to other relational databases via JDBC. For more on Kafka connectors for VoltDB, click here.
VoltDB, the HTTP connector and WebHDFS
VoltDB’s connector to Hadoop receives serialized data from Export tables and writes it out to Hadoop via HTTP requests to WebHDFS.
The VoltDB HTTP connector is a general-purpose export utility that can export to any number of destinations from simple messaging services to more complex REST APIs. The properties work together to create a consistent export process.
The HTTP connector contains optimizations to support exporting data to Hadoop via the WebHDFS protocol. Developers can choose between two formats for export data when using WebHDFS: comma-separated values (CSV) and Apache Avro format. By default, data is written as CSV data; however, developers can choose to set the output format to Avro by setting the type property. Avro is a data serialization system that includes a binary format that is used natively by Hadoop utilities such as Pig and Hive. Because it is a binary format, Avro data takes up less network bandwidth than text-based formats such as CSV.
VoltDB with Hadoop and big data provides developers with a closed-loop system to deliver full visibility into an organization’s data, enriching vast incoming streams of event data with historical analytics to support business decisions. Read more about VoltDB WebHDFS.
VoltDB offers a broad set of Big Data ecosystem integrations, certifications, industry partnerships and connectors to enable high-speed data export to Hadoop-based data warehouses and long-term analytics stores such as HPE Vertica, Teradata, and IBM Netezza.
VoltDB Big Data integrations enable developers to take advantage of the speed and cyclical nature of the import-export data pipeline.