As VoltDB is a specialized system, it can be better at many tasks than a more general system. The VoltDB architecture focuses on strong transactions, low latency, throughput and high availability. These features come at the expense of larger than memory data sizes, and OLAP-style long running analytics queries.
We know VoltDB is likely part of a broader infrastructure, and it needs to integrate well with other tools and platforms.
VoltDB approaches the integration problem first by supporting standards where possible.
- Connect to VoltDB with JDBC/ODBC or even a JSON/HTTP(S) interface.
- You can save snapshots to CSV/TSV.
- You can monitor with SNMP or JMX.
- Logs are log4j with standard config files.
But sometimes we need to build pipelines to get data in and out of VoltDB. For this, we have both multi-source ingestion and multi-target export. Like VoltDB, these features are fully fault-tolerant, with out-of-the box support for common systems and support for fully custom integrations with other tools.
Ingest and Bulk Loading
VoltDB has a bulk-loading framework that accepts big chunks of data, batches them up in groups of partitioned inserts, and feeds them into a table in a cluster.
Currently, we support CSV/TSV data using
csvloader, loading from RDBMSs using
jdbcloader, and bulk loading from Kafka clusters using
kafkaloader. Each run provides detailed logs, including lists of tuples that weren’t successfully inserted, either due to constraint violations or malformed-data.
The framework isn’t specific to the source and can be customized to bulk-load from anywhere.
Ingest and Continuous Importers
VoltDB importers connect streams of data to VoltDB tables or stored procedures, usually with no code.
VoltDB supports using importers to connect to Kafka, AWS Kinesis, or RabbitMQ. Let’s use Kafka as an example.
Using declarative configuration, VoltDB will connect to a specific Kafka cluster and do all the work needed to ingest from a Kafka topic into VoltDB tables or into a specific stored procedure.
The VoltDB cluster will divide up partitions of the topic amongst cluster nodes using internal consensus elections. These nodes will consume data, transactionally memoizing Kafka offsets as data is ingested. If one or more VoltDB or Kafka nodes fail, then a new election is held to re-assign topic partitions to nodes, and committed offsets are read to ensure all data is processed.
This is all done with a durable at-least-once guarantee. Combined with idempotent processing, you get effective exactly-once processing.
The result is less glue code that has to be fault-tolerant, and more value for your business.
You can add custom formatting code that is run for every message, say to convert from JSON to row data, or to decompress messages.
You can even write a fully custom importer to support a new streaming data source. We’ve used the same core APIs and code to support Kafka as Kinesis.
When data is changing thousands, or sometimes millions of times a second, getting data out of a system may require a bit of help. You can’t just select all data with a SQL query and see what’s new every few minutes.
VoltDB provides two tools. One is transactionally-consistent, point-in-time, global snapshots, and the other is VoltDB Export.
Rather than poll VoltDB for changed data, Export lets you push data from your event-processing logic into a downstream system. Example downstream systems include HDFS, analytical databases, message queues and distributed logs, or even HTTP APIs like AWS Simple Notification Service.
When making a decision in a stored procedure, you usually record a record of that decision in VoltDB state. Using Export, you can also push a record of that decision to another system. Having that record in another system is great for many reasons.
- Do your reporting and exploratory analytics on an analytical database.
- Train new ML models on that data in SparkML, then load models back into VoltDB to improve low-latency decisions.
- Push alerts into AWS Simple Notification Service so humans get notified of important events.
- Feed events into Kafka to be processed as part of a pipeline.
None of this is exclusive; you can push to Kafka and AWS SNS in the same transaction. All of it fault-tolerant. Records being transferred are acknowledged by the downstream system before they are dropped by VoltDB, and they are redundant and safe just like any other VoltDB data until then.
Finally, configuration is declarative and many hooks are provided for monitoring the state of the Export system. This is a hardened feature that’s been present since 1.0, and it’s in production for all kinds of demanding users.