Introducing VoltDB v7.0
The VoltDB engineering team spent the past 12 months improving and extending our innovative database offering across all facets of the product. Now that we’ve turned the calendar to 2017, we’re ready to introduce you to our newest version of the fastest in-memory SQL operational database, VoltDB v7.0.
VoltDB v7.0 delivers predictable performance, 24×7, through operational events, all while transactionally processing streams of data at high rates. With v7.0 we’ve added new ways to stream data into VoltDB as well as improved the ability to interact with VoltDB, via new analytical SQL support. Additionally, we’ve made numerous improvements to the VoltDB command line to help with deploying VoltDB in auto-provisioning and container environments. Read on for more about VoltDB v7.0.
Modern applications and what they require from a database
Today, applications – and the infrastructure software that supports them – must be distributed, cloud-ready, highly available, responsive, automated and fast. Modern applications need to be agile, adaptable to fast-changing business models, all while processing thousands to hundreds of thousands of transactions a second in cloud-based applications as well as those hosted in bare metal servers.
In this context, do these application requirements seem familiar?
- Low, predictable latency, one or two milliseconds on average, with a focus on keeping 99.999% of responses occurring within an order of magnitude of the average.
- The ability to make decisions on flows of incoming data on a per-event basis, using historical insight to inform that decision.
- The ability to run globally, deploying geo-distributed active/active databases across regions.
- The ability to ingest and perform real-time analytics on fast moving streams of data, before passing that data through to the data lake.
These are the core requirements of modern Telco/Communication Service Provider (CSP), IoT, Financial, Gaming and Digital Ad Tech applications, in addition to numerous other emerging application domains. Yes, they’re tough requirements to deliver on a 24x7x365 basis, but they are the requirements VoltDB customers demand, and which VoltDB delivers.
With release v7.0, VoltDB strengthens its commitment to each of these requirements. The v7.0 release includes multi-datacenter database replication (XDCR), improved support for real-time analytics via materialized views based on table joins, enabling continuous queries over streams of data, improvements in high availability (HA), telco and IoT monitoring and health alert support in the form of SNMP traps, and numerous other performance improvements. For more details on v7.0, read on. If you’d like to get your hands on it right now, you can download it here.
Multi-region VoltDB Cross Datacenter Replication (XDCR)
In the v6.0 release, VoltDB introduced active-active, Cross-Datacenter Replication (XDCR), delivering the ability to have two active VoltDB databases replicate to each other, with conflict notification and resolution. Throughout the past year, we’ve continued to enhance this offering, adding the ability to replicate between clusters with different node counts or hardware types. We also allow you to replicate between different versions of VoltDB, providing the foundation to perform in-service upgrades to VoltDB with no maintenance window.
With the v7.0 release, VoltDB Database Replication now allows you to operate active copies of the database in three (3) or more locations, making it possible to support low-latency database interactions that otherwise would result in unacceptable latency when the database and the users are geographically separated. XDCR conflict detection and resolution introduced in v6.0 is supported for this new configuration. For more details on conflict resolution in VoltDB, please see the Using VoltDB chapter on Understanding Conflict Resolution.
Continuous Queries over Streams (Materialized Views over Table Joins)
Materialized Views form the foundation of real-time analytics in VoltDB. Use Materialized Views to define continuous queries on fast-changing data. Continuous queries will avoid costly from-scratch computation and use the cached (pre-computed) result for fast and scalable, response. VoltDB supports materialized views on individual tables or streams, and, new with the release of VoltDB v7.0, joins of multiple tables. The maintenance of materialized views is transparent, not requiring any configuration and tuning from the user. See the sample below.
CREATE VIEW V (REGION_ID, RECORD_COUNT) AS
COUNT(*) FROM REGIONS JOIN TAXI_LOCATIONS ON
GROUP BY REGIONS.REGION_ID;
For an in-depth look at this powerful new capability, please take a look at Ethan Zhang’s blog on continuous queries with joins.
SQL Window Functions
Window functions allow you to perform more selective calculations on statement results than you can do with plain aggregation functions such as COUNT() or SUM(). Window functions execute the specified operation on a subset, a window if you will, of the total selection results, controlled by the PARTITION BY and ORDER BY clauses. This capability is also helpful when computing real-time analytics over streams of data, perhaps moving time windows of data.
VoltDB v7.0 supports RANK, DENSE RANK, MIN, MAX, COUNT, and SUM as window functions. By way of a simple example, the voter sample has been simplified to make use of RANK when computing which contestant is ranked 1st (winning) in votes in each U.S. state:
SELECT state, contestant_number, num_votes
FROM ( SELECT state, contestant_number, num_votes,
RANK() OVER ( PARTITION by state
ORDER BY num_votes DESC ) AS vrank
FROM v_votes_by_contestant_number_state ) AS sub
WHERE sub.vrank = 1;
Increased High Availability
VoltDB is a distributed database designed to provide the highest levels of consistency and correctness in the face of many kinds of machine and network failures. Last summer we worked with Kyle Kingsbury to externally validate VoltDB’s resiliency under the brutal “Jepsen Test” he created (https://voltdb.com/jepsen).
In VoltDB 7.0, we’ve made changes to the way we replicate data and assign data to individual machines in a cluster to make VoltDB even more robust. Without impacting performance or correctness, VoltDB clusters with more than three nodes can now survive additional failures without losing availability or using more memory. The differences become even more stark as cluster size increases.
The bottom line for users is that your clusters will have less downtime without any configuration changes. Users who have chosen to run with triple redundancy on larger clusters may find they can achieve nearly the same level of fault-tolerance with single redundancy under the new scheme. And VoltDB 7.0 still fully passes Kingsbury’s Jepsen tests (read further for Jepsen details).
When you call someone on your cell, there’s a good chance a VoltDB application is approving the call – VoltDB is the operational database used by Communications Service Providers (CSP) around the globe. The health of many of these CSP systems is monitored with established SNMP monitoring and alerting systems. With v7.0, VoltDB provides a set of SNMP traps allowing tight integration into these systems. For a complete list of traps now supported by VoltDB, please see the VoltDB Administration Guide Monitoring chapter.
In addition to SNMP, VoltDB can be monitored by Nagios and New Relic, via our REST API, and also through the VoltDB Management Center (VMC).
VoltDB continued to expand its integrations over the past year. We now integrate with the AWS Kinesis Firehose through a new Kinesis importer and exporter. For an overview of this new capability, check out Peter Shaw’s summary here: https://www.voltdb.com/blog/connecting-voltdb-to-amazon-kinesis-streams
Though we’ve had a Kafka importer and exporter in the VoltDB product for quite a while now, we recently added support for loading data from Kafka to VoltDB by using the newly released Kafka Connect framework. Our new VoltDB Kafka Sink Connector has been certified by Confluent, the commercial entity that supports Kafka. You can find this Connector in our Github repository here: https://github.com/VoltDB/voltdb-kafka-connector
In the “You May Have Missed It” Department …
The Jepsen Test is a very important test in the field of distributed data stores. Jepsen, created and administered by “Breaker of Databases” Kyle Kingsbury, tests and validates (or invalidates as the case may be) the documented guarantees a database makes in the face of various failure modes, such as node failures, network partitions, etc. In the first half of 2016 VoltDB commissioned and worked closely with Kyle to run VoltDB through Jepsen. Issues were found, issues were fixed and VoltDB became a better product for this effort. As a result, we now run a rigorous Jepsen test suite nightly as part of our system tests. Check out the full set of vendor Jepsen tests here: https://aphyr.com/tags/jepsen. You can read Kyle’s findings on VoltDB here: https://aphyr.com/posts/331-jepsen-voltdb-6-3, as well as VoltDB’s summary of this effort here: https://www.voltdb.com/jepsen.
Download VoltDB v7.0
To sum up, we’ve spent the past year enhancing VoltDB to become more available geographically, and also in the face of failures of all types. VoltDB v7.0 delivers predictable performance, 24×7, through operational events, all while transactionally processing streams of data at high rates. With v7.0 we’ve added new ways to stream data into VoltDB as well as improved the ability to interact with VoltDB, via new analytical SQL support. Additionally, we’ve made numerous improvements to the VoltDB command line to help with deploying VoltDB in auto-provisioning and container environments. I encourage you to take a look at our feature list (“Changes Since the Last Release” chapter) in our release notes to get a full view of the improvements delivered over the past year.
Finally, thanks for reading! Please download VoltDB 7.0 at http://voltdb.com/download/software, and give it a try. We’d love your feedback. Feel free to send us a note at email@example.com and we’ll be sure respond.