Database Super Friends
In 2007, the research team behind the Aurora Complex Event Processing system (commercialized as Streambase) and the C-Store Analytics (OLAP) system (commercialized as Vertica) turned their attention to the problem of building an operational database for the 21st century.
Led by Turing Laureate Mike Stonebraker, the academic team behind the original VoltDB research included Dan Abadi, Pat Helland, Sam Madden, Andy Pavlo, and Stan Zdonik. The team, at this writing, has more than 150 years of collective database experience.
As with their previous projects, the team focused its research on two basic ideas:
How has computing changed in the 21st century and how can we leverage that to build a better system?
How can we make the system better by adding focus: if our goal is not to solve every problem in the world, but to focus on operational stores, how much faster can we go?
Memory, Memory, Memory
The team’s first key insight was that memory was getting cheaper faster than operational stores were growing. When systems such as MySQL were designed, memory was costly, and used mostly as a sliding window over slower, disk-encumbered state. Today, many operational datasets fit in memory, and tomorrow even more will. While analytical datasets are exploding in size, the operational systems of the future will be memory-centric.
But there was a catch. While memory was 100 times faster than SSDs and 10,000 times faster than spinning disks, in-memory datastores weren’t similarly faster. RDBMSs built to be in-memory disappointed with less than 10X performance increases. Experimenters who backed traditional systems like MySQL with a RAM-based filesystem were often similarly disappointed. What was holding back these systems?
To find out, the researchers did something rather clever. They took an open source RDBMS that followed traditional RDBMS architecture, ran it on a memory-backed filesystem, and measured where it spent its time. The research showed that traditional databases spent less than 10% of their time doing actual work; most of the time was spent in two places.
Page Buffer Management: The page buffer system assigns database records to fixed-size pages, organizes their placement within pages, and then manages which pages are loaded into memory and which are on disk. Additionally, pages must be tracked as dirty or clean as they are read and written to. This entire subsystem exists to manage the storage of data on disk; it does not add value to an in-memory system.
Concurrency Management: The database must solve two concurrency problems. First, there exists a logical problem that multiple user transactions operating concurrently must not conflict, and must read consistent data. This is accomplished using high-level locks on resources like tables and rows, and tracking those locks across transactions. At the same time, the actual database software has multiple threads of execution. Data structures must be thread safe, which adds execution cost. For example, there is a large body of research on building concurrent B-Tree indexes, but the takeaway is that there is tremendous complexity for only moderate speed increases.
Most memory-centric systems address the page buffer, but few are able to drastically reduce concurrency costs. In fact, as other parts of a system get faster, the concurrency problem actually gets worse; competing processes just get to contention points faster.
The conclusion from the research was clear: to go 10X faster, the researchers needed a radical new approach to concurrency.
Horizontal Scale Out
Beyond memory, the second major shift is the move to horizontal, scale-out systems. Moore’s law has been used as justification to give us more cores, rather than faster individual cores. Likewise, many small machines can be more effective and efficient than one really large machine. Clustering on commodity hardware is expected and hardware fault-tolerance is table stakes.
Horizontal scaling has been very difficult for traditional RDBMSs. Many are of the opinion that the relational model can’t scale. Data movement is an issue; executing arbitrary joins can mean tremendous data movement, and commodity networking can present bottlenecks. Moreover, as locks are held over network round trips, concurrency becomes an even bigger problem.
But the team had hope. The C-Store system showed the relational model could be scaled for analytical SQL. The researchers conjectured that an architecture with the same domain-specific assumptions would work for operational stores as well.
VoltDB was born.