When learning about VoltDB for the first time, people often ask how VoltDB executes transactions in a distributed environment. So, here’s how…
Let’s say a developer, Dan, is working on a new website. He’s decided to use VoltDB as part of his data management layer. One of Dan’s users requests a dynamically generated page and the code that generates that page sends a request to Dan’s VoltDB cluster. The excitement begins.
Dan’s code asks the VoltDB client library to execute his procedure named “GetUserProfile” and provides the user’s handle, “dbNerd”, as a parameter. The client library maintains persistent connections to the three nodes (servers) in Dan’s VoltDB cluster. If Dan’s cluster had 48 nodes, the client library might only connect to a subset, but the default behavior is to round robin requests to all connected nodes. Since the previous request was sent to node zero, this next request is sent to node 1. Our wire protocol document (PDF) describes the actual bytes sent.
So the request arrives at Node 1. Its first stop is a VoltDB component called the Initiator. There is one Initiator at each node and they are responsible for managing client connections and routing work to the parts of VoltDB that store and manipulate data. The Initiator at node 1 handles this client request. First, the request is assigned a transaction id. It is now officially a VoltDB transaction.
Second, the parameters are examined to determine where the data for the transaction is. VoltDB knows that the first parameter matches up with column “userhandle” in table “profiles” and that this column is the partition key for the table. This information is stored in the VoltDB Application Catalog that was loaded when Dan started the VoltDB cluster. So based on a hash of the string, “dbNerd”, the Initiator knows the user’s profile data is stored in partition 1. Also, it knows copies of partition 1 are stored at sites 1 and 4. Finally, it knows sites 1 and 4 are located at nodes 0 and 2. The Initiator forwards the transaction information to both relevant sites on the two nodes.
Site 1 and site 4 both receive the message from the initiator. They then each do the same work in parallel. Since each individual site performs transactions one at a time, the transaction information is put into a queue at site 1 and a queue at site 4. Work is pulled from the queues at each site and executed one at a time. These queues are actually quite special, as they ensure all transactions are run in a single global ordering. Another blog post will explain how ordering works in more detail.
Each site has an VoltDB Execution Engine (EE) that stores relational data and executes pre-optimized SQL plans. At both sites, when this specific transaction to get the user profile is released from the queue, its stored procedure code is run with the “dbNerd” value as a parameter. Any SQL calls inside the stored procedure will invoke the local EE. Ultimately the stored procedure will complete and the resulting data will be handed off to the site code. Both sites send the resulting data back to the Initiator at node 1.
The initiator keeps track of all outstanding connections, and it knows that for this transaction, it expects two responses. Since VoltDB did identical work at two sites, both should return the same data. VoltDB is paranoid about data consistency, so it will ensure the data returned from both sites is identical. Once the data checks out, the initiator returns the results to the client.
The response is received by Dan’s page-generation code, and the result is used to generate a page.
Hopefully this post answers some questions about VoltDB architecture; I’m sure it probably raises others… What about transactions that require data from multiple partitions? How (and why) does VoltDB ensure a global ordering of transactions? What makes VoltDB faster than other in-memory databases? How well does this architecture scale? We plan to address as much as we can in coming posts, but if you’re impatient, there’s a lot of discussion going on in our forums, and we can sometimes be found in #voltdb on irc.freenode.net.