My name is Zheng Li, a UMass Lowell graduate student. I spent the summer as an internship at VoltDB. Over the summer, I primarily worked on two VoltDB features, both related to query plans. The features are:
- Explain plan command
- Planner testing tool
A query plan is an ordered set of steps used to access or modify information in a SQL relational database management system (see http://en.wikipedia.org/wiki/Query_plan). Understanding query plans is important because the first plan chosen to execute will directly affect the query execution time. Both of these features were developed so that performance considerations related to query execution could be understood more easily by VoltDB users. The following sections present an overview of using these two new features.
Explain Plan Command
There is a new system stored procedure Explain. As the name suggests, this VoltDB command enables a user to see the execution plan of a SQL query. Specifically, it takes a SQL query as input, plans it in VoltDB’s planner and then returns the explain plan to the client.
Let’s take the Voter application as an example. When running the voter application, a developer might be interested to know how VoltDB will execute a certain SQL query before it is actually executed. For instance, to display query plan of this SQL statement: SELECT COUNT(*) FROM VOTES WHERE STATE = ‘MA’ simply issue the following command in sqlcmd:
exec @Explain ‘SELECT COUNT(*) FROM VOTES WHERE STATE = ‘MA’ ’;
The result table contains the explain plan for the query in the following form:
RETURN RESULTS TO STORED PROCEDURE
AGGREGATION ops: sum
RECEIVE FROM ALL PARTITIONS
SEND PARTITION RESULTS TO COORDINATOR
AGGREGATION ops: count(*)
SEQUENTIAL SCAN of "VOTES"
For details of how to interpret this plan, please refer to our documentation. This query results in a sequential scan of the table, which may be expensive if the table contains a lot of data. If the query is executed often, it is recommended that an index on the STATE column be created on the VOTES table to speed up execution.
This new feature can also be applied to user stored procedures. This is useful because stored procedures are the transactional unit of work in VoltDB. To use the Explain command to view the plans of all queries in the “Vote” stored procedure, simply type in the following command in sqlcmd:
exec @ExplainProc Vote;
The resultant output is a map between SQL statements and the corresponding explain plans. For brevity, only one row out of four rows is shown here:
SELECT contestant_number FROM contestants WHERE contestant_number = ?;
RETURN RESULTS TO STORED PROCEDURE
INDEX SCAN of “CONTESTANTS” using “SYS_IDX_PK_CONTESTANTS_10019″
Planner Testing Tool
The second tool is a Planner Testing Tool. The planner testing tool can be used for finding differences between query plans. Developers/DBAs can use this tool to identify the differences between query plans generated by different versions of VoltDB. You can think of it as the diff command for query plans. This tool is especially useful when the number of queries is large. VoltDB developers use this tool as part of the VoltDB regression test system.
This tool catches three categories of differences between two query plans:
- Position diff: changes of position for all plan nodes (including inline plan nodes).
- Scan diff : changes in scan node type, target table and indexes being used if there’s any.
- Join diff : changes in join node type and join order.
Here’s an example of how to use this tool: Change directory to voltdb/tests/scripts/plannertester. There’s a shell script called plannertest.sh in this directory. Type in the following:
./plannertest.sh -C='config/voltcache, config/voter, config/voltkv' -cd
This will run the tool with the config files voltcache, voter and voltkv (these are configuration files corresponding to the examples). It will compile the catalogs with the current version of VoltDB, save all SQL statements (written in the config file as well) and corresponding query plans into files, then find differences between these newly generated files and a reference version generated by an older version of VoltDB. A report file called plannerTester.report will be generated in /tmp, which contains the diff summary. For more information on usage, you can use the ‘-h’ command line argument to see the full list.
Please refer to our documentation page for more information about this tool if you’re interested.
Have fun using these tools and hope they’ll be helpful!