Testing at VoltDB: SQLCoverage
A previous blog post by Ruth Morgenstein, VoltDB Director of QA, gave an overview of how we test at VoltDB. Read that for brief descriptions of our robust processes and culture of quality, including Agile development, continuous integration using Jenkins, Java and C++ unit tests, system integration tests, performance tests, and related topics.
This blog post, the first of two, focuses on one aspect of our testing: verifying correct SQL query results by running queries using randomized data against both VoltDB and a reference database, and comparing the results, using a VoltDB-developed Python tool, SQLCoverage. SQLCoverage includes 42 separate test suites and runs over five million SQL statements, some against every code branch, others against the “master” branch only. The second post will describe the recent expansion of SQLCoverage to test VoltDB against PostgreSQL, including Geospatial queries using PostGIS.
This post will set aside performance and features such as table partitioning that make VoltDB unique, to focus instead on its similarity to other databases in correct results returned by its SQL queries. You may take VoltDB’s capabilities here for granted, because we take correct results very seriously.
Testing VoltDB against HSQLDB
HSQLDB, aka HyperSQL Database or HSQL, is an open source relational database, parts of which are used by VoltDB’s SQL parser. The SQLCoverage test program works by generating SQL queries, running them against both VoltDB and HSQLDB version 1.9b3, and comparing the results. This version of SQLCoverage has been running tests for many years. HSQLDB was an ideal database to use for this purpose initially since the code, with modifications, is included in VoltDB.
SQLCoverage tests are divided into 28 test suites that focus on different types of columns and queries. For example, the basic-strings test suite includes comparatively simple queries running mostly against columns of type VARCHAR (i.e., strings). The advanced-strings test suite includes more complex or unusual queries, also running mostly against VARCHAR columns.
Similarly, test suites called basic-timestamp and advanced-timestamp run mostly against columns of type TIMESTAMP, while basic-ints and advanced-ints run mostly against columns of type TINYINT, SMALLINT, INTEGER, and BIGINT. Other test suites focus mainly on testing joins, e.g. basic-joins and basic-int-joins; indexes, e.g. basic-index and advanced-index; materialized views, e.g. regression-basic-matview and advanced-matview-nonjoin; and combinations, e.g. advanced-matview-join.
SQLCoverage test suites run over a million queries in Jenkins against every VoltDB branch. Tests must pass before each branch can be merged to the “master” branch.
Fourteen additional test suites containing more than four million queries are part of the “extended” SQLCoverage tests. These tests run once daily against “master” since they typically take around five hours to run. Daily testing also includes running the 28 main SQLCoverage test suites with just one partition and with three partitions. These “multiconfig” SQLCoverage tests ensure that the correctness of SQL queries does not depend on the number of partitions being equal to the usual default of eight.
How the tests work: generating queries
It would have been time consuming for our engineers to manually write the more than five million queries executed by SQLCoverage. Fortunately, most queries are generated automatically by taking templates of queries we want to test and substituting column names and constant values into those templates to create queries for test.
Each test suite is defined by four files. The two most important are a DDL file, which defines the tables and columns to be used, and a “template” file, which defines the SQL statements to be tested against the two databases. A “schema” file used in randomizing values will not be discussed in this post; a “normalizer” will be discussed later.
Template files may be defined to include other files. For example, the template file for the basic-ints test suite consists of two lines, one including the configure-for-ints.sql file, used by both the basic-ints and advanced-ints test suites to define certain values; and one including the basic-template.sql file, used by basic-ints, basic-strings, and other test suites to define the SQL statements to be tested.
Similarly, the basic-strings test suite includes the configure-for-strings.sql and basic-template.sql files. The advanced-ints includes configure-for-ints.sql and advanced-template.sql; advanced-strings includes configure-for-strings.sql and advanced-template.sql. Thus different test suites may test SQL statements of similar form using different column names, column types, and constant values.
An example is the simple query template:
SELECT _variable FROM @fromtables
The idea is simple: in place of @fromtables a table name used in the current test suite is substituted. This typically includes P1, a partitioned table, and R1, a replicated table. Similarly, in place of _variable, a column name from that table will be substituted. For example, you might end up with queries such as these:
SELECT ID FROM P1
SELECT VCHAR FROM R1
The query template example above is a simplified version of one taken from basic-select.sql which is included in basic-template.sql and used in multiple test suites. The real template looks like this:
SELECT _variable[#arg @comparabletype] B1 FROM @fromtables
You’ll notice three additions. First, working from the right end, B1 is a column alias. Some SQL users may be more familiar with column aliases such as “AS B1”, but the AS keyword is optional. This serves both to test that column aliases work and, in case a query generated from the template ever fails – i.e. the results from VoltDB and HSQLDB do not match – it is easy to track which template the query came from since a different alias is used in each query. The templates in basic-select.sql are numbered B1, B2, B3 and so on.
Second, @comparabletype defines which type of column should be used; it has different values in different test suites. Third is “#arg”, which we’ll ignore for now since it has no effect in this simple query template, where it appears only once. When this template is used in the basic-strings test suite, @comparabletype will be set to string (VARCHAR). Among the queries generated from this template will be:
SELECT VCHAR B1 FROM P1
SELECT VCHAR B1 FROM R1
SELECT VCHAR_INLINE B1 FROM P1
SELECT VCHAR_INLINE_MAX B1 FROM R1
The VCHAR, VCHAR_INLINE, and VCHAR_INLINE_MAX columns are all of type VARCHAR but have different sizes, i.e., different maximum numbers of characters or bytes. When this template is used in the basic-ints test suite, @comparabletype will instead be set to numeric. Among the queries generated will be:
SELECT ID B1 FROM P1
SELECT TINY B1 FROM R1
SELECT SMALL B1 FROM P1
SELECT BIG B1 FROM R1
The TINY, SMALL, ID and BIG columns are of VoltDB types TINYINT, SMALLINT, INTEGER and BIGINT. You can imagine the equivalent queries generated by this same query template for the basic-timestamp test suite, which includes columns PAST and FUTURE, both of type TIMESTAMP, or for other test suites using other data types, including new Geospatial types to be discussed in a future blog.
Another example of a query template, also from basic-select.sql, is below:
The @agg represents an aggregate function, e.g. COUNT, MIN, MAX, SUM, or AVG. In the basic-strings test suite the last two of these are omitted, since they do not make sense for VARCHAR columns. The #arg, which we ignored earlier, here means that the same column name should be used in both places where it occurs, in the SELECT column list and the WHERE clause. The _maybe is either an empty string or the keyword NOT. The @cmp is one of the comparison operators: =, <>, !=, <, >, <=, or >=. And @comparableconstant is a constant value: in the basic-ints test suite it is 42, an integer value, but in the basic-strings test suite it is ’42’, a string value. A few queries generated from this query template for the basic-ints test suite are:
SELECT SUM(ID) B9 FROM P1 A WHERE A.ID = 42
SELECT AVG(TINY) B9 FROM P1 A WHERE NOT A.TINY <> 42
SELECT MIN(SMALL) B9 FROM P1 A WHERE A.SMALL < 42
SELECT MAX(BIG) B9 FROM R1 A WHERE NOT A.BIG > 42
SELECT COUNT(ID) B9 FROM R1 A WHERE A.ID >= 42
These queries are for the basic-strings test suite:
A final example of a query template is taken from advanced-template.sql using a CASE statement:
SELECT __[#numone] Q36, CASE
WHEN A._variable[#arg @columntype] @cmp @comparableconstant
THEN A._variable[#numone @columntype]
ELSE A.__[#arg] @aftermath END
FROM @fromtables A WHERE @columnpredicate
The __[#numone] is shorthand for the same column name referenced later as _variable[#numone @columntype]; similarly, __[#arg] is shorthand for the same column name referenced earlier as _variable[#arg @columntype]. The @columntype is similar to @comparabletype: they differ slightly in some test suites, but not here. Sometimes the former is defined more narrowly, e.g., @comparabletype may refer to any numeric column while @columntype is restricted to integer columns.
Further, @aftermath is either an empty string, e.g., in advanced-strings, where math does not apply, or a mathematical operator (+, -, ∗, or /) followed by the integer 1, 2, or 3, e.g., in advanced-ints. @columnpredicate typically compares a column with a random constant. Variables such as @cmp and @comparableconstant are defined as before. A typical query generated from this template in the advanced-ints test suite is:
SELECT ID Q36, CASE WHEN A.BIG <= 42
THEN A.ID ELSE A.BIG + 3 END
FROM P1 A WHERE SMALL >= -6366282630321067393
A typical query generated from the same template in the advanced-strings test suite is:
SELECT VCHAR Q36, CASE WHEN A.VCHAR_INLINE <= '42'
THEN A.VCHAR ELSE A.VCHAR_INLINE END
FROM P1 A WHERE VCHAR_INLINE_MAX >= 'v蛙yWBaMRITuKMC'
Generating random data
All example query templates in the previous section involved SELECT queries, which are the most numerous, complicated, and interesting queries we test . We also test INSERT, UPDATE, and DELETE SQL statements. We generally begin with INSERT since none of the others yield interesting results on an empty table. Both basic-template.sql and advanced-template.sql, and therefore most test suites, begin with these lines:
DELETE FROM @dmltable
INSERT INTO @dmltable VALUES(@insertvals)
The first line deletes any data in relevant tables – typically P1 and R1 – in case any data is leftover from previous test suites. The second is used to insert new data into the table.
The value of @insertvals varies in each test suite. In basic-ints and advanced-ints, it is defined as:
This inserts four values for each row, in the ID, TINY, SMALL, and BIG columns. The ID column will contain an integer value that starts with 0 for the first row, and then increments for each new row. The other columns where _value was specified will contain pseudo-random values. The second column, with [int:-10,10], will contain an integer value between -10 and 10, inclusive. The third and fourth columns, with [byte], will contain a byte value, hence between -127 and 127. The third column, since it also has null30, will be NULL 30% of the time; the others, without this, are never NULL. So, a typical INSERT statement generated from all this is:
INSERT INTO P1 VALUES (0, 7, NULL, -48)
In basic-strings and advanced-strings, @insertvals is defined as:
This will insert five values for each row in the ID, VCHAR, VCHAR_INLINE_MAX, VCHAR_INLINE and RATIO columns. These are of types INTEGER, VARCHAR(5000), VARCHAR(63 BYTES), VARCHAR(14), and FLOAT, respectively, in table R1. They are slightly different in P1 to test a larger variety of VARCHAR sizes. The _id will increment for each new row, and _value indicates pseudo-random values. The second, third, and fourth columns with [string null20] will contain a randomly-generated string with a 20% chance of being NULL. The fifth column with [float] will contain a number between 0 and 1. So, a typical INSERT statement generated is:
INSERT INTO P1 VALUES (0, 'zwjカVxMPprCяSU', NULL,
Notice that the randomly-generated strings include both ASCII and non-ASCII characters.
Comparing query results
After running two SQL statements in both VoltDB and HSQLDB, it is necessary to compare results to see if they match. There are two issues with this process, which we treat by using a “normalizer” to put the results into a format where they may be easily compared.
The main issue is the sorting of results. When no ORDER BY clause is present results can be returned in any order, so before comparing results the normalizer sorts them to make it easy to check whether they match. Even when an ORDER BY clause is present some sorting may be necessary; for example, if there is an ORDER BY A clause where A is a column or expression, the results will already be sorted according to their values of A. Rows with equal values of A will be returned in an arbitrary order so a subset of the results must be sorted by the normalizer, according to the values of other columns or expressions returned by the query.
There is a further complication with the sorting of NULL values. VoltDB treats NULL values as if they were the smallest values. They are always sorted first unless the DESC keyword is used, in which case they are sorted last. HSQLDB always puts NULL values first, with or without the DESC keyword. Both behaviors are valid; the relevant part of the SQL standard says: “If <null ordering> [that is, “NULLS FIRST” or “NULLS LAST”] is not specified, then an implementation-defined <null ordering> is implicit,” so it is up to the database vendor to determine the behavior in this case.
Thus, for HSQLDB results to match VoltDB results, the normalizer does additional sorting of NULL values.
The second, comparatively minor, issue is round-off discrepancies, which only apply to columns or expressions of type FLOAT, or to Geospatial data. Floating-point calculations can yield slightly different, yet still correct results, depending on the order of operations. For example, if you sum three FLOAT values, you may get a slightly different answer, depending on the order in which they were summed. Consequently, the normalizer rounds off all FLOAT values to 12 significant digits, so tiny round-off discrepancies are ignored.
We take the correctness of the SQL query results returned by VoltDB very seriously. Using the SQLCoverage test program, with other tests, ensures results are correct, as confirmed by comparison with results returned by HSQLDB – and with PostgreSQL, as we’ll see in the next blog post. Differences are taken very seriously and, unless they can be explained by known differences between the databases – such as the sorting of NULL values or tiny round-off discrepancies – we do not release VoltDB until they agree.
N.B.: The SQLCoverage test program referred to in this blog is part of VoltDB’s community version, and therefore open source software, which you can find here, with the test suite definition files here.