An intern’s experience with VoltDB Materialized Views
My name is Yiqun Zhang and I worked as a software engineering intern on the SQL core team at VoltDB during the summer. The experience of working with so many smart and talented people was so much fun. I was proud to be part of the team.
My work focused mainly on the materialized view feature in VoltDB. In database systems, ‘materialized views’ refers to a database object that stores the result of a query, which can otherwise be very expensive to evaluate. The materialized view can be refreshed from its source table upon data update from time to time. Refreshing is performed in an incremental manner only on the updated data, so it is far more efficient than directly evaluating a complex query.
When I joined the group, a materialized view in VoltDB could be created from an aggregation query on a single table with the GROUP-BY clause. Our goal for enhancement of the feature was to gradually remove those limitations and to optimize performance as much as possible.
Initially I added support for table-wide aggregation queries in the materialized view, removing the limitation that a query creating a materialized view must have a GROUP-BY clause. The absence of the GROUP-BY columns, and the fact that table-wide aggregation queries always returned one single row, enabled a different way to locate the row to be updated in the materialized view. In addition, the enhancement created an optimization, using smaller table block size. It was very exciting to see my patch for this task included in VoltDB’s v5.4 release; customers who had the need for materialized views on table-wide aggregations didn’t need to create an artificial GROUP-BY column as a workaround any more.
The next project was to optimize performance when refreshing a materialized view. Invertible aggregation functions, such as count() and sum(), are easy and straightforward to refresh. The costly part in refreshing is on non-invertible aggregation functions like min() or max(); in order to refresh them it’s necessary to have knowledge of the source data. We looked at two approaches to a solution: doing a sequential scan on the referenced column over all data, or using an appropriate index to search in the referenced column for the new value. Maximizing the use of indexes was the best answer, and provided improved performance.
Historical implementations of materialized views use only one index that covers all GROUP-BY columns in the source table to do an index range scan. I added support for selecting and using multiple indexes, each covering both the GROUP-BY columns and one aggregated column referenced in min() or max(). This type of index is more efficient for refreshing because fewer records need to be scanned, compared with an index range scan (for most use cases). I found refreshing the view with min() and max() can gain eight to 10 times more throughput for deletion and update than before this kind of index existed.
The final work I did at VoltDB was to develop something like a database trigger. I came up with this idea after finishing the optimization of view refreshing. The refreshing function was hard-coded in VoltDB; its behavior was much like the execution of a compiled query plan. This made me think that if I could compile the plan for certain queries and cache them, I could save the time invested in developing and maintaining those hard-coded routines. It may not have offered much improvement in performance (actually it introduced a little performance degradation), but it was highly extensible. More importantly, by using query plans, users benefit from all the effort the team put into our query planner and executors.
I refactored several APIs that we use to compile, cache and execute query plans internally and rewrote the refreshing function, crossing out the hardcoded part we had been using before. Although this work was not complete when I left, we believe it is a good way to go not only for a more general and more powerful materialized view, but also for a potential fully functional database trigger for VoltDB in the future.
If someone asked me, ‘How does it feel working at VoltDB?’, my answer would be ‘unforgettable’. It’s unforgettable not just because of the work I finished this summer, or the technical experience I got in database systems; it’s the people I worked with who made this memory bright and shiny. I am grateful for the help, care, encouragement and friendship that all my colleagues gave me. It is a gift to be treasured and carried on. Thanks, Volties!