Modern Disk Persistence
While many data sets fit in memory, there is an impermanence to memory that makes belt-and-suspenders engineering types nervous. VoltDB needed disk persistence to be ready for the enterprise. But the team had to determine how to use disks without crippling performance. They asked: What does a modern approach look like?
For starters, VoltDB barely reads from disk at all, so much of the workload of a traditional system is removed. Second, VoltDB disk IO is almost 100% append-only streaming writes. Even spinning disks can sustain high write throughput when used this way. Third, disk IO is almost entirely parallel to the operational workload. The system is designed to almost never block on disk synchronization.
This is achieved through two mechanisms: background snapshots, and inter-snapshot logical logging. VoltDB background snapshots are transactional, and serialize data to disk at a single, logical point-in-time, cluster-wide. They proceed at the speed of disk; a slower disk will take fewer snapshots per hour. They also don’t block ongoing operational work.
Logical logging protects data that mutates between snapshots. A logical log of all write operations is streamed to disk. If an entire cluster fails, the most recent snapshot is reloaded, followed by a replay of the logical log to bring the cluster back to the point of failure. This logical log has a huge throughput advantage over binary logs, partly because it is bounded in size, but mostly because disk IO can begin before the operational work is started. Binary logs must wait until an operation has completed to log to disk. Combined with a group commit mechanism, VoltDB’s logging is remarkably low impact, allowing millions of writes per second, per node with synchronous disk persistence.