Case Study: Max CDN
Who is MaxCDN?
MaxCDN (now part of StackPath) is a global content delivery network, or CDN. A CDN caches content for their clients in datacenters all over the world, and through various clever strategies, serves content to end-users from servers near them. CDNs also serve an important role in protecting content providers from DDoS attacks and other network-based threats.
See https://www.maxcdn.com for more.
What Problems Did They Have?
MaxCDN bills their customers a tiny amount for each piece of content downloaded. In the simplest sense, they keep accurate statistics on how their system is used, and they turn that into billing information for their customers.
At its core, it’s a counting problem.
There are two main things that make this particular counting problem hard. First, it needs to be accurate at tremendous scale and speed, which is actually really hard for computers. Second, CDNs are fiercely competitive, so cost is a huge issue.
In a nutshell: Accuracy at scale for the least amount of computing resources.
Why Did They Choose VoltDB?
VoltDB had all of the features they needed:
- Fully ACID, Multi-Statement Transactions
- High Throughput
- Stored Procedures
- Low Operational Cost
MaxCDN processes 32TB of log data through their VoltDB cluster daily, which is about 300,000 log lines per second. They estimate they use 1/10th of the computing resources to compute billing and reporting statistics compared with other solutions they explored.
At the end of the day, VoltDB stood alone.
Other strongly consistent systems simply cost too much per event. Even systems with no licensing costs like Postgres or MySQL still required orders of magnitude more computing resources to keep accurate statistics.
They considered several streaming systems like Spark Streaming, Storm+Trident and a few NoSQL systems like Cassandra & HBase. With these systems, techniques for keeping accurate statistics and counters limit performance and/or increase memory usage dramatically.
What Does Their VoltDB Solution Look Like?
MaxCDN wants to keep accurate statistics on huge volumes of events at the lowest cost possible.
Why is accuracy important? Many companies who deal with high-volume and low-value services, such as online ads or energy grids, have the same problem. Like those companies MaxCDN could only afford to ensure at-least-once or at-most-once billing of events. They could overcharge their customers and lose goodwill, or undercharge and leave money on the table.
MaxCDN leverages VoltDB’s strong transactions and stored procedures to implement effective exactly once processing of events. Unlike many VoltDB users, MaxCDN breaks from the one-event, one-transaction pattern that serves latency-sensitive customers so well.
They take the logs from edge content servers and combine thousands of events together into a bundle of about 70,000 log lines. They then feed that bundle into a single VoltDB ACID transaction that updates hundreds of thousands of counters and aggregates atomically.
Why do this?
- Bundling thousands of events together reduces network traffic and is the most efficient way to get throughput, even if it comes at a cost of latency. Millisecond latency isn’t MaxCDN’s priority, accuracy at scale for the least amount of computing resources is.
- Making transactions idempotent, a key part of effective exactly-once processing, has some overhead per transaction. By reducing the number of database transactions by four orders of magnitude through bundling, the cost of idempotence becomes trivial.
Thanks to VoltDB, MaxCDN can provide accurate billing to their customers at a lower cost than their competition, with live updates within 15 seconds. In practice, this accurate billing allows them to charge their customers slightly more while providing a better service. Multiply that by a high-volume business and we’re talking about a lot of money.
MaxCDN is also able to provide all kinds of real-time internal metrics and aggregations to understand the health of their systems and to and drive their business forward. Dealing with faulty servers and anticipating load imbalance faster has huge effects on their bottom line.