Key Considerations for a Modern Database to Operate at Scale
Modern applications have two primary needs:
- performance and
- ease of operations, both at scale.
The performance consists of two aspects: throughput and latency. Both are driven by applications becoming machine-driven i.e. API-driven. Humans will wait much longer than an API will since APIs have strict latency expectations due to timeouts.
When you take a multi-tier approach to solve a data related problem, each layer you add to the composition of your solution, you are baking in a certain latency. When your business latency or API latency expectation is larger than the baked in latency, you have room for coding cleverness to try to do something really quick. But in most cases in modern applications, the application expectation is far less than the baked-in latency.
- vEPC in Telco has a strict 1 ms latency requirement.
- Ad targeting has about 2 to 3 ms latency to decide whether or not a user should be shown creative from a campaign based on near-past context.
- In a real-time slippage calculation, the slippage needs to be determined as soon as the execution event is submitted.
- In anomalous behavior prevention, the detection needs to happen apriori without deteriorating the end user experience instead of a post-event reconciliatory measure. This anomalous behavior can be credit card fraud like Huawei is performing or Ad Bot or Not detection like WhiteOps is performing (just recently The Trade Desk partnered with WhiteOps to become the world’s first ad tech platform that BLOCKS a fraudulent impression before they are purchased).
Ease of Operations
Now comes the second part of the modern application requirements, which is the ease of operations. Each layer you add is essentially another layer of potential failure. Then you need to manage the resiliency, performance, guarantees of deliveries, interoperability with other layers before and after the current layer. This starts bloating the infrastructure needs by 50% to 90% depending on the number of layers added. On top of this, add in the people to manage such large infrastructure and the data center costs associated with such a large footprint i.e. power, cooling, real estate, etc.
When one takes all these elements into consideration, it quickly becomes quite apparent, that open source free software is not free after all.
What we have done at VoltDB is to consolidate all things to be done with fast data into a single layer so the complexity of system resiliency is distilled down to that single layer. We have put special focus into implementing strategies for not just high availability but also no downtime during planned activities like software or OS upgrade. All of this is scriptable via our command line interface to reduce human intervention or dependence for the middle of the night operations requirements.
For a deep dive into this subject, check out our recorded webinar “Architectural Considerations for a Modern Database to Operate at Scale“, hosted in conjunction with DBTA. Or, feel free to check out our technical overview for a more detailed look at the VoltDB architecture.