There’s a very simple distinction in the database world that tends to be ignored. Your database either:
- Defines the World
- Reacts to the World
If your database ‘Defines the World’ it means that if it doesn’t happen in the database, it doesn’t happen at all. Amazon.com is a classic example of this - if you couldn’t buy it on the web site then you haven’t bought it. Payroll and accounting systems also fall into this category. Most early database applications were like this, and as a result APIs and developer mindsets still tend to assume that ‘defining the world’ is the task at hand.
But increasingly we are seeing systems that fall into the second category - those that react to an external reality that will keep on happening, with or without the database. It’s the difference between playing at being an Air Traffic Controller and having the screen go dark and being a *real* controller and having the screen go dark, at which point the game has only just started if you are the controller.
Another example would be the phone system in the US - if the computer system that takes credit off your prepaid phone doesn’t respond in under 100ms, the network will connect your call regardless of whether you have credit or not, because the telcos don’t want to antagonize you and prefer a situation where money is being lost at the rate of a few million dollars an hour.
This brings me nicely to my point - systems that react to the world usually struggle to escape from the consequences of either degraded service or a full outage. So while airplane pilots trust the controller they also keep their landing lights on and use other gadgetry when near airfields and other airplanes, because research has shown that replacing one light bulb is consistently cheaper than replacing two airplanes.
‘Reacts to the World’ systems include pretty much all IoT and Connected Car applications, as well as any other situation in which you have devices ‘in the wild’ that require some form of real-time back end processing to provide value to end users. But what does this all mean for developers? Does your database ‘Define the World’ or ‘React to the World’? Use our checklist below to determine which type of database you have - and which you need.
Data Modeling is different in the ‘Reacts to the World’ (R2W) scenario because you can no longer assume you will have full knowledge of all the data you are managing at any moment in time. You have to assume that the system will suddenly start, and be expected to act sensibly, when confronted by situations it may not fully understand.
Question: Does you data model cope with situations where you don’t actually have full information?
Error handling is a big issue - normal development practice is to send utterly cryptic errors back to the application, which is akin to walking into a McDonalds, ordering a milkshake, and being told that a filter in the shake machine needs cleaning instead of ‘we’ve got no shakes’. Sending such an error message back makes sense if you can stop reality while the user asks for it to be fixed, but makes no sense in a world where reality continues with or without you.
Question: Will error handling allow you to provide a degraded service when things go wrong?
In a ‘Defines the World’ system you might have an agreed SLA, but while an outage might have commercial consequences, it might be limited to forcing human customers to try again later. If you are interacting with other computers and you don’t respond quickly enough there may be far more significant consequences – imagine, for example, an air traffic controller whose screen is one minute behind reality.
For many IoT applications, millisecond latency is not just a matter of convenience - it can be the key to effective monetization. If you are trying to influence events while they are still happening, the lower your latency, the better things will be. For this reason ‘Reacts to the world’ systems may need to be in-memory, as any kind of delay caused by disk access may serve to break your hold over events.
Question: Is your ability to function critically dependent on low latency?
Returning to our air traffic controller example - imagine if one controller tried to hand off your airplane to another controller, but your airplane showed up at different physical locations on each of their screens? The database industry refers to this phenomenon of the same question receiving different answers for a while as ‘eventual consistency’. In a ‘reacts to the world’ scenario ‘eventual consistency’ is a major problem; if you are acting within milliseconds you will presumably have made decisions that influence real-world events before the database finally settles on a single answer. Some of those decisions will be wrong.
Question: Does your application need ‘immediate consistency’?.
Client behavior when ignored
Humans have an attention span of about seven seconds, but will sometimes wait for minutes for a computer to respond. Humans also will get bored and find something else to do when confronted by a slow ‘Defines the World’ system. Machines, on the other hand, either stop functioning or re-send the same message when they don’t get a response. If the back-end system has poor concurrency capabilities you can get into a situation where the same machine has multiple requests trying to use the same resources to solve the same problem, and has thus deadlocked itself. Regardless of the architecture chosen, sooner or later there will be a network spike of some kind that will provoke this situation.
Question: Does your application need to cope with surges of activity and rapid, identical, cloned requests?
Surges and odd bursts of activity are a fact of life in the IoT and Telco spaces. These can either affect the IoT devices you are managing or other systems you are interacting with. Most legacy database platforms - and many newer ones! - have APIs which are synchronous - once you start a call to the database you are 100% dependent on the API returning control to your program quickly. This is obviously problematic, as your only way to cope with surges is to spawn more and more threads of activity. An asynchronous API allows you to carry a large number of simultaneous transactions without a correspondingly large number of threads.
Question: Does your application need an asynchronous API?
Most traditional ‘Defines the World’ systems have pre-announced periods of downtime. In the ‘Reacts to the World’ space, the outside world can’t be turned off, so concepts such as ‘5 9s’ go from being ‘nice to have’ to being commercially and practically important.
Question: Is downtime really, really expensive as opposed to merely being inconvenient?
A frequently-ignored aspect of the IoT is that we are now building ‘Reacts to the World’ systems that will need to run for years and years, as they include hardware such as fridges and cars that consumers expect to last a long time. This in turn means the architectural choices need to be equally long term. Open source stacks can be problematic here, as the frenetic pace of development means that either you suffer the expense of continually upgrading and changing running, tested code to use the latest technology, or you run the risk of finding yourself on an old, unfashionable and unsupported stack.
Question: Do you have a good plan for supporting your platform over a long period?
If you find yourself answering ‘Yes’ to most of these questions then you are probably in a ‘Reacts to the World’ situation. Of the questions posed above, ‘immediate consistency’ and ‘latency’ are arguably the two most important. The question you now need to ask is whether your current plans are going to allow you to succeed, or whether you need to step back and review your technology choices.
by David Rolfe
David Rolfe is VoltDB’s senior technologist in EMEA. His 30 year career has been spent working with data, especially in the Telecoms industry.