Streaming Events: Are you Storing Decisions or Taking Them?

streaming data processing

There’s a lot of talk right now about how 5G and the IIoT will increase volumes of ‘transactions’ driven by streaming events, but equally, there’s a lot of hand waving and vague definitions about what a ‘transaction’ actually is. As a consequence, I would argue people are overestimating what can be done with clever queues and underestimating how much high-volume transactional stream processing will be needed.

The issue is this: It’s really easy to concoct examples or even find real world use cases which involve collecting data about completed events, loading it into a queue in a transactional manner and then using a post-processing technology such as KSQL to do simple transformations on the input stream and spit out results.

We would argue that this is not what was promised for stream processing:

  1. Real world data is complicated and full of exception conditions, special cases and weird edge cases. So I might want the input stream aggregated every hour, unless an individual customer’s usage is greater than ‘x’, in which case we send a record downstream immediately and start counting from zero again. Or to use a real world example: A mobile phone user might technically be roaming because they are connected to a cell tower 1 mile away in a foreign country, but we won’t charge it as roaming because they weren’t actually in the foreign country when they made the call.
  2. The real value in stream processing is the capability to influence ongoing events, which means taking decisions. This in turn means remembering state, unless you want to take the same decision dozens of times a second. A simple queue aggregation model won’t work.

Just because a simple single item transaction can be scaled in a queue doesn’t mean your complicated business transaction can be scaled in a queue.

Not all transactions are easy to scale using queueing technologies:

  • “Preordained” transactions where the outcome is never in doubt are easily scalable. Non-preordained transactions with unknown outcomes can be tough to scale, especially if they involve shared, finite resources. In this scenario, transactions will compete for these resources.
  • Being ACID-compliant doesn’t just encompass data storage but also needs to include the decisions driven by the data.

Right now the database industry is looking at 5G and the IIoT. I’ve had conversations with people in the telco business who casually stated they expect volumes to go up tenfold, and this is in a space where 500K TPS is already not unusual. There will be a whole series of new high volume transactional problems to solve. Some of these may be “Preordained Transactions” – where you’re effectively recording events that have already happened. But a lot of them will be the kind of transactions where decisions driving responses to events at high volumes must complete within milliseconds if they are to be relevant. As this ability will directly impact your bottom line, it calls for some very careful design decisions.

Streaming events and handling hundreds of thousands of impactful, complex transactions per second?

You have four choices:

  1. Try to solve this with legacy RDBMS. This approach won’t work unless you somehow create a farm of legacy RDBMS databases, shard the work across the farm, write a layer on top of it and hire lots of DBAs to manage it.
  2. Try to solve this with NoSQL. While NoSQL products are beginning to offer multi object/document ACID transactions, they still do not encompass the decision making logic in these transactions. Approaches that involve overwriting entire copies of records with new ones will be vulnerable to scaling problems, especially when low SLAs are involved, as conflicts result in multiple time-consuming retries.
  3. Try to solve this with a smart queueing product. Queuing product vendors have started to put SQL layers on top of their base functionality, and have implemented basic GROUP BY and time-based aggregation. This approach works well for simple scenarios like rendering a dashboard. But, when actions need to be taken when these aggregates indicate deviation from the norm, it will not be sufficient to just store aggregates for interested parties to query whenever they choose to. Ultimately this approach runs the risk of breaking totally when the complexity of your use-case changes and involves conditional logic or any other additional complexity.
  4. Use VoltDB. We don’t claim to solve every problem, but handling huge numbers of dynamic transactions with a predictable latency without compromising on data and decision veracity is an area we have an established track record in.

Conclusion

We know that the number of transactions will skyrocket over the next decade as machine-to-machine communications increase in volume. You need to fully understand what vendors mean when they claim to support high transaction volumes. Transactions that simply record state changes are much easier than ones that need to encapsulate the entire work spectrum of ingest-store-aggregate-measure-detect-decide-act.

Do you have preordained transactions or complicated business transactions that need to be scaled in a queue? Get in touch and let’s chat about how VoltDB can help.

  • 184/A, Newman, Main Street Victor
  • info@examplehigh.com
  • 889 787 685 6