A Question of Degree: Thoughts from the In-Memory Computing Summit
VoltDB was a sponsor of the recent In-Memory Computing Summit in San Francisco. I always enjoy going back to San Francisco. I stayed at a cool flat on Nob Hill and the show was enjoyable and productive – we had some great conversations with attendees and even other sponsors there. The frustration came when I heard other vendors saying some things that were either misleading or just plain wrong. I don’t know if they were intentional, but I thought I would try to achieve some inner peace by ranting here.
One of the great things about the IMC Summit was the high level of technical sophistication of the attendees. We were at the Strata Hadoop World show in San Jose in March and it seemed like we were spending a lot of time explaining very basic in-memory, database, streaming, real-time, etc. concepts. At the IMC Summit, people got all that – they approached us with rather deep and meaningful questions about architecture, implementation, and the sometimes subtle differences between various vendors and products.
One thing that you need to be very careful about when you are trying to compare different products is the definition of terms. I know I fall into this trap myself – I make assumptions about what terms mean and, unless the numbers look absurd, I usually accept them along with the specification definition, unquestioned. Also be aware that sometimes more is not better just as sometimes less is not better. We can understand that for things like latency – in the data management game if something takes less time, generally that is a good thing. But you should also be fully aware of why it takes less time, whether it is just doing everything faster or if it is leaving things out… simplifying a process by not doing most of the process is usually not a good thing. I’ll touch on my top three here, and may have more to tackle in a future blog.
This term is often the most disagreed upon but it is very subjective – a bit like “beauty” in that it is in the eye of the beholder. Real-time for any particular use case is as fast as is needed to do the job. Practically, it seems to have come to mean significantly faster than it currently is. I’ve spoken with customers who want some analytics done within two hours, which represented a significant improvement over their then-current SLA of overnight (about eight hours). To them such a short period of time would seem to be real-time. But with all the talk about streaming systems, it seems like real-time is becoming more, well, real-time…within milliseconds rather than seconds.
The term “transaction” does not mean “something that happens”. I have seen and heard some vendors using the term transaction to represent events as simple as query responses and database lookups. Those are not transactions. A transaction has a rather strict and rigid definition within data management – it represents the concept of a group of actions that occur with a certain set and level of criteria, typically ACID (Atomicity, Consistency, Isolation, and Durability) that allow the data management platform to remain accurate, meaningful, and reliable. There are different degrees of all of the individual components of the term ACID, except perhaps for Consistent, but the ‘C’ most people are concerned about is the CAP Theorem ‘C’, for which there are different degrees of consistency. It is the relaxing of consistency to “eventual consistency” that is the primary way many NoSQL solutions get their high degree of availability. For that one, just as for “real-time”, you will need to decide how much time you are willing to wait. For high-value transactional data, most of our customers can’t wait at all – they want all nodes to agree on this data as soon as it is there.
This one baffled and, frankly, irritated me the most. Another vendor was asked about the difference between their product and VoltDB. One of the first differences he cited was that their product was “operational”.
This other vendor offers an OLAP product. I come from the data warehousing space so I understand the concept of operational business intelligence. I spent a considerable amount of time trying to convince customers to make analytics a key and core part of their business, and that OLAP systems (like data warehouses) should be more operational but frankly most companies are not there yet. Transactional systems are more traditionally seen as being operational. When a system is responsible for keeping track of customer billing or revenue recognition or shipment confirmations or debiting a credit card account or any of the many other things transactional systems do, it is easy to see that it is critically important to the “operation” of the business and that it needs to “operate” at the speed of the business “operations” and, thus, is “operational”. A simple dashboard that gets data by doing a straightforward lookup on an OLAP system, that can crash without significantly affecting business “operations”, isn’t really operational, is it?
The bottom line is you should not assume the definitions of terms are universal or even widely accepted. Make sure the vendors you are considering are in sync with your definitions, particularly for those requirements that are critically important to you.