In-Memory Databases

High performance, low latency

If you've got a serious need for speed, in-memory databases deliver the fastest possible data access available today. In-memory databases are gaining popularity among organizations struggling to keep up with fast transactions and other forms of high velocity streaming data.

When SQL databases were initially designed, memory was expensive, so rotating magnetic disks in the form of mechanical disk drives became the primary storage device for databases. While disk-based storage is still used and relied upon, the steadily decreasing price of RAM has the potential to make mechanical disks obsolete as the primary data storage layer for operational systems.

However, not all in-memory databases are the same. Some in-memory RDBMS technologies provide much higher performance than others when compared to traditional disk-based products. How can that be? It really depends on how the in-memory product was designed -- was it created from the ground up to run in-memory, or was it a simple migration of a disk-based product to in-memory?

Building a RDBMS for a New Generation

In 2007, the research team behind the Aurora Complex Event Processing system (commercialized as Streambase) and the C-Store Analytics (OLAP) system (commercialized as Vertica) set to building an operational database aligned with the needs of the 21st century enterprise.

They took an open source RDBMS that followed traditional RDBMS architecture, ran it on a memory-based filesystem, and measured where it spent its time. Over 80% of the database software's time was spent on page buffer management, index management, and concurrency management. Only 12% of its time was spent actually doing the real work the database was supposed to do.

The research team also discovered that the concurrency problem could be exacerbated by trying to increase the speed of individual components within the database software system architecture -- the whole was much less than the sum of its parts.

The original research behind VoltDB was led by Dr. Michael Stonebraker and a team of senior computer scientists from MIT, Yale University, and Brown University. The original research paper (H-Store) is here.

An in-memory RDBMS built for today's requirements needed a radical approach to solving concurrency issues. Modern in-memory databases that offer significant advantages must offer:

  • Horizontal scale-out on commodity hardware with linear scalability
  • Full and strong ACID compliance
  • High concurrency
  • Reliable disk persistence
  • High Availability

Is an In-Memory Database Worth It?

How do the advantages of in-memory databases stack up against their disk-based counterparts?

Advantage In-Memory Disk-based
Ease of Implementation    
Cost (Hardware)    
Write Performance    
Read Performance    

Myths About In-Memory Database Risks

While the inherent structure of in-memory databases does introduce potential for data loss if the node power is lost, a product that's properly designed for guaranteed durability easily addresses that risk. But you have to be careful that you fully understand what each vendor claims and, more importantly, what they actually deliver, for data durability in their products.

There is a design trade-off between the ACID (atomicity, consistency, isolation, and durability) levels supported by default and the performance of the product -- in general, the higher the level of any of those components, the slower the performance (higher latency).

There is also a compromise between availability and consistency (CAP Theorem) for distributed systems; with some high performance systems, high performance can come at the high cost of inconsistent or inaccurate data, or the possibility of data loss.

In addition, cost-effectiveness will vary with the architecture chosen, with some products providing you with the ability to make effective and efficient use of commodity servers and others needing much more hardware to provide similar performance.

In-Memory Data Grid and In-Memory Fabric Technologies

Technological advances in the area of computer architecture have made in-memory operations more practical and more affordable than ever before. The cost of DRAM has dropped significantly so systems with several terabytes of DRAM are now feasible for many users.

VoltDB was made possible due to these advances -- we can run our database in-memory for very high performance on relatively inexpensive commodity servers. Other vendors have taken different approaches to capitalize on the availability of more memory. One of these different approaches is the in-memory data grid.

What is an In-memory data grid (IMDG)?

An in-memory data grid (IMDG) is a data structure that resides entirely in RAM (random access memory), and is distributed among multiple servers. It is sometimes called an in-memory data fabric (IMDF).

IMDGs can span a very large number of nodes (scale-out) and claim to be able to support hundreds of thousands of in-memory data updates per second. They can be clustered and scaled in ways that support large quantities of data.

IMDGs usually use a key/value data structure, rather than a relational structure, which allows flexibility in the data that is cached but may necessitate changes to the applications to use the IMDG instead of the underlying database or other type of data store.

What is an In-memory data grid used for?

The in-memory data grid is proposed as a way to help improve the performance of existing systems without the need to replace those systems. It attempts to speed up operations by caching the data from those existing systems in memory for faster access.

Some vendors have taken this concept a step further by using the in-memory data grid not just for in-memory storage of data but also for in-memory compute operations on the data. Vendors that offer in-memory data grids include Hazelcast, GridGain, Pivotal (GemFire), Oracle (Coherence), Tibco (ActiveSpaces), and GigaSpaces.

In-memory data grid technology is a separate breed from most in-memory relational database technologies. These products are distinguished by the data fabric, or distribution of data assets.

In-memory data grid products are defined by a NoSQL (key-value store) cache that is layered in between applications and data management platforms to speed up the data access.

Common "Alternatives" to In-Memory Databases


Apache Storm

Storm is a real-time streaming data framework for processing fast streams of data. Incoming data sources are connected to back-end data stores, while code is run on an intermediate path. This solution is perhaps the best known of the in-memory alternatives, since it was created and used by Twitter.

It may be falling out of favor, however -- even Twitter has stopped using it (replacing it with Heron). The main difference with in-memory databases is that Storm does not maintain state by itself, so it needs additions like the Trident API and iother Apache components, such as Cassandra.

Learn More

Lambda Architecture

The Lambda Architecture is designed to provide analytics on enormous amounts of data using batch and stream processing methods in different "layers". The goal is to separate data processing into two distinct branches to maximize the efficiency of both the fast processing of recent data as well as the slower deep processing of long-term, historical data. Since Lambda is really focused on analytics, it doesn't organically support the ability to build responsive, event-oriented transactional applications.

The Lambda Architecture really isn't an alternative to in-memory databases – in fact, in-memory databases can be used to implement part of the Lambda Architecture, particularly the speed layer used for fast processing of recent short-term data.

Learn More

Other common alternatives to in-memory databases include In-Memory OLAP and Amazon Kinesis.

View Alternatives

Is an In-Memory Database the Right Solution for Your Needs?

VoltDB is faster, smarter and simpler than traditional databases. Its in-memory scale-out architecture enables 100x faster performance than traditional alternatives. This enables our customers to convert live data into business value, analyze and act on streaming data, and use real-time intelligence.

With VoltDB, data is always consistent, correct, and never lost. This spells seamless ecosystem integration, simpler apps, easier testing, and better maintenance. VoltDB customers report they have experienced:

  • 1/10 the needed compute resources than competitive products
  • 100% data correctness and completeness
  • 253% increase in offer purchases through use of VoltDB for personalization
  • 3ms or less response latency (99.999% of the time)
  • 100% billing accuracy in billing management applications

VoltDB Database Comparison vs Alternatives

To discover how VoltDB stacks up against the competition, including Cassandra, Vertica, TimesTen & Aerospike, take this two minute survey.

Find the Right Fit