Cross Data Center Replication (XDCR) is an essential technology that facilitates data failover and disaster recovery. By replicating data across multiple data centers in different geographical locations, XDCR guarantees continuous service delivery to the end user by always having the backup of data in case one of the data centers is adversely affected.
XDCR is a robust feature that effectively addresses the need to maintain separate, active copies of the database in separate locations. XDCR thereby provides organizations with essential standards of data availability while also bringing data closer to users.
XDCR technology is particularly critical for organizations in the telecommunications industry, where data availability, consistency, resiliency, and ultra-low latency are vital for success in today’s age of 5G.
A basic cross data center replication system.
However, not all XDCR solutions can provide the sub-10 millisecond latency required by telcos in today’s age of 5G, while also ensuring data availability, consistency and resiliency.
Read on to discover the benefits you should expect from XDCR as well as an overview of how XDCR works, critical XDCR challenges and why some efforts to overcome them still fall short, and what to look for in a modern XDCR platform.
Benefits of Cross Data Center Replication
XDCR is absolutely vital for telecommunications service providers and other types of enterprises that rely heavily on low latency and data center uptime to keep their services afloat and their customers happy.
5G’s promise of ultra-low data latency presents exciting opportunities to provide new services that require nearly instant response times. Seizing these new opportunities, however, requires data to be immediately available, resilient, and consistent, regardless of a user’s geographic location. Data center failure and data loss caused by unresolved data conflicts and cyber attacks simply can’t happen.
That’s why XDCR exists: to keep things running. But that’s the highest-level benefit.
Here’s a more specific breakdown of the benefits of XDCR:
1. “Five-Nines” Availability
It’s widely accepted in the telco industry that systems supporting mission-critical telco functions such as charging or policy need to provide 99.999%, or five-nines of data availability each year, which equates to a mere six minutes of unscheduled downtime per year.
2. Physical Survivability
At the same time, there’s also a requirement for physical survivability, which means an application and its data must run in more than one data center at once. These data centers must be several miles apart so that they can’t both be destroyed by the same catastrophic event.
3. Low Latency
Operators in large markets, such as the US, need to maintain industry-standard SLAs of just a few milliseconds, regardless of where the data is stored. Due to 5G’s aggressive latency requirements, it is physically impossible to serve a market like the US from a single data center. A large number of data centers are required. These multiple data centers must also be capable of resolving data conflicts, in order to maintain data resiliency and consistency, and avoid data loss.
How Cross Data Center Replication Works
Passive data replication involves duplicating the contents of selected tables between two database clusters, in only one direction: from the master database to the replica. In contrast, XDCR copies database changes in both directions.
XDCR can be set up on multiple (i.e., more than two) clusters. Client applications can then perform read/write operations on any of the participating clusters. Changes in one database are then copied and applied to all the other databases. Therefore, XDCR can support client applications attached to each database instance.
XDCR uses memory-to-memory data replication, so all writes are first saved in the memory and then put in a replication queue, which sends it over the network simultaneously through multiple threads. This way, the performance of replication is limited only by your network speed.
XDCR generates data changes using Database Change Protocol (DCP), a high-performance streaming protocol that communicates the state of data using an ordered changelog. This way, DCP guarantees that the same data will be replicated among all clusters regardless of connectivity problems.
The Evolution of Cross Data Center Replication: Active-Active (and Active-Active-Active) XDCR
XDCR technology has advanced significantly since its introduction decades ago to enable disaster recovery for legacy enterprise databases. By reviewing recent key evolutions in XDCR, we can more fully understand the challenges that still remain, particularly for telcos.
Active-active XDCR is a deployment of XDCR that consists of two databases that are both active and fully functional and can be changed in real time and propagate changes to each other. Active-active XDCR marks a substantial improvement in functionality over active-passive XDCR, in which only one active, changeable database exists, with one or more passive (aka standby or backup) databases that are constantly changed to match the active database. In the event that the active database becomes damaged or corrupted, business interruption can be avoided by rendering one of the passive/standby databases as newly “active”. Doing so, however, generally requires human intervention—a key shortcoming that no longer exists with active-active XDCR.
Active-active-active XDCR moves beyond active-active functionality to provide the vital benefit of “five nines” of data availability. This is achieved by deploying a third active database cluster that provides geo-redundancy of data (physical separation of data centers spanning multiple geographic locations) as well as enabling upgrade testing and deployment without impacting constant, ongoing data processing.
However, both of these advances in XDCR technology still leave a critical challenge unresolved: conflict resolution. Data conflicts are not only inevitable with XDCR deployments but can become an even more prevalent issue given 5G’s ultra-low latency. The ways in which most data platforms presently handle conflicts are increasingly insufficient to keep up with the demands of telcos.
There are two primary methods for conflict resolution:
The first method uses conflict-free replicated data types (CRDTs) to merge the numerical changes between conflicting transactions; however, this partial strategy only works for simple numerical events.
The second method, timestamp-based reconciliation, can be used to resolve any conflict; however, timestamp-based reconciliation simply chooses which transaction in a data conflict will be processed; generally the last transaction timewise. The other conflicting transaction or transactions simply vanish—a clearly unacceptable outcome, particularly if the deleted transaction is for your flight reservation to Hawaii!
The New Age of Cross Data Center Replication: VoltDB’s Active(N)™ Lossless XDCR
A new type of XDCR is needed to provide critical and complete conflict resolution at the database level and application level, allowing enterprise-grade networks to avoid losing data when their data centers go down, even in the face of massive data volume, velocity, and variety, while still providing ultra-reliable low latency.
To put it more simply: A new XDCR system is needed to make the new revenue opportunities of 5G a reality.
VoltDB’s Active(N) Lossless XDCR is the latest evolution of modern XDCR, designed to ensure data availability and prevent data loss while also still allowing for sub-millisecond low-latency running of telco networks across widely distributed locations.
Unlike other XDCR technologies, Active (N) Lossless XDCR uses timestamp-based reconciliation, but also uses Kafka to tell the application how it resolves each conflict (see figure 1 below), resulting in no deletion of transactions under the guise of “conflict resolution”.
VoltDB’s Active(N) Lossless Data Center Replication uses three (or more) data centers with high-availability clusters and enhanced data observability via Kubernetes to achieve active-active-active XDCR with application-level conflict resolution.
Using Active(N) Lossless XDCR, telcos and other organizations can finally achieve true, error-free functioning of enterprise-grade networks in the age of 5G, preventing data and revenue loss and empowering companies to take full advantage of 5G to successfully deliver new services yielding new revenue streams.
Active(N) Lossless XDCR provides telcos and enterprises with the unique, revenue-building power to:
- resolve conflicts at both the application level and the database level.
- ensure data resiliency and consistency at single-millisecond latencies regardless of where the data is stored, in single or with multiple data centers.