In last week’s blog I questioned the need for DBAs, concluding that yes, DBAs are still valuable in the current data platform environment due to: 1. the need for people who understand the pre-RDBMS database world, and 2. The need for data ‘curators’ to enable rational sharing of data between microservices and to ensure compliance with regulatory standards such as GDPR.
Now I’d like to take a look at the same question— the need for DBAs—in light of NoSQL databases.
Why DBAs are here in the first place
The DBA role was invented when the RDBMS came of age. It came into existence because prior to the RDBMS every team of developers had its own plan and format for storing data, and the complexity increased geometrically as the number of teams involved increased.
So, while a lot of the ‘noise’ around traditional DBAs revolved around elaborate data modeling exercises that attempted to find a sort of ‘universal truth’ about the enterprise’s data, the real value was that there was somebody whose job it was to nail down a single corporate standard for representing a given piece of data and how it related to other pieces of data. This inevitably led to conflicts with developers, who found the oversight intrusive, time-consuming, and annoying.
Sadly, we can see that history is repeating itself with the transient popularity of microservices that manage their own state using NoSQL stores. In terms of data modeling and management, a lot of companies are now behaving the same way they were in 1985 in that they have teams of developers creating siloed and disconnected data structures but don’t see a linear increase in productivity as they increase the number of teams writing code, because data sharing and duplication is becoming a nightmare.
Now, we don’t directly see this in VoltDB, partly because we’re a SQL database and partly because we’ve never claimed to be a single enterprise data store for all your needs. Instead, we solve high-performance OLTP problems that are clearly defined but technically challenging.
NoSQL database complexity
The complexity of NoSQL products has dramatically increased, and now they need DBAs to run them.
One of the main selling points of NoSQL, compared to a legacy RDBMS, was that it was simple to own and operate.
There were two reasons for this:
- SQL and ACID were generally not implemented, which meant that the underlying product could be a lot simpler. Document databases in particular were promoted as not needing professional data modeling by a ‘use case neutral’ DBA.
- A limited number of use cases were envisaged. A traditional SQL database was expected to do OLTP and OLAP, possibly at the same time, so there tended to be far more settings and flags.
While most new databases fizzled, a few became outstandingly successful, and are now trying to position themselves as enterprise database platforms. This means adding SQL and ACID as well as making many internal changes to make diverse use cases run well.
Unfortunately, all of this creates new layers of complexity and leads to a proliferation of options, switches, and flags. These options create management overhead, especially when multiple groups of users with different goals are trying to work on the same platform. Sadly, the industry seems to be repeating the same mistakes the RDBMS vendors did by allowing operational complexity to explode. We’re even hearing that people are now implementing cost-based optimisers for their new SQL layers. As somebody with decades of DBA experience, let me assure you that a cost-based optimizer is a ‘Full Employment Act’ for DBAs.
The one-size-fits-all myth
The bottom line is that a database can’t be ‘all things to all people’ without making aspects of the product tunable and configurable, which in an enterprise environment creates a requirement for a DBA to manage it all, as a change that helps one application will hinder another.
You can’t be optimized for everything. Over the next two to three years, we can expect this to become more apparent, and I won’t be surprised if the major NoSQL players start following the playbook of a well-known Redwood City company by offering expensive ‘Enterprise Assistants’ that will use AI to try and fix the problems created by trying to support multiple conflicting use cases on the same platform at the same time.
Here at VoltDB we’re lucky because our focus on OLTP and decisioning use cases relieves us of the obligation to attempt to offer functionality that runs against the grain of what the product was built to do.
But wait! I’m using a database hosted in the cloud by a hyperscaler. I don’t need DBAs, right?
Just because your database is in a cloud doesn’t mean it can magically resolve the design challenges created by fifty different use cases trying to access the same data at the same time. And since the hyperscaler’s staff will have zero knowledge relating to your specific business, they aren’t going to be able to help.
From a perspective of getting acceptable performance, it’s slightly more complicated. In my experience, I’ve seen that in any organization about 95% of databases could be hosted in a cloud and would need minimal care and feeding, but they never needed attention anyway.
The other 5%, however, tend to be both performance-sensitive and directly linked to generating revenue for a given enterprise. They will be getting at least 90% of developer/DBA time when it comes to keeping them running in an acceptable manner.
Using a cloud-hosted version for such mission-critical systems is outwardly appealing,
- You will have multiple lawyers of abstraction and networking gear between you and the server, which can make optimization hard.
- You may not have the capability of tuning all the new knobs and levers that have been added, or of monitoring their effects at a low level.
- While you will be able to access DBA resources from the hyperscaler, they won’t know your business, and more importantly, will have a conflict of interest. A traditional ‘in-house’ DBA gets kudos and bonuses for speeding things up and doing more with fewer resources. A hyperscaler DBA will be incentivized to make you use as much of their product as possible, so the first solution to every performance problem is going to be ‘use more stuff’.
Conclusion: History is repeating itself, and DBAs are making a comeback
The bottom line is that a DBA’s role traditionally had two aspects. The first was managing and curating the enterprise’s data, to avoid duplication and errors. The second was making systems run at optimal speed to minimize use of resources. Neither of those two things are unique to enterprises that use legacy RDBMS products.
In fact, these needs exist at all large enterprises, and have for years. What we’re now seeing is that as new database technologies move from single use cases and start to enter the world of enterprise computing, they will end up operating under the constraints inherent to enterprise computing, which means they will need DBAs.