The End of Scalability?
If you received any sort of liberal arts education in the past twenty years you’ve probably read or at least had an assignment to read Francis Fukuyama’s 1989 essay “The End of History?” If you haven’t read the article or the follow on book, Fukuyama argues that the advent of Western liberal democracy is the final form of human government and therefore it is the end point of humanity’s sociocultural evolution. He isn’t arguing that events will stop happening in the future but rather that democracy will become more and more prevalent in the long term, despite possible setbacks such as totalitarian governments for periods of time.
I have been involved, in some form or another, in scaling technology systems for nearly two decades, which does not take into account the decade before when I was hacking on Commodore PETs and Apple IIs learning how to program. Over that time period there have been noticeable trends such as the centralization/decentralization cycle within both technology and organizations. With regards to technology, think about the transitions from mainframe (centralized) to client/server (decentralized) to web 1.0 (centralized) to web 2.0/Ajax (decentralized) as an example of the cycle. The trend that has lately attracted my attention is about scaling. I’m proposing that we’re approaching the end of scalability.
As a scalability consultant who travels almost every week to clients in order to help them scale their systems, I don’t make this end of scalability statement lightly. However, before we jump into my reasoning, we first need to define scalability. To some scalability means that a system needs to scale infinitely no matter what the load over any period of time. While certainly ideal, the challenge with this definition is that it doesn’t take into account the underlying business needs. Investing too much in scalability before its necessary isn’t a wise investment for a business when there are other great projects in which to invest such as more customer facing features. A definition that takes this into account defines scalability as “the ability of a system to maintain the satisfaction of its quality goals to levels that are acceptable to its stakeholders when characteristics of the application domain and the system design vary over expected operational ranges.” [2:119]
The most difficult problem with scaling a system is typically the database or persistent data storage. AKF Partners teaches general database and applications scale theory in terms of a three-dimensional cube where the X-axis of the cube represents replication of identical code or data, the Y-axis represents a split by dissimilar functions or services, and the Z-axis represents a split across similar transactions or data. Having taught this scaling technique and seen it implemented in hundreds of systems, we know that by combining all three axes a system can scale infinitely. However, the cost of this scalability is increased complexity for development, deployment, and management of the system. But is this really the only option?
The NoSQL and NewSQL movement has produced a host of new persistent storage solutions that attempt to solve the scalability challenges without increased complexity. Solutions such as MongoDB, a self-proclaimed “scalable, high-performance, open source NoSQL database”, attempt to solve scaling by combining replica data sets (X-axis splits) with sharded clusters (Y & Z-axis splits) to provide high levels of redundancy for large data sets transparently for applications. Undoubtedly, these technologies have advanced many systems scalability and reduced the complexity of requiring developers to address replica sets and sharding.
But the problem is that hosting MongoDB or any other persistent storage solution requires keeping the hardware capacity on hand for any expected increase in traffic. The obvious solution to this is to host it in the cloud, where we can utilize someone else’s hardware capacity to satisfy our demand. Unless you are utilizing a hybrid-cloud with physical hardware you are not getting direct attached storage. The problem with this is that I/O in the cloud is very unpredictable, primarily because it requires traversing the network of the cloud provider. Enter Solid-State Drives (SSD).
Chris Lalonde, CEO of ObjectRocket a MongoDB cloud provider hosted entirely on SSDs, states that “Developers have been thinking that they need to keep their data set size the same size as memory because of poor I/O in the cloud and prior to 2.2.x MongoDB had a single lock, both making it unfeasible to go to disk in systems that require high performance. With SSDs the I/O performance gains are so large that it effectively negates this and people need to re-examine how their apps/platforms are architected.”
Lots of forward thinking technology organizations are moving towards SSDs. Facebook’s appetite for solid-state storage has made it the largest customer for Fusion-io, putting NAND Flash memory products in its new data centers in Oregon and North Carolina. Lalonde says “When I/O becomes cheap and fast it drastically changes how developers think about architecting their application e.g. a flat file might be just fine for storing some data types vs. the heavy over head of any kind of structured data.” ObjectRocket’s service offering also provides some other nice features such as “instant sharding” where through the click of a button provides an entire 3-node shard on demand.
Besides the advances being made in leveraging NoSQL and SSDs to allow applications to scale using Infrastructure as a Service (IaaS), there are advances in Platform as a Service (PaaS) offerings such as Google App Engine (GAE) that are also helping systems scale with little to no burden on developers. GAE allows applications to take advantage of scalable technologies like BigTable that Google applications use, allowing them to claim that, “Automatic scaling is built in with App Engine, all you have to do is write your application code and we’ll do the rest. No matter how many users you have or how much data your application stores, App Engine can scale to meet your needs.” While GAE doesn’t have customers as large as Netflix who run exclusively on Amazon’s Web Services (AWS) their customers do include companies like the Khan Academy, which has over 3.8 million monthly unique visitors to their growing collection of over 2,000 videos.
So with solutions like ObjectRocket, GAE, and the myriad of others that make it easier to scale to significant numbers of users and customers without having to worry about data set replication (X-axis splits) or sharding (Y & Z-axis splits), are we at the end of scalability? If we’re not there yet we soon will be. “But hold on” you say, “our systems are producing more and consuming more data…much more.” No doubt the amount of data that we process is rapidly expanding. In fact, according to the EMC sponsored IDC study in 2009, the amount of digital data in the world was almost 500 exabytes and doubling every 18 months. But when we combine the benefits we achieve from such advances as transistor density on circuits (Moore’s Law), NoSQL technologies, cheaper and faster storage (e.g. SSD), IaaS, and PaaS offerings, we are likely to see the end of the need for most applications developers to care about manually scaling their applications themselves. This will at some point in the future all be done for them in “the cloud”.
Where does this leave us as experts in scalability? Do we close up shop and go home? Fortunately, no, there are still reasons that application developers or technologists need to be concerned with splitting data replica sets and sharding data across nodes. Two of these reasons are 1) reduce risk and 2) improve developer efficiency.
As we’ve written about before, risk has several high-level components (probability of an incident, duration, and % of customers impacted). Google “GAE outages” or “AWS outages” or any other IaaS or PaaS provider and the word “outage” and see what you find. All hosting providers that I’m aware of have had outages in their not-so-distant past. GAE had a major outage on October 26, 2012 for 4 hours. GAE proudly states at the bottom of their outage post “Since launching the High Replication Datastore in January 2011, App Engine has not experienced a widespread system outage.” Which sounds impressive until you do the math and realize that this one outage caused their availability to drop to 99.975% for the entire year and a half that the service has been available. Not to mention that they have much more frequent local outages or issues that affect some percentage of their customers. We have been at clients when they’ve experienced incidents caused by GAE.
The point here is not to call out GAE, trust me all other providers have the exact same issue. The point is that when you rely on a 3rd party for 100% of your availability you by definition have their availability as your ceiling. Now add on your availability issues because of maintenance, code releases, bugs in your code, etc. Why is this? Incidents are almost always have multiple root causes that include architecture, people, and process. Everything eventually fails including our people and processes.
Given that you cannot reduce the probability of an incident to 0%, no matter whether you run the datacenter or a 3rd party provider does, you must focus on the other risk factors (reduce the duration and reduce the % of customers impacted). The way you achieve this is by splitting services (Y-axis splits) and by separating customers (Z-axis splits). While leveraging AWS’s RDS or GAE’s HRD provides cross availability zone / datacenter redundancy, in order to have your application take advantage of these you still have to do the work to split it. And if you want even higher redundancy (across vendors) you definitely have to do the work to split applications or customers between IaaS or PaaS providers.
Let’s say you’re happy with GAE’s 99.95% SLA which no doubt is pretty good especially when you don’t have to worry about scaling. But don’t throw away the AKF Scalability Cube just yet. One of the major reasons we get called in to clients is because their BOD or CEO aren’t happy with how the development team is delivering new products. They recall the days when there were 10 developers and features flew out of the door. Now that they have 100 developers, everything seems to take forever. The reason for this loss of efficiency is that with a monolithic code base (no splits for different services) all 100 developers trying to make changes and add functionality, they are stepping all over each other. There needs to be much more coordination, more communication, more integration, etc. By splitting the application in to separate services (Y-axis splits) with separate code bases the developers can split into independent teams or pods that makes them much more efficient.
We are likely to see continued improvement in IaaS and PaaS solutions that auto-scale and perform to such a degree that most applications will not need to worry about scaling because of user traffic. However, this does not obviate the need to consider scaling for greater availability / vendor independence or to improve a development teams efficiency. All great technologist, CTOs, or application developers will continue to care about scalability for these and other reasons.
1. Francis, F., The End of History? The National Interest, 1989. 16(4).
2. Duboc, L., E. Leiter, and D. Rosenblum, Systematic Elaboration of Scalability Requirements through Goal-Obstacle Analysis. 2012.
3. Abbott, M.L. and M.T. Fisher, The Art of Scalability: Scalable Web Architecture, Processes, and Organizations for the Modern Enterprise. 2009: Addison-Wesley Professional.