Below is our most recent newsletter. If you would like to subscribe and have it delivered to your inbox, you can do so here.
Over the past few years the NoSQL movement has been gaining in popularity. This is a movement away from the traditional RDBMS such as Oracle or MySQL, that use Structured Query Language (SQL), in favor of high performance data storage systems, that typically use call level interfaces for data storage and retrieval. There are dozens of products, open source and vendor provided, vying for adoption in this space. These products have arisen because of the need for technology teams to find an easier way to scale their data management needs while maintaining high availability, fast performance, and easy administration.
Supporters of the NoSQL movement point out that scaling databases is fraught with difficulties and problems. Even RDBMS supporters at times promote this view. It is believed that these new technologies have many advantages over databases including that they are much easier to scale across multiple nodes, that they are faster, and that they require much less administration. There are many companies currently using these data storage systems as either extended memory caches for the application or object caches to offload the database and are doing so with great success. We, in fact, often recommend this as a first step in scaling.
Data storage systems are numerous and there currently is no standard way of classifying all of them. Three often cited categories are key-value stores, extensible record stores, and document stores. My goal is not to classify or review all of the data storage systems available, rather I’d like to focus on some of the key attributes of each of these three classifications and provide some balanced thought about each.
ACID, CAP, BASE, and Cube
Before we begin we need to cover a few topics. The first is ACID, which is an acronym that stands for:
- Atomicity – all of the operations in the transaction will complete, or none will.
- Consistency – The database will be in a consistent state when the transaction begins and ends.
- Isolation – The transaction will behave as if it is the only operation being performed upon the database.
- Durability – Upon completion of the transaction, the operation will not be reversed.
Most RDBMS guarantee ACID properties. Because of these properties RDBMS can be difficult to scale. When you guarantee consistency of data and you have multiple nodes in your RDBMS cluster, such as with MySQL NDB, synchronous replication is used to guarantee data is written to multiple nodes upon committing the data. With Oracle RAC there is a central database but ownership of areas of the DB are shared among the nodes so write requests have to transfer ownership to that node and reads have to hop from requestor to master to owner and back. No matter the strategy for solving the multiple node plus consistency challenge, the application must wait on this to occur.
According to the Brewer Theorem, developed by Eric Brewer, there are three core requirements that exist when it comes to designing and deploying applications in a distributed environment. These are expressed in the acronym CAP:
- Consistency – The client perceives that a set of operations has occurred all at once.
- Availability – Every operation must terminate in an intended response.
- Partition Tolerance – Operations will complete, even if individual components are unavailable.
From this we have BASE, an acronym for architectures that solve CAP and stands for: Basically Available, Soft State, Eventually Consistent. By relaxing the ACID properties of consistency we have greater flexibility in how we scale.
If you are reading this you are probably familiar with the AKF Scale Cube but to quickly refresh your memory the cube consists of X, Y, and Z axes. The X-axis represents read-write splitting of databases such as done through master-slave or standby databases. The Y-axis represents splitting tables onto different database nodes based on which application service or module it pertains to. The Z-axis represents a splitting of the data within tables based on a modulus or lookup.
There are two primary problems that you must face when attempting to scale an RDBMS. As discussed above one of these problems is the requirement of maintaining consistency across the nodes. Eventually you are limited by the number of nodes that data can be synchronously replicated to or by their physical geographical location. The second problem is that data in the RDBMS has a relational structure. The tables of most OLTP systems are normalized to3rd Normal Form, where all records of a table have the same fields, non-key fields cannot be described by only one of the keys in a composite key, and all non-key fields must be described by the key. Within the table each piece of data is very related to others. Between tables there are often relationships, foreign keys. Most applications depend on the database to support and enforce these relationships. Requiring the database to do so makes it difficult to split the database without significant engineering intervention. The simpler the data model the easier it is to scale.
I would include in this group products such as Memcached, Tokyo Tyrant, and Voldemort. These products have a single key-value index for all data that is stored in memory. Some have the ability to write to disk for persistent storage. Across nodes some products use synchronous replication while others are asynchronous. These offer significant scaling and performance by utilizing a very simplistic data store model, the key-value pair. This is obviously somewhat limiting when the application needs to store and relate large amounts of diverse data. Additionally, the key-value stores that rely on synchronous replication still face the limitations that RDBMS clusters do which are a limit on the number of nodes and their geographical locations. Asynchronous replication provides the same eventually consistent model that an X-axis split in the AKF Scale Cube provides utilizing master-slave or standby databases.
Extensible Record Stores
In this group I would place Google’s proprietary BigTable and Facebook’s, now open source, Cassandra. These products use a row and column data model that can be split across nodes. Rows are sharded on primary keys and columns are broken into groups and placed on different nodes. This method of scaling is similar to the X and Y axes in our Scale Cube, where the X-axis split is read replicas and the Y-axis is separating the tables by services supported. In these products row sharding is done automatically but column splitting requires user definitions, similar to how it is performed in and RDBMS. These products also utilize an asynchronous replication providing eventually consistency. Even though similar scaling can be accomplished with an RDBMS there are still some innovations that are desirable such as Cassandra’s ability to automatically bring nodes in and out of the cluster. By the way, Cassandra in Greek mythology had the ability of prophecy but was cursed that no one would believer her.
I would put CouchDB, Amazon’s SimpleDB, and Yahoo’s PNUTS in this category. The data model used is called a “document” but is more accurately described as a multi-indexed object model. These use domains or collections as combinations of documents and can be queried on multiple attributes. They do not support ACID properties, instead they utilize asynchronous replication providing an eventually consistent model. This is similar to the read replication for x-axis scaling of RDBMS.
In summary, data store systems are very popular and rightly so. They have helped some very large companies scale quite effectively. But, just as cloud computing doesn’t automatically solve your application’s scaling problems, these don’t either. The constraints of 1) numbers and locations of nodes with synchronous replication and 2) the binding of nodes together when relating complex objects to one another, still remain. NoSQL data stores solve these problems by either relaxing the constraint (eventual consistency vs. guaranteed consistency) or restricting the data model (RDBMS table vs. key-value). As Ted Dziuba points out, even Google who owns the most proven highly scalable data storage system, BigTable, still runs AdWords on MySQL. Don’t misconstrue my intent, many of these solutions have places in high scalability architectures but make sure you understand the limitations of each before planning their implementations.
In Other News….
We were recently informed that The Art of Scalability has been selected to be translated into Japanese. We’re very excited about this and look forward to hopefully many other translations. If you haven’t picked up a copy here are links to a few online stores Amazon, Barnes and Noble, Borders, and InformIT.
If you are interested in a preview of the kinds of topics covered in the book check out our latest posts on GigaOM and VentureBeat. We’ve also been very busy on the blog with posts such as 97 Things – Book Review, Ethical Concerns of China, a leadership example from the Battle of Shilo, and one of the most popular posts 4 Things I Wish I’d Learned As An Undergraduate.