AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth


Defining Pods, Shards and Swim Lanes

In the course of our engagements we often have to pause for a few minutes to acquaint everyone with a few terms that we use. It is often the case that they have heard or even use some terms common in the industry. Three of these that are often used and/or confused are pods, shards, and swim lanes. Let’s start by defining each one and then explaining the differences

According to Merriam-Webster a shard is a small piece or part. Wikipedia defines a database shard as “…a method of horizontal partitioning in a database or search engine.” The term horizontal partitioning refers to a database design principle whereby rows of a database table are separated possibly onto physically distinct database servers.

A shard to AKF is an Z-axis split on the AKF Scale Cube. This involves splitting the tables in the database between two or more database servers based on some appropriate key such as customer ID or sales items. An X-axis split involves replicas such as read-only slaves or standbys that are complete copies of the primary database. The Y-axis splits are one done by service, which usually aligns to a sub-set of tables. An example of this would be pulling session off the primary database an onto it’s own database server.

One of our clients, Salesforce.com, uses the term pods especially for its Force.com software-as-a-service platform. Pods are self-contained sets of functionality that can consist of an app server or database. If a pod goes down because the platform isn’t running it, only the customers on that pod will be effected. Salesforce executives claimed that it delivered 99.95 percent uptime last year.

Swim Lanes
AKF uses the term “swim lane” to describe a failure domain or fault isolation architecture. A failure domain is a group of services within a boundary such that any failure within that boundary is contained and failures do not propagate outside. The benefit of such a failure domain is two-fold:

  1. Fault Detection: Given a granular enough approach, the component of availability associated with the time to identify the failure is significantly reduced. This is because all effort to find the root cause or failed component is isolated to the section of the product or platform associated with the failure domain.
  2. Fault Isolation: As stated previously, the failure does not propagate or cause a deterioration of other services within the platform. As such, and depending upon approach only a portion of users or a portion of functionality of the product is affected.

Between swim lanes synchronous calls are absolutely forbidden because any synchronous call between failure domains, even with appropriate timeout and detection mechanisms, is very likely to cause a cascading series of failures. An example of how this happens is in your database when one long running query slows down all the other queries competing for locks or resources.

Similarity and Differences
All of these terms describe similar architectures (splitting by customers or similar key) but they are done for different purposes. Shards are very specific to databases and don’t imply whether or not the application tier is sharded or not. The purpose of shards are to scale an RDBMS onto many different servers instead of larger hardware. Pods and Swim Lanes aim to achieve both scalability of the overall system (application and database) as well as achieve fault isolation.


RAC Rant

We’ve written about trying to use vendor features to scale but given how often we run across companies that have been convinced by vendors to rely on them, this topic is worth revisiting. To state it as directly as possible, every major SaaS company that has relied on a vendor, software or hardware, to scale them through hyper-growth has failed and had to solve the scale problem themselves.

Since Oracle World took place recently I’ve decided to use Oracle RDBMS as an example of failing to scale with vendor features. We have nothing against using Oracle as an RDBMS, even though there are open source options that can scale just as well, but let’s use one of their scalability features, Real Application Clusters (RAC), as an example. In Oracle’s own words RAC “…enables a single database to run across a cluster of servers, providing unbeatable fault tolerance, performance, and scalability with no application changes necessary.” A nice concept – to scale with “no application changes” – but this isn’t realistic with hyper-growth companies. One large reason is that RAC does not scale across multiple data centers, which is a requirement for hyper-growth companies since everything fails eventually including data centers. Even with the “Extended Distance Clusters” for RAC nodes, they only extend to 25 kilometers using Dark Fiber (DWDM or CWM) technology.

The use of RAC for increased availability is fine but you should review our post on the downside of using vendor features and how to negotiate with vendors. In particular you should be aware that by using this feature you have weakened your position during renewal negotiations. If you think your sales person is being nice by throwing in the RAC feature for a low price, think again. As soon as you start using this feature they have the upper hand in negotiations.

Enough of the RAC rant, especially since this is just one example of many that are out there. Hardware vendors, both servers and storage, are just as guilty of trying to convince SaaS companies to rely on them for scalability. Keep your destiny in your own hands and resist relying on short term solutions to long term problems.

Comments Off on RAC Rant

Why We Hate Stored Procedures

Here's a little color on our love-hate relationship with stored procedures.

Okay we really don’t hate stored procedures and have actually used them  extensively on past projects but we so often tell people to remove them  that we’ve been accused of hating them. Here’s the deal, as we mentioned  in a past post, two scalability best practices are 1) put as little  business logic in the database layer as possible and 2) use as few  features of the RDBMS as possible. Both of these are violated by the use  of stored procedures, if they contain any business logic. If the stored  procedures are simply instantiations of SQL data statements then you’re  probably okay.

Expanding on the two reasons for avoiding stored procedures mentioned  above, we’ll first discuss why business logic doesn’t belong in the  database layer. The biggest reason for this is that the database server  is likely the most expensive server in your system and the most  difficult to scale. Adding more computational processing on it makes it  the most expensive layer to perform these and requires you to upgrade  the hardware or split the database sooner than necessary. It is much  cheaper from a cost per computation perspective to have this business  logic processed on a much cheaper application server.

Avoiding the use of RDBMS features is important when it comes to  remaining vendor agnostic and getting the best possible price during  negotiations. Using features specific to certain database vendors locks  you in to that vendor. While you may never want to switch database  vendors it does reduce your negotiating strength. When you are locked in  to a vendor your best alternative to a negotiated agreement, BATNA, is  not good and their sales staff knows this.

While we don’t hate stored procedures we do seem them as an impediment  to scaling. If you already have them in place you may be hesitant to  spend the money moving them into the application logic. One method of  doing so at minimal incremental cost is to start today with the policy  that if any engineer touches a stored procedure they must move it. This  way the cost is small, because you’re already touching the code, and  spread out because it’s not done all at once.