Availability as a Feature
It doesn’t matter if you run a commerce site, a services product (such as a SaaS offering) or simply use your homepage to distribute information: The table stakes for playing online is high availability. So many companies just take for granted that they will be highly available because they have multiple instances of systems and multiple copies of their data. This assumption of availability will likely, at the very least, cost you significant pain and in the extreme cost you either significant market share or close your doors as a business. Customers expect the unachievable – 100% availability. At the very least you need to give them something close to that. What will happen to you if you have a data center failure? How about if a DBA accidentally drops a critical table in your production database? What will you do when that marketing campaign triggers a near overnight doubling of traffic? What happens when that new feature has a significant performance bug and gets adopted so quickly that it brings your entire site to its knees?
We often tell our clients that they should treat high availability as a feature. Unfortunately, it is a somewhat expensive feature that requires constant investment overtime to achieve and maintain. It is a must have feature that will only differentiate your firm if you have competitors who do not make similar investments; when competition exists, customers are more likely to leave a site for a competitor due to availability and performance issues than nearly any other reason. If you don’t believe us on this topic, just go ask the folks at Friendster.
Treating availability as a feature means measuring availability from a customer perspective rather than a systems or device perspective. How many times did customer requests not complete? In this regard, availability now becomes a percentage of failed transactions against an expected number of transactions. We define an approach to accomplish this in our first book “The Art of Scalability”. Every executive in the company should “own” the availability metric and understand its implication to the business. You should track how much you invest in availability over time and significant decreases in engineering or capital should be questioned as it may be an early indicator that you are under investing and a harbinger of hard times to come.
One of the most common failures we see is to assume that disaster recovery is something that only big companies need. Make no mistake about it, disasters do happen and given enough time they will happen to you. Data centers catch on fire, have water (sprinkler) discharges that ruin equipment, have complete power equipment failures that take hours to fix and are prone to damages from vehicles, earthquakes, employees and tornados. In our past lives as executives and current roles as advisors we’ve seen no less than 4 data center fires, 2 data centers incapacitated from earthquakes and tornados and one data center leveled by a truck running into it. You are never too young to invest in DR.
And DR need not break your bank. The dials of RTO (recovery time objective) and RPO (recovery point objective) allow you to determine how much you will invest. Perhaps you simply replicate your databases to a smaller set of databases at a remote datacenter and have a copy of each of your systems there with an additional copy ready “in the cloud”. While you won’t be able to run production from that data center, you may be able to leverage the cloud to add capacity for relatively low cost by cloning the cloud based systems. Such a solution has a fast recovery point objective (you lose very little data) and a moderate recovery time objective (several hours) for very low comparative cost. Of course, you would need to test the solution from time to time to show that it is viable, but it’s a cheap and effective insurance policy for the business.
So remember – availability is your most important feature. Customers expect it always and will run away from you to competitors if you do not have it. Create an availability metric and ensure that everyone understands it as a critical KPI. Evaluate the company spend against availability quarterly or annually as an additional indicator of potential problems. Assume that disasters happen and have a DR plan regardless of your company size.