tape measurer graphic measuring the word availability

As a company matures from a startup to a growing business, there are a number of measurables that become table stakes – basic tools for managing a business. These measurables include financial reporting statements, departmental budgets, KPIs, and OKRs. Another key measurable is the availability of your product or service and this measurable should be owned by the technology team.

When we ask clients about availability goals or SLAs, some do not have it documented and say something along the lines of “we want our service to always be available”. While a nice sentiment, unblemished availability is virtually impossible to achieve and prohibitively expensive to pursue. Availability goals must be relevant to the shared business outcomes of the company.

If you are not measuring availability, start. If nothing else, the data will inform what your architecture and process can do today, providing a starting point if the business chooses to pursue availability improvements.

Some clients who do have an availability measurable use a percentage of clock time – 99.95% for example. This is certainly better than no measurable at all, but still leaves a lot to be desired.

Reasons why clock time is not the best measure for availability:

  • Units of time are not equal in terms of business impact – a disruption during the busiest part of the day would be worse than an issue during a slow period. This is intrinsically known as many companies schedule maintenance windows for late at night or early in the morning, periods where the impact of disruption is smaller.
  • The business communicates in business terms (revenue, cost, margin, return on investment) and these terms are measured in dollars, not clock time.
  • Using the uptime figure from a server or other infrastructure component as an availability measure is inaccurate because it does not capture software bugs or other issues rendering your service inoperative despite the server uptime status.

slide showing outage of equal time plotted against company revenue at time of outage

Now that we’ve established that availability should be measured and that clock time is not the best unit of measure, what is a better choice? Transactional metrics aligned to the desired business outcome are the better choice.

Transactional Metrics

  • Rates – log transactional rates such as logins, add to cart, registration, downloads, orders, etc. Apply Statistical Process Control or other analysis methods to establish thresholds indicating an unusual deviation in the transaction rate.
  • Ratios – the proportion of undesired or failed outcomes such as failed logins, abandoned shopping carts, and HTTP 400s can be useful for measuring the quality of service. Analysis of such ratios will establish unusual deviation levels.
  • Patterns – transaction patterns can identify expected activity, such as order rates increasing when an item is first available for sale or download rates increasing in response to a viral social media video. The absence of an expected pattern change can signal an availability issue with your product or service.

Alignment with Desired Outcomes

What are the goals of your business? What is your value proposition? Choose metrics that comprehensively measure the availability of your product or service. The ability of a customer to buy a product from your website (login, search, add to cart, and check out). The proportion of file downloads successfully completed in less than 4 seconds. The success rate of posting a message to a social media platform and the ability of others to view it. Measuring availability with metrics aligned with the desired outcomes keeps the big picture at the forefront and helps business colleagues understand how the technology team contributes to value creation.

Summary

Not measuring availability is bad. Measuring it in clock time is better, but still leaves something to be desired. Measuring availability with transactional metrics tied to the desired business outcome is best. Don’t settle for better when you can be best.

Interested in learning more? Struggling with analyzing data? Unsure of how to apply architectural principles to achieve higher availability? Contact us, we’ve been in your shoes.


(Image Credit: Sarah Pflug from Burst)