AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth

Monitoring Strategies

What questions do each of your system monitors answer? You probably think they answer questions such as “Is there a problem?” and if so “Where is the problem?” Most likely this is not the case and instead of telling you “Is there a problem?” it really only tells you “Where” or “What” the problem might be. Before we continue this, first a quick detour to discuss metrics, which while different than monitoring are very similar in many ways.

Eric Ries, co-founder and CTO of IMVU, posted an article about the difference between vanity metrics and actionable metrics.  The entire article and accompaning video are worth a read and listen but the take away is that most people are using and looking for metrics that are great soundbites but do not offer any definable actions.  One example is the total number of hits to a website.  Eric ask the questions “Now what? Do you really know what actions you took in the past that drove those visitors to you, and do you really know which actions to take next?” This makes total sense to me as we  often see teams misusing monitoring in an attempt to determine what actions to take with their systems.

Back to our discussion of what question your monitoring is attempting to answer. We think there are five evolutionary questions that monitoring should answer:

  1. Is there a problem?
  2. Where is the problem?
  3. What is the problem?
  4. Why is there a problem?
  5. Will there be a problem?

Where most people fail is using a monitoring tool that is designed to answer “Where” or “What” and try to use it to answer “Is”. For example, if you are monitoring all of your servers vitals such as CPU, memory, and I/O what is the appropriate action for your team to take when the CPU utilization goes to 100%? The reason that might be a tough question is that you are missing the vital piece of information “Is this affecting my customers?”.  The “Is there a problem” is intended to be a proxy for customer impact in order to help determine the degree and speed of escalation of the issue.

If you have monitoring services in place now it is worthwhile to determine what question each one answers. If you are missing a monitor for a particular question, the time to remedy it is before you need that question answered.

Comments RSS TrackBack 5 comments

  • 构架设计的一些原则 « Cody

    in August 17th, 2009 @ 01:32

    […] Monitoring – Understand your application’s performance from a customer’s perspective. Monitor outside of your network and have tests that simulate a real user’s experience. Also […]

  • Airline Metrics | AKF Partners Blog

    in March 2nd, 2010 @ 18:51

    […] like Nagios to monitor the CPU, memory, and disk of all the servers. As we discuss in our post Monitoring Strategies, the first measurement to put in place should be something to measure from the customer’s […]

  • Crisis Management – Normal Accident Theory and High Reliability Theory | AKF Partners Blog

    in October 4th, 2012 @ 13:09

    […] make ongoing assessments and continual updates of this data.  We think our book and our post on monitoring strategies have some good suggestions on this […]

  • dragon 7 baccarat odds

    in October 13th, 2012 @ 20:14

    Heya this is kind of of off topic but I was wanting to know if blogs use WYSIWYG editors or if you have to manually code with HTML. I’m starting a blog soon but have no coding expertise so I wanted to get guidance from someone with experience. Any help would be enormously appreciated!

  • Signs That You May Be Disconnected From Your Business

    in February 13th, 2013 @ 21:35

    […] your systems will inevitability produce. For more details on our preferred strategy, visit one of our earlier blogs. You can also read more about the Design To Be Monitored principle in our book Scalability […]