Posts Tagged ‘application monitoring software’

Fix Your Bugs

Thursday, April 23rd, 2009

Most of you should be familiar with the Microsoft Error Reporting service. If you are not, this is a service that when an error occurs in an application running on a Microsoft Operating System, such as Vista, it offers to report the problem to Microsoft in order that they “improve” your experience. What’s interesting about this service is the data.  They have undoubtedly gathered millions of errors over the years and have some pretty interesting insight into application errors. What I found most compelling is that if you only 1% of the bugs you will improve the experience for 50% of the users. 

 

I’m not sure if this error / customer impact rate extends perfectly to Web 2.0 or Software as a Service applications but I suspect it is not off by much. If you don’t mine your application’s error logs, you’re missing out on a plethora of insight into not only your application but more importantly your users’ experience.  Unless the error is coming from an offline process each error or set of errors is resulting in a frustrated user.  

We’ve talked in the past about monitoring your application, how much logging is necessary, and not relying on your customers to find problems. Custom application monitoring such as with SCAMP is ideal. However, unless you’ve turned off all logging you should still have web and app server logs to parse through starting today. There are lots of open source log parsers such as AWStats or Webalizer or if you’re the NIH-type there is always the option of building something custom using MySQL or Hadoop. 

Start today, looking through your log files for the top five errors and file bugs to have them fixed before the next release goes out the door. Make investigating log files part of your process especially after releases. Just simply the number of errors logged should give you some indication of the application’s performance compared to previous versions. Your customers will thank you for it.

To Log or Not To Log?

Monday, December 8th, 2008

That is the question that has caused debate for many years among operations and engineering staffs.  We’ve recently read a couple very well written and well thought out articles on this topic and wanted to offer our ideas on the debate.  The first article is by Todd Hoff from HighScalability.com who advocates in Log Everything All the Time, as the title implies, that everything should be logged for potential use.  Todd has another article describing Facebook’s open source Scribe, Product: Scribe – Facebook’s Scalable Logging System, where he observes the fact that Facebook must agree with his logging approach by virtue of their development of this product.  The other article titled, The Problem With Logging by Jeff Atwood of CodingHorror.com, argues for a more tempered approach.  Jeff summarizes his position as “Start small and simple, logging only the most obvious and critical of errors.”  

 

Our position is squarely in the camp of log everything but with a few caveats.  These ignore-at-your-application’s-peril cautions are 1) logging must not impede the performance of the application 2) use a common framework and 3) look at the data.  Let’s go through these one at a time.

 

1) Logging must not impede the performance of the application – As Jeff points out, “logging isn’t free” and we agree with that but we would add that the potential benefit of the data outweighs the resource cost, unless it negatively affects performance.  Get ready for one of our repeating themes:  Do it if the BUSINESS benefit of logging outweighs the cost of logging.  Most web / application servers are not utilized completely because most teams don’t know precisely the performance parameters and resource constraints of their application, especially as it changes with each release.  If you are fortunate enough to be in an organization that really understands the bottlenecks and performance of the application on specific hardware, more than likely there is a single resource that is the bottleneck, i.e. memory, i/o, or CPU.  Your logging service should not put further demand on a constrained resource, all surpluses are fair game.  And what should go without saying is all logging must be done asynchronously.  Losing a log event is acceptable but impacting a transactional event is not.

2) Use a common framework – Chose or build a common framework that is used throughout the application and that includes common definitions.  Just like definitions of Priorities and Severity for bugs are defined, logging definitions must be determined and adhered to.  Code reviews are a way to ensure common usage.  Data being sent to five different files in different formats defeat the purpose of logging, common usage, format, gathering, and analysis is where the payoff is realized.

3) Look at the data – Logging tons of data, and when we says tons think of Scribe that claims to handle 10’s of billions of messages each day, looking at this data is completely overwhelming.  But looking through some mechanism automated or manual is mandatory for the benefit to be gained.  As Todd points out there are products like Hadoop to help process the data into viewable and actionable information.  Jeff makes the point that “the more you log, the less you find”, but our point is that by the time you know you have a problem and need to inject logging you’re too late.  Properly logging and analyzing of the data will identify the problems early and make diagnosis easier.  We think products such as SCAMP application monitoring software are excellent for creating an easy way of seeing inside the application.

 

As long as you avoid the pitfalls stated above, we feel that logging can be a very beneficial addition to your quality assurance, scalability, availability initiatives.  We highly encourage you to read all the articles cited, both HighScalability and  CodingHorror are on our must-read list of blogs that we subscribe to.  As always let us know what you think.  I’m sure we have not heard that last of this great debate.