The Bug is in the Code!
We are engineers by training and vocation, so we understand what it is like to be a software developer. Too often during the course of any site or product problem we hear developers saying “It can’t be the code”. In our experience it is most often the case that the code is the problem. That is not to say that we have not seen our share of operating system, database, webserver and application server bugs, but statistically you are going to be right way more often by suspecting the code first. Here is why that is so.
As we mentioned, operating systems, databases and any other piece of third party or open source software including firmware have bugs. But these pieces of software are changed far less frequently than your SaaS application code and the amount of testing performed before a release is more often than not an order of magnitude or more than what you are performing. And that is okay, as you are working in two completely different worlds where the cost of a defect and the opportunity cost of a delay resulting from testing are much different. A bug in your code that slows your application from 2sec response to 5sec is terrible but you should be able to quickly recover from it assuming that you have designed for rollback and have processes to quickly “fix forward” any release. A bug in a database that causes a loss of data integrity is disastrous because hundreds of thousands of organizations rely on that database to keep their data safe. So, given the likely differences in code quality, defect density and change frequency, you would be better off always suspecting your code first but there is another reason as well.
A simple but golden rule is whatever changed last caused the problem. This is one reason we harp so much on a rigorous change management process. Since you likely update the code between ten and twenty times more often than you update a piece of infrastructure it is reasonable to suspect your frequently changing code is the culprit. Even with this overwhelming evidence, the argument that engineers will typically use is that the one place in the code that is responsible for the broken feature has been checked and is fine. The number of times we have seen a fourth, fifth or sixth attempt to find a defect in the code yield a bug would astound you, further proving our point that “the defect is in the code”. Not reading with a critical eye, knowing that the bug is there waiting to be found by you, will guarantee that you will not find the defect. Secondly, most code bases have a pretty high cyclomatic complexity. This is a fancy term for how many unique code paths exist in the code, usually broken down by class and method. If something has 50 – 100 logical paths most of us cannot keep them straight in our head and thus should be using unit tests to verify them, but that is for a different post.
The bottom line is have every engineering discipline look in earnest for the possible cause. The bug is in your code more often than not. As our childhood friend Dr. Seuss would say, it is 98 and 3/4% guaranteed.
*Image courtesy of krelic from flickr creative commons