Archive for the ‘Engineering’ Category

Tuning versus Scaling

Wednesday, July 1st, 2009

We often talk to clients about the differences between tuning an application or platform and scaling it.  Often the clients intuitively see them as different things, but oddly still refer to them similarly as in “We scaled the database last year by tuning the top SQL queries and getting 20% better performance”.

Tuning is something that you absolutely must do, but we stress that it’s more of an operational endeavor than an architectural endeavor.  In the ideal world, as a typical part of your operation, you would focus some time monthly or quarterly on identifying the slowest portions of your product or the most costly in terms of hardware, engineering time or clock cycles.  These would then be prioritized based on cost to fix and expected return with changes being made to them as engineering allocations allow.  This process is just good housecleaning; throwing out old clothes, and books or getting rid of the motorcycle and lawn mower that no longer work to make room in a garage bay for other “stuff”.

And that’s just what you’re doing with tuning – throwing out old stuff to make room for new stuff.  Just as with the house analogy, you aren’t really making your house “bigger”, you are making the current house capable of holding different and ideally more valuable “stuff”.   If the occupants of your house aren’t growing (i.e. your business isn’t growing quickly), as with a new addition to a family or several new roommates, then this disposition of old less valuable stuff (old code that is seldom executed but costly, often executed code that is very costly) for newer more valuable stuff (new functionality or more efficient code) may be all you ever need to do.

Scaling, however, is an architectural effort that implies significant modifications to your current structure.  In the house analogy, it might mean adding a second floor or expanding the house.  Structurally the house is different and is capable of holding more “stuff” because the total floor space has increased, as compared to the tuning example where you just freed up existing floor space.  Scaling involves architectural effort in redesign of the underlying fundamentals of your system such as adding replicated read databases or splitting databases along data boundaries while tuning is just “spring cleaning” of your platform.

If you are experiencing rapid or hyper growth, you need to both tune your system for the sake of efficiency and performance and scale it for future growth.  Scaling is what allows you to achieve orders of magnitude of improvement (10x, 100x, etc), while tuning should afford linear growth and more efficient operation over time.

Continuous Deployment

Monday, June 22nd, 2009

You probably have heard of continuous integration that is the practice of checking code into the source code repository early and often.  The goal of which is to ease the often very painful process of integrating multiple developer’s code after weeks of independent work. If you have never had the pleasure of experiencing this pain, let me give you another example that we have experienced recently. In the process of writing The Art of Scalability, we have seven editors including an acquisition editor, a development editor, and five technical editors who all provide feedback on each chapter. Our job is to take all of this separate input and merge it back into a single document, which at times can be challenging when editors have different opinions for the direction of certain parts of the chapter. The upside of this process is that it does make the manuscript much better for having gone through the process. Luckily software engineering has developed the process of continuous integration designed to reduce wasted engineering effort. In order to make this process the most effective the automation of builds and smoke tests are highly recommended. For more information on continuous integration there are a lot of resources such as books and articles.

The topic of this post is taking continuous integration to an extreme and performing continuous deployment. And it is exactly what it sounds like, all code that is written for an application is immediately deployed into production. If you haven’t heard of this before you’re first thought is probably that this is the ultimate in Cowboy Coding but it is in use by some household technology names like Flickr and IMVU. If you don’t believe this check out code.flickr.com and look at the bottom of the page, last time I checked it said:

Flickr was last deployed 20 hours ago, including 1 change by 1 person.

In the last week there were 34 deploys of 385 changes by 17 people.

Eric Ries, co-founder and former CTO of IMVU, is a huge proponent of continuous deployment as a method of improving software quality due to the  discipline, automation, and rigorous standards that are required in order to accomplish continuous deployment. Other folks at IMVU also seem to be fans of the continuous deployment methodology as well from the post by Timothy Fitz. Eric suggest a 5 step approach for moving to a continuous deployment environment.

The topic of this post is taking continuous integration to an extreme and performing continuous deployment. And it is exactly what it sounds like, all code that is written for an application is immediately deployed into production. If you haven’t heard of this before you’re first thought is probably that this is the ultimate in ‘Cowboy Coding’ but it is in use by some household technology names like Flickr and IMVU. If you don’t believe this check out code.flickr.com and look at the bottom of the page, last time I checked it said:
Flickr was last deployed 20 hours ago, including 1 change by 1 person.
In the last week there were 34 deploys of 385 changes by 17 people.
Eric Ries, CTO of IMVU, is a huge proponent of continuous deployment as a method of improving software quality due to the  discipline, automation, and rigorous standards that are required in order to accomplish continuous deployment. Eric suggest a 5 step approach for moving to a continous deployment environment.
  1. Continuous Integration – Obviously before moving beyond integration into full deployment, this is a prerequisite that must be in place.
  2. Source Code Commit Checks – This feature which is available in almost all modern source code control systems,  allows the process of checking in code to halt if one of the tests fail.
  3. Simple Deployment Script – Deployment must be automated and have the ability to rollback, which we wholeheartedly agree with here and here.
  4. Real-time altering – Bugs will slip through so you must be monitoring for problems and have the processes in place to react quickly
  5. Root Cause Analysis – Eric recommends the Five Why’s approach to find root cause, whatever the approach, finding and fixing the root cause of problems is critical to stop repeating them.

Admittedly, this concept of developers pushing code straight to production scares me quite a bit, since I’ve seen the types of horrific bugs that can make their way into pre-production environments. However, I think Eric and the other continuous deployment proponents are onto something that perhaps the reason so many bugs are found by weeks of testing is a self-fulfilling prophecy. If engineers know their code is moving straight into production upon check in they might be a lot more vigilant about their code, I know I would be. How about you, what do you think about this development model?

Monitoring Strategies

Monday, June 15th, 2009

What questions do each of your system monitors answer? You probably think they answer questions such as “Is there a problem?” and if so “Where is the problem?” Most likely this is not the case and instead of telling you “Is there a problem?” it really only tells you “Where” or “What” the problem might be. Before we continue this, first a quick detour to discuss metrics, which while different than monitoring are very similar in many ways.

Eric Ries, co-founder and CTO of IMVU, posted an article about the difference between vanity metrics and actionable metrics.  The entire article and accompaning video are worth a read and listen but the take away is that most people are using and looking for metrics that are great soundbites but do not offer any definable actions.  One example is the total number of hits to a website.  Eric ask the questions “Now what? Do you really know what actions you took in the past that drove those visitors to you, and do you really know which actions to take next?” This makes total sense to me as we  often see teams misusing monitoring in an attempt to determine what actions to take with their systems.

Back to our discussion of what question your monitoring is attempting to answer. We think there are five evolutionary questions that monitoring should answer:

  1. Is there a problem?
  2. Where is the problem?
  3. What is the problem?
  4. Why is there a problem?
  5. Will there be a problem?

Where most people fail is using a monitoring tool that is designed to answer “Where” or “What” and try to use it to answer “Is”. For example, if you are monitoring all of your servers vitals such as CPU, memory, and I/O what is the appropriate action for your team to take when the CPU utilization goes to 100%? The reason that might be a tough question is that you are missing the vital piece of information “Is this affecting my customers?”.  The “Is there a problem” is intended to be a proxy for customer impact in order to help determine the degree and speed of escalation of the issue.

If you have monitoring services in place now it is worthwhile to determine what question each one answers. If you are missing a monitor for a particular question, the time to remedy it is before you need that question answered.

Interviewing Engineers Continued – Part III (The Cultural Interview)

Sunday, March 29th, 2009

The most often overlooked aspect of interviewing a candidate is interviewing the candidate for a cultural “fit” within your team.   An employee who doesn’t fit in with the company culture can ruin team dynamics and actually reduce your engineering throughput rather than raising it.  The radio show This American Life did a show about bad apples in December 2008, Ruining it for the Rest of Us, which does a great job covering the downside of employees with bad attitudes.  

The biggest failure most managers have in the area of interviewing for cultural fit is that they far too often spend somewhere between 1 and 3 total hours with a candidate before making a hiring decision.  On top of this, they may have 2 to 5 people spend an hour each with the candidate. None of this is sufficient to get a good feel for the candidate and whether they will work well within the team.  You and your team probably spend more time researching major purchases such as a car, a house or even a flat screen TV set.  You and your team are probably going to spend 9 to 12 hours a day, 5 days a week with the newly hired employee.  There’s a good chance you don’t spend that much time with your significant other, your family or your friends within a week.

The most obvious aspects of this interview is to ensure that the candidate has the right work ethic, shares similar goals and aspirations, and works and plays well with others.  This is honestly easier said than done, but it is nevertheless an absolute necessity if you are to be happy with your hiring decision.  Here are some suggestions to get you started:

  1. Make a list of your current cultural aspects and those to which you aspire.  Is your team a bunch of workaholics?  Do they often get together after work and “blow off steam” or are they largely socially introverted?  Do you have a culture where people talk openly about issues?  Do people work mostly at work, or do they take work home with them?  Does the team joke around a lot, or are they “buttoned up”?   Is this person a “grandstander” or does he or she have a “hero mentality” or are they all about making the team perform better?
  2. Identify the folks on your team who exemplify the culture you have or the culture to which you are attempting to move the team and have them be part of the interview team.  Be careful only to select “A” players as you don’t want “B” and “C” players kicking out a potential “A” player for the wrong reason.
  3. Distribute a set of cultural questions that everyone should ask as a baseline so that you are reviewing the same questions.  You should encourage people to create their own as well as you want both innovation in the process and a good set of “control” questions.

Be careful not to confuse the need for diversity with the need for cultural fit.  You absolutely want different perspectives and approaches in your team.  This diversity leads to the abolition of “group think” and creates a higher performing team.  But you don’t want to bring a person who is constantly pointing fingers into a team that looks to solve problems together.

Before you hire anyone, each member of the interview team should have spent enough time with the individual to be able to answer the question of whether they “like” the person.  You aren’t really interviewing the person to see whether they could be a friend as you are still their boss, but you had better be able to answer the following questions at a minimum before offering someone a job:

  1. Can I be with this person 9 to 12 hours a day and still enjoy working with them?
  2. Can this person offer something to the team as a contributor and will the team accept the person given the behaviors I’ve perceived during my hours of interview and discussion?
  3. Can I learn from this person?  Note – this doesn’t mean that the person is smarter or has more experience than you much as it is a question of whether you can form a relationship with this person such that when they have valuable information or experience, you will be willing to accept it from them.
  4. Can I coach this person?

We propose that if the answer to any of the above questions is “No”, then the person is not a fit for you or your organization.

Interviewing Engineers Continued – Part II (The Technical Interview)

Tuesday, March 24th, 2009

The technical interview is important because besides the code test and the phone screen, which both can be misleading, it is the only chance you get to see what the engineer brings to the table in terms of their technical prowess.  First we should probably address why the code test and phone screen can be misleading.  Depending on how you conduct each of these there is a very real possibility that unsavory candidates will cheat to some degree on these.  During the phone screen you may hear frantic typing in the background, a possible hint that they are googling your questions.  If the coding test is done remotely or if the same one is given to everyone it’s likely, at least according to this article and this one, that a high percentage of candidates will cheat on it.  So that leaves us with the in person technical interview as the one remaining check we have of how good they are technically.

To start with we think there are at least seven areas of knowledge in which every engineer should have at least a solid understanding if not proficiency.   These areas are: 

  1. Algorithms, big-O or asymptotic notation used to describe upper and lower bounds for algorithms and how growth can be constant, linear, or logarithmic.  
  2. Object oriented programming, the fundamentals of inheritance, encapsulation, abstraction, and polymorphism and how these are implemented or not in their particular language of choice. 
  3. Syntax, data structures, and type of a primary language, be able to write a method on the board without glaring syntactic errors.
  4. Design patterns, why they exist and at least a couple of the more common ones such as factory and bridge. 
  5. PDLC, know the difference between Waterfall and Iterative lifecycles and be able to argue for the appropriate uses of each.
  6. Source Code Control, should believe in the concept and have knowledge of at least one platform.
  7. Compiler / Interpreter, should have an understanding of the debugger and that they be able to read and step through the compiler output (assembly equivalent) in the debugger.

How do you put these into a technical interview?  We recommend start off by picking your favorite sorting algorithm, such as a bubble sort.  Ask the candidate something similar to the following:

  • What are the steps involved in a bubble sort?
  • What is the worst and average complexity (big-O notation) of a bubble sort? 
  • In the expected language that will be used every day, write a method implementing a bubble sort on a white board.
  • What OOP properties are displayed in the code?
  • If the sort was written in C++ with an iterator what design pattern is being followed?  {or some language appropriate design pattern question}
  • What PDLC methodology did they most closely demonstrate when writing the bubble sort?
  • How would you check in this piece of code?
  • What are the steps that compiler goes through when compiling the bubble sort code? 

In a few quick questions you have covered in a concrete fashion the seven areas of knowledge that an engineer should possess.  Do you have any favorite engineering interview questions?