One of the processes that we often recommend to clients is known as a Scalability Summit. The purpose of this summit is to identify which component in your application is most likely to prevent you from scaling. This idea of fixing the next bottlenecks or the next thing that is going to prevent you from scaling is how YouTube.com scaled. You can see a presentation by Cuong Do at a Google Tech Talk. About three minutes into the video Cuong expresses his algorithm for “Handling Rapid Growth” as:
YouTube’s growth was so rapid that this cycle of identifying the next bottleneck and fixing it was often weeks or even days. For all other “normal” scaling issues performing this bottleneck identification is usually done on a quarterly basis. When done at this interval we refer to them as Scalability Summits. We recommend that a select group of individuals should be invited to participate and discuss what they believe to be the next set of issues the platform will experience. The participants should include people representing architecture, operations, engineering.
When we run Scalability Summits we generally will go through this exercise twice. Once for the expected growth rate of the business and then once again for the expected growth rate multiplied by 10. So if you plan on growing by 200,000 users over the next quarter use that number first then use 2,000,000 users and identify which components would fail at those usage numbers.
Once these potential bottlenecks are identified they are prioritized by a return on investment analysis that takes into account factors such as how expensive it is to fix (in terms of both capital expenditure as well as personnel), the component’s Time To Break (how much growth it can sustain), and the severity in the event it does break.
The most important step comes after the Scalability Summit. A set amount of labor from each team must be set aside to focus on scalability related issues that come out of the summit. If a team spends several hours identifying bottlenecks that get ignored no one is going to participate again. As an organization you must take action on these or 1) you will likely experience these issues that will hamper your growth and 2) participants will lose interest in the process.