Posts Tagged ‘Scalability Blogs and Reviews’

The Art of Scalability Update

Thursday, October 1st, 2009

Our book is still on track for an early to mid January 2010 hard launch.  We have completed roughly 1/3d of the copy editing process, having gone through the final round of editing on 11 chapters so far.  The book is available for pre-order at Amazon, Barnes and Noble and Borders.  You can also pre-order the book from the publisher, Addison-Wesley as well as get early web-access or pdf access to the pre-copy edited book.

Fish and I are discussing different options for signing books that clients and readers of our blog and newsletter purchase.  If you are interested, simply comment/respond to this post and we’ll work on logistics.

The Art of Scalability

Scalability Best Practices

Tuesday, August 11th, 2009

Here are a baker’s dozen of items that we feel are Best Practices for Scalability:

  1. Asynchronous - Use asynchronous communication when possible. Synchronous calls tie the availability of the two services together. If one has a failure or is slow the other one is affected.
  2. Swim Lanes – Create fault isolated “swim lanes” of hardware by customer segmentation. This prevents problems with one customer from causing issues across all customers. This also helps with diagnosis of issues and code roll outs.
  3. Cache - Make use of cache at multiple layers including object caches in front of databases (such as memcached), page or item caches for content (such as squid) and edge caches (such as Akamai).
  4. Monitoring - Understand your application’s performance from a customer’s perspective. Monitor outside of your network and have tests that simulate a real user’s experience. Also monitor the internal working of the application in terms of query and transaction execution count and timing.
  5. Replication - Replicate databases for recovery as well as to off load reads to multiple instances.
  6. Sharding - Split the application and databases by service and / or by customer using a modulus. While this requires slightly more complicated logic in the application it allows for massive scaling.
  7. Use Few RDBMS Features – Use the OLTP database as a persistent storage device as much as possible. The more you rely on the features offered in most RDBMS for your transactions, the greater load you are putting on the hardest item in your system to scale. Remove all business logic from the database such as stored procedures and move it into the application. When significant scaling is required join in the application and not through the SQL.
  8. Slow Roll – Roll out new code versions slowly, to a small subset of your servers without bringing the entire site down. This requires that all code be backwards compatible because you will have two versions of code running in production during the roll out. This method allows you to find problems that your quality and L&P testing missed while having minimal impact on customers.
  9. Load & Performance TestingTest the performance of the application version before it goes into production. This will not catch all the issues, which is why you need the ability to rollback, but it is very worthwhile.
  10. Capacity Planning / Scalability Summits – Know how much capacity you have on all tiers and services in your system. Use Scalability Summits to plan for the increased capacity demands.
  11. Rollback – Always have the ability to rollback a code release.
  12. Root Cause Analysis - Ensure you have a learning culture that is evident by utilizing Root Cause Analysis to find and fix the real cause of issues.
  13. Quality From The Beginning – Quality can’t be tested into a product, it must be designed in from the beginning.

To Succeed Big, Think Small

Wednesday, July 29th, 2009

I, like a lot of you, read lots of blogs and articles each week, some of my favorites include Joel on Software, Coding Horror, High Scalability, Seth Godin, and Tim Ferriss. I also read just about cover to cover the IEEE publications that I receive. I use a variety of methods to keep track of the ones that I like so that I can find them easily when I want to reference them. Two of my favorite tools to do this with are Evernote, where you can clip web pages with tags into your notebook, and ShareThis, where I can drop something in my sharebox and send it to someone at the same time.

I was having a discussion the other day about small services as business opportunities and recalled several threads that seemed to be tangentially related. I parsed through my clipped, starred, and shared items and found these pearls. Whether you are already part of a start-up or considering one here are some ideas that you might consider.

And since one of the technical editors of our book has pounded into my brain the demand to “explain how this relates to scalability”, let me explain. If you’ve read a few of our posts you know that we believe that scalability is about more than technology, in fact if done correctly it’s technology agnostic, but depends greatly on people, process, and architecture. This starts even before the business is founded. The right business plan that balances people and investment with real revenue earning products is critical to scaling. If your cost are too high to service a given number of customers you are losing money. Now you can argue that you’ll get efficiency of scale at some point but you need to survive long enough for that to occur. Without further ado, on to our amalgamation of advice:

Seth Godin says the way to make money on the Internet is by connecting people with what they need. He gives examples that include the following as well as many more:

Connect advertisers to people who want to be advertised to.
Connect job hunters with jobs.
Connect information seekers with information.
Connect teams to each other.

Kevin Kelly says the solution for inventors who are up against the giant aggregators like Amazon is to find 1,000 “true fans”.  If each one is willing to pay $100 each year for something that you invent or service that you provide you can make a great living. This might be four CDs that you produce each year. For those organizations comprised of more than just a solo artist, Kelly states that an increase in fans is necessary but is linearly proportional to the increase in the team size.  He continues that because of the network effect (Metcalfe’s Law) it is likely that the value of the your fans increases proportionally to the square of the number of fans, which means the number of true fans does not have to double to support a duet.

Paul Buchheit, the 23rd employee at Google and creator of Gmail says stick with it, overnight success takes a long time.  They started Gmail in 2001, launched it in 2004 and 7 years later is seen as a huge success with annual growth rates of 40%.

Matt from 37Signals advises to keep your day job and work on your start-up on the side. Even though he capitulates that starting a business does require plenty of time and effort, quitting puts a shot clock on your idea. When coupled with Buchheit’s notion of success is a measure of endurance not speed, this seems like sound advice when possible. For those already under the pressure of the shot clock just remember that is monetization is king and survival is a competitive advantage.

So far in our bucket of advice we have 1) connect people with what they need, 2) find 1,000 fans to support your dream, and 3) don’t jump in until you are able to stick with it for the long haul. What I can add to this is Think Small. Throw away the fifty page business plan that requires $25 million of investment to sustain a profitless company for seven years. Instead think of a single service that people or businesses need.

For Internet companies, there are dozens of services that they need as part of their product offering but only as a small part.  Therefore they either don’t have the expertise or they can’t dedicate the resources to build it well. An example of this is search. Lots of websites need search functionality but few are going to build it themselves when there is a site search tool built by experts available.

Services that come to mind and that either need to be done or need to be done better are contextual classification, yield optimization, micro-payments, recommendations, abstracted scaling, and application monitoring. Go give it a shot and think about scaling from day one of your business.

Scalability Summit

Tuesday, July 14th, 2009

One of the processes that we often recommend to clients is known as a Scalability Summit. The purpose of this summit is to identify which component in your application is most likely to prevent you from scaling. This idea of fixing the next bottlenecks or the next thing that is going to prevent you from scaling is how YouTube.com scaled. You can see a presentation by Cuong Do at a Google Tech Talk. About three minutes into the video Cuong expresses his algorithm for “Handling Rapid Growth” as:

while (true)

{

identify_and_fix_bottlenecks();

drink();

sleep();

notice_new_bottleneck();

}

YouTube’s growth was so rapid that this cycle of identifying the next bottleneck and fixing it was often weeks or even days. For all other “normal” scaling issues performing this bottleneck identification is usually done on a quarterly basis. When done at this interval we refer to them as Scalability Summits. We recommend that a select group of individuals should be invited to participate and discuss what they believe to be the next set of issues the platform will experience. The participants should include people representing architecture, operations, engineering.

When we run Scalability Summits we generally will go through this exercise twice. Once for the expected growth rate of the business and then once again for the expected growth rate multiplied by 10. So if you plan on growing by 200,000 users over the next quarter use that number first then use 2,000,000 users and identify which components would fail at those usage numbers.

Once these potential bottlenecks are identified they are prioritized by a return on investment analysis that takes into account factors such as how expensive it is to fix (in terms of both capital expenditure as well as personnel), the component’s Time To Break (how much growth it can sustain), and the severity in the event it does break.

The most important step comes after the Scalability Summit. A set amount of labor from each team must be set aside to focus on scalability related issues that come out of the summit. If a team spends several hours identifying bottlenecks that get ignored no one is going to participate again. As an organization you must take action on these or 1) you will likely experience these issues that will hamper your growth and 2) participants will lose interest in the process.

Twitter Posts

Thursday, June 25th, 2009

If you are like me you probably read hundreds of articles and posts each week. I often come across an interesting article related to scalabiltiy and share it on Twitter. For those of our blog readers who don’t follow us on Twitter I thought I’d share some linksthat I’ve recently posted:

If you have any favorite articles you’ve read in the last month share them with us in the comments.