Archive for the ‘CTO/CIO’ Category

The D-I-D Approach to Scalability

Monday, August 31st, 2009

Customers often ask us “When should we invest in scalability?”  The answer that we want to give, and that’s financially correct for your shareholders, is to deploy your scalability improvements the day before you need them.  If you could deploy scale improvements the day before the lack of those improvements would cause problem, you would delay investments to be “just in time” and gain the benefits that Dell brought to the world with configure to order systems married with just in time manufacturing.

But most of us have a hard time predicting when we need to deploy that next scalability improvement and an even harder time getting the project to come in just at the right date.  To that end, we developed the “Design-Implement-Deploy” or “D-I-D” approach to thinking about scalability.  Because there are multiple phases to any product enhancement, including the phase in which you start to think about it and design it (even if it’s just in your head), the phase in which you actually “code it” and the phase that you deploy it to production, we matched these phases to your needs for scalability.  Note that these phases don’t argue for a waterfall model.  If you are developing using Agile methods, you still hopefully “think” about how something might look or work before you start coding or developing it.

We start with the notion that discussing and designing something is significantly less expensive than actually implementing that design in code.  As such, we should be willing to spend more time discussing how we might scale something and sketching out a design for that scale well in advance of our need and to an extent significantly greater than our need.  We should, for instance, discuss how we might scale something to at least 20x greater than what we have now and ideally to nearly infinite capacity.  Many times, in SaaS environments, we could facilitate such scale along the Z axis by dedicating systems to individual customers though this would cost us a great deal and would break the leverage achieved by multi-tenancy.   Nevertheless, discussions such as these are beneficial to help us figure out what we need to do in the future and to what degree we need to scale our systems.  The focus then on design of the D-I-D scale model is on scaling to between 20x and infinity.  Intellectual costs are high, but engineering and asset costs are lower because we aren’t writing code and we aren’t deploying systems.  Scalability summits are a good way to identify the areas necessary to scale within the design phase of the D-I-D process.

We should then move to implementing our designs within our software, focusing on ensuring that we can scale to at least 3x our current size and up to 20x.  There might be cases where the cost of scaling 100x (or greater) our current size is not different than the cost of scaling 20x and if this is the case we might as well make those changes once rather than going in and making those changes multiple times.  This might be the case we are going to perform a modulus of our user base to spread across multiple (N) systems and databases.  We might code a variable Cust_MOD that we can configure over time between 1 (today) and 1,000 (5 years from now).  The cost of such changes are high in terms of engineering time, medium in terms of intellectual time (we already discussed the designs earlier in our lifecycle) and low in terms of assets as we don’t need to deploy 100x our systems today if we intend to deploy a modulus of 1 or 2 in our first phase.

The final phase is deployment.  Using our modulus example above, we want to deploy our systems in a just in time fashion; there’s no reason to have idle assets sitting around diluting shareholder value.  Maybe we put 1.5x of our peak capacity in production if we are a moderately high growth company and 5x our peak capacity in production if we are a hyper growth company.   Asset costs are high, and other costs range from low to medium.  Total costs tend to be highest for this category as to deploy 100x of your necessary capacity relative to demand would kill many companies.

The focus of your scale factor (20 to infinity, or 1.5x to 3x moving from design to deploy) varies with your rate of growth.  If you have 10x annual growth, you are going to want to be on the high end of our ranges whereas if you have 10% annual growth you can be on the lower end.  The chart below summarizes the discussion above.

DID Matrix

Scalability Summit

Tuesday, July 14th, 2009

One of the processes that we often recommend to clients is known as a Scalability Summit. The purpose of this summit is to identify which component in your application is most likely to prevent you from scaling. This idea of fixing the next bottlenecks or the next thing that is going to prevent you from scaling is how YouTube.com scaled. You can see a presentation by Cuong Do at a Google Tech Talk. About three minutes into the video Cuong expresses his algorithm for “Handling Rapid Growth” as:

while (true)

{

identify_and_fix_bottlenecks();

drink();

sleep();

notice_new_bottleneck();

}

YouTube’s growth was so rapid that this cycle of identifying the next bottleneck and fixing it was often weeks or even days. For all other “normal” scaling issues performing this bottleneck identification is usually done on a quarterly basis. When done at this interval we refer to them as Scalability Summits. We recommend that a select group of individuals should be invited to participate and discuss what they believe to be the next set of issues the platform will experience. The participants should include people representing architecture, operations, engineering.

When we run Scalability Summits we generally will go through this exercise twice. Once for the expected growth rate of the business and then once again for the expected growth rate multiplied by 10. So if you plan on growing by 200,000 users over the next quarter use that number first then use 2,000,000 users and identify which components would fail at those usage numbers.

Once these potential bottlenecks are identified they are prioritized by a return on investment analysis that takes into account factors such as how expensive it is to fix (in terms of both capital expenditure as well as personnel), the component’s Time To Break (how much growth it can sustain), and the severity in the event it does break.

The most important step comes after the Scalability Summit. A set amount of labor from each team must be set aside to focus on scalability related issues that come out of the summit. If a team spends several hours identifying bottlenecks that get ignored no one is going to participate again. As an organization you must take action on these or 1) you will likely experience these issues that will hamper your growth and 2) participants will lose interest in the process.

How Technical Should The CTO Be?

Thursday, July 9th, 2009

One of our earliest post was the Path To CTO/CIO, where we focused on not only the “path” but the path that would make you successful once you arrived in that position. One of the necessary skills that we mentioned you must gather along the way is “great technical experience”. We promised to revisit this topic in a later post so I thought I’d come back to this question of how technical does the CTO/CIO need to be? This is especially relevant for those individuals coming from a non-technical background but I think it is a question often asked by technologist as well. Do you need to have engineering and operations experience? Can you come from QA and become a CTO? Do you have to know how to code?

CTO and CIO jobs come in all shapes and sizes. In some businesses the CTO is the chief architect and not a manager, in others it is the VP or SVP of all technology teams. For the purpose of this discussion I’ll define the CTO/CIO as the role that has the technology organizations (such as engineering, quality assurance, operations, etc) reporting to them.

To be upfront about answering the question in the title of this post, I think a CTO should be very technical. I don’t think there is a prescribed path to the top technology job in a company and you don’t necessarily have to come up the technology ranks. I do, however, believe that possessing certain technical skills and experiences are far more likely to land you in that role than if you do not have them. More importantly, while these skills and experiences won’t guarantee your success in that role, they lack of them almost ensure problems or even failure. The skills and experiences that I mentioned fall into two categories, broad and deep.

Deep experiences and skills are ones that are most likely gained early in your career and should bring you proficiency in a subject area. For some this might be programming in a specific language, automating testing on a specific tool, or administering an operating system. If you believe Malcolm Gladwell in Outliers this process takes about 10,000 hours. Thinking of this in terms of travel or language, these deep experiences and skills are the kind you gain by living in a foreign country and becoming immersed in the culture and able to speak the language fluently. Deep experiences and skills are important because they develop in you a strong knowledge foundation that can be built upon when broadening your experiences. This deep foundation allows you to learn other technologies easier, similar to how proficiency in one foreign language makes the next one easier to learn. These deep experiences also give you a base of confidence that when peered with other experts provides credibility and when faced with uncertainty provides a history of solutions.

Broad experiences and skills are ones that are somewhat superficial but serve to give you a general understanding. Continuing our travel and language analogy, broad experiences and skills are the ones you acquire by spending a few weeks in another country and being able to get by asking for directions and food. The broad experiences that a CTO/CIO should have are working with multiple technology disciplines (engineering, quality assurance, architecture, operations, etc.) as well as business disciplines (marketing, finance, legal, etc). These experiences should serve to give you an understanding of their responsibilities, their day-to-day jobs, and most importantly their perspectives on technology and product development. You don’t have to have a job in each of these departments to gain this experience. Other ways to gain these include, acting as a liaison, serving on joint boards, working together on special projects, or volunteering to stay late to help the other teams accomplish their work.

Perhaps not prerequisites but rather as identifiers that will set you apart and prepare you well for the top technology role, look for establishing first deep skills and experiences. Once that foundation is firmly built then begin to broaden those through interdisciplinary work. Don’t forget that this is focusing of the technical skills and experiences. There are still other skills such as leadership, management, communication, and business that must be developed as well if you not only want the top technology job but want to keep it and do a great job while you are there. It’s not unusual for the most technical CTO’s to be the ones who need the most management and business coaching.

Having thrown down the gauntlet that a CTO must be technical it is only fair to address those who didn’t rise up through technology roles and are currently CTO’s or desire to be CTO’s.  We’ll save this for a future post but in the mean time break out a coding book.

UC Berkeley’s take on cloud computing

Wednesday, February 25th, 2009

Researchers at UC Berkeley have outlined their take on cloud computing in an paper “Above the Clouds: A Berkeley View of Cloud Computing.“ They cover a lot of material in this paper and it’s well worth reading.  Section 7 was particularly interesting to us because it covers the top 10 obstacles that companies must overcome in order to utilize the cloud.  According to them these are:

  • Availability of service
  • Data lock-in
  • Data confidentiality and auditability
  • Data transfer bottlenecks
  • Performance unpredictability
  • Scalable storage
  • Bugs in large distributed systems
  • Scaling quickly
  • Reputation fate sharing
  • Software licensing

 

These look very similar to our top five concerns that we outlined in our article on Venturebeat.com.  Our list was:

  • Security
  • Non-portability
  • Control (availability)
  • Limitations (non-persistent storage) 
  • Performance

Their article concludes with “Although Cloud Computing providers may run afoul of the obstacles …we believe that over the long run providers will successfully navigate these challenges…” They continue saying “Hence, developers would be wise to design their next generation of systems to be deployed into Cloud Computing.”  

We agree and reiterate our conclusion from “The Cloud Isn’t For Everyone

“Of course, most importantly, we should all keep an eye on how cloud computing evolves over the coming months and years. This technology has the potential to change the fundamental cost and organization structures of most SaaS companies. And as cloud providers mature, we’re sure they’ll address our top five concerns, becoming more viable companies in their own right.”

New Year’s Tech Resolutions

Friday, January 2nd, 2009

 

In the spirit of the New Year, we thought we would share our list of top things that you should consider putting on your technology’s roadmap for 2009.   

  1. Develop the ability to rollback: If you can make only one change to your product and process in 2009 and you don’t currently have the ability to rollback, this should be at the top of your list.  Being able to push code changes and then pull them back from production in the event of a problem will save you more customers and more effort than any other single item.
  2. Break changes into smaller pieces: There is almost never a need to redesign the entire site or service at once.  Break it into parts and take it one piece at a time.  This will be lower risk and give you an opportunity to learn along the way.
  3. Remove SPOFs:  Commit to removing all single points of failure in your architecture.  Single servers, firewalls, load balancers, power supplies, etc should all be listed and tackled one at a time until they are all eliminated
  4. Remove synchronous calls:  Having one service call another service in a synchronous manner causes a multiplicative effect of failure.  Five synchronous calls on servers with five 9’s availability (99.999% uptime) leads to a maximum of 99.995% for the system.  Eliminate synchronous calls wherever possible and create fault-isolative architectures to help you identify problems quickly.
  5. Incent a culture of excellence:  Hire the right people and hold them to high standards. Set aggressive yet achievable goals and motivate them with your vision. Be a leader.  
  6. Develop a disaster recovery plan: Disasters happen, no one expects an entire data center to be down but things like that happen.  Plan on it and start making changes today to keep your services up and running in the event of a disaster.
  7. Develop quality into the product from the start:  Don’t expect QA to ensure quality is built into the product.  We’ll post more about this in the New Year but quality starts much, much earlier in the product development life cycle.
  8. Split your application or database:  Start this year thinking about how to split your application and database.   We recommend our cube model for both because working on all three axes gives you unlimited scalability in both your app and database.
  9. Start Logging:  As we discussed in a recent post, start logging your application data but follow our three key guidelines – 1) logging must not impede the performance of the application 2) use a common framework and 3) look at the data
  10. Celebrate your success:  Take time now and throughout the year to look back and congratulate your team and yourself.  If you want to foster creativity on your team, celebrating victories is a great way to keep the energy up on your team.  If you are getting ready to present your 2009 goals to your team, we recommend that you start by focusing on the amazing accomplishments that you have had in 2008.

Wishing you and your teams a great New Year!   

 

-The AKF Team