AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth

Tag » Technology/Internet

Scalability Warning Signs

Is your system trying to tell you that you're going to have scalability problems? We like to think that we couldn't have predicted problems at 10x our last year's traffic but there are often warning signs that we can heed if we know what to look for.

Unless you’re one of the incredibly lucky sites where the traffic spikes 100x overnight, scalability problems don’t sneak up on you. They give you warning signs that if you are able to recognize and react to appropriately, allow you to stay ahead of the issues. However, we’re often so head down getting the next release out the door that we don’t take the time to realize we’re experiencing warning signs until they become huge problems staring us in the face.  Here are a few of the warnings that we’ve heard teams talk about in the past couple of months that were clearly signs of problems on the horizon.

Not wanting to make changes – If you find yourself denying request for changes to certain parts of your system, this might be a warning sign that you have scalability issues with that component. A completely scalable system has components that can fail without catastrophic impact to customers. If you’re avoiding changes to a component because of the risk of problems this is a warning sign that you need to re-architect to eliminate or at least mitigate the risk.

Performance creep – If after each release you need to add hardware to a subsystem or you accept a performance degradation in a service you could have a scaling issue approaching quickly. Consistently increasing consumption of CPU or memory resources in a service with each release will lead you into an unsustainable situation. If today you’re comfortably sitting at 40% CPU utilization and you allow a modest 10% degradation in each release you have less than nine releases before you are well above 100% but the reality is you won’t get close to that without significant issues.

Investigating larger hardware – If you’ve started asking your vendors or VAR about bigger hardware you’re heading down the path of scalability problems. The scale of more computational resources per dollar is not linear, it’s closer to cubic or even exponential scales. Purchasing more expensive hardware might seem like the economical way out when you compare the cost of the first hardware upgrade versus developer time but run the calculation out several iterations. When you get to a Sun Fire™ E25K Server with Oracle Database 10g at a $6M price tag you might feel differently about the decision.

Asking vendors for advanced features – When you start exploring advanced options of your vendor’s software you’re likely heading down the path of increased complexity and this is a clear warning sign of scalability problems in your future. Besides potentially locking you into a vendor which lowers your negotiating power it puts the fate of your company in someone else’s hands, which wouldn’t make me sleep very well at night. See our post on using vendor features to scale for more information.

Watch out for these or similar warning signs that scalability problems are looming on the horizon. Dealing with the problems today while you have time to plan properly might not get you an award for being a firefighter but you’ll more likely deliver a quality product without costly interruption.

1 comment

Agile Architects

If you think agile development methods and architecture are at odds, think again. They can not only coexist but can thrive together to build better products and platforms.

We recently posted what agile is not, where we outlined questions that we often hear about agile development. Another question that is often raised is how to combine the seemingly long-term process of architecture with the short-term nature of agile development. We believe that architecture is not at odds with agile development and that the two can not only coexist but complement each other. To ensure your architecture standards are being integrated with each sprint, resulting in a scalable and available architecture, we rely on the Joint Architecture Development (JAD).

We’ve covered JAD before but as a recap, this is the process by which features are designed in a series of meetings where developers, architects, and tech operations come together to create the design. This multi-function representation early in the development process ensures that individuals are aware of standards, there is buy-in from all concerned parties, and that the design is benefitted by the knowledge that exists in different technical fields.

In the IEEE Software article “Architects as Service Providers,” Roland Faber says that “the architect role is to champion system qualities and stability, while the developer role is to champion the system functions and change.” The agile architect interacts frequently and flexibly with the developers, building a trust relationship with them.

The JAD is ideal for flexible interaction that can happen in short bursts of effort that correspond to sprints. The agile architect must understand that, because of the nature of agile development, architecture must be dynamic and not static. Architects must rely on personal interaction with developers not documentation to understand the requirements.

Faber continues in his article describing two phases of the architecture process as preparation and support. During preparation the architect engages in processes such as prepare rules, frameworks, and structures. During support the architect helps resolve conflicts, engages in firefighting, and stimulates architecture communication. He makes a point that if the developers don’t believe the architects will provide support they won’t tell them when they are breaking the rules.

1 comment

Why Can’t I Outsource Everything?

Our second article on outsourcing, which digs into the competitive considerations of outsourcing engineering and/or operations.

Since writing our view on outsourcing, we’ve received a number of questions.  While most people indicate that the article was useful, the most common question is “Why can’t I just outsource everything?”  Here’s your answer:

You absolutely CAN outsource (or even purchase) everything.  The consideration of whether or not to outsource, as we’ve indicated earlier, is roughly the same as whether or not you should buy something.  The two differences are cost and the ease with which someone else can do the same thing you do.  “Buying” something (I mean off the shelf, packaged software, software as a service, etc) means that it’s available to just about everyone today – potentially with some small modifications to fit their business needs.  As such it usually costs less than outsourcing but is also more easily accessible and implementable by your potential competition.  “Outsourcing” something means that you are going to have someone else implement (code) your idea or run your servers (in a hosted rather than a SaaS model), which usually implies higher cost and a bit more difficulty in transferring technology.

In either scenario, you must be willing to say that you are willing to be like “everyone else”.  In other words, you are willing to give up the competitive differentiation that a homegrown solution might offer such as creating a higher barrier to entry, lower barriers to exit, switching costs, etc.  If an outsourcer can develop your code they will take that experience and apply it to someone else.  They may not use the actual code they write for you, but they simply can’t help but use the past experience.  This means that the job to copy you just got a little easier, which in turn means that you lowered the barrier to entry for competition.  And of course if you purchase a solution, then you are also making a decision that you will not differentiate yourself in that particular area.

None of this is bad.  In fact, there are many cases where you SHOULD outsource or purchase software or services.  Most companies and organizations tend towards isomorphism, which means that over time they all look (or should look) to leverage the best known practices to increase efficiencies and reduce costs.  It’s hard to imagine that you are going to differentiate yourself in your accounting systems, customer support systems, sales lead systems, etc.  You might add a unique set of routing rules, etc – but these systems are so standard that the best practices are built in to most pieces of software.

From a product perspective, if your business objective is to be a “low price leader” rather than to compete on technology or to simply “run with the pack” and use standard features while maintaining good margins then it also makes sense to buy or outsource.

But what if you want to have the world’s best product, stock, or media recommendation engine?  By definition you can’t “buy” that as everyone else would have the same thing.  If you outsource it, everyone else might not have your code but the firm that develops it for you can’t help but add it to their experience; they might not copy it but it certainly will influence their future activities.

As we’ve described before – don’t outsource or buy those things that you feel should or will differentiate your business.

1 comment

Revisiting the 1:10:100 Rule

Has the 1:10:100 rule changed? We think so, though the principles still hold true.

If you have any gray in your hair, you likely remember the 1:10:100 rule.  Put simply, the rule indicates that the cost of defect identification goes up exponentially with each phase of development.  It costs a factor of 1 in requirements, 10 in development, 100 in QA and 1000 in production. The increasing cost recognizes the need to go back through various phases, the lost opportunity associated with other work, the amount of people and systems involved in identifying the problem, and end user (or customer) impact in a production environment. In a 2002 study by the National Institute of Standards and Technology the estimated cost of software bugs was $59.5 billion annually, half the cost borne by the users and the other by the developers.

While there is an argument to be made that Agile development methods reduce this exponential rise in cost, Agile alone simply can’t break the fact that the later you find defects, the more it costs you or your customers.   But I also believe it’s our jobs as managers and leaders to continue to reduce this cost between phases – especially in production environments.  If the impact in the production environment is partially a function of 1) the duration of impact, 2) the degree of functionality impacted, and 3) the number of customers impacted, then reducing any of these should reduce the cost of defect identification in production.  What can we do besides considering Agile methods?

There are at least three approaches that significantly reduce the cost of finding production problems.  These are “swimlaning”, having the ability to roll back code in XaaS environments (our term for anything as a service), and real time monitoring of  business metrics.  These approaches affect the number of customers impacted and the duration of the impact respectively.

Swim Lanes

We think we might have coined the term “swimlaning” as it applies to technology architectures.  Swimlaning, as we’ve written about on this blog as well as in the book, is the extreme application of the “shard” or “pod” concept to create strict fault isolation within architectures.  Each service or customer segment gets its own dedicated set of systems from the point of accepting a request (usually the webserver) to the data storage subsystem tier that contains the data necessary to fulfill that request (a database, file system or other storage system).  No synchronous communication is allowed across the “swimlanes” that exist between these fault isolation zones.  If you swimlane by the Z axis of scale (customers) you can perform phased rollouts to subsets of your customers and minimize the percentage of your customer base that a rollout impacts.  An issue that would otherwise impact 100% of your customers now impacts 1%, 5% or whatever the smallest customer swimlane is.  If swimlaned by functionality, you only lose that functionality and the rest of your site remains functioning.  The 1000x impact might now be 1/10th or 1/100th the previous cost.  Obviously you can’t have less cost than the previous phase, as you still need to perform new work, but the cost must go down.


Ensuring that you can always roll back recently released code reduces the duration of customer impact.  While there is absolutely an upfront cost in developing code and schemas to be backwards compatible, you should consider it an insurance policy to help ensure that you never kill your customers.  If asked, most customers will probably tell you they expect that you can always roll back from major issues.   One thing is for certain – if you lose customers you have INCREASED rather than decreased the cost of production issue identification.  If you can isolate issues to minutes or fractions of an hour in many cases it becomes nearly imperceptible.

Monitoring Business Metrics

Monitoring the CPU, memory, and disk space on servers is important but ensuring that you understand how the system is performing from the customer’s perspective is crucial. It’s not uncommon to have a system respond normally to an internal health check but be unresponsive to customers. Network issues can often provide this type of failure. The way to ensure you catch these and other failures quickly is to monitor a business metric such as logins/sec or orders/min. Comparing these week-over-week e.g. Monday at 3pm compared to last Monday at 3pm, will allow you to spot issues quickly and rollback or fix the problem, reducing the impact to customers.

Comments Off on Revisiting the 1:10:100 Rule

From Technician to Engineer

We’ve had a couple posts on this topic of engineer vs craftsman vs technician but I found myself in the past couple of days discussing this topic in two different settings and thought it would be fun to revisit on the blog. I started the conversation with a post that quoted Tom DeMarco concluding that “software engineering is an idea whose time has come and gone.” Also quoted was Jeff Atwood stating that “If you want to move your project forward, the only reliable way to do that is to cultivate a deep sense of software craftsmanship and professionalism around it.” Marty picked up the conversation in another post stating that:

All of this goes to say that software developers rarely “engineer” anything – at least not anymore and not in the sense defined by other engineering disciplines. Most software developers are closer to the technicians within other engineering disciplines; they have a knowledge that certain approaches work but do not necessarily understand the “physics” behind the approach. In fact, such “physics” (or the equivalent) rarely exist. Many no longer even understand how compilers and interpreters do their jobs (or care to do so).

I think it is possible that we are seeing the evolution of our discipline as it struggles to determine what its final form will take. Computer science, information technology, software engineering, and other related disciplines are all relatively new fields of study. With a new discipline it should be expected that definitions and themes will need to be stretched or reconsidered.

Software engineers, similar to other engineering disciplines who are taught more “true” laws such as Newton’s or Faraday’s, also undergo something of an apprenticeship once their degree is conferred. Mechanical and electrical engineers work beside senior engineers who help them transition from the theoretical to the practical. Software engineers are often apprenticed in the same manner by more senior engineers. If the practical implementation of one discipline is considered engineering because it is based upon laws and principles, I would argue that the principles of architecture for scalability are of a similar nature. This in fact I think is a strong differentiator between technicians and engineers within the software development discipline. A technician can write code, setup a database, or administer a server. An engineer can architect a database or system or pool of servers such that it can scale. We’ve written several posts about the principles and patterns of scalability and a large part of our book is dedicated to these principles. Are these sufficiently established to be called a principle as defined in Wikipedia?

A principle is one of several things: (a) a descriptive comprehensive and fundamental law, doctrine, or assumption; (b) a normative rule or code of conduct, and (c) a law or fact of nature underlying the working of an artificial device

I still like the idea of software developers as craftsmen and -women but as I concluded in the other post, that discussion for me is as much about organizational size and control as it is anything else. The technician vs engineer discussion I think is best held in the light of are they or are they not applying laws or principles. As the American Engineers’ Council for Professional Development defines engineering “The creative application of scientific principles to design or develop…” Have we as a discipline, especially in terms of scalability, advanced enough to call what we use “principles”? Let us know what you think.


No Such Thing As a Software Engineer

Mike Fisher recently blogged about all the recent activity decrying the death of software engineering in his post “Engineering or Craftsmanship”.  The two terms should never have been stuck together in the first place.  Compared to the “true” engineering disciplines, the construct is as ridiculous as the term “sanitation engineer”.

Most other engineering disciplines require school trained engineers with deep technical and scientific knowledge to accomplish their associated tasks.  There probably aren’t many ground breaking airplanes designed by people who do not understand lift and drag, few ground breaking electronic devices designed by people who don’t understand the principles of electromotive force and few skyscrapers designed by people who do not understand the principles of statics and dynamics.  This isn’t to say that such things haven’t happened (e.g. the bicycle manufacturers turned airplane pioneers named the Wright brothers), but rather that these exceptions are examples of contributions by geniuses and savants rather than the norm.

The development of software is simply different than the work performed within true engineering disciplines.  With just a little bit of time and very little training or deep knowledge, one can create a website or product that is disruptive within any given market segment.  You don’t need to learn a great deal about science or technology to begin being successful and you need not be a genius.  The barrier to entry to develop a business changing service on the internet simply isn’t the same as the knowledge necessary to send a man to the moon.  Software, as it turns out, simply isn’t “rocket science”.   To develop it we don’t need a great deal of scientific or technical experience and it’s really not the application of a “real science” (one with immutable laws etc) as is say electrical engineering.

Sure, there are some people who as a result of training are better than other people and there is still incredible value in going to school to learn the classic components of computer science such as asymptotic analysis.  Experience increases one’s ability to create efficient programs that reduce the cost of operations, increase scalability and decrease the cost of development.  But consider this, many people with classical engineering backgrounds simply walk into software development jobs and are incredibly successful.  Seldom is it the case that a software engineer without an appropriate undergraduate engineering background will walk into a chemical, electrical or mechanical engineering position and start kicking ass.

The “laws” that developers refer to (Brooks Law, Moore’s Law, etc) aren’t really laws as much as they are observations of things that have held true for some time.  It’s entirely possible that at some point Moore’s Law won’t even be a “law” anymore.  They just aren’t the same as Faraday’s Law or Bernoulli’s Principle.  It’s a heck of a lot easier to understand an observation than it is to understand, “prove” or derive the equations within the other engineering disciplines.  Reading a Wikipedia page and applying the knowledge to your work is not as difficult as spending months learning calculus so that one can use differential equations.

All of this goes to say that software developers rarely “engineer” anything – at least not anymore and not in the sense defined by other engineering disciplines.  Most software developers are closer to the technicians within other engineering disciplines; they have a knowledge that certain approaches work but do not necessarily understand the “physics” behind the approach.  In fact, such “physics” (or the equivalent) rarely exist.  Many no longer even understand how compilers and interpreters do their jobs (or care to do so).

None of this goes to say that we should give up managing our people or projects.  Many articles decry the end of management in software, claiming that it just runs up costs.  I doubt this is the case as the articles I have read do not indicate the cost of developing software without attempting to manage its quality or cost.  Rather they point to the failures of past measurement and containment strategies as a reason to get rid of them.  To me, it’s a reason to refine them and get better.  Agile methods may be a better way to develop software over time, or it may be the next coming of the “iterative or cyclic method of software development”.  Either way, we owe it to ourselves to run the scientific experiment appropriately and measure it against previous models to determine if there are true benefits in our goal of maximizing shareholder value.


Engineering or Craftsmanship

Having gone through a computer science program at a school that required many engineering courses such as mechanical engineering, fluid dynamics, and electrical engineering as part of the core curriculum, I have a good appreciation of the differences between classic engineering work and computer science. One of the other AKF partners attending this same program along with me and we often debate whether our field should be considered an engineering discipline or not.

Jeff Atwood posted recently about how floored he was reading Tom DeMarco’s article in IEEE Software, where Tom stated that he has come to the conclusion that “software engineering is an idea that whose time has come and gone.” Tom DeMarco is one of the leading contributors on software engineering practices and has written such books as Controlling Software Projects: Management, Measurement, and Estimation, in which it’s first line is the famously quoted “You can’t control what you can’t measure.” Tom has come to the conclusion that:

For the past 40 years, for example, we’ve tortured ourselves over our inability to finish a software project on time and on budget. But as I hinted earlier, this never should have been the supreme goal. The more important goal is transformation, creating software that changes the world or that transforms a company or how it does business….Software development is and always will be somewhat experimental.

Jeff concludes his post with this statement, “…control is ultimately illusory on software development projects. If you want to move your project forward, the only reliable way to do that is to cultivate a deep sense of software craftsmanship and professionalism around it.”

All this reminded me of a post that Jeffrey Zeldman made about design management in which he states:

The trick to great projects, I have found, is (a.) landing clients with whom you are sympatico, and who understand language, time, and money the same way you do, and (b.) assembling teams you don’t have to manage, because everyone instinctively knows what to do.

There seems to be a theme among these thought leaders that you cannot manage your way into building great software but rather you must hone software much like a brew-master does for a micro-brew or a furniture-maker does for a piece of furniture. I suspect the real driver behind this notion of software craftsmanship is that it if you don’t want to have to actively manage projects and people you need to be highly selective in who joins the team and limit the size of the team. You must have management and process in larger organizations, no matter how professional and disciplined the team. There is likely some ratio of professionalism and size of team where if you fall below this your projects breakdown without additional process and active management. As in the graph below, if all your engineers are apprentices or journeymen and not master craftsmen they would be lower on the craftsmanship axis and you could have fewer of them on your team before you required some increased process or control.

Continuing with the microbrewery example, you cannot provide the volume and consistency of product for the entire beer drinking population of the US with four brewers in a 1,000 sq ft shop. You need thousands of people with management and process. The same goes for large software projects, you eventually cannot develop and support the application with a small team.

But wait you say, how about large open source projects? Let’s take Wikipedia, perhaps the largest open project that exists. Jimbo Wales, the co-founder of Wikipedia states “…it turns out over 50% of all the edits are done by just .7% of the users – 524 people. And in fact the most active 2%, which is 1400 people, have done 73.4% of all the edits.” Considering there are almost 3 million English articles in Wikipedia, this means each team working on a article is very small, possibly a single person.

Speaking of Wikipedia, one of those 524 people defines software engineer as “a person who applies the principles of software engineering to the design, development, testing, and evaluation of the software and systems that make computers or anything containing software, such as chips, work.” To me this is too formulaic and doesn’t describe accurately the application of style, aesthetics, and pride in one’s work. I for one like the notion that software development is as much craftsmanship as it is engineering and if that acknowledgement requires us to give up the coveted title of engineer so be it. But I can’t let this desire to be seen as a craftsman obscure the concept that the technology organization has to support the business as best as possible. If the business requires 750 engineers then there is no amount of proper selection or craftsmanship that is going to replace control, management, measurement, and process.

Perhaps not much of a prophecy but I predict the continuing divergence among software development organizations and professionals. Some will insist that the only way to make quality software is in small teams that do not require management nor measurement while others will fall squarely in the camp of control, insisting that any sizable project requires too many developers to not also require measurements. The reality is yes to both, microbrews are great but consumers still demand that a Michelob purchased in Wichita taste the same as it does in San Jose. Our technological world will be changed dramatically by a small team of developers but at the same time we will continue to run software on our desktops created by armies of engineers.

What side of the argument do you side with engineer or craftsman?


Continuous Deployment

You probably have heard of continuous integration that is the practice of checking code into the source code repository early and often.  The goal of which is to ease the often very painful process of integrating multiple developer’s code after weeks of independent work. If you have never had the pleasure of experiencing this pain, let me give you another example that we have experienced recently. In the process of writing The Art of Scalability, we have seven editors including an acquisition editor, a development editor, and five technical editors who all provide feedback on each chapter. Our job is to take all of this separate input and merge it back into a single document, which at times can be challenging when editors have different opinions for the direction of certain parts of the chapter. The upside of this process is that it does make the manuscript much better for having gone through the process. Luckily software engineering has developed the process of continuous integration designed to reduce wasted engineering effort. In order to make this process the most effective the automation of builds and smoke tests are highly recommended. For more information on continuous integration there are a lot of resources such as books and articles.

The topic of this post is taking continuous integration to an extreme and performing continuous deployment. And it is exactly what it sounds like, all code that is written for an application is immediately deployed into production. If you haven’t heard of this before you’re first thought is probably that this is the ultimate in Cowboy Coding but it is in use by some household technology names like Flickr and IMVU. If you don’t believe this check out code.flickr.com and look at the bottom of the page, last time I checked it said:

Flickr was last deployed 20 hours ago, including 1 change by 1 person.

In the last week there were 34 deploys of 385 changes by 17 people.

Eric Ries, co-founder and former CTO of IMVU, is a huge proponent of continuous deployment as a method of improving software quality due to the  discipline, automation, and rigorous standards that are required in order to accomplish continuous deployment. Other folks at IMVU also seem to be fans of the continuous deployment methodology as well from the post by Timothy Fitz. Eric suggest a 5 step approach for moving to a continuous deployment environment.

The topic of this post is taking continuous integration to an extreme and performing continuous deployment. And it is exactly what it sounds like, all code that is written for an application is immediately deployed into production. If you haven’t heard of this before you’re first thought is probably that this is the ultimate in ‘Cowboy Coding’ but it is in use by some household technology names like Flickr and IMVU. If you don’t believe this check out code.flickr.com and look at the bottom of the page, last time I checked it said:
Flickr was last deployed 20 hours ago, including 1 change by 1 person.
In the last week there were 34 deploys of 385 changes by 17 people.
Eric Ries, CTO of IMVU, is a huge proponent of continuous deployment as a method of improving software quality due to the  discipline, automation, and rigorous standards that are required in order to accomplish continuous deployment. Eric suggest a 5 step approach for moving to a continous deployment environment.
  1. Continuous Integration – Obviously before moving beyond integration into full deployment, this is a prerequisite that must be in place.
  2. Source Code Commit Checks – This feature which is available in almost all modern source code control systems,  allows the process of checking in code to halt if one of the tests fail.
  3. Simple Deployment Script – Deployment must be automated and have the ability to rollback, which we wholeheartedly agree with here and here.
  4. Real-time altering – Bugs will slip through so you must be monitoring for problems and have the processes in place to react quickly
  5. Root Cause Analysis – Eric recommends the Five Why’s approach to find root cause, whatever the approach, finding and fixing the root cause of problems is critical to stop repeating them.

Admittedly, this concept of developers pushing code straight to production scares me quite a bit, since I’ve seen the types of horrific bugs that can make their way into pre-production environments. However, I think Eric and the other continuous deployment proponents are onto something that perhaps the reason so many bugs are found by weeks of testing is a self-fulfilling prophecy. If engineers know their code is moving straight into production upon check in they might be a lot more vigilant about their code, I know I would be. How about you, what do you think about this development model?