Posts Tagged ‘Technology/Internet’

Moving from Packaged Software to SaaS

Monday, July 19th, 2010

It’s probably no surprise to our readers that many old packaged software companies are attempting to take their software and hence their business models “online”.  And why not?  The model is attractive and benefits accrue to both the providers of service through software and those who outsource portions of what was once bothersome internally hosted software.  The providers benefit from economies of scale in hosting that generate attractive profits for the provider and savings for the customer, lower maintenance costs resulting from custom customer deployments, predictable revenue streams fostered through closer customer contact, more frequent and smaller releases that reduce risk and faster implementation times that result in faster profit recognition.  Customers benefit from outsourcing non-core IT functions, providers who specialize in delivering specific services, lower capital expenditures and faster deployment times.  SaaS is both a desert topping and a floor wax!  It’s the cure for cancer and the answer to the riddle of life!

But what many of these companies don’t realize is that the way one architects a product and runs a company focused on service delivery is simply different than the approach of a company focused on delivering software.  Customers expect that you are going to give them higher availability and fewer headaches.  Software alone simply won’t meet this goal; it is imperative that one design SaaS systems holistically which in turn requires skills in both infrastructure and software architecture (or “systems” architecture).   The cost leverage necessary to both increase profit margins and decrease customer cost typically requires multi-tenancy which has its own share of headaches.  Fault isolation and rollback capabilities are a must to minimize customer impact and mitigate rapid deployment risks.

It is not enough to simply bundle up an application in a hosted fashion and label yourself a “SaaS” company.  If you don’t work aggressively to increase availability and decrease your cost of operations, someone with greater experience will come along and simply put you out of business.  After all, your business is now about SERVICE – not SOFTWARE.  This is a fundamental mind-shift that some companies simply can’t overcome or maybe simply don’t recognize.  This isn’t to say that a good engineer or product manager can’t be equally good at developing packaged and SaaS applications, but it does mean that the approach is completely different.

Stop trying to figure out how to leverage your existing assets with minimal work and start thinking about having two different products.  Or, determine which business you want and kill the other one off.  If you decide to keep both products alive, you can share services and code between these platforms, but you should not do so at the expense of optimizing your SaaS solution.  Attempting to satisfy both with a single architecture will likely result in you failing at both.

Scalability Warning Signs

Monday, June 7th, 2010

Unless you’re one of the incredibly lucky sites where the traffic spikes 100x overnight, scalability problems don’t sneak up on you. They give you warning signs that if you are able to recognize and react to appropriately, allow you to stay ahead of the issues. However, we’re often so head down getting the next release out the door that we don’t take the time to realize we’re experiencing warning signs until they become huge problems staring us in the face.  Here are a few of the warnings that we’ve heard teams talk about in the past couple of months that were clearly signs of problems on the horizon.

Not wanting to make changes – If you find yourself denying request for changes to certain parts of your system, this might be a warning sign that you have scalability issues with that component. A completely scalable system has components that can fail without catastrophic impact to customers. If you’re avoiding changes to a component because of the risk of problems this is a warning sign that you need to re-architect to eliminate or at least mitigate the risk.

Performance creep – If after each release you need to add hardware to a subsystem or you accept a performance degradation in a service you could have a scaling issue approaching quickly. Consistently increasing consumption of CPU or memory resources in a service with each release will lead you into an unsustainable situation. If today you’re comfortably sitting at 40% CPU utilization and you allow a modest 10% degradation in each release you have less than nine releases before you are well above 100% but the reality is you won’t get close to that without significant issues.

Investigating larger hardware – If you’ve started asking your vendors or VAR about bigger hardware you’re heading down the path of scalability problems. The scale of more computational resources per dollar is not linear, it’s closer to cubic or even exponential scales. Purchasing more expensive hardware might seem like the economical way out when you compare the cost of the first hardware upgrade versus developer time but run the calculation out several iterations. When you get to a Sun Fire™ E25K Server with Oracle Database 10g at a $6M price tag you might feel differently about the decision.

Asking vendors for advanced features – When you start exploring advanced options of your vendor’s software you’re likely heading down the path of increased complexity and this is a clear warning sign of scalability problems in your future. Besides potentially locking you into a vendor which lowers your negotiating power it puts the fate of your company in someone else’s hands, which wouldn’t make me sleep very well at night. See our post on using vendor features to scale for more information.

Watch out for these or similar warning signs that scalability problems are looming on the horizon. Dealing with the problems today while you have time to plan properly might not get you an award for being a firefighter but you’ll more likely deliver a quality product without costly interruption.

Agile Architects

Wednesday, June 2nd, 2010

We recently posted what agile is not, where we outlined questions that we often hear about agile development. Another question that is often raised is how to combine the seemingly long-term process of architecture with the short-term nature of agile development. We believe that architecture is not at odds with agile development and that the two can not only coexist but complement each other. To ensure your architecture standards are being integrated with each sprint, resulting in a scalable and available architecture, we rely on the Joint Architecture Development (JAD).

We’ve covered JAD before but as a recap, this is the process by which features are designed in a series of meetings where developers, architects, and tech operations come together to create the design. This multi-function representation early in the development process ensures that individuals are aware of standards, there is buy-in from all concerned parties, and that the design is benefitted by the knowledge that exists in different technical fields.

In the IEEE Software article “Architects as Service Providers,” Roland Faber says that “the architect role is to champion system qualities and stability, while the developer role is to champion the system functions and change.” The agile architect interacts frequently and flexibly with the developers, building a trust relationship with them.

The JAD is ideal for flexible interaction that can happen in short bursts of effort that correspond to sprints. The agile architect must understand that, because of the nature of agile development, architecture must be dynamic and not static. Architects must rely on personal interaction with developers not documentation to understand the requirements.

Faber continues in his article describing two phases of the architecture process as preparation and support. During preparation the architect engages in processes such as prepare rules, frameworks, and structures. During support the architect helps resolve conflicts, engages in firefighting, and stimulates architecture communication. He makes a point that if the developers don’t believe the architects will provide support they won’t tell them when they are breaking the rules.

Why Can’t I Outsource Everything?

Wednesday, May 26th, 2010

Since writing our view on outsourcing, we’ve received a number of questions.  While most people indicate that the article was useful, the most common question is “Why can’t I just outsource everything?”  Here’s your answer:

You absolutely CAN outsource (or even purchase) everything.  The consideration of whether or not to outsource, as we’ve indicated earlier, is roughly the same as whether or not you should buy something.  The two differences are cost and the ease with which someone else can do the same thing you do.  “Buying” something (I mean off the shelf, packaged software, software as a service, etc) means that it’s available to just about everyone today – potentially with some small modifications to fit their business needs.  As such it usually costs less than outsourcing but is also more easily accessible and implementable by your potential competition.  “Outsourcing” something means that you are going to have someone else implement (code) your idea or run your servers (in a hosted rather than a SaaS model), which usually implies higher cost and a bit more difficulty in transferring technology.

In either scenario, you must be willing to say that you are willing to be like “everyone else”.  In other words, you are willing to give up the competitive differentiation that a homegrown solution might offer such as creating a higher barrier to entry, lower barriers to exit, switching costs, etc.  If an outsourcer can develop your code they will take that experience and apply it to someone else.  They may not use the actual code they write for you, but they simply can’t help but use the past experience.  This means that the job to copy you just got a little easier, which in turn means that you lowered the barrier to entry for competition.  And of course if you purchase a solution, then you are also making a decision that you will not differentiate yourself in that particular area.

None of this is bad.  In fact, there are many cases where you SHOULD outsource or purchase software or services.  Most companies and organizations tend towards isomorphism, which means that over time they all look (or should look) to leverage the best known practices to increase efficiencies and reduce costs.  It’s hard to imagine that you are going to differentiate yourself in your accounting systems, customer support systems, sales lead systems, etc.  You might add a unique set of routing rules, etc – but these systems are so standard that the best practices are built in to most pieces of software.

From a product perspective, if your business objective is to be a “low price leader” rather than to compete on technology or to simply “run with the pack” and use standard features while maintaining good margins then it also makes sense to buy or outsource.

But what if you want to have the world’s best product, stock, or media recommendation engine?  By definition you can’t “buy” that as everyone else would have the same thing.  If you outsource it, everyone else might not have your code but the firm that develops it for you can’t help but add it to their experience; they might not copy it but it certainly will influence their future activities.

As we’ve described before – don’t outsource or buy those things that you feel should or will differentiate your business.

Revisiting the 1:10:100 Rule

Wednesday, April 28th, 2010

If you have any gray in your hair, you likely remember the 1:10:100 rule.  Put simply, the rule indicates that the cost of defect identification goes up exponentially with each phase of development.  It costs a factor of 1 in requirements, 10 in development, 100 in QA and 1000 in production. The increasing cost recognizes the need to go back through various phases, the lost opportunity associated with other work, the amount of people and systems involved in identifying the problem, and end user (or customer) impact in a production environment. In a 2002 study by the National Institute of Standards and Technology the estimated cost of software bugs was $59.5 billion annually, half the cost borne by the users and the other by the developers.

While there is an argument to be made that Agile development methods reduce this exponential rise in cost, Agile alone simply can’t break the fact that the later you find defects, the more it costs you or your customers.   But I also believe it’s our jobs as managers and leaders to continue to reduce this cost between phases – especially in production environments.  If the impact in the production environment is partially a function of 1) the duration of impact, 2) the degree of functionality impacted, and 3) the number of customers impacted, then reducing any of these should reduce the cost of defect identification in production.  What can we do besides considering Agile methods?

There are at least three approaches that significantly reduce the cost of finding production problems.  These are “swimlaning”, having the ability to roll back code in XaaS environments (our term for anything as a service), and real time monitoring of  business metrics.  These approaches affect the number of customers impacted and the duration of the impact respectively.

Swim Lanes

We think we might have coined the term “swimlaning” as it applies to technology architectures.  Swimlaning, as we’ve written about on this blog as well as in the book, is the extreme application of the “shard” or “pod” concept to create strict fault isolation within architectures.  Each service or customer segment gets its own dedicated set of systems from the point of accepting a request (usually the webserver) to the data storage subsystem tier that contains the data necessary to fulfill that request (a database, file system or other storage system).  No synchronous communication is allowed across the “swimlanes” that exist between these fault isolation zones.  If you swimlane by the Z axis of scale (customers) you can perform phased rollouts to subsets of your customers and minimize the percentage of your customer base that a rollout impacts.  An issue that would otherwise impact 100% of your customers now impacts 1%, 5% or whatever the smallest customer swimlane is.  If swimlaned by functionality, you only lose that functionality and the rest of your site remains functioning.  The 1000x impact might now be 1/10th or 1/100th the previous cost.  Obviously you can’t have less cost than the previous phase, as you still need to perform new work, but the cost must go down.

Rollback

Ensuring that you can always roll back recently released code reduces the duration of customer impact.  While there is absolutely an upfront cost in developing code and schemas to be backwards compatible, you should consider it an insurance policy to help ensure that you never kill your customers.  If asked, most customers will probably tell you they expect that you can always roll back from major issues.   One thing is for certain – if you lose customers you have INCREASED rather than decreased the cost of production issue identification.  If you can isolate issues to minutes or fractions of an hour in many cases it becomes nearly imperceptible.

Monitoring Business Metrics

Monitoring the CPU, memory, and disk space on servers is important but ensuring that you understand how the system is performing from the customer’s perspective is crucial. It’s not uncommon to have a system respond normally to an internal health check but be unresponsive to customers. Network issues can often provide this type of failure. The way to ensure you catch these and other failures quickly is to monitor a business metric such as logins/sec or orders/min. Comparing these week-over-week e.g. Monday at 3pm compared to last Monday at 3pm, will allow you to spot issues quickly and rollback or fix the problem, reducing the impact to customers.