Archive for the ‘Engineering’ Category

Commitment-Based SOA

Thursday, December 17th, 2009

There is an interesting article in this months IEEE Computer Magazine on Commitment-Based Service Oriented Architecture.  The authors, Munindar P. Singh, Amit K. Chopra, and Nirmit Desai, put forth that existing implementations of service oriented architecture are low-level abstractions instead of business services. They propose a new Commitment-Based SOA where the components are true business services and the connectors are patterns.

As proof of this claim the authors provide the hypothetical example of a purchasing of an item that combines ordering, paying, and shipping business services. The current approaches such as Business Process Modeling Notation, Business Process Execution Language, and Choreography Description Language all emphasize controll and data flow. In contrast to these the CSOA “gives primacy to the business meanings of service engagements, which are captured through the participants’ commitments to one another.”  Commitments include three agents: debtors, creditors, and context. The autors claim benefits of the CSOA are 1) Services can be changed or modified and judged to be correct as long as the commitments are not violated. Commitments thus support business-level compliance and don’tdictate specific operationalizations. 2) Commitment-based specifications explicitly reflect business requirements. Without this business meaning, there is no basis to establish the validity of reuse or composition.

For more information on this topic check out the authors paper on North Carolina State University’s site or in the Novemeber issue of IEEE’s Computer Magazine.

Practice, Practice, Practice

Wednesday, November 25th, 2009

We wrote a post back in July 2008 about how in order to get better you must practice. Since that time we’ve seen a lot of interest in this concept of how much must you practice in order to master a skill.  This interest is primarily due to Malcolm Gladwell’s book Outliers in which he showed a number of examples of why you must have about 10 years or 10,000 hours of practice in order to achieve mastery. I’ve come across many variations of this recently and wanted to revisit the topic from a non-programming perspective to show that this is applicable to every aspect of your life and career. If you want to become a better parent, teacher, runner, programmer, leader, or technologist you must practice. If you want to master those skills you must practice them a lot.

One of the more interesting variations is from the 1976 book by William Zinsser, On Writing Well. If you are interested in writing I highly recommend this book and for everyone else you can draw inspiration from his devotion to the craft of writing. In the first chapter William Zinsser talks about an interview he participated in discussing writing as a vocation along with a certain Dr. Brock who was a surgeon that had recently taken up writing. Dr. Brock started by expressing how “The words just flowed.” And when asked what he did when the writing wasn’t going well he said he’d just stop writing. Zinsser countered that “writing is a craft, not an art, and that the man who runs away from his craft because he lacks inspiration is fooling himself.” He continues with “If your job is to write every day, you learn to do it like any other job.” Zinsser concludes the section with a thought that if writing for a surgeon was so easy he’d consider taking up surgery on the side. This was a bit tongue-in-cheek with the point being to get better you must persevere and do so with a critical eye for how to constantly improve.

An interesting theory on how to produce sustained and desirable change is the Boyatzis’ Intentional Change Theory. This is five step process that enables individuals to achieve change and maintain it. If you have ever tried to stay on a diet or start an exercise regime and have found it difficult, you should be able to relate to this. One of the keys to this process is practicing the new behavior in order to build and strengthen neural pathways.  This eventually leads to mastery of the skill and sustained change.

An MIT Computer Science grad student has an interesting list on his blog of posts that cover this subject of dedicated practice to master a skill.  In MJ fashion, he calls this list “Thoughts on living a remarkable life”.  Two of his most interesting from that list are book reviews On the Value of Hard Focus about Murakami’s book on distance running and The Steve Martin Method about the comedian’s life.

Another story comes from Dave Ramsey’s book More Than Enough. Dave tells the story/parable of a professional golfer being approached by a fan saying “I’d do anything to hit like you” and the golfer says “No, you wouldn’t”, to which the fan replies “Oh, yeah, I really would.”  The golfer goes on to explain his secret “Get up every morning and hit 500 golf balls. Hit them until your hands are so blistered they bleed. The next morning, tape over the blisters and do it again.”

A final example comes from Jason at 37Signals on making money.  He postulates that as an entrepreneur you need to practice early and often making money; learning the skills of negotiating, pricing, and selling.  This is why he recommends entrepreneurs not take outside money, in order that they learn to make money quickly. Tim Ferris would probably agree that negotiating can be practiced and Seth Godin would certainly agree that selling takes practice.

So now with all the overwhelming evidence that we need to focus and practice to master a skill, what will you do today to become a master at something?

A final example comes from Jason at 37Signals on making money.  He postulates that as an entrepreneur you need to practice early and often making money; learning the skills of negotiating, pricing, and selling.  This is why he recommends entrepreneurs not take outside money, in order that they learn to make money quickly.  Tim Ferris would probably agree that negotiating is a practiced art and Seth Godin would certainly agree that selling takes practice.

Crisis Management – Normal Accident Theory and High Reliability Theory

Wednesday, November 18th, 2009

The partial meltdown of TMI-2 at Three Mile Island in 1979 is one of the best known crisis situations within the US and was the source of several books, and at least one movie.  It also generated two theories relevant to crisis management.

Charles Perrow’s Normal Accident Theory (NAT), described in his book Normal Accidents, states that the complexity inherent to tightly coupled technology systems makes accidents inevitable.  Perrow’s hypothesis is that the tight coupling causes interactions to escalate rapidly and without obstruction.  “Normal” is a nod to the inevitability of such accidents.

Todd LaPorte, who founded the Berkeley school of High Reliability Theory, believes that there are organizational strategies to achieve high reliability even in the face of such tight coupling.  The two theories have been debated for quite some time.  While the authors don’t completely agree as to how they can coexist (LaPorte believes that they are complimentary and Perrow believes that they are useful for the purposes of comparison), we believe there is something to be gained from them.

One paradox from these debates becomes intuitively obvious to our pursuit of high availability and highly scalable systems:  The better we are at building systems that avoid problems and crises, the less practice we have in solving problems and crises.  As the practice of resolving failures are critical to our learning, we become more and more inept at rapidly resolving these failures as their frequency decreases.  Therefore, as we get better at building fault tolerant and scalable systems, we get worse at resolving crisis situations that are almost certain to happen at some point.

Weick and Sutcliffe have a solution to this paradox that we paraphrase as “organizational mindfulness”.  They identify 5 practices for developing this mindfulness:

1)      Preoccupation with failure.  This practice is all about monitoring IT systems and reporting errors in a timely fashion.  Success, they argue, narrows perceptions and breeds overconfidence.   To combat the resulting complacency, organizations need complete transparency into system faults and failures.  Reports should be widely distributed and discussed frequently such as in our oft recommended “operations review” process outlined within the Art of Scalability.

2)      Reluctance to simplify interpretations.  Take nothing for granted and seek input from diverse sources.  Don’t try to box failures into expected behavior and act with a healthy bit of paranoia.

3)      Sensitivity to operations.  Look at detail data at the minute level, such as we’ve suggested in our posts on monitoring.  Include the usage of real time data and make ongoing assessments and continual updates of this data.  We think our book and our post on monitoring strategies have some good suggestions on this topic.

4)      Commitment to resilience.  Build excess capability by rotating positions and training your people in new skills.  Former employees of eBay operations can attest that DBAs, SAs and Network Engineers used to be rotated through the operations center to do just this.  Furthermore, once fixes are made the organization should be quickly returned to a sense of preparedness for the next situation.

5)      Deference to expertise.  During crisis events, shift the leadership role to the person possessing the greatest expertise to deal with the problem.  Our book also suggests creating a competency around crisis management such as a “technical duty officer” in the operations center.

We would add that every operations team should use every failure as a learning opportunity, especially in those environments in which failures are infrequent.  A good way to do this is to leverage the post mortem process.

From Technician to Engineer

Monday, October 5th, 2009

We’ve had a couple posts on this topic of engineer vs craftsman vs technician but I found myself in the past couple of days discussing this topic in two different settings and thought it would be fun to revisit on the blog. I started the conversation with a post that quoted Tom DeMarco concluding that “software engineering is an idea whose time has come and gone.” Also quoted was Jeff Atwood stating that “If you want to move your project forward, the only reliable way to do that is to cultivate a deep sense of software craftsmanship and professionalism around it.” Marty picked up the conversation in another post stating that:

All of this goes to say that software developers rarely “engineer” anything – at least not anymore and not in the sense defined by other engineering disciplines. Most software developers are closer to the technicians within other engineering disciplines; they have a knowledge that certain approaches work but do not necessarily understand the “physics” behind the approach. In fact, such “physics” (or the equivalent) rarely exist. Many no longer even understand how compilers and interpreters do their jobs (or care to do so).

I think it is possible that we are seeing the evolution of our discipline as it struggles to determine what its final form will take. Computer science, information technology, software engineering, and other related disciplines are all relatively new fields of study. With a new discipline it should be expected that definitions and themes will need to be stretched or reconsidered.

Software engineers, similar to other engineering disciplines who are taught more “true” laws such as Newton’s or Faraday’s, also undergo something of an apprenticeship once their degree is conferred. Mechanical and electrical engineers work beside senior engineers who help them transition from the theoretical to the practical. Software engineers are often apprenticed in the same manner by more senior engineers. If the practical implementation of one discipline is considered engineering because it is based upon laws and principles, I would argue that the principles of architecture for scalability are of a similar nature. This in fact I think is a strong differentiator between technicians and engineers within the software development discipline. A technician can write code, setup a database, or administer a server. An engineer can architect a database or system or pool of servers such that it can scale. We’ve written several posts about the principles and patterns of scalability and a large part of our book is dedicated to these principles. Are these sufficiently established to be called a principle as defined in Wikipedia?

A principle is one of several things: (a) a descriptive comprehensive and fundamental law, doctrine, or assumption; (b) a normative rule or code of conduct, and (c) a law or fact of nature underlying the working of an artificial device

I still like the idea of software developers as craftsmen and -women but as I concluded in the other post, that discussion for me is as much about organizational size and control as it is anything else. The technician vs engineer discussion I think is best held in the light of are they or are they not applying laws or principles. As the American Engineers’ Council for Professional Development defines engineering “The creative application of scientific principles to design or develop…” Have we as a discipline, especially in terms of scalability, advanced enough to call what we use “principles”? Let us know what you think.

Scalability Best Practices

Tuesday, August 11th, 2009

Here are a baker’s dozen of items that we feel are Best Practices for Scalability:

  1. Asynchronous - Use asynchronous communication when possible. Synchronous calls tie the availability of the two services together. If one has a failure or is slow the other one is affected.
  2. Swim Lanes – Create fault isolated “swim lanes” of hardware by customer segmentation. This prevents problems with one customer from causing issues across all customers. This also helps with diagnosis of issues and code roll outs.
  3. Cache - Make use of cache at multiple layers including object caches in front of databases (such as memcached), page or item caches for content (such as squid) and edge caches (such as Akamai).
  4. Monitoring - Understand your application’s performance from a customer’s perspective. Monitor outside of your network and have tests that simulate a real user’s experience. Also monitor the internal working of the application in terms of query and transaction execution count and timing.
  5. Replication - Replicate databases for recovery as well as to off load reads to multiple instances.
  6. Sharding - Split the application and databases by service and / or by customer using a modulus. While this requires slightly more complicated logic in the application it allows for massive scaling.
  7. Use Few RDBMS Features – Use the OLTP database as a persistent storage device as much as possible. The more you rely on the features offered in most RDBMS for your transactions, the greater load you are putting on the hardest item in your system to scale. Remove all business logic from the database such as stored procedures and move it into the application. When significant scaling is required join in the application and not through the SQL.
  8. Slow Roll – Roll out new code versions slowly, to a small subset of your servers without bringing the entire site down. This requires that all code be backwards compatible because you will have two versions of code running in production during the roll out. This method allows you to find problems that your quality and L&P testing missed while having minimal impact on customers.
  9. Load & Performance TestingTest the performance of the application version before it goes into production. This will not catch all the issues, which is why you need the ability to rollback, but it is very worthwhile.
  10. Capacity Planning / Scalability Summits – Know how much capacity you have on all tiers and services in your system. Use Scalability Summits to plan for the increased capacity demands.
  11. Rollback – Always have the ability to rollback a code release.
  12. Root Cause Analysis - Ensure you have a learning culture that is evident by utilizing Root Cause Analysis to find and fix the real cause of issues.
  13. Quality From The Beginning – Quality can’t be tested into a product, it must be designed in from the beginning.