Continuous Deployment

June 22nd, 2009 by Fish

You probably have heard of continuous integration that is the practice of checking code into the source code repository early and often.  The goal of which is to ease the often very painful process of integrating multiple developer’s code after weeks of independent work. If you have never had the pleasure of experiencing this pain, let me give you another example that we have experienced recently. In the process of writing The Art of Scalability, we have seven editors including an acquisition editor, a development editor, and five technical editors who all provide feedback on each chapter. Our job is to take all of this separate input and merge it back into a single document, which at times can be challenging when editors have different opinions for the direction of certain parts of the chapter. The upside of this process is that it does make the manuscript much better for having gone through the process. Luckily software engineering has developed the process of continuous integration designed to reduce wasted engineering effort. In order to make this process the most effective the automation of builds and smoke tests are highly recommended. For more information on continuous integration there are a lot of resources such as books and articles.

The topic of this post is taking continuous integration to an extreme and performing continuous deployment. And it is exactly what it sounds like, all code that is written for an application is immediately deployed into production. If you haven’t heard of this before you’re first thought is probably that this is the ultimate in Cowboy Coding but it is in use by some household technology names like Flickr and IMVU. If you don’t believe this check out code.flickr.com and look at the bottom of the page, last time I checked it said:

Flickr was last deployed 20 hours ago, including 1 change by 1 person.

In the last week there were 34 deploys of 385 changes by 17 people.

Eric Ries, co-founder and former CTO of IMVU, is a huge proponent of continuous deployment as a method of improving software quality due to the  discipline, automation, and rigorous standards that are required in order to accomplish continuous deployment. Other folks at IMVU also seem to be fans of the continuous deployment methodology as well from the post by Timothy Fitz. Eric suggest a 5 step approach for moving to a continuous deployment environment.

The topic of this post is taking continuous integration to an extreme and performing continuous deployment. And it is exactly what it sounds like, all code that is written for an application is immediately deployed into production. If you haven’t heard of this before you’re first thought is probably that this is the ultimate in ‘Cowboy Coding’ but it is in use by some household technology names like Flickr and IMVU. If you don’t believe this check out code.flickr.com and look at the bottom of the page, last time I checked it said:
Flickr was last deployed 20 hours ago, including 1 change by 1 person.
In the last week there were 34 deploys of 385 changes by 17 people.
Eric Ries, CTO of IMVU, is a huge proponent of continuous deployment as a method of improving software quality due to the  discipline, automation, and rigorous standards that are required in order to accomplish continuous deployment. Eric suggest a 5 step approach for moving to a continous deployment environment.
  1. Continuous Integration – Obviously before moving beyond integration into full deployment, this is a prerequisite that must be in place.
  2. Source Code Commit Checks – This feature which is available in almost all modern source code control systems,  allows the process of checking in code to halt if one of the tests fail.
  3. Simple Deployment Script – Deployment must be automated and have the ability to rollback, which we wholeheartedly agree with here and here.
  4. Real-time altering – Bugs will slip through so you must be monitoring for problems and have the processes in place to react quickly
  5. Root Cause Analysis – Eric recommends the Five Why’s approach to find root cause, whatever the approach, finding and fixing the root cause of problems is critical to stop repeating them.

Admittedly, this concept of developers pushing code straight to production scares me quite a bit, since I’ve seen the types of horrific bugs that can make their way into pre-production environments. However, I think Eric and the other continuous deployment proponents are onto something that perhaps the reason so many bugs are found by weeks of testing is a self-fulfilling prophecy. If engineers know their code is moving straight into production upon check in they might be a lot more vigilant about their code, I know I would be. How about you, what do you think about this development model?

The Art of Scalability Update

June 19th, 2009 by Wabb

Chapters 1 through 10 of The Art of Scalability have been published online at Safari Books Online. The folks at Pearson Publishing (Addison-Wesley and Prentice-Hall) are working on getting chapters 11 through 20 online soon. We just completed editing through chapter 24 of 32 chapters.

Art of Scalability Cover

We will have one more round of editing before the January, 2010 launch of the book, so it’s not too late for you to sign up for Safari Books Online, review our book and give us your valuable insights!

Newsletter – The Future of Relational Databases

June 17th, 2009 by Fish

For those readers who haven’t subscribed to our newsletter here is a copy of what was sent out earlier this week.  If you would like to receive future emails, about once every other month, sign up here.

We were intrigued by a question asked by Tony Bain of Read Write Web, in his recent article “Is the Relational Database Dead?” Since relational databases management systems (RDBMS) are a significant part of most SaaS or Web 2.0 architectures, we spend a lot of time helping people scale their databases. We also often find ourselves part of discussions about the future of relational databases.

Do Relational Databases Scale?
We believe that RDBMS scale very well especially when not on a single server. This is the major principle behind the three axes of the AKF Scale Cube.

The AKF Database Scale Cube consists of an X, Y and Z axes – each addressing a different approach to scale transactions applied to a database. The lowest left point of the cube (coordinates X=0, Y=0 and Z=0) represents the worst case monolithic database – a case where all data is located in a single location and all accesses go to this single database.

The X Axis of the cube represents spreading read requests across multiple instances of the same data set. All writes get executed on a single database that asynchronously propagates to the read replicas. Caching can be implemented to further reduce database load. The Y Axis of the cube represents a split by function or service. The Z Axis represents ways to split transactions by performing a lookup, a modulus or other indiscriminate function (hash for instance).

Key / Value Stores
We don’t believe in the imminent demise of the RDBMS but we do expect more applications and services to use a relational database as a persistent data store and use key/value stores as part of the transactional system. We have seen and expect to see more of the use of key/value stores such as memcached and redis as a shared memory object cache that is the primary data source for the service or application.  Writes work their way back to the relational database asynchronously and reads into the object cache work their way forward either scheduled or on demand.

This trend is possible obviously because of the software being developed to allow this but also the cheaper hardware including virtualization in clouds. As services become more specialized and service level agreements demand faster response times, in memory data stores are very likely to be involved in more architectures as a layer between the application servers and the relational databases.

Specialized Databases
Another trend in RDBMS is the creation of specialized databases such as Hive that is an open source, peta-byte scale date warehousing framework based on Hadoop that was developed by Facebook. Other databases include Amazon’s distributed, web service SimpleDB, Google’s column-oriented BigTable, and the Apache JSON based CouchDB. Each of these with advantages in particular scenarios. It has been shown in research papers like “One Size Fits All? – Part 2: Benchmarking Results” by Michael Stonebraker and Ugur Cetintemel that major RDBMS vendors can be outperformed by 1 – 2 orders of magnitude by specialized database engines.

In “The End of an Architectural Era” they continue their argument  that the current legacy RDBMS code attempting to be a “one size fits all” excels at nothing. In fact they offer data that the completely written from scracth, H-Store database, built in collaboration with MIT, Brown and Yale Universities, can outperform these legacy RDBMS by two orders of magnitude on standard transactional benchmarks.

Conclusion
The demise of the RDBMS has been prophesied almost since it’s inception in E.F. Codd’s 1970 paper “A Relational Model of Data for Large Shared Data Banks“. One of the most promising replacements given the advent of object oriented programming languages was the object-oriented database, which never became mainstream but has resulted in the inclusion of object-relational features in more popular RDBMS.

Even with the improved performance from specialized databases the most popular relational databases including Oracle, MySQL, and PostgreSQL will continue to be an integral part of SaaS and Web 2.0 architectures. Learning how to scale them will continue to be important for many years to come.

Monitoring Strategies

June 15th, 2009 by Fish

What questions do each of your system monitors answer? You probably think they answer questions such as “Is there a problem?” and if so “Where is the problem?” Most likely this is not the case and instead of telling you “Is there a problem?” it really only tells you “Where” or “What” the problem might be. Before we continue this, first a quick detour to discuss metrics, which while different than monitoring are very similar in many ways.

Eric Ries, co-founder and CTO of IMVU, posted an article about the difference between vanity metrics and actionable metrics.  The entire article and accompaning video are worth a read and listen but the take away is that most people are using and looking for metrics that are great soundbites but do not offer any definable actions.  One example is the total number of hits to a website.  Eric ask the questions “Now what? Do you really know what actions you took in the past that drove those visitors to you, and do you really know which actions to take next?” This makes total sense to me as we  often see teams misusing monitoring in an attempt to determine what actions to take with their systems.

Back to our discussion of what question your monitoring is attempting to answer. We think there are five evolutionary questions that monitoring should answer:

  1. Is there a problem?
  2. Where is the problem?
  3. What is the problem?
  4. Why is there a problem?
  5. Will there be a problem?

Where most people fail is using a monitoring tool that is designed to answer “Where” or “What” and try to use it to answer “Is”. For example, if you are monitoring all of your servers vitals such as CPU, memory, and I/O what is the appropriate action for your team to take when the CPU utilization goes to 100%? The reason that might be a tough question is that you are missing the vital piece of information “Is this affecting my customers?”.  The “Is there a problem” is intended to be a proxy for customer impact in order to help determine the degree and speed of escalation of the issue.

If you have monitoring services in place now it is worthwhile to determine what question each one answers. If you are missing a monitor for a particular question, the time to remedy it is before you need that question answered.

Advertising Revenue

June 8th, 2009 by Fish

The Internet Advertising Bureau (IAB) just released Q1 revenue numbers for advertising online which showed a 5% decline over the same period in 2008.  At $55 Billion in advertising revenue for Q1 2009, the amount is equal to 2007 revenue numbers. “Interactive advertising has taken its rightful place as a fixture on marketing plans across sectors, which means we aren’t immune to broader economic trends,” said Randall Rothenberg, President and CEO of the IAB.  

 

David Silverman, PricewaterhouseCoopers Assurance partner, stated “Current economic conditions are clearly challenging … nonetheless, interactive media continues to consume a larger piece of the overall advertising pie.” According to the growth rates of advertising display mediums, internet display ads were growing at 7% year over year from 2007 to 2008 while all other mediums (radio, newspaper, magazine, outdoor, etc) were shrinking, except for television which grew at a modest 2%.  

As we pointed out in our post about monetization, we don’t necessarily believe that people are resentful of advertising on free services or that this downward trend is anything more than online advertising being tied to the economy. However, if you were a start up planning on starting monetization through advertising in 2009 this economy and downward move of advertising spend might have caught you at a particularly bad time. Had your business model built in profitability from the start, you would not be immune from the economy but you would be able to react quicker and be impacted less. Amassing a following and then figuring out how to make it into a business is a great way to burn cash for a lot of years.