We’ve several times made reference to the need for organizations to design for rollback to be successful as a SaaS company. Put simply, given the speed with which we want to make releases, it is critical that we limit our risk in delivering any given release by being able to easily roll back these releases.
Here are some hints on how to develop systems such that they can be easily rolled back in the event of a problem in production.
- Database changes must only be additive – Columns or tables should only be added, not deleted, until a version of code is released that deprecates the dependency on those columns. Once these standards are implemented every release should have a portion dedicated to cleaning up data from previous releases that is no longer needed.
- DDL & DML scripted and tested – DBMS changes for a release must be scripted ahead of time instead of applied by hand. This should include the script used to rollback any changes. The two reasons for this are that:
- The team needs to test the rollback process in QA or staging in order to validate that they have not missed something that would prevent rolling back and
- The script needs to be tested under some amount of load to ensure it can be executed while the application is utilizing the database.
- Restricted SQL queries in the application – The development team needs to disambiguate all SQL by removing all SELECT * queries and adding column names to all UPDATE statements.
- Semantic changes of data – The development team must not change the definition of data within a release. An example would be a column in a ticket table that is currently being used as a status semaphore indicating three values such as assigned, fixed, or closed. The new version of the application cannot add a fourth status until code is first released to handle the new status and then code can be released to utilize the new status.
- Wire On / Wire Off – The application should have a framework added that allows code paths and features to be accessed by some user and not by others, based on an external configuration. This setting can be in a configuration file or a database table and should allow for both role based access as well as random percentage based. This framework allows for beta testing of features with a limited set of users and allows for quick removal of a code path in the event of a major bug in the feature, without rolling the entire code base back.