AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth

Tag » Data management

Designing for Rollback

We’ve several times made reference to the need for organizations to design for rollback to be successful as a SaaS company.  Put simply, given the speed with which we want to make releases, it is critical that we limit our risk in delivering any given release by being able to easily roll back these releases.

Here are some hints on how to develop systems such that they can be easily rolled back in the event of a problem in production.

  • Database changes must only be additive – Columns or tables should only be added, not deleted, until a version of code is released that deprecates the dependency on those columns.  Once these standards are implemented every release should have a portion dedicated to cleaning up data from previous releases that is no longer needed.
  • DDL & DML scripted and tested – DBMS changes for a release must be scripted ahead of time instead of applied by hand.  This should include the script used to rollback any changes.  The two reasons for this are that:
  1. The team needs to test the rollback process in QA or staging in order to validate that they have not missed something that would prevent rolling back and
  2. The script needs to be tested under some amount of load to ensure it can be executed while the application is utilizing the database.
  • Restricted SQL queries in the application – The development team needs to disambiguate all SQL by removing all SELECT * queries and adding column names to all UPDATE statements.
  • Semantic changes of data – The development team must not change the definition of data within a release.  An example would be a column in a ticket table that is currently being used as a status semaphore indicating three values such as assigned, fixed, or closed.  The new version of the application cannot add a fourth status until code is first released to handle the new status and then code can be released to utilize the new status.
  • Wire On / Wire Off – The application should have a framework added that allows code paths and features to be accessed by some user and not by others, based on an external configuration.  This setting can be in a configuration file or a database table and should allow for both role based access as well as random percentage based.  This framework allows for beta testing of features with a limited set of users and allows for quick removal of a code path in the event of a major bug in the feature, without rolling the entire code base back.

Comments Off

Matching Data Value and Data Storage Costs

Not every piece of data is created equally. So why do you treat them all the same?

We often consult with clients who have storage costs overruns.  These aren’t just the actual cost of storage, which as we all know have been dropping fairly drastically.  Rather, as we outline in our book, it’s all the other costs of storage:  space for the storage, power for the storage, the cost to traverse huge indices and the resulting response time, etc.  Most often these clients have a single solution for their storage needs.  Regardless of the actual value of the data being stored to the business, it goes into a cookie cutter high speed SAN; easy to access – overall costly to maintain on a relative basis.

Our solution to this is simple:  Not all data is created equal.  No, we aren’t going to tell you to “enslave it” but we are going to violate certain constitutional rights and tell you that you should absolutely “profile” it.  TSA be damned.  We are going to borrow a technique from our marketing friends and apply something known as an RFM cube to the data.  RFM stands for recency, frequency and monetization.  Marketing gurus use this technique to make recommendations to people or send special offers to keep high value customers happy or to “re-activate” those who haven’t been active recently.

“Recency” accounts for how recently the data item in question has been accessed.  This might be a file in your storage system or rows within a database.  Frequency speaks to how frequently the data is accessed.  Monetization is the value that specific piece of data has to your business in general.  The three of them help us calculate overall business value and access speeds.  By matching the type of storage to the value of the data in an RFM-like approach, we might have a cube that has very high cost storage mapped to high value data in the upper right and an approach to delete and/or archive data in the bottom left:

The product of our RFM analysis might yield a score for the value of the data overall.  Maybe it’s as simple as a product or maybe you’ll add some magic of your own.  If we plot this on a cost and value curve, we can start matching storage needs to it.  Low value data goes away or goes on low cost systems with slow access times.  If you need to access it, you can always do it offline and email the report or whatever to the requester.  High value systems might go on very fast but relatively costly SSDs or some storage area network equivalent.  We’ve made some mappings below, though the escalation isn’t meant to represent today’s market prices:

Now go save your company some cash!


1 comment