Posts Tagged ‘SAN’

Matching Data Value and Data Storage Costs

Wednesday, May 12th, 2010

We often consult with clients who have storage costs overruns.  These aren’t just the actual cost of storage, which as we all know have been dropping fairly drastically.  Rather, as we outline in our book, it’s all the other costs of storage:  space for the storage, power for the storage, the cost to traverse huge indices and the resulting response time, etc.  Most often these clients have a single solution for their storage needs.  Regardless of the actual value of the data being stored to the business, it goes into a cookie cutter high speed SAN; easy to access – overall costly to maintain on a relative basis.

Our solution to this is simple:  Not all data is created equal.  No, we aren’t going to tell you to “enslave it” but we are going to violate certain constitutional rights and tell you that you should absolutely “profile” it.  TSA be damned.  We are going to borrow a technique from our marketing friends and apply something known as an RFM cube to the data.  RFM stands for recency, frequency and monetization.  Marketing gurus use this technique to make recommendations to people or send special offers to keep high value customers happy or to “re-activate” those who haven’t been active recently.

“Recency” accounts for how recently the data item in question has been accessed.  This might be a file in your storage system or rows within a database.  Frequency speaks to how frequently the data is accessed.  Monetization is the value that specific piece of data has to your business in general.  The three of them help us calculate overall business value and access speeds.  By matching the type of storage to the value of the data in an RFM-like approach, we might have a cube that has very high cost storage mapped to high value data in the upper right and an approach to delete and/or archive data in the bottom left:

The product of our RFM analysis might yield a score for the value of the data overall.  Maybe it’s as simple as a product or maybe you’ll add some magic of your own.  If we plot this on a cost and value curve, we can start matching storage needs to it.  Low value data goes away or goes on low cost systems with slow access times.  If you need to access it, you can always do it offline and email the report or whatever to the requester.  High value systems might go on very fast but relatively costly SSDs or some storage area network equivalent.  We’ve made some mappings below, though the escalation isn’t meant to represent today’s market prices:

Now go save your company some cash!

Storage Headaches

Saturday, February 21st, 2009

There are numerous companies who decided a year or two ago that as part of their product offering to provide storage of user data.  Usually this occurred with no foresight or cost calculations and so these companies decided that this was either unlimited in amount, perpetual in duration, or worse, both.  Fast forward to the present and these companies are scrambling to figure out ways to lower the storage cost or charge customers for this service.  Of course, hindsight is 20/20 but in our opinion this should be taken as a lesson to all companies that product roadmaps without consideration of the revenue versus cost equation is more than likely to result in future problems of features either not being used by customers or the use of the feature not generating enough revenue to cover the cost.  

 

 

For companies with data storage problems our recommendations are very dependent on their business model, user agreements, customer contracts, etc. So unfortunately there is no panacea or one size fits all solution. In general we usually walk down the follow steps attempting to achieve an acceptable solution:

  1. Delete what data you can
  2. Archive to very low cost storage data that is not being accessed
  3. Establish tiers of storage based on speed, reliability, and availability

Consider situations in which you have a significant amount of archival data such as former employees or customers who are no longer active.  The cost of keeping this on your primary storage is not only the space on your fastest and most expensive storage but also the backup and archiving of this data that occurs every day even though it never changes.  Incremental backups help this but more than likely you have full backups periodically as well.  If this data is in a primary database, you are likely to have one or more standby databases as well as a tape backup.  All of that unchanging and rarely accessed data continues to take up storage and bandwidth to move it around.  

Possible storage alternatives include the myriad of SAN offerings, NAS devices, open source storage, SATA drive farms, tape, and cloud storage.  We recommend that you implement one or more of these in your solution depending upon your particular needs.  We also encourage you to consider ahead of time your need for scalability and availability.  For a sample architecture of a scalable read or search subsystem check out our previous article.