Matching Data Value and Data Storage Costs
We often consult with clients who have storage costs overruns. These aren’t just the actual cost of storage, which as we all know have been dropping fairly drastically. Rather, as we outline in our book, it’s all the other costs of storage: space for the storage, power for the storage, the cost to traverse huge indices and the resulting response time, etc. Most often these clients have a single solution for their storage needs. Regardless of the actual value of the data being stored to the business, it goes into a cookie cutter high speed SAN; easy to access – overall costly to maintain on a relative basis.
Our solution to this is simple: Not all data is created equal. No, we aren’t going to tell you to “enslave it” but we are going to violate certain constitutional rights and tell you that you should absolutely “profile” it. TSA be damned. We are going to borrow a technique from our marketing friends and apply something known as an RFM cube to the data. RFM stands for recency, frequency and monetization. Marketing gurus use this technique to make recommendations to people or send special offers to keep high value customers happy or to “re-activate” those who haven’t been active recently.
“Recency” accounts for how recently the data item in question has been accessed. This might be a file in your storage system or rows within a database. Frequency speaks to how frequently the data is accessed. Monetization is the value that specific piece of data has to your business in general. The three of them help us calculate overall business value and access speeds. By matching the type of storage to the value of the data in an RFM-like approach, we might have a cube that has very high cost storage mapped to high value data in the upper right and an approach to delete and/or archive data in the bottom left:
The product of our RFM analysis might yield a score for the value of the data overall. Maybe it’s as simple as a product or maybe you’ll add some magic of your own. If we plot this on a cost and value curve, we can start matching storage needs to it. Low value data goes away or goes on low cost systems with slow access times. If you need to access it, you can always do it offline and email the report or whatever to the requester. High value systems might go on very fast but relatively costly SSDs or some storage area network equivalent. We’ve made some mappings below, though the escalation isn’t meant to represent today’s market prices: