Following on from Martin English’s article about data storage I thought I would report a very interest discussion I had at a dinner hosted by the NVT Group a couple of weeks ago.
The main topics of conversations were data de-duplication, and data profiling.
Data de-duplication is a solution that makes sure you are not holding the same data in multiple places, and therefore wasting disk space. While this sounds sensible there is (of course) a cost to the solution. The question is will that solution cost less than the total cost of ownership of the disk space used up by the duplicate data (including its power consumption) . Also I had a slight concern regarding proprietary lock-in here. That is, if you purchase a data de-dup solution from one supplier you may be “tied” to that supplier, or at least moving between suppliers will have a hassle factor (apparently you need to “re-hydrate” the de-duped data).
The other technology was data profiling (or at least that is what I call it). This is a SAN solution that identifies how often data is accessed. Data that is accessed often is stored on faster/more expensive storage, while data that is rarely accessed is stored on cheaper/slower storage. Again this makes sense as long as there is a saving, and there is no proprietary lock-in.
I came away from the dinner with two distinct thoughts.
1 The volume of data being stored is increasing, and will continue to increase despite data protection laws, and other measures. Put simply, people don’t like deleting data.
2 In my experience where large ICT solutions are being purchased the hardware component (or “tin” as people in the business invariably call it) tends to be a bit of an after thought. It is the unglamorous end of the deal, and is seen as being a low cost commodity. However, with the massive increase in data it may be these priorities should be reassessed.
What do the people out there think?

_
_subscribe to TechBlog email alerts
_subscribe to TechBlog RSS
__add Google feed