Data deduplication

It has been a while, but here's something I am trying to figure out for some time now.

We've bought a Diligent VTF with ProtecTIER (data deduplication on virtual tapes) in the first quarter this year. The main reason was that nice feature where we could store up to 25 times the amount of actual disk space we had available for virtual tape.
Since then, almost all virtual tape vendors have some form of data deduplication. I guess this is a flaming hot feature. The big-irons are lagging behind a bit, but no doubt they will soon step up, and try to dominate this field too.

The one thing I wonder about is, since storage is this hot, why the disk vendors aren't incorperating data deduplication in their subsystems?

  • Is it  the fact that they will loose revenue on the disk sales? (my best guess).
  • Is it because it is to hard to write the code? I don't think this is a problem though.

Computer room cooling, energy costs, floor space should all be factors to take into consideration. These factors would justify data deduplication on disk storage from an end-user point of view. Vendors always find ways to increase their profits, even though the hardware prices are dropping constantly. I see no obstacle here.

Up untill 2003 we were using IBM's RVA (StorageTek OEM)  on mainframe systems. The logstructured volumes were perfect. Compression was done in de RVA, without the host knowing it. We were able to store about three times the amount of data than was actually present in the box. Need more volumes. No problem. We'd carve out another model 9 volume (9GB), and no actual physical disk space was consumed, untill some data was stored on it.
When StorageTek came out with the successor of the RVA, called the V2X, we were eager to use it because it also supported Open systems lun's.

Unfortunately the design of the V2X had some flaws. Especially in the microcode. The V2X wasn't stable enough for us to run open systems volumes on it. The compression and snapshot mechanisms worked fine, but paths from the host to the V2X kept on dropping connections. We decided to send it back to STK, and we continued our storage services on the IBM Sharks.

So doing compression and deduplication in the storage box is just a heartbeat away. Now we just need to wait for the first vendor to pick up the gauntlet and start shipping the storage box including dedupe code. The rest will soon follow.

 Need someone to test for you? Just give me a call ;-)