Of EMC, RAID 6, mud and Dragons

So, a while back I apparently showed my baboon ass while slinging mud at EMC over their lack of support for RAID 6 .

Shortly after I threw the mud, Storagezilla got out a dirty old rag and attempted to wipe some of it away .  Can’t blame the guy for trying, but he only succeeded in smearing the mud – there was plenty of mud still there after his clean up attempt.

My issue back then, and might I add the reason that I threw the mud in the first place, was because it annoyed me that EMC weren’t giving their customers the choice.  Heck, Im aware that opinion is divided over the need and benefits of RAID 6 and I wasn’t attempting to preach and convert anyone – I know a lot of people can be religious about stuff like this.  People can make their own decisions – if they don’t have EMC kit that is ;-)

Anyway, in all honesty I was interested in the post that Storagezilla wrote attempting to cleanup the mud that I threw - especially the part where he (I’m assuming it’s a “he”) talked about “EMC being the only vendor spending a lot of time and effort working to mitigate the chances of double disk failure related data loss… regardless of RAID type”.  Of course EMC are not the only vendor spending a lot of time and effort on this front, Im sure most vendors are.  Still its good to hear that EMC are working hard at resolving the root causes of “the double-disk failure dragon” and not simply providing recovery mechanisms.

Soo…… in a comment to Storagezillas post cleaning up my mud I asked if he/she/it could tell me more about what EMC are doing to and what significant resources they are putting to the task.  But of course he/she/it wouldn’t say more as that is for others at EMC to do……. And I wasn’t about to hold my breath for any further clarification from the Beast of Hopkinton.

However, since then, Chuck Hollis VP of technology alliances at EMC, has been to the local supermarket and purchased a bucket and a bunch of new cloths, rolled up his sleeves and had a go at removing some of my mud .  In seriousness though it’s a good article and an excellent and rare insight into how a huge market leading company goes about improving its core products and competencies.  Obviously it’s written to make EMC look good but I wouldn’t expect anything else.

I just want to make one comment though – both Sotragezilla and Chuck talk a lot about RAID 6 not solving problems, only providing fixes, essentially papering over the cracks and that EMC don’t do that.  Instead EMC invest their efforts in solving the underlying problems.  Well of course that’s great (honestly) but what about in the mean time?  As Storagezilla calls it the “double-disk failure dragon” I will make an analogy around that  -
(Warning! I’m about to get a little carried away here)

Imagine a village, lets call it Hopkintonfieldvilleshire, that is terrorized by a fearsom dragon that steals people in the night who are never seen again.  The village folk meet to discuss how to stop the dragon coming into the village and stealing their loved ones.  After much deliberation they decide that the only way to keep the dragon out is to build a huge moat around the village.  But it will take a whole year to build the moat if everyone in the village helps.  There is also another quick fix option to build a large bell that will frighten the dragon away when it attacks the village.  The bell will take a month to build but will delay the finishing of the moat by a month.  The village folk decide to have a vote –
  1. Should they put everyone in the village to task building the moat, all the while allowing the dragon to roam freely into the village and steal their loved ones for another whole year.
  2. Or should they take some people off the moat project and build a big bell to scare the dragon away and save the lives of the friends and families here and now, BUT delay the completion of the moat?

Not sure if you’re still with me after that random piece of fiction, but if I were in the village I would vote for option number two.

Chuck also mentions that we may very well be seeing a RAID 6 offering from EMC before long – although I think we all expected that anyway.

He also mentions that there is always more to do to improve on, and hints at aiming for six nines.  Of course its always good to aim high – but why not go all out and aim for 100%  ;-)

All in all, good banter and very insightful.


AUTHORS NOTE: Something that Jesse mentioned a while ago in response to one of my other posts was that when he runs RAID 5 on his EMC Symms he often doesnt bother to provision spare spindles.  Instead when a disk fails he allows the RAID set to run degraded until an engineer arrives with a replacement.  His thinking behind this is that, especially with larger disks, the engineer often arrives on site with te replacement before the sparing process has completed or at worst shortly thereafter.  If the engineer arrives before the sparing completes he has to wait before he can tell the box to spare back to the new disk.  All of this obviously consuming processor cycles.  And who am I to argue with Sangod!?

He also mentions that he knows of other people who do the same.  I have my reservations and would be interested to know if anybody else does this????