Self tuning storage?

Well, its 1:30am and the sheer anticipation of the expected Hitachi announcement is keeping me awake!  Actually it’s the jet lag, but Im awake so I’ll post. 

I’ve been impressed with Barry Burke’s contributions to the storage blogsphere.  I’m a techie at heart so appreciate the angles that he comes from, even if he does work for the Evil Machine Company ;-)

A while ago I asked Hitachi if they could get some of their techie product guys blog and Barry is exactly the type I was looking for from Hitachi.  At the time I when I asked Hitachi I said I thought something “quite like but not exactly like” Tom Treadway and his gang over at the Adaptec Storage Advisors would be good.

While Hu and the rest of the blogging crew at Hitachi are great bloggers, they’re just not quite what I thought was missing.  Hu certainly has the passion and the know, but from what I read on his blog, he spends much of his time travelling and entertaining customers and is a little further from the guys at the factory than someone like Barry.

Anyway, if I slap Barry’s back anymore I risk being arrested on charges of assault, so I’ll move on to the reason for this post.  In the absence of a blogging firmware engineer from Hitachi I will don my Hitachi hat and say a few words in response to Barry’s recent post titled “self tuning storage – today & tomorrow ”.

Barry is actually posting in response to a post by Chris Evans that refers to self tuning storage.  In his post, Barry gives a few nice insights into the inner workings of the Symmetrix and puts it up there as the only array that self tunes itself without human intervention.  A grand claim, even though he backtracks a little at the end of the post by admitting that the Symm is not yet self optimizing but ensures us that they nice people at EMC are working on it.  Anyway, I thought it was an interesting post but had a couple of thoughts on it. 

Barry starts out by referring to what he calls the “walls of fame” in the EMC office he works at.  These so called walls of fame are littered with patents that EMC owns, implying, I think, that more patents = more impressive?  Well it doesn’t do it for me.  It reminds me of a TV advert (commercial) that ran recently in the UK where Audi bragged along the lines that while engineering the current A6 they filed more patents that NASA did during the space race, as if like Barry, thinking that might impress me.  However, as much as I’d love a nice Audi (the RS6 to be exact), for me they are not in the same league as NASA – no matter how many patents they own!  In fact, Im wondering if the certificate trademarking the terms “Recovery Point” and “Recover Point” is on the same wall of shame in Barrys office.

Interestingly though, one of the patents Barry refers to covers what he calls “free reads”. Basically he is saying that any time a cache miss occurs, the Symm will position the r/w heads over the required track and stage into cache every sector on the track up to and including the required sectors.  He also says that if the disk in question is not so busy, it will read the rest of the track into cache as well.  The result being improved cache hit ratios for very little cost.  Fine, I like it.  <on with my Hitachi hat> But to my knowledge the USP does exactly the same, well…… my understanding is that the USP will always stage an entire track into cache when a read miss occurs.  It will also only stage read data into one side of cache for efficient use of the duplex cache architecture.  No need to duplex read data as its already there on RAID protected disk.  So no real difference there.

On a side note - this reminds me a little of an old post from Storagezilla (another one of those guys from Chuck’s ramshackle mob ;-) ) where he claimed that EMC was the only vendor working on mitigating disk failures while everyone else was just banging the RAID 6 drum but not bothering to address the issues of the disk failures themselves.  Of course that’s blatantly not true so I asked what exactly EMC had been doing, that the rest hadn't, to avoid disk failures (I was and still am genuinely interested).  However, I was politely told that he wasn’t the guy to tell me about stuff that goes on in the microcode and it was a shame that there wasn’t an EMC microcode guy out there blogging.  Well, fast forward to today…… is Barry the microcode guy that old zilla was hoping for?  And if so, can he tell us what great things EMC have been doing to reduce disk failures?  I'm still interested.

Anyway back to Barry’s post.  He goes on to talk about Symmetrix Optimizer, which sounds to me like Cruise Control/Volume Migrator on a Hitachi box.  Basically software that allows you to move an LDEV/hyper from one set of spindles to another, non-disruptively. 

Firstly I don’t totally agree with the “non disruptive” claims of products like these.  That’s what the marketing materials say (BTW my Hitachi hat is off now).  Don’t get me wrong, the technologies that Ive used are really good and have built in rules that make them as non-disruptive as possible.  But in the real world (or may be just my world), I try not to use them on my most important production volumes during critical business hours.  I have always tried to wait until quiet times to use them. The simple reason being that these technologies have to read and copy an entire LDEV/hyper from one set of spindles, through cache and across the internal interconnects to another set of spindles.  This has to have an affect on performance of the array.

May be Im just not visionary enough (I admit I can barely see past my next meal) but I can only see this being possible with a major architectural changes which would only happen over a long period of time.  For example, the USP has dedicated RAID calculation chips that are there to take the strain of RAID calculation (especially in the event of a rebuild from parity) off the other processors making RAID calculations less disruptive.  So I can't see tools like Symmetrix Optimizer being non-disruptuve without its own dedicated power internally.  Im thinking along the lines of separate internal bus (crossbar switch etc - ooops, a reference to the mighty crossbar switch, my Hitachi hat must be back on) and disks with more heads/actuators or more mirrors dedicated to management rather than normal user I/O.  Hmm starting to rant now and talk about things that are way over my head and its far too late, or is that early for me to speak even semi intelligently on a subject like this.

My other point on these technologies is that they are not clever enough to be left fully alone in automatic mode.  For example, I might have an LDEV and associated spindles that get very hot while a month end report is running but are cool for the remainder of the month.  Then there is also the risk of flip-flopping where the box starts spending so much time moving things t oo and frow and back and forth in a reactive way.  To me, the technologies are great and getting better, but still require too much management to be classed under the banner of self tuning.  Just my opinion, and may be the EMC stuff is more intelligent than the tools Ive used.

I’m with Chris, I just don’t think that todays storage arrays are truly self tuning.  If the vendors say they are, then this would be just as misleading as the “storage virtualisation” naming faux pas that we have recently been talking about.

Todays enterprise storage is great and getting more self optimising and intelligent all the time.  But there’s still no substitute for the human touch.

Nigel

Disclaimer: I am not a Hitachi/HDS employee never mind one of their firmware gurus.