When bigger isn’t better

Ive recently been reviewing the design of an enterprise backup environment that is using LTO-3 as its tape technology.  When looking at the strategy for the internal database backup, approx 2GB, I noticed that they were backing it up daily to a dedicated non appendable pool of tapes and having the tapes shipped offiste each day.  All of which is fairly standard practice.  The thing that bothered me was how long they were keeping these backups for – 1 month!  Im still struggling to think of a scenario where you might want to recover your backup environment to a month ago!?!?  But the point Id like to make is how much space this is wasting on a 400/800GB tape (in a 31 day month this will store 64GB on 12400GB worth of tape, and that’s without compression).  Smaller capacity tapes would be ideal for this situation but of course as is the way with everything in the storage industry – everything is getting bigger! 

There is a similar, seemingly unstoppable, trend with disks too.  Now to the uninitiated, bigger disks might seem all good.  However, to the more initiated, several issues become apparent and Id like to address one of these here. 

I recently did some performance measuring on a storage array for a company that was being forced into using larger disk drives in a storage array.  Although, on the spec sheet, the larger disks performed near enough the same as the existing smaller disks, the problem would arise from the potential of creating more LUNs on these larger disks.  For example (WARNING: oversimplified example) –  

Imagine this company was currently running with 50GB disks and a standard LUN size of 10GB.  The existing disks are therefore divided into 5 x 10GB LUNs.  At the moment the disk performance is fine.  However, despite the fact that the larger 100GB disks perform very similarly, each one will be divided not into 5, but into 10 x 10GB LUNs and therefore potentially receive twice as much “work”.  The matter is further complicated by the fact that as more and more LUNs are carved on a single disk it becomes more and more difficult to predict the type of workload that the underlying disks will be subjected to. 

And then consider this scenario……what do you do when your Oracle DBAs come to you asking for storage for the new database that’s going in?  Of course they want their own dedicated spindles!  And as usual they know exactly what they want – 18GB 15K Wink So for a start you have a hard enough time convincing the DBAs to go with larger disks because the new array doesn’t support 18GB disks, the smallest it will take is 73GB.  Then you find out they only want 60GB for their database file and 10GB for their log files – not both on the same set of disk  as performance is crucial and they cant afford to mix different workloads on the same disks.  The next problem is that your array only lets you install disks in groups of 8, so…… when its all translated, what the DBAs are asking for is two groups of 8 disks –  ·         

  • DISK_GROUP1 for database = 8 x 73GB in RAID 10 = 292GB useable·         
  • DISK_GROUP2 for log files = 8 x 73GB in RAID 5 = 511GB useable 

All a bit overkill don’t you think?  So after convincing your DBAs that the bigger disks wont bring down the performance of their database you now have to convince management into buying in the region of 800GB of useable space when you only need 70GB of it – good luck! 

As a trade off may be you could put some other sequential workloads on DISK_GROUP2 but the question then becomes “how many sequential workloads can a disk group take before the workload becomes more random than your random workloads” Laughing

This also reminds me of speaking with an EVA guy a while ago (I like the EVA) who was evangelising about how great the EVA was because the backend never needed to be a bottleneck anymore.  If you had a bottleneck on the backend you simply added more disks, and more disks, and more disks until the problem went away.  The problem with this approach though, is that its a two edged sword – every time you add disks you add capacity, and the first law of storage dynamics states that “wherever free capacity exists it will be used” (BTW I just made that law up so don’t go quoting it in any meetings). 

My theory behind it is this –  

more/bigger disks = more space = more LUNs = more hosts = more applications = more workloads = more contention = more random = DIRE PERFORMANCE 

To sum it up disk manufacturers seem obsessed with larger capacities, a bit of “our disks are bigger than your disks”.  To a point I understand why with the current rates of data growth.  However, as we’ve seen larger capacities brings its own problems, especially with the disk drive already being by far the slowest part of a computer system.   

Mackem