Galaxy and ProtecTIER issues.

Today,I've come across some problems when configuring a Commvault Galaxy server to use the Diligent's Virtual Tape Facility including ProtecTIER. I thought you'd be interested.

This applies to the Commvault Galaxy Media Agents for Windows for as far as I know.
Another criteria, is that you have chosen to configure drives to be spread across two or more fibrechannel connections.

When doing a standard detection in Galaxy, the library device and tape units are properly detected, and the units can be automatically configured into the corresponding drive slots in the library. While this is the preferred method according to Windows/Galaxy users, (clickity click :-) ) this does not result into the desired configuration though. 

The symptoms are as follows:
When doing tape IO in Galaxy, Galaxy expects carts in certain tape units, but they are mounted in another unit, resulting in corresponding SCSI errors, like LOAD_OR_UNLOAD_FAILURES. In some cases, the exhaustive detection and validations also fails. 

The reason:
When defining a new virtual library in the ProtecTIER GUI, and have chosen to spread the drives across two or more channels, Diligent has decided to use a naming scheme that uses add drive number assignments on one path, and even drive letter assignments on the other path. I have not been able to check the naming scheme when using more than two paths. But my best guess is that the problems will be similar.
The LUN id's are numbered in consecutive order, from 0 (or 1) to n, on one path, and 0 (or 1) to n on the other path. Whether or not the LUN numbers start at zero is dependent on which port you are configuring the library controller device on. The library device apparently is always LUN ID zero.

For the example, lets make 6 drives, spread across two paths (controllers).

Port 0, holds :

  • Library Device, LUN 0
  • Drive 0, LUN 1
  • Drive 2, LUN 2
  • Drive 4, LUN 3

Port 1, holds :

  • Drive 1, LUN 0
  • Drive 3, LUN 1
  • Drive 5, LUN 2 

You could say this is a nice config, because most of the load will be spread across the two paths. Our experience tells us, this is indeed working well. I need to point out the current experience was based on TSM on AIX.  TSM on AIX automatically detects in which library  drive slot the tape drive actually sits. Galaxy messes this up though.

In the above screenshot of the drives, the element address of the drive slot (in the GUI it's called "Address" ) is listed in the last column.
When taking a close look at this column, you should see that this list is consecutive also, but always starts at Address 16. You should also notice, that this list is hopping paths. When I sort on the " Port"  column, you will see, that all the even numbered Addresses are listed together, as well as the odd numbered Addresses. This tells me that the programmers at Diligent were at least consequent in their design and naming schemes.

This is also what apparently imposes the  problem on Galaxy.
When doing your scsi discovery, normal operations scan one bus at a time, resulting in a list in tape drives (LUN's) that are consecutive by port. During the discovery of the library device, inventory on that library also tells us, or Galaxy in this case, that the library holds n tape slots, as well as n drive slots, n drives and such.

The drive slots are consecutive according to  Galaxy, because the info Galaxy goes by is provided by the library device, and not by scsi inquiry commands. When using the Galaxy auto config features, Galaxy will match the drives to the drive slots, in the order they are detected.

Thus,

  • \.\tape0 (LUN 1 port 0) is matched to Drive Slot 1 (Address 16)
  • \.\tape1 (LUN 2 port 0) is matched to Drive Slot 2 (Address 17)
  • \.\tape2 (LUN 3 port 0) is matched to Drive Slot 3 (Address 18)
  • \.\tape3 (LUN 0 port 1) is matched to Drive Slot 4 (Address 19)
  • \.\tape4 (LUN 1 port 1) is matched to Drive Slot 5 (Address 20)
  • \.\tape5 (LUN 2 port 1) is matched to Drive Slot 6 (Address 21)

But, according to ProtecTIER, the drives and slots (Addresses) are matched hopping ports.
So, this is what it should look like. 

  • \.\tape0 (LUN 1 port 0) is matched to Drive Slot 1 (Address 16)
  • \.\tape1 (LUN 2 port 0) is matched to Drive Slot 3 (Address 18)
  • \.\tape2 (LUN 3 port 0) is matched to Drive Slot 5 (Address 20)
  • \.\tape3 (LUN 0 port 1) is matched to Drive Slot 2 (Address 17)
  • \.\tape4 (LUN 1 port 1) is matched to Drive Slot 4 (Address 19)
  • \.\tape5 (LUN 2 port 1) is matched to Drive Slot 6 (Address 21)

Below is a Galaxy screenshot of a partially configured tape library. Here you can also notice the odd/even numbered features. In this screenshot I renamed the Drive aliasses to match the ProtecTIER naming of Drives. Elm is short for element, and indicated the Address column in the ProtecTIER GUI.

So, you need to do a manual matching (moving in the Galaxy Library and Drive configuration tool) between the drives and drive slots.
Use the ProtecTIER GUI (Drives tree within the Library) to track back the port number, LUN number and Drive Slot (Address).
Once you've tackled this problem, everything should work like a charm.

This post isn't intended to be used as a manual, but merely as a reference for possible problems you might run into when trying to configure or troubleshoot the Galaxy configuration in combination with Diligent's  VTF/ProtecTIER. As most virtual libraries use the most common devices and methods, this post could possible apply to more virtual libraries in combination with Galaxy.

I hope is has some use for you.