vSAN 3 Node Data Resiliency

vSAN 3 Node Data Resiliency

I've received a few requests from the field recently regarding FUD being spread by a competitor about how vSAN handles resiliency of data for 3 node clusters so I thought I'd write about it just to clear the air a bit.

Note: We will be referencing the resiliency policy FTT=1 vs RF2 in the rest of this post. Each of these methods keep two copies of data within a cluster.

So the statement (or lie as most folks call it) that has been going around is this "Our resiliency is better with only three nodes so you don't need a fourth node like vSAN does." When I first read it I laughed and thought to myself "There's no way a competitor is stupid enough to tell this lie to a customer if they know anything about how clustering technologies work" but it turns out they did.

As you know, with any clustering technology you must have a quorum of devices to provide full availability for whatever you are clustering. For MSCS and newer you need at least two nodes plus a shared quorum device that manages split brain. For vSAN, you need two copies of the data and a witness to have a full quorum within a cluster. Since vSAN utilizes RAID 1 in FTT=1 configurations you will require at least three nodes in your cluster. (Note: we aren't talking about two node clusters here where the witness is outside the cluster but the premise is the same. See this post from Jase Link.) In the case of our FUD friends they must have three nodes as well. One for the local copy, one for mirroring the secondary copy of data, and a node for a witness component. That is three nodes for FTT=1 and three nodes for RF2. Good so far?

Borrowed from CormacHogan.com

Let's talk about failures. If either one of these configurations loses a device, lets use a disk in this case, they both handle things similarly. vSAN will immediately begin a rebuild of the components affected by the disk failure on a surviving device within the cluster, either on another disk within the same node or on a disk on another node. Once the components are rebuilt then you are back to full availability. The RF2 folks will handle things in a similar fashion so I won't go into detail here.

Another possible scenario is maintenance mode for a node and in this example we're going to upgrade RAM. In this scenario the first step is to put the node being upgraded into maintenance mode. In a three node cluster, when you do this you are essentially taking one of the three devices in your cluster offline and therefore can't achieve full quorum for that device (data in this case). This is what is known as a majority node scenario. You put yourself at risk should there be a failure while accomplishing the tasks you're doing during the maintenance window. It doesn't matter whose technology you're using, when you lose quorum the cluster is considered at risk. Period. No one can do some sort of cluster magic with only three nodes in the cluster, two nodes up and one of the nodes offline. "That's not how any of this works."

When your maintenance is complete you bring the host out of maintenance mode, data re-syncs and you're back to a full quorum.

Now if you look at VMware's history of over 20 years in business you'll see that we are a company that puts our customers first and always want to ensure that they are well taken care of. We, like any other software company, have minimum specifications and recommended specifications for our solutions. If you're a PC or Mac gamer then you know all about minimum specs and recommended specs. Minimum is simply what you need to make things work correctly, three nodes in this case and recommended specs would be what you need to make it work at optimal performance and resiliency which would be four nodes in this case.  With four nodes in the cluster you have a place for your cluster objects to be re-protected in the event of a failure or maintenance activity. That removes any potential risk of a failure, or loss of data, in the solution while one node is offline. Wouldn't you rather always be fully protected against a failure when you're upgrading RAM or replacing a bad HW part in one of your servers? I know I would.

In the end you can run vSAN with only 3 nodes, and we have lots and lots of customers doing that, but we recommend you run four nodes so that you always have resiliency of your data no matter what you're doing within your vSAN cluster.