When is a Backup Not a Backup (and When is it)
There’s been a lot of debate regarding backups vs. snapshots lately, and this is a common conversation we have with customers and prospects. This debate usually either directly or obliquely claims that SimpliVity doesn’t have “true” backups. I’m going to spend a bit of time discussing SimpliVity’s own technology — which is what matters to our customers — not any of our competitors’ technology.
Let’s start by defining why backups are necessary in every IT shop. This one should be fairly obvious, but backups make it possible to replace lost or damaged data. This loss or damage can happen at multiple levels:
- the loss of one or more files due to a careless deletion, malware, or ransomware;
- the loss of one or more VMs;
- the loss of an individual component like a hard drive;
- the loss of a system node like a disk shelf or hyperconverged infrastructure node;
- the loss of an entire system or array;
- or the loss of an entire physical site.
A secondary reason in addition to recovering from this loss of data is to provide the ability to return to different points of time. This can help in situations where the loss or corruption may have gone unnoticed for an undetermined amount of time. It can also help in situations where a previous copy of data can be useful for testing or verification.
So what is it any vendor needs to provide in a backup product or feature? Obviously, based on the above details, having the ability to keep multiple points in time for a well-defined retention period, and being able to restore this data quickly is critical. In order to do this, you need to ensure that the backup is safely stored outside the fault domain you are aiming to protect. For example, RAID works great to handle the loss of a single hard disk drive, whereas storing the full set of VM data on two separate nodes is critical for surviving the loss of an entire node. If you’re worried about the loss of an entire site, then storing your data at a second physical location is the proper strategy. In this last case, having an efficient way to complete the replication of the backups to the remote site in a reasonable period of time can be a major concern. While not a core requirement, being able to have the data as close as possible for quick restoration is a very nice to have in order to reduce recovery time.
This was the mindset that SimpliVity had when architecting the native data protection in the OmniStack Data Virtualization Platform: protect data at multiple levels to help customers protect their most valuable asset against failures at multiple levels. Starting at the most granular level, all data (VMs and backups both) are protected against hard disk drive failures by hardware accelerated RAID (which by the way, is far from old, antiquated, or kludgy). All data (again, both VMs and backups) are written to two nodes in order to protect against the loss of a single node. This way of maintaining data across two nodes ensures that a complete copy of the data is maintained off the VM’s primary storage as a normal course of maintaining data.
SimpliVity maintains all data fully deduplicated, which allows backups to be instant, fully independent metadata objects that reference the same set of blocks as the VM at that moment in time. Since they are created in metadata and fully deduplicated, there is no I/O generated in order to create them, which means that these backups are extremely fast and no longer need to be limited to the traditional backup window. Customers love this latter advantage, because it means they can do backups way more frequently than with 3rd party backup products, thus increasing the number of recovery points available to them and decreasing the amount of data potentially lost if they need to restore data. As mentioned before, but it’s worth repeating, these backups are protected by RAID and created on two nodes (assuming at least two nodes in a SimpliVity Data Center). Each of the two nodes independently manages its own data, so the data is stored in at least two locations, representing different failure domains, so that all data can be recovered in the event the primary node is lost or unavailable for any reason.
As a side note, we have a lot of information written and recorded about how easy it is to manage data protection in a SimpliVity environment, so I will simply link to a few of those: HyperGuarantee (see HyperProtected, HyperSimple, and HyperManageable guarantees), 3rd Party Review, video
To protect data from a site-level disaster, a customer merely needs to create a second set of rules to create the backup in another SimpliVity Data Center. When creating a manual backup, it’s the difference between changing a dropdown selection box from the default value of “Local” to the destination remote Data Center. These remote Data Centers can be another set (one or more) of customer-owned SimpliVity nodes, a 3rd party service provider offering SimpliVity backup as a service, or a SimpliVity instance running in AWS. The data efficiencies SimpliVity introduces to move these backups between sites is another customer favorite. See: blog part 1/part 2, Register article, customer testimonial for more details.
Another side note: the use of "SimpliVity Data Center" and "Data Center" are the same, indicating a logical construct where all the nodes contained within it work together to share out and protect data. They are capitalized to distinguish them from the physical location known as a "data center." A SimpliVity Data Center can exist across two physical data centers (aka a stretched cluster) and multiple Data Centers can exist within a single physical data center.
Are SimpliVity backups perfect and work in every scenario? Of course not. I’d get accused of flinging FUD if I claimed they worked in every scenario. (Side note: no technology works in every scenario.) If a customer absolutely must have a 3rd party vendor managing their backup data, then of course SimpliVity won’t be the end-all-be-all data protection product. We have plenty of customers who run Veeam or Avamar “just in case” SimpliVity would have some major issue. That doesn’t stop them from using SimpliVity’s native data protection for their daily and very short RPO needs. It just stops them from relying solely on SimpliVity. And that’s okay.
In other cases, customers may want to have every backup immediately restored into a sandbox environment so the backup can be fully tested to ensure it has full integrity. This can be done in a SimpliVity environment, but it does require some scripting/orchestration today. If a customer needs this level of functionality out of the box, then they may want to consider Veeam (which works perfectly well on SimpliVity).
Do customers really trust SimpliVity to be their only data protection solution? The answer is definitely yes. Do all of these customers blindly trust the SimpliVity (or partner) sales rep and SA? Believe it or not, they don’t. Anyone who is in IT sales at any level knows that customers ask A LOT of questions when considering new technology, and that’s a good thing! As someone who’s been on all sides of this situation, I advise that no one should trust anyone in sales blindly. SimpliVity’s SAs spend a lot of time educating our customers on the way the technology works to ensure they know what they’re getting. Hopefully articles like this make that job easier.
One of my favorite stories is of a customer who implemented SimpliVity one year after a brand new backup deployment was put in. With this shiny new backup infrastructure in place, the customer had no interest in utilizing SimpliVity backups. No problem. As part of the knowledge transfer after implementation, the SimpliVity Solutions Architect showed the customer how the backups worked, created some policies to take regular backups, and quickly moved on knowing there was little interest. A year later, the customer informed the SA that they were pulling out that two year old infrastructure and were going to instead rely solely on SimpliVity backups, because of how easy, fast, and well-protected they were.
Another customer story involves one that was doing both Unitrends and SimpliVity backups. At one point they needed to restore a 4.5 TB VM, so they headed to their Unitrends interface, since that was their first level of comfort for restoring data. Unfortunately, the VM was corrupted upon restore, so they instead restored the SimpliVity backup, and had the VM ready to power on in 5 seconds.
So in summary, can SimpliVity protect data in the ways listed at the beginning?
The loss of one or more files due to a careless deletion, malware, or ransomware — utilizing file level restore functionality, the files can be restored from a local backup with no I/O.
The loss of one or more VMs — easily restore an entire VM from a local or remote backup.
The loss of an individual component like a hard drive — RAID is a tried and true mechanism for protecting data from multiple hard disk drive failures.
The loss of a system node like a disk shelf or hyperconverged infrastructure node — by storing each VM across two different nodes, the loss of one node will not affect the ability to access the VM’s data, or any of its backups.
The loss of an entire system or array — with the addition of a single node in the same or a different physical site, backups can be created to protect against the unlikely loss of an entire SimpliVity Data Center.
The loss of an entire physical site — really no different from the previous item, just ensure that whatever disaster (power outage, flood, tornado, hurricane, etc.) would take out the primary site won’t affect the secondary site.
Terminology can be tough when doing things so radically different from what most people have seen before. Honestly, for most of the major hyperconverged players, this is a common problem. Hyperconvergence really is a game changer! It’s pretty amazing to witness the “a-ha” moment from long-term storage practitioners/analysts during a deep dive into SimpliVity technology. They have never seen anything like this before, and agree that “snapshot” definitely isn’t the right way to describe a SimpliVity backup, especially given all the baggage the term “snapshot” usually brings. For example, most storage arrays leave some chain of custody in their point-in-time copies that would result in the snapshot being worthless if the original VM is deleted. This is not the way SimpliVity backups work.
Is “backup” even the right phrase? Maybe not, but quibbling over phrases instead of truly discussing the underpinnings of the technology and how they meet customer requirements is the land of FUD. When considering hyperconverged infrastructure vendors, I would encourage everyone, especially the industry analysts, to consider what level of data protection is necessary, how easy it is protecting to these levels, and the trade-offs to capacity and performance necessary to achieve these desired levels of data protection.
P.S. Thanks to my esteemed colleague Ron Singler for allowing me to post this on his blog while I work out some issues on my own site.