In my last post about VSAN I went through the setup of my VSAN lab without spending too much time on details. This time I want to take some time to look at some details.
Let’s start easy. The first thing I recognized in a lab with limited physical resources is that the overhead cause by VSAN is solid. The following picture depicts how much resources are in use on an ESXi host with VSAN enabled when NO virtual machine is running:
Currently I’m running a single disk group with 1 SSD + 1 HDD. So for a real world VSAN deployment with some more disks make sure to add some extra RAM. To be able to support the maximum number of disks and disk groups, the host requires at least 32GB RAM. VMware also state that VSAN won’t consume more than 10% of the computing resources of a single host.
More details can be found in the latest Design and Sizing Guide
When you want to put a host into maintenance mode there are now some options you can choose from:
Quote right from the information pop up:
Full data migration: Virtual SAN migrates all data that resides on this host. This option results in the largest amount of data transfer and consumes the most time and resources.
Ensure accessibility: Virtual SAN ensures that all virtual machines on this host will remain accessible if the host is shut down or removed from the cluster. Only partial data migration is needed. This is the default option.
No data migration: Virtual SAN will not migrate any data from this host. Some virtual machines might become inaccessible if the host is shut down or removed from the cluster.
The last two options may violate your configured storage polices. Assuming you are using an N+1 policy then you usually have two copies of all protected VMs. If you want to shut down a host which is storing one of the copies, you end up with just a single dataset of the protected VMs. This will violate the N+1 policy and the VM will be listed as not compliant.
If the number of hosts within the cluster is sufficient, VSAN is able to automatically make the VM compliant by creating a new copy of the VM data. Before VSAN starts to copy any data, by default it will wait 60 minutes, because the host may come back online. In my case with just three host I’m not able to test it. An N+1 policy requires at least three host, two which host the VM data and one for the witness. Accordingly an N+2 policy requires fours hosts.
A question which I was not able to answer what if I put a host into maintenance mode without full data migration, will VSAN recognize it like a host failure? If yes VSAN would be able to automatically bring the VMs back into compliance.
I would say the following update actually answers this question. Because even if two hosts were in maintenance mode and all VMs were not compliant, the VSAN cluster seemed to be healthy and showed three eligible hosts.
A colleague just asked if it would be possible to put two of three hosts into maintenance mode. My first though was “No” but I decided to proof it and my first thought was wrong!
Storage DRS, SIOC, FT, DPM & large VMDK (62 TB) are NOT supported!
OK I mean sDRS wouldn’t make any sense since you just got a single aggregated datastore for your VSAN cluster. SOIC was designed to ensure fairness between virtual machines when it comes to I/O queues and I would assume VSAN has its own mechanisms to deal with that. DPM? Honestly I never seen a customer really using it and it would cause unnecessary data movement to ensure data accessibility. With FT it’s similar, at least our customers rarely have the need to make use of FT, and so the only disappointing point is the missing support for 62 TB VMDKs which would be cool. Not to mention that some of those features require an Enterprise (Plus) license but I indeed see VSAN also for small environments which maybe just licensed with Standard licenses.
VMware vMotion & DRS
There is not much to say about that, it simply works as usual.
That’s it for Part I, find out more about VSAN & VMware HA in Part II.