VSAN


Final thoughts about my VSAN experiences

To wrap up my VSAN series I want to share my final thoughts with the community. Please feel free to comment and share yours!

All experiences I’ve made with VSAN are only based on my LAB with the minimum deployment of just three hosts and without any real world workload but I would say I’m able to rate the overall impressions I got.

In my case the setup itself was quite simple because I already had a vCenter server running but in case of a green field deployment the provided boot strap process is maybe a bit cumbersome but no big deal.

The policy based management in general is really pleasant and offers flexibility to assign different policies to different workloads or even different VMDKs on a single VM.

The way VSAN handles problems/outages is good but it also has the potential to cause some trouble if you don’t follow the recommendation to set a proper host isolation response. Please see my “Working with VSAN – Part II” post for details.

The lack support for stretched cluster deployments and large 62TB VMDKs is a bit disappointing but I hope it won’ take too long until these features make it into the product.

From a performance perspective I won’t rate it without having any real experience from a productive environment but I can rate the way it scales, which is quite nice. I would always recommend to select a chassis which allows future SSD/HDD expansions. Personally I favor the Dell PowerEdge R720 XD which offers support for up to 24 HDDs, redundant SD card to install the hypervisor, sufficient computing recourses and enough slots to add HBAs, RAID controllers or Flash cards. I think really important is the ability to add hosts which do not contribute storage to the VSAN cluster. In my lab I was not able to feel the difference between a VM running on a host with or without a local copy of the VM data.

But please be realistic, if your “working set” doesn’t fit into the SSD cache and I/Os need to be served from disk(s), this can impact the application performance. Many people I’ve talked to were wondering why VMware doesn’t let customers the choice to use RAIDs instead of single disk drives to speed up disk operations. I don’t know if there is a technical reason behind this requirement or just the vision of using storage in a more efficient way.

When it comes to networking probably even if 1 GbE will be sufficient for smaller deployments but you also have the ability to mitigate potential bottlenecks by using multiple network adapters to handle the VSAN workload.

I’ve also talked to some VMware folks who don’t see VSAN as a 1:1 replacement for classic SANs yet. In the end we agreed that it heavily depends on the planned use case and the expected workload. Probably a huge IO monster database with hundreds of GB or even TB is not the best use case for VSAN, just keep that in mind.

However I indeed see it for customers which are running smaller environments with reasonable workloads to “replace” entry level SAN solutions. The huge benefit is the simplified management which enables admins to work in their well-known environments like Ethernet networking and vSphere.

But all that glitters is not gold. What really annoyed me was a problem with the VM Storage Policies or actually with the VSAN Storage Providers. There is a known Issue with vSphere 5.5 Update 1. In my opinion this is not supposed to happen when releasing an update and making such a hyped solution GA. To cut some corners and to speed up fixing the issue I moved all my hosts to a new VMware vCenter Server Appliance, which was no problem for VSAN itself.

So overall I really enjoyed working with VSAN and now I feel comfortable to recommend it to customers if it fits into the environment and it matches the expected workloads. This is important for me personally because I think you should always stand up for a solution you sell to a customer.


Working with VSAN – Part IV

To continue my “Working with VSAN” series, this time I want to challenge the scalability (at least what was possible within my lab). But see yourself.

Performance scaling by adding disk groups

To see how VSAN scales when adding disks I did the following tests:

IOMeter @ 32 QD, 75/25 RND/SEQ, 70/30 R/W, 8KB in combination with different disk group and VM storage policy settings. But it’s actually not about the total values or settings, it’s about to show the scalability. Not to mention that the SSD used are pretty old (1st GEN OCZ Vertex) and differ in performance!

RUN1

Failures to tolerate (FFT): 0

VMDK Stripe: 1

FTT0_ST1

FTT0_ST1_IOM

RUN2

Failures to tolerate (FFT): 0 – So still on one host…

VMDK Stripe: 2 – … but on two disk groups!

FTT0_ST2 FTT0_ST2_IOM

To be able to combine multiple stripes like shown above with FFT > 0, you will need multiple disk groups in each host to get the performance. In my case I just got a single host with two disk groups, so I was not able to perform the same test with a FFT = 1 policy.

Changing VM Storage Policies

To wrap up this post I want to mention that during my tests I’ve always used the exact same VMDK and so I had to change the policy multiple times. Of course it took some time till VSAN moved the data around to that it was compliant with the policy. But it worked like a charm and I though it is also worth mentioning!

But what about the network?

Multiple VMKernel Interfaces 

In case you are planning to run VSAN over a 1GbE network (which is absolutely supported), multiple VMKernel interfaces could be a good way to mitigate a potential network bottleneck. This would enable VSAN to leverage multiple 1GbE channels to transport data between the nodes.

To be able to use multiple VMKernel ports you will need to keep in mind to use different subnets for each VMK and to always set a single vmnic as active and all others to stand-by.

ActStbyNic 2VSANVMK

To see how this would scale I moved a virtual machine to a host who had no data stored locally of that particular VM, so that all the reads (& writes) had to traverse the network.

2VMsWithLocalwitnessI also had setup the VSAN networking a couple of day before so I started with the desired multi VMK setup und were quite happy with the results.

FTT1_ST1_esxtop_2VMK FTT1_ST1_IOM_2VMK

Then I disabled VSAN on the second VMKernel and also moved the vmnics down to be in stand-by only. The result were as expected, VSAN were just using a single vmnic.

FTT1_ST1_esxtop_1VMK FTT1_ST1_IOM_1VMK

 To verify these results I wanted to switch back to the multi VMKernel setup but for some reason I wasn’t able to get it back to work again. I moved the vmnic up to be active again (as depicted above) and re-enabled VSAN traffic on the second VMKernel interface (VMK4). But since then I was unable to see VSAN traffic across both NICs again. When I disable VSAN traffic on the first VMKernel (VMK3) it switches to the second interface (VMK4) which tells me that the interfaces are generally working. At this point I’m a bit clueless and asking you guys, have you already tried this setup? What are your results? Am I missing something or did I misunderstood something? Are there any specific scenarios where the multi VMK kicks in? I would love to get some feedback!


Working with VSAN – Part III

In Part I and II I already tested some scenarios which may impact your VSAN cluster, like simulating the outage of two hosts by just isolating them. This time I’m going to torture my VSN lab even more, read on to how this turned out.

What if I put one host into maintenance mode and another node fails?

HostMaintHostFailed

Will the remaining node be able to run the VMs and even restart those of the failed host?

Maintenances Mode with “Full data migration”

MaintModeError

VSAN didn’t allow me to put a host into maintenance mode using full data migration. I got a couple of VMs running with a VM Storage Policy saying FFT (Failures to Tolerate) = 1. So with just three hosts this would violate these rules.

 

Maintenances Mode and “Ensure accessibility”

This mode is possible since it ensures that at least one copy of the VM data and the witness (or the second copy of the VM data) is available on the remaining nodes. This mode didn’t move any data around since there were no eligible nodes available.*

Then I simulated several outage scenarios to see what would happen:

  • Host Reboot of ESX3

The remaining host ESX2 was fully functional and restarted the VM running on ESX3.

  • Disk Failure on ESX3

UnhealthyDiskGroup

The remaining host ESX2 was fully functional AND the VMs running on ESX3 were functional since they were able to access their disks over the network on ESX2. So it was also no problem to vMotion the VMs from ESX3 over the ESX2, reboot the host to fix the disk failure.

Btw. I simply re-plugged the SSD I pulled out to simulate the failure. By re-importing a foreign config. in the PERC Controller, the volume was back online and VSAN recognized that and no data was lost.

ForeignConfig

  • Network Partition – ESX3

The last test was to isolate ESX3 and as expected the remaining host ESX2 was fully functional and restarted the VM running on ESX3.

Honestly?  This is way better node cluster than I expected, since we are still talking about a THREE! node cluster. Ok I admit there could also arise scenarios where thing can go wrong. Assume the scenario above, when there was no VM data on ESX2 then those VMs on ESX3 would have crashed.

But again a three node cluster is just a minimum deployment, so if you want to make sure you can withstand multiple host failure, you have to add more nodes it’s as simple as that.

* Contrary to the scenario when you put two hosts into maintenance mode, then VSAN will start to move data around!

RAID0 Impact

OK now the disk is gone and I want to replace it. Usually when using a RAID other than RAID0 this would be no problem since the volume would be still online. In my case I was forced to use RAID0s on every single device because the PERC 6/i doesn’t support pass-through mode. For now, even if I replaced the drive the RAID0 was still offline. This means I had to reboot the host to manually force the RAID0 online again. In case I would have used a pass-through capable controller, this would be no problem since it would just pass through the new disk. The RAID0 also disables the option to use a hot spare disk since from a logical standpoint it wouldn’t make any sense to replace a disk within a RAID with an empty disk.

 

Stay tuned for more VSAN experience!


Working with VSAN – Part II

VMware HA

It’s important to know that HA must be enabled AFTER VSAN and if you are changing any VSAN networking settings you should re-configure HA so that it is aware of those changes. When VSAN is enabled HA traffic (heartbeat) will be send via the VSAN VMkernel ports. More details can be found on Duncan’s Blog.

How did I test if HA is working properly? I simply rebooted the host which was hosting my vCenter server VM. Guess what? Right, it worked. Once the lock on the VM expired another host picked up the VM and booted it as expected. This also shows that there is no dependency to the vCenter server when it comes to rebooting VMs on VSAN.

 

Host Isolation & VMware HA

This is basically the reason why it’s not possible to run VSAN with less than three hosts, if redundancy is required (like n+1). Before VSAN will create a copy of a virtual machine it will make sure to have a third host available to hold the witness. Witness

In case of a split-brain the witness will help to decide which of the hosts who is holding a copy of the data (in this case esx1 or esx2) should take over control. The host who is able to communicate with the node holding the witness will be able to do so. Whereas the isolated host should power off its VMs accordingly (requires the proper setting for the Host Isolation Response). If the VMs would continue to run on an isolated host there would be no way to properly protect the data because they won’t be mirrored to another node.

To see how it would behave I simply isolated one of my hosts.HostIsolatedVMDiskPlacement

1st RUN

Host Isolation response: Power Off

11:41 PM Connection to the host has been lost

11:42 PM vSphere HA initiated a virtual machine failover action in cluster Cluster2 in datacenter Home

11:42 PM vSphere HA restarted a virtual machine vMotionME on host esx3.home.local in cluster Cluster2

The VM was back online quite fast. When I reconnected the host everything was looking fine because the isolated VM already got powered off.

 

2nd RUN

Host Isolation response: Leave powered On

11:57 PM Connection to the host has been lost

11:57 PM vSphere HA initiated a virtual machine failover action in cluster Cluster2 in datacenter Home

11:57 PM vSphere HA restarted a virtual machine vMotionME on host esx3.home.local in cluster Cluster2

So far so good. Via the IPMI adapter I was able to take a look at the running virtual machines:

VMproc1

But wait! esx3 which just has restarted the VM is also running the VM:VMproc2_on2ndHost

This could cause some issues!

If the VM network is not affected by the isolation the VM would still be accessible via external connections, which sounds good at first. But VMware HA will restart a new copy of the VM very soon, so you would get duplicate IP addresses on your network, not to mention the applications/clients connecting to those VMs could freak out.

And I also had the problem, when I reconnected the isolated host with the running VM the host started “to fight” with the host who restarted the VM. The vSphere client showed the VM flapping between two hosts. I had to manually kill the VM process to end that fight. So make sure to use a proper host isolation response!!!

 

What if I lose 2 of my 3 hosts?

In case of a central storage (SAN/NAS) this would be no problem as long as the remaining host would provide sufficient computing resources, or at least enough to power on the majority of your VMs.

With VSAN things look a bit different, depending on the number of hosts in a cluster. I moved all VMs to a single host and isolated the two other host from the cluster so that I ended up with a single host running all virtual machines. The remaining host was not so happy. All VMs slowed down rapidly, applications stopped responding, the host wasnt able to open a VM console session (MKS missing), it felt a bit like an All Path Down (APD) scenario. However the RDP connection was persistent and didn’t disconnect.But I would assume this is similar to an APD, this may work for a while but not infinitely!?VSAN_unkownVMs

When I re-enabled the first switch port to just ONE of the isolated hosts the remaining host instantly recovered and all VMs continued without any problem.

I admit this was not fair to force the three node cluster into three network partitions. Let me reemphasize that VSAN works as designed! A three node cluster allows a single host outage and if more are required you simply need to add additional hosts!

 

Broken Host

What if your ESXi installation is broken and you have to re-install ESXi? I tried exactly that by re-installing ESXi without any preparation. So the host I re-installed was still a VSAN cluster member when I rebooted it for the setup.

During that time the other two hosts where running fine, no issues with the VSAN cluster at all.

//Update: Somehow I missed the # right before the devices. So the installer is aware that those devices are already claimed by VSAN: ReInstESXiVSAN

Once the host was back online I had to perform some manual steps:

  1. Re-connect the host
  2. Remove and re-add it to the distributed switch*
  3. Re-create all VMKernel interfaces
  4. Disable VMware HA on the cluster. As soon as I disabled HA, the VSAN cluster went green and the VSAN datastore appeared!
  5. Re-enable VMware HA

VSANgreen

* In case you run in the following error when trying to remove the ESXi host from the vDS:

vDS DSwitch0 port 40 is still on host esx1.home.local connected to ESXi5.0 nic=4002 type=vmVnic” Solution: Reboot the host and during it is offline you can easily remove it.