storage


FreeNAS 9.3 NFS share – ESXi datastore – Unable to connect to NFS server – Fixed! 4

Update2: I was able to “fix” the problem by downgrading to release 9.2.1.9 which works like a charm!

Update: Today I ran into the same problem again and I’m still working on it. This time adding a comment or modifying it didn’t help. Once I found out what is causing the problem I’ll update the post.

Since yesterday evening I was trying to mount a new FreeNAS (9.3) NFS share as ESXi 5.5 datastore and no matter what I’ve tried, the attempt always failed:UnableToConnectToNFSServer

~ # esxcfg-nas -a FreeNAS:Volume1 -o 192.168.180.150 -s /mnt/Volume1

Connecting to NAS volume: FreeNAS:Volume1

Unable to connect to NAS volume FreeNAS:Volume1: Sysinfo error on operation returned status : Unable to connect to NFS server. Please see the VMkernel log for detailed error information

 

ESXi : /var/log/vmkernel.log

2014-12-15T19:33:30.901Z cpu2:55811)NFS: 157: Command: (mount) Server: (192.168.180.150) IP: (192.168.180.150) Path: (/mnt/Volume1) Label: (FreeNAS:Volume1) Options: (None)

2014-12-15T19:33:30.901Z cpu2:55811)StorageApdHandler: 698: APD Handle a99dc9de-d6c49dd6 Created with lock[StorageApd0x41118f]

2014-12-15T19:33:40.023Z cpu1:34783)World: 14296: VC opID hostd-00ac maps to vmkernel opID f3171169

2014-12-15T19:34:00.023Z cpu1:34783)World: 14296: VC opID hostd-d3b4 maps to vmkernel opID 707e611f

2014-12-15T19:34:01.293Z cpu3:55811)StorageApdHandler: 745: Freeing APD Handle [a99dc9de-d6c49dd6]

2014-12-15T19:34:01.293Z cpu3:55811)StorageApdHandler: 808: APD Handle freed!

2014-12-15T19:34:01.293Z cpu3:55811)NFS: 168: NFS mount 192.168.180.150:/mnt/Volume1 failed: Unable to connect to NFS server.

 

FreeNAS: /var/log/messages

Dec 15 20:33:42 FreeNAS mountd[3769]: mount request succeeded from 192.168.180.80 for /mnt/Volume1

Dec 15 20:33:57 FreeNAS mountd[3769]: mount request succeeded from 192.168.180.80 for /mnt/Volume1

Dec 15 20:34:02 FreeNAS mountd[3769]: mount request succeeded from 192.168.180.80 for /mnt/Volume1

This screenshot looks like many others floating around in multiple community threads and the config seemed to work for a couple of users: FreeNASShareProperties

But it took me a while to realize that I should try to add a “Comment:” FreeNASShareComment

~ # esxcfg-nas -a FreeNAS:Volume1 -o 192.168.180.150 -s /mnt/Volume1

Connecting to NAS volume: FreeNAS:Volume1

FreeNAS:Volume1 created and connected.

FreeNASConnectedShare

I hope this helps to save you some time!


Pure Storage – Hands-on Experience Part I

In the last couple of days I got the chance to get my hands on a real All Flash Array from Pure Storage. Why real AFA you may ask? In my opinion an all flash solution is more than just putting a bunch of flash drives into a chassis as some vendors tend to do, because things that work well with spinning disks do not necessarily have to work out with flash drives. And why Pure? Simply because they convinced us with their product which not only offers some hardware but also an intelligent software and combined together it makes a really cool solution. The following post should provide you with a brief summary of the key features as well as some hands on experience.

 

Hardware

The hardware is rather simple, depending on the controller model you chose you get two 1 or 2U controller nodes plus SAS attached chassis holding the flash drives. The installation is quite simple and well documented, it took just a few minutes. As far as I can tell the hardware is based on a major OEM which provides high quality and reliable hardware, so I’m absolutely fine with that. Both controllers are connected via redundant 56 Gb/s Infiniband to exchange information and more important I/Os, but more about that in a second. The following picture by the way hows a FA-420 array.

FA420

Both controllers are always active and so are all front end ports across both controllers. No matter where your application hosts will drop its I/Os the corroding node will take those and one of the nodes at some point will write them down to flash. This allows attached application hosts to simply send I/Os round robin across all available ports.

This is how it looks from a vSphere/ESXi perspective

ActivePaths

Currently the hardware is a classic scale-up approach which allows to non-disruptively upgrade controllers as well as storage capacity.

Basic setup & Web Interface

The setup is done via cli and also takes just a few minutes. All you have to do is to run a single command per controller and then follow the setup dialog. From this point on the array can be completely managed via a fast and simple to use web interface. In my option they did a good job, since it’s really intuitive and you don’t need any special instruction. For example this is how the hardware monitoring looks like which makes it easy to find faulty components.

PureSystemHealth

 

IO Path, Data Optimization and Protection

Before data finally gets written to flash it will be cached in in two independent  NV-RAM cache devices that reside along with the flash drives in the external chassis. I/Os have successfully hit both devices, before the I/Os will be acknowledged back to the application hosts. If the system gets deployed with two external chassis, the total number of NV-RAM cards will be four.

PureChassis

This design not only enable stateless controllers, more important it has the advantage that I/Os that are already acknowledged are still residing it very fast cache and can now be optimized before finally get written to the flash drives.

Even if Pure also leverages standard MLC based SSDs of the shelf, they don’t treat them like that. Pure adds all drives into a storage pool that acts are layer of virtualization. This layer is created by the PURITY operating system which is the brain of the system. Once the data is written to NV-RAM, PURITY will in-line process the data (de-duplication, pattern removal and compression) all done at a 512 Byte block size! This layer of virtualization of course also allows thin provisioning but I guess this should be standard on a modern storage system.

A really cool thing is Pure’s RAID3D technology which eliminates the need to make a decision on how a volume should be protected.

PureCreateVolume

There are no RAID mechanisms to choose from, all volumes are protected at all times. The PURITY OS and its virtualization layer divides all flash drives into chunks called segments which are multiple MB in size. Those segments are pooled together and now new volumes can be provisioned out of this pool. So over time as the volumes gets filled with data, the volume will be spread across all flash drives.

As mentioned above, the data at first lands in NV-RAM for a short period of time. At some point the data needs to be written down to flash. PURITY aims to always write full-stripes to maintain performance and to go easy on the flash cells. Before the stripe-write occurs PURITY can determine the optimal RAID geometry just in time depending on the current behavior of the individual drives. So if some drives are busy for whatever reason the next stripe write will happen across less disks. The minim as I understood is always a geometry with a double parity. So for example a RAID6 or multiple RAID5 with a global parity to always be able to withstand a double drive failure.

One interesting thing is the fact that flash devices are great on single workloads like reads or write but when challenging a flash drive with both workloads simultaneously the performance can drop. PURITY can determine if a SSD is busy and because waiting for the device to become ready would increase latency the OS simply gathers the requested blocks by re-building them instead of waiting for a drive.

SSDs usually store a CRC checksum along with each individual page to be protected against bit errors. To ensure that the page not only contains consistent data PURITY but also includes the virtual block address into the checksum to make sure that the right virtual block is being read. Virtual blocks will never be overwritten, as re-writes occur they will be written to free segments. A global garbage collection process will take care of the old & unused segments.

Drive failures are handled in a way that the system instantly starts rebuilding the failed segments that were stored on the drive across the remaining drives within minutes. In the end a failed rive is nothing else than a loss of capacity with no impact to your data and system performance.

More details can be found here.

 

vSphere Integration

The vSphere Webclient integration allows the complete management

vSpherePlugin

And here an example of how Thin Provisioning and VAAI can speed up storage provisioning compared to regular block storage powered by some SSD for 100GB VMDK:

TimeNewDisk

That’s it for part I. In the next part I’ll dive deeper into some features like the use case as DataCore SANsymphony-V backend array. I hope I’ll be able to provide you with some real life values regarding data reduction and some configurations tips.


Final thoughts about my VSAN experiences

To wrap up my VSAN series I want to share my final thoughts with the community. Please feel free to comment and share yours!

All experiences I’ve made with VSAN are only based on my LAB with the minimum deployment of just three hosts and without any real world workload but I would say I’m able to rate the overall impressions I got.

In my case the setup itself was quite simple because I already had a vCenter server running but in case of a green field deployment the provided boot strap process is maybe a bit cumbersome but no big deal.

The policy based management in general is really pleasant and offers flexibility to assign different policies to different workloads or even different VMDKs on a single VM.

The way VSAN handles problems/outages is good but it also has the potential to cause some trouble if you don’t follow the recommendation to set a proper host isolation response. Please see my “Working with VSAN – Part II” post for details.

The lack support for stretched cluster deployments and large 62TB VMDKs is a bit disappointing but I hope it won’ take too long until these features make it into the product.

From a performance perspective I won’t rate it without having any real experience from a productive environment but I can rate the way it scales, which is quite nice. I would always recommend to select a chassis which allows future SSD/HDD expansions. Personally I favor the Dell PowerEdge R720 XD which offers support for up to 24 HDDs, redundant SD card to install the hypervisor, sufficient computing recourses and enough slots to add HBAs, RAID controllers or Flash cards. I think really important is the ability to add hosts which do not contribute storage to the VSAN cluster. In my lab I was not able to feel the difference between a VM running on a host with or without a local copy of the VM data.

But please be realistic, if your “working set” doesn’t fit into the SSD cache and I/Os need to be served from disk(s), this can impact the application performance. Many people I’ve talked to were wondering why VMware doesn’t let customers the choice to use RAIDs instead of single disk drives to speed up disk operations. I don’t know if there is a technical reason behind this requirement or just the vision of using storage in a more efficient way.

When it comes to networking probably even if 1 GbE will be sufficient for smaller deployments but you also have the ability to mitigate potential bottlenecks by using multiple network adapters to handle the VSAN workload.

I’ve also talked to some VMware folks who don’t see VSAN as a 1:1 replacement for classic SANs yet. In the end we agreed that it heavily depends on the planned use case and the expected workload. Probably a huge IO monster database with hundreds of GB or even TB is not the best use case for VSAN, just keep that in mind.

However I indeed see it for customers which are running smaller environments with reasonable workloads to “replace” entry level SAN solutions. The huge benefit is the simplified management which enables admins to work in their well-known environments like Ethernet networking and vSphere.

But all that glitters is not gold. What really annoyed me was a problem with the VM Storage Policies or actually with the VSAN Storage Providers. There is a known Issue with vSphere 5.5 Update 1. In my opinion this is not supposed to happen when releasing an update and making such a hyped solution GA. To cut some corners and to speed up fixing the issue I moved all my hosts to a new VMware vCenter Server Appliance, which was no problem for VSAN itself.

So overall I really enjoyed working with VSAN and now I feel comfortable to recommend it to customers if it fits into the environment and it matches the expected workloads. This is important for me personally because I think you should always stand up for a solution you sell to a customer.


Working with VSAN – Part IV

To continue my “Working with VSAN” series, this time I want to challenge the scalability (at least what was possible within my lab). But see yourself.

Performance scaling by adding disk groups

To see how VSAN scales when adding disks I did the following tests:

IOMeter @ 32 QD, 75/25 RND/SEQ, 70/30 R/W, 8KB in combination with different disk group and VM storage policy settings. But it’s actually not about the total values or settings, it’s about to show the scalability. Not to mention that the SSD used are pretty old (1st GEN OCZ Vertex) and differ in performance!

RUN1

Failures to tolerate (FFT): 0

VMDK Stripe: 1

FTT0_ST1

FTT0_ST1_IOM

RUN2

Failures to tolerate (FFT): 0 – So still on one host…

VMDK Stripe: 2 – … but on two disk groups!

FTT0_ST2 FTT0_ST2_IOM

To be able to combine multiple stripes like shown above with FFT > 0, you will need multiple disk groups in each host to get the performance. In my case I just got a single host with two disk groups, so I was not able to perform the same test with a FFT = 1 policy.

Changing VM Storage Policies

To wrap up this post I want to mention that during my tests I’ve always used the exact same VMDK and so I had to change the policy multiple times. Of course it took some time till VSAN moved the data around to that it was compliant with the policy. But it worked like a charm and I though it is also worth mentioning!

But what about the network?

Multiple VMKernel Interfaces 

In case you are planning to run VSAN over a 1GbE network (which is absolutely supported), multiple VMKernel interfaces could be a good way to mitigate a potential network bottleneck. This would enable VSAN to leverage multiple 1GbE channels to transport data between the nodes.

To be able to use multiple VMKernel ports you will need to keep in mind to use different subnets for each VMK and to always set a single vmnic as active and all others to stand-by.

ActStbyNic 2VSANVMK

To see how this would scale I moved a virtual machine to a host who had no data stored locally of that particular VM, so that all the reads (& writes) had to traverse the network.

2VMsWithLocalwitnessI also had setup the VSAN networking a couple of day before so I started with the desired multi VMK setup und were quite happy with the results.

FTT1_ST1_esxtop_2VMK FTT1_ST1_IOM_2VMK

Then I disabled VSAN on the second VMKernel and also moved the vmnics down to be in stand-by only. The result were as expected, VSAN were just using a single vmnic.

FTT1_ST1_esxtop_1VMK FTT1_ST1_IOM_1VMK

 To verify these results I wanted to switch back to the multi VMKernel setup but for some reason I wasn’t able to get it back to work again. I moved the vmnic up to be active again (as depicted above) and re-enabled VSAN traffic on the second VMKernel interface (VMK4). But since then I was unable to see VSAN traffic across both NICs again. When I disable VSAN traffic on the first VMKernel (VMK3) it switches to the second interface (VMK4) which tells me that the interfaces are generally working. At this point I’m a bit clueless and asking you guys, have you already tried this setup? What are your results? Am I missing something or did I misunderstood something? Are there any specific scenarios where the multi VMK kicks in? I would love to get some feedback!