In the last couple of days I got the chance to get my hands on a real All Flash Array from Pure Storage. Why real AFA you may ask? In my opinion an all flash solution is more than just putting a bunch of flash drives into a chassis as some vendors tend to do, because things that work well with spinning disks do not necessarily have to work out with flash drives. And why Pure? Simply because they convinced us with their product which not only offers some hardware but also an intelligent software and combined together it makes a really cool solution. The following post should provide you with a brief summary of the key features as well as some hands on experience.
The hardware is rather simple, depending on the controller model you chose you get two 1 or 2U controller nodes plus SAS attached chassis holding the flash drives. The installation is quite simple and well documented, it took just a few minutes. As far as I can tell the hardware is based on a major OEM which provides high quality and reliable hardware, so I’m absolutely fine with that. Both controllers are connected via redundant 56 Gb/s Infiniband to exchange information and more important I/Os, but more about that in a second. The following picture by the way hows a FA-420 array.
Both controllers are always active and so are all front end ports across both controllers. No matter where your application hosts will drop its I/Os the corroding node will take those and one of the nodes at some point will write them down to flash. This allows attached application hosts to simply send I/Os round robin across all available ports.
This is how it looks from a vSphere/ESXi perspective
Currently the hardware is a classic scale-up approach which allows to non-disruptively upgrade controllers as well as storage capacity.
Basic setup & Web Interface
The setup is done via cli and also takes just a few minutes. All you have to do is to run a single command per controller and then follow the setup dialog. From this point on the array can be completely managed via a fast and simple to use web interface. In my option they did a good job, since it’s really intuitive and you don’t need any special instruction. For example this is how the hardware monitoring looks like which makes it easy to find faulty components.
IO Path, Data Optimization and Protection
Before data finally gets written to flash it will be cached in in two independent NV-RAM cache devices that reside along with the flash drives in the external chassis. I/Os have successfully hit both devices, before the I/Os will be acknowledged back to the application hosts. If the system gets deployed with two external chassis, the total number of NV-RAM cards will be four.
This design not only enable stateless controllers, more important it has the advantage that I/Os that are already acknowledged are still residing it very fast cache and can now be optimized before finally get written to the flash drives.
Even if Pure also leverages standard MLC based SSDs of the shelf, they don’t treat them like that. Pure adds all drives into a storage pool that acts are layer of virtualization. This layer is created by the PURITY operating system which is the brain of the system. Once the data is written to NV-RAM, PURITY will in-line process the data (de-duplication, pattern removal and compression) all done at a 512 Byte block size! This layer of virtualization of course also allows thin provisioning but I guess this should be standard on a modern storage system.
A really cool thing is Pure’s RAID3D technology which eliminates the need to make a decision on how a volume should be protected.
There are no RAID mechanisms to choose from, all volumes are protected at all times. The PURITY OS and its virtualization layer divides all flash drives into chunks called segments which are multiple MB in size. Those segments are pooled together and now new volumes can be provisioned out of this pool. So over time as the volumes gets filled with data, the volume will be spread across all flash drives.
As mentioned above, the data at first lands in NV-RAM for a short period of time. At some point the data needs to be written down to flash. PURITY aims to always write full-stripes to maintain performance and to go easy on the flash cells. Before the stripe-write occurs PURITY can determine the optimal RAID geometry just in time depending on the current behavior of the individual drives. So if some drives are busy for whatever reason the next stripe write will happen across less disks. The minim as I understood is always a geometry with a double parity. So for example a RAID6 or multiple RAID5 with a global parity to always be able to withstand a double drive failure.
One interesting thing is the fact that flash devices are great on single workloads like reads or write but when challenging a flash drive with both workloads simultaneously the performance can drop. PURITY can determine if a SSD is busy and because waiting for the device to become ready would increase latency the OS simply gathers the requested blocks by re-building them instead of waiting for a drive.
SSDs usually store a CRC checksum along with each individual page to be protected against bit errors. To ensure that the page not only contains consistent data PURITY but also includes the virtual block address into the checksum to make sure that the right virtual block is being read. Virtual blocks will never be overwritten, as re-writes occur they will be written to free segments. A global garbage collection process will take care of the old & unused segments.
Drive failures are handled in a way that the system instantly starts rebuilding the failed segments that were stored on the drive across the remaining drives within minutes. In the end a failed rive is nothing else than a loss of capacity with no impact to your data and system performance.
More details can be found here.
The vSphere Webclient integration allows the complete management
And here an example of how Thin Provisioning and VAAI can speed up storage provisioning compared to regular block storage powered by some SSD for 100GB VMDK:
That’s it for part I. In the next part I’ll dive deeper into some features like the use case as DataCore SANsymphony-V backend array. I hope I’ll be able to provide you with some real life values regarding data reduction and some configurations tips.