My take on All Flash Arrays vs. Server Side Caching 2

This post is about my personal view as System Engineer to the question “All Flash Arrays (AFA) vs. Server Side Caching (SSC)”. A while ago somebody asked me exactly this question and to be honest I had no proper answer back then, simply because I have never thought about it! So I’ve spent some time thinking about it and this is the result.

One of the first factors I’ve had in mind were costs. No matter how cool, innovative or trendy a technology might be, it always comes down to costs. Even if I think AFA in general are a cool stuff, they still have their price. You basically pay for hardware (SSDs + controllers), for the software (intelligence) that powers the array and of course for a maintenance contract. The point is how much capacity & performance you will get for this price? You usually get rather low physical capacity and you have to rely on data reduction technologies and a good reduction ratio to get as much as possible logical space out of that box. Don’t get me wrong, that’s the way it is because otherwise All Flash Arrays would be even more expensive.

In my opinion the SSC approach sets the lower hurdles to get access to the technology. Some reasons might be:

  • No need for dedicated controllers  (even tough for local RAID controllers in the hypervisor if not already in place) which reduces initial investment costs
  • SSDs of your choice and I think this one can make a big difference. Unfortunately I’ve seen vendors charging 5 times more for a single SSD than the actual street price!
  • A single SSD (not even for all hosts) is sufficient to start with
  • Less rack space, power consumption and cooling

In turn you could argue against a SSC approach because it doesn’t provide any persistent storage. But because a caching layer can reduce the total number of IOPS my primary storage has to process, I can scale down I/O requirements which again saves money. And it definitely scales down better to SMB customers which have rather small environments and tight budgets.

Licensing is an interesting point, many all Flash vendors offer an all-inclusive bundle compared to annual, capacity or host based licensing offered by the SSC provider. So with a growing number of hosts the AFA pricing gets more interesting.

Since both solutions can accelerate reads AND writes, especially writes need some special attention to avoid data loss. Server Side Caching “simply” replicates IOs to other Flash devices within the cluster to withstand host and device failures. So this could be seen as network RAID10, assuming two copies of a particular block. Whereas AFAs usually use some form of dual parity RAID to distribute data across SSDs and to protect your data. This is pretty efficient if you ask me, as long we are talking about non-stretched clusters.

For some businesses 10GbE could be a problem, but from what I’ve seen there are also solutions available which optimize replication traffic (when using write back caching) to be able to run even on 1GbE. Going with an AFA will also require having a proper 10GbE or 8Gbs + Fibre Channel SAN in place to avoid bottle necks. A plus for AFAs is that 8GBs FC SANs are more common than 10GbE Ethernet networks, at least in the German SMB market.

When it comes to data placement the SSC can help to accelerate a broad range of workloads; basically all virtual workloads could be speed up. Depending on the number of VMs per host, this of course would require multiple SSDs or PCIe based Flash devices to provide sufficient caching space and performance.

An All Flash Array on the other hand needs a decision which workload should be moved over to the new array, because I assume not many people will be able to replace their current SAN completely with an All Flash solution anytime soon. Another option for AFAs would be to be placed underneath the umbrella of a storage virtualization layer like DataCore’s SANsymphony-V with the Automated Storage Tiering feature. Even if SSV has pretty good storage tiering implementation which can move blocks within minutes rather than just once a day, a block accessed on a Tier > 1 will be considerably slower than from flash. And to be a moved up to a faster tier, a block needs a certain number read/write hits to heat up and to be finally moved. That sounds probably a bit too negative, but it’s just the way a storage tiering approach works.

When expanding your infrastructure scalability is an important factor. There are all Flash solutions with shared nothing architectures which scale out very well or other approaches which allow transparently scaling up without any impact. But that’s why I’ve said in my previous post, AFA =! AFA details matter!

The SSC approach causes less overhead like zoning, LUN masking when scaling out and doesn’t require any rack space and less power and cooling. But it could require downtime of the host depending if hardware changes like a new RAID controller are required. But in times of vMotion this isn’t really a problem. Scaling the SSC layer offers more granularity since I can slowly scale by purchasing a certain number SSDs of the shelf and licenses, whereas scaling out the AFA means buying a completely new array/block.

And what about the impact of a failure? Both approaches will be able to handle drive and host/controller failures. An outage of a dual controller array is rather unlikely but can happen; I’ve seen things like that before. But also kernel driver/server side software can go crazy but this will most likely only impact the host it’s running on and not all connected hosts. So SSC compared to a dual controller AFA is less vulnerable to impact big portions of the production workload. Compared to a scale-out (shared nothing) design it’s quite similar.

Host overhead is there but rather minimal. When comparing an AFA, which of course causes no overhead, to a SSC solution based on a kernel module. Especially people here in Germany tend to be more conservative when it comes to overprovisioning, so many environments I work with are usually < 20% CPU utilization. Usually RAM is the number one resource than runs out very quickly. So if you ask me it’s important to have a solution which lets you chose between RAM or Flash based devices as caching medium.

As always I’m sure there are folks that won’t agree with some of my arguments, so feel free to comment and share your thoughts!

Print Friendly, PDF & Email

Related Post