To follow up my last post about different storage technologies and approaches, I want to dive deeper into one of those. This time it’s all about server side caching which is rather new in the way it now can be implemented in virtual environments and how it can accelerate a broad range of applications.
Server side caching technologies leverage local resources right inside the hypervisor like SSDs, PCIe Flash cards or even RAM to drive down application (VM) latency which in the end is the key performance metric that really matters. The idea is to transparently plug into the virtual machines I/O path and to intercept their I/Os, write them to Flash or RAM and acknowledge the I/O right back to the VM. So virtual machines see real Flash/RAM performance and latency.
Since this technology hasn’t gained as much as awareness it actually deserves, I think it’s worth it to point out some of the key benefits:
Decouples virtual machine performance (IOPS/Latency) from primary storage
I have no clue how much time I’ve spent just playing with spindle counts, RAID variants, vendor discussions just to hit the requires IOPS for new storage systems. Even with a Storage Tiering approach and Flash as Tier 1 it still can be a challenging because all of a sudden the IOPS/GB ratio between different storage tiers comes into play. How much Tier 1 do I need? Should I go with just two Tiers or will my pool be big enough to create a third Tier 3 with slow SATA disks and so forth.
This is especially important for SMB customers, who often have to use basic block storage systems with rather low spindle counts. I’ve seen those system to become a bottle neck very fast because people tend to underestimate the I/O demand of applications like SQL, Exchange, etc. even in smaller environments.
Allows to freely choose which Flash devices to use
I’ve seen storage vendors using of the shelve SSDs but charging 5 times more than the actual street price. No doubt that people still hesitate to pay this price! But this approach lets you choose what to plug into your host. You can choose the SSD vendor, capacity with the best $/GB or IOPS/GB. This can significantly drive down the costs to implement flash. If you go with devices from the OEM of your existing server equipment the price will certainly be higher, but you don’t need to worry about compatibility, RMA processes etc. basically you get the services/SLAs you are used to.
Efficient & Fault Tolerant
Since the local device don’t need to be RAIDed you don’t lose as much capacity as in external storage systems. Only in case the write back cache is enabled, the cache software has to replicate every I/O to a second (or even third) host within the cluster to ensure fault tolerance against host and device failures. If you use the layer as read cache only, there is basically no loss and the full capacity can be used to cache I/Os.
It can scale out
If you add additional hosts to a virtual environment you CAN add them with or without local flash devices. If your current hosts are sufficient to run your most critical enterprise applications and they are already accelerated you don’t necessarily have to add flash to those new hosts but you can of course! Just keep failover scenarios in mind. And in case you want every VM to be accelerated you can easily scale out by adding new hosts with local flash devices (or additional RAM).
Management & Ease of Use
All solutions I’ve seen so far did a very good job in terms of usability and integration into VMware’s vSphere & Web Client. So setup and management don’t cause a big overhead and don’t require a special training to be able to install and use the product.
However you may ask yourself “What about my primary storage system and the cache software, will they get along? And how about all the dirty data inside the cache?”
Of course this in turn means that the local caching devices are full of “dirty data” and the caching software is responsible to take care of it in terms of fault redundancy (I/O replication between nodes) and to commit (de-stage) those I/Os over time to a persistent storage system.
This has can have a huge positive impact to your primary storage, simply because it can significantly reduce the total number of IOPs your SAN or NAS has to process. Even if the caching layer acts just as read cache, due to the size of Flash devices the data (individual blocks or content) can reside there way longer than in just some GB of NVRAM inside the storage array.
If also the write back cache functionality is being used the caching software can eliminate redundant writes of the same block and only commit the latest version of a block down to the array. Both scenarios can significantly reduce the total number of IOPS which allows some conclusions:
- Bring primary storage arrays down to a healthy state in case it’s already at its limits
- Get more out of existing assets
- And even if some folks don’t like this one, consider smaller arrays due to the reduced number of IOPS it has to process
But nothing comes for free. The caching software needs to be purchased, no doubt about the Flash devices and probably, if your hosts are just equipped with SD cards (diskless config.), a proper RAID controller will also be required. RAM can really help here, so don’t rule it out too quickly.
In my opinion Server Side Caching is a smart way to speed things up and to create room when it comes to choosing a new primary storage, since IOPS & latency is not longer your primary concern. Guess the storage folks don’t like to hear this but if SSC gets enough awareness and a growing install base it can really steal a big peace of the storage market cake.
That’s it for now. I’m planning to follow this one up with two additional posts. A hands-on deep dive on one of the current solutions available on the market as well as a post about the question “server side caching vs. All Flash Arrays” so stay tuned.