To kick-off this series of short blog posts about individual Architect features, I’ve picked the “Drill-Down” charts as they count towards my personal favorite features.
Most of you will be familiar with the saying “All roads lead to Rome” and with Architect we can turn this into “ All roads lead to a virtual Machine”. But why is this important?
Most folks working in virtualization & storage are familiar with complaints like “Hey – my SQL server is very slow today”. That’s usually the trigger for admins trying to figure out whats wrong with the machine or the underlying infrastructure.
Is the problem even storage related? In case CPU & Memory utilization doesn’t indicate a problem, it is probably worth it to investigate the storage performance next. There could be many reasons why a certain virtual machine is suffering from bad storage performance. Maybe it’s just a de-duplication or RAID rebuild task on the array? Or is a rouge VM (noisy neighbors) killing the storage performance? Or even a more complex problem, like a change of the I/O pattern of certain application.
Looking at the storage array or a particular LUN probably won’t provide you with the required level of detail, due to the lack of VM-awareness. And often there is a discrepancy between the latency the array sees and what the virtual machines are actually experiencing.
With the PernixData Architect Software you can easily narrow down the problem on a VM level.
You could start by looking at the individual VM which has been reported, or with a high level view on the cluster, to check if there are any spikes in terms of latency, IOPS or throughput.
If you discover a spike, you for sure want to investigate and “drill-down” into the problem. All you have to do is to click the peak directly on the graph.
One important thing to point out here is that the peak on a cluster level, which represents aggregated data for all VMs incl. read & write latency, was just around 20.6ms. But on a VM Level for the VM who was suffering at that point in time, the peak was actually above 50ms for the read IOPS. A simple mouse over on the latency bar will also tell you the number of IOPS that were related to that high latency. This is why it is important to get a detailed view on a virtual machine level to understand what is really going on.
To verify that there is no rouge VM overloading the array, simply do the crosscheck and go to the IOPS & Throughput charts. Using these charts, it has never been easier to identify noisy neighbors, wish I had these in my admin & consulting days.
A first look at the Throughput chart could give a first indication what caused increased latency, but more on that in the next post.