To kick-off this series of short blog posts about individual Architect features, I’ve picked the “Drill-Down” charts, as they count towards my personal favorite features.
Most of you will be familiar with the saying “All roads lead to Rome” and with Architect we can turn this into “ All roads lead to a virtual Machine”. But why is this important?
Most folks working in virtualization & storage are familiar with complaints like “Hey – my SQL server is very slow today”. That’s usually the trigger for admins trying to figure out whats wrong with the virtual machine or the underlying infrastructure.
Is the problem even storage related? In case CPU & Memory utilization doesn’t indicate a problem, it is probably your best bet to investigate storage performance next. There could be many reasons why a certain virtual machine is experiencing bad storage performance. Maybe it’s just a de-duplication or RAID rebuild task on the array? Or is a rouge VM (noisy neighbors) killing the storage performance? Or is it a even more complex problem, like a changed I/O pattern of certain application?
Looking at the storage array or on a particular LUN probably won’t provide you with the required level of detail, simply due to the lack of VM-awareness. And often there is a discrepancy between the latency the array reports and what the virtual machines are really experiencing.
With the PernixData Architect Software you can now easily narrow down the problem.
You could start by looking at the individual VM which has been reported as being slow, or on a high level to get a better overall view across all VMs.
If you discover a spike, you for sure want to investigate it and “drill-down” into the problem. All you have to do, is to click the peak directly on the graph.
This will bring up a popup that instantly shows the top 10 VMs experiencing the highest latencies, or issuing the most IOPS, respectively causing the most throughput, depending on the typ of chart you are looking at.
One important detail to point out here is that the peak observed on a cluster level, represents aggregated data for all VMs in the Cluster.
While the spike was just around 20.6ms on a cluster level, this particular virtual machine it was actually observing a latency above 50ms for the read IOs.You can see that this view allows to select a Secondary Breakdown, to get not also the total VM observed latency, but also detailed insights about the individual read & write latency. In addition to that, a simple mouse over on the latency bar will also tell you the number of IOs that were related to high latency.
This is why it is important to get a detailed view on a virtual machine level to understand what individual VM is observing and not only taking for granted, that the VM really sees the exact latency the LUN on the array is reporting.
To verify that there is no rouge VM overloading the array, simply do the crosscheck and go to the IOPS & Throughput charts and check if a VM is issuing an unusual amount of IO.
Using these charts it has never been easier to identify problems like noisy neighbor VMs. I wish I had these in my admin & consulting days.
A first look at the Throughput chart could give a first indication what caused increased latency, but more on that in the next post.
Last but not least I would like to point out, that these Drill-Down Charts are also available when reviewing individual VMs in PernixData Architect. So if you are in the process of reviewing a virtual machines latency for instance, you can also drill down into the details, to get a better understanding of how the individual block sizes issued by the VM influenced the latency.