PernixData Architect – VM Performance Plot

It’s been a while since the last Architect post, but nevertheless here we go with the next one!

The cool thing about many of the Architect features, it doesn’t take hours to explain them, like in this case the so called “VM Performance Plot”. Of course I’ll add give a brief explanation, but once you’ve seen the screenshots, its actually crystal clear.

Many feature are usually associated with a particular problem, and in this case the problem Architect solves: How to easily identify VMs that are not performing as intended or which are issuing more workload than assumed on a single graph. And especially across a big number of VMs and clusters.

Once you are logged into the PernixData UI, select “Architect” in the drop-down menu in the top-left corner and before jumping into a particular cluster stop and click the “VM Performance Plot” tab.

ArchitectStart

Why this graph is not inside the cluster view? Because in case you have multiple vSphere clusters, you will be able get a nice overview across all your clusters and VMs on a single page.

The cool thing is that you can define the x- and y-axis yourself as well as the time range you are interested in. Below the drop-down menu on the right-hand side, you will be able to select the vSphere clusters you want to take a look at.

VMPerformancePlot

This will prove you with a holistic overview across all of your VMs, represented as colored dots on the graph. A mouse over a dot will provide even more insights about the performance metrics for a given VM during the selected time period.

This for example helps to identify VMs suffering from performance issue like high latencies or rogue VMs who are killing the SAN performance by issuing more IOPS/throughput as they are intended to.

I hope will help current and future Architect users to find this “hidden” gem to further improve their virtual infrastructures.

PernixData Architect – Block Size Breakdown

In my last post about the PernixData Architect software I covered the so called Drill-Down charts. At the end of the post I didn’t go deeper into the details what could have caused that latency spike for this particular virtual machine. And this is what this blog is intended to do.

So we saw a latency spike and at the same time an increased throughput. Using the Block Size Breakdown you can now easily understand how IO-profile has changed and how the block size is impacting the overall virtual machine performance.

ArchitectLatencyBlockSize

As you can see on the chart, the virtual machine latency was defined by the time it took to process blocks with >=256K in size.

PernixData Architect can not only show how the latency is being impacted by individual block sizes, but can also breakdown how many IOs & throughput have been issued which a particular block size.

ArchitectIOPSBlockSize ArchitectMBsBlockSize

As I’ve mentioned earlier, all these details are available on a virtual Machine level. These detailed insights about the actual block sizes a VM is issuing can actually help you help on several fronts.

Does my current storage infrastructure incl. all its components provide enough throughput to process those IO sizes in an adequate time?

With FVP in the picture, this helps to select the optimal acceleration media which will offer the best performance for those workloads. For example, a regular SSD will be limited in terms of throughput due the SATA interface, whereas a PCIe Flash card would offer way more bandwidth. Not to speak of the capabilities memory would offer to fix such a problem.

This can also help tremendously to go into more data driven discussion with the corresponding application owners, to find a way to optimize the IO-profile at the application level.

That’s it for now, more features to cover in future posts!

PernixData Architect – Drill-Down Charts

To kick-off this series of short blog posts about individual Architect features, I’ve picked the “Drill-Down” charts, as they count towards my personal favorite features.

Most of you will be familiar with the saying “All roads lead to Rome” and with Architect we can turn this into “ All roads lead to a virtual Machine”. But why is this important?

Most folks working in virtualization & storage are familiar with complaints like “Hey – my SQL server is very slow today”. That’s usually the trigger for admins trying to figure out whats wrong with the virtual machine or the underlying infrastructure.

Is the problem even storage related? In case CPU & Memory utilization doesn’t indicate a problem, it is probably your best bet to investigate storage performance next. There could be many reasons why a certain virtual machine is experiencing bad storage performance. Maybe it’s just a de-duplication or RAID rebuild task on the array? Or is a rouge VM (noisy neighbors) killing the storage performance? Or is it a even more complex problem, like a changed I/O pattern of certain application?

Looking at the storage array or on a particular LUN probably won’t provide you with the required level of detail, simply due to the lack of VM-awareness. And often there is a discrepancy between the latency the array reports and what the virtual machines are really experiencing.

With the PernixData Architect Software you can now easily narrow down the problem.

You could start by looking at the individual VM which has been reported as being slow, or on a high level to get a better overall view across all VMs.

If you discover a spike, you for sure want to investigate it and “drill-down” into the problem. All you have to do, is to click the peak directly on the graph.

ArchitectVMObservedLatency

This will bring up a popup that instantly shows the top 10 VMs experiencing the highest latencies, or issuing the most IOPS, respectively causing the most throughput, depending on the typ of chart you are looking at.

DrillDownLatencyRW

One important detail to point out here is that the peak observed on a cluster level, represents aggregated data for all VMs in the Cluster.

While the spike was just around 20.6ms on a cluster level, this particular virtual machine it was actually observing a latency above 50ms for the read IOs.You can see that this view allows to select a Secondary Breakdown, to get not also the total VM observed latency, but also detailed insights about the individual read & write latency. In addition to that, a simple mouse over on the latency bar will also tell you the number of IOs that were related to high latency.

This is why it is important to get a detailed view on a virtual machine level to understand what individual VM is observing and not only taking for granted, that the VM really sees the exact latency the LUN on the array is reporting.

To verify that there is no rouge VM overloading the array, simply do the crosscheck and go to the IOPS & Throughput charts and check if a VM is issuing an unusual amount of IO.

Using these charts it has never been easier to identify problems like noisy neighbor VMs. I wish I had these in my admin & consulting days.

ArchitectThroughput1.1

A first look at the Throughput chart could give a first indication what caused increased latency, but more on that in the next post.