My personal adventures in the quest for virtual perfectness.
Using Super Metrics in VCOPS is quite common and well documented. There are also nice blog posts (http://www.batchworks.de/using-super-metrics-to-monitor-cpu-ready-part1/ and http://www.virtualclouds.co.za/?p=254). When creating a report or a dashboard, I like to show the metric in percentage. Percentage is nice as you know, since the human mind instantly sees this value as 'wow, that's high or hmm, that's pretty low' since the range is between 0 and 100 most of the time.
VCOPS includes a lot of percentage metrics already. However, if you look at for example CPU Co-stop, System, Idle - they are only available in milliseconds (ms). Now there is a small catch you'll need to take into account which I'll show you in this blog post.
Here is a first example: I want to get the System (Kernel time) from this Windows machine:
As you can see, the system time is quite high, almost as high as CPU usage, which is around 35-40%.
How does this plot inside vCenter? In vCenter System time is in milliseconds, like this:
And here how it plots in VCOPS:
Given the fact the real time counters are updated every 20 seconds (which is 20.000 milliseconds). 7.800 out of 20.000 is (7800/20000*100=) 39% which is the same % we see inside the Windows VM, great.
Now let's create a Super Metric. In the vcops-custom ui go to 'Environment, Advanced, Super Metrics, Super Metric Editor...'. Create a new one. In the Resource Kinds find Virtual Machine. Click on it, open CPU Usage in the screen below, look up System (ms). Click on the 'This' button on the top, and double click System (ms). Then show the formula description. Your screen should look like something like this:
NOTE: the A818 can be different in you VCOPS instance! That's why it is important to use the steps I described above.
However, trying to save this gives an error:
"Formula expression is not valid. Cannot convert number array to number."
Now that makes sense right? Why? Well, as I showed you in one of the previous screenshots, we've got multiple instances. Let's see how many we've got. Use the count() to get the number of instances for this metric on a VM.
Here is an example from another VM with 4 vCPU's:
So the total number of instances is the amount of vCPU's + 1 aggregate set of values.
Now here is the catch: I've seen a lot of people using the sum() formula. So calculating the % would be like:
So you can clearly see this % isn't correct! That's why I see a lot of blog posts and examples with the average (avg), which might work fine if you've got 2 instances:
Much better, however, here is the catch: what if you've got 5 instances (so 4 vcpus)?
Here is an example: vcpu0: 5.000, vcpu1: 0, vcpu2: 0, vcpu3: 0. Aggregate will be 5.000 (5.000+0+0+0). Total will be 10.000. The average will be: 10.000 / 5 (instances) = 2.000. However, it should be 5.000!
So the answer to the correct formula is using sum()/2. To simplify the dividing by 20.000 times 100 (for percentage) divided by 2 = 400. This would work for any amount of vCPU's.
Have fun defining your Super Metrics into percentage calculations!