Measuring Metrics: Log Analytics vs Azure Metrics – Part 3 Azure Metrics

In the first part of this series, we looked at some of the data we can collect through Azure Monitor Logs (aka Log Analytics), in particular, performance metrics.

Now, we’re going to explore Azure Metrics to compare.

 

Overview

To give you a quick high-level overview of Azure Metrics, it’s capable of supporting near real-time monitoring scenarios and ideal for alerting quickly on issues. Some other useful things you can accomplish with Azure Metrics include:

  • Compare metrics from different resources
  • Use the charting feature to create custom dashboards
  • Combine multiple sets of data in an interactive report by creating a workbook
  • Create automated actions (ie. scaling) based on metric thresholds
  • Access metrics via PowerShell, CLI, or REST APIs

 

Where do metrics come from?

It’s important to understand where metrics are collected from because that helps us to understand what we can and cannot expect to collect (in comparison to Azure Monitor Logs).

Based on Microsoft’s official documentation, there are three fundamental sources of metrics collected by Azure Monitor.

  • Platform
  • Guest OS
  • Application

 

Platform metrics

Briefly, platform metrics provide us with health and performance information from, well, the platform. These are the Azure resources. And, it’s important to know that each different type of Azure resource produces its own set of metrics. It’s also important to know that this all accomplished without any additional configuration required! So that means, if you stood up a bunch of resources, but are not using Azure Monitor Logs, you would still be able to obtain these metrics. For a reference list of what metrics are provided for what resources out-of-the-box, check out the supported metrics with Azure Monitor documentation.

Also note for comparison, that platform-level metrics are collected from the various Azure resources at one-minute intervals.

 

Guest OS metrics

With Guest OS metrics, it’s sort of obvious in the name but, these are metrics collected from the Operating System of a Virtual Machine. You need to do this, though, by enabling the Azure Diagnostics Extension (which will be slightly different if you’re running Windows vs Linux VMs).

When using this VM Extension, you are able to collect more than just performance information (like we get natively with the platform metrics).

For example, if you’re using Windows, you can collect the following types of data:

  • Performance counter metrics
  • Application logs
  • Windows Event logs
  • .NET EventSource logs
  • IIS Logs
  • Manifest-based ETW logs
  • Crash dumps (logs)
  • Custom error logs
  • Azure Diagnostic infrastructure logs

An interesting point about the Guest OS metrics and the Azure Diagnostics Extension is that it can send this data to either an Azure Storage account, Application Insights, or Event Hub.

You can also send this collected data to Azure Monitor, but with the following caveats:

  • It is only applicable to performance counters (not the other counters you can collect)
  • The performance counters sent will be recorded as custom metrics
  • The feature is currently in preview

So, while it’s a nice option to send guest-OS metrics to Azure Monitor to facilitate graphing/charting, alerting, etc. you’ll have to also use another method to do the same for the other data types (until Microsoft makes those integrate-able).

 

Application metrics

Finally, we have application metrics. These metrics are retrieved through Application Insights and produce you with performance data to help detect issues and trends. It can collect information on the following data points:

  • Request rates, response times, and failure rates
  • Dependency rates, response times, and failure rates
  • Exceptions
  • Page views and load performance
  • AJAX calls from web pages
  • User and session counts
  • Performance counters
  • Host diagnostics from Docker or Azure
  • Diagnostic trace logs from your app
  • Custom events and metrics

This is a developer-focused monitoring tool, so we won’t go into depth on it here. But, if you’re looking for more information, check out the following article: What is Application Insights?

 

Metric retention

A quick note on metric retention, as this is the crux of it.

Per the official Microsoft documentation,

  • Most Azure resource metrics are stored for up to 93 days
  • Classic guest OS metrics are only retained for up to 14 days
  • Application metrics are stored for 90 days

Azure Metrics

Since we’re trying to compare apples-to-apples, the metrics that are made available in Azure Monitor Logs are for Virtual Machine performance. So, we’re going to use the same object reference here.

In Azure Metrics, when you select a Virtual Machine as the target resource, notice that the default metric namespace selected is ‘Virtual Machine Host’. This could be misleading if you’re trying to compare Azure Metrics with Azure Monitor Logs since the list of Metrics available (shown below) doesn’t seem to match.

Azure Metrics – VM Host – Metrics

But, if you change the Metric Namespace to ‘Guest (Classic)’ you will see that the list of Metrics available that’s similar to what’s available in Azure Monitor Logs.

Azure Metrics – VM Guest – Metrics

A quick note on the differences here. The ‘Virtual Machine Host’ metrics are selected by default because those metrics are made available from Azure out-of-the-box (as in, they are automatically collected, and then surfaced for Azure consumers).

The ‘Guest OS (Classic)’ metrics are only collected if you enable Diagnostics on the VM.

 

The Counters

So, understanding all of that about Azure Metrics, what are the actual counters that are exposed/made available?

Here is what’s currently available, grouped by category:

Counter CategoryAzure Metrics (Guest OS)
CPU\Process(_Total)\Handle Count
\Process(_Total)\Thread Count
\Process(_Total)\Working Set
\Process(_Total)\Working Set - Private
\Processor Information(_Total)\% Privileged Time
\Processor Information(_Total)\% Processor Time
\Processor Information(_Total)\% User Time
\Processor Information(_Total)\Processor Frequency
\System\Context Switches/sec
\System\Processes
\System\Processor Queue Length
\System\System Up Time
Memory\Memory\% Committed Bytes In Use
\Memory\Available Bytes
\Memory\Cache Bytes
\Memory\Committed Bytes
\Memory\Page Faults/sec
\Memory\Pages/sec
\Memory\Pool Nonpaged Bytes
\Memory\Pool Paged Bytes
Disk\LogicalDisk(_Total)\% Disk Read Time
\LogicalDisk(_Total)\% Disk Time
\LogicalDisk(_Total)\% Disk Write Time
\LogicalDisk(_Total)\% Free Space
\LogicalDisk(_Total)\% Idle Time
\LogicalDisk(_Total)\Avg. Disk Queue Length
\LogicalDisk(_Total)\Avg. Disk Read Queue Length
\LogicalDisk(_Total)\Avg. Disk sec/Read
\LogicalDisk(_Total)\Avg. Disk sec/Transfer
\LogicalDisk(_Total)\Avg. Disk sec/Write
\LogicalDisk(_Total)\Avg. Write Queue Length
\LogicalDisk(_Total)\Disk Bytes/sec
\LogicalDisk(_Total)\Disk Read Bytes/sec
\LogicalDisk(_Total)\Disk Reads/sec
\LogicalDisk(_Total)\Disk Transfers/sec
\LogicalDisk(_Total)\Disk Write Bytes/sec
\LogicalDisk(_Total)\Free Megabytes
NetworkNone

 

Let’s wrap everything up with a few simple comparisons.

%d bloggers like this: