Azure 'Common Alert Schema'

Azure Monitor enables creation of alerts based on a variety of metrics and log search results. This year Azure added the ability to enable a ‘common alert schema’ for notifications when configuring Action Groups. This post aims to explain a bit about what the common alert schema does, including some examples for clarity.

Azure contains a fairly extensive list of metric and log triggers for alerting administrators to almost any situation. Notifications from these alerts are sent to Action Groups which define how and where the alerts are sent. Traditional alerts including SMS, e-mail, and phone are supported if you like immediate messages. Other methods include Webhooks, ITSM for integration to supported systems, and direct integration with other Azure services such as Functions, Runbooks, and Logic Apps. The latter allow for automated alerts and integration with 3rd party systems, or even automated responses and mitigations.

Prior to the common alert schema, each alert type could send messages formatted in a different way. Understandably the information conveyed by an alert on CPU utilization will be different from that for one triggered on a log message - such as VM deallocation. Humans don’t have a problem dealing with these differences when reading an SMS or email message as we can quickly adapt and pull the important information. But if you’re programatically parsing alerts for ingestion by other systems dealing with multiple formats can become a problem. Coding a parser to handle different formats isn’t an insurmountable task, but it presents scale and management problems especially when dealing with new alert types. Even screening alerts for particular information such as the type of alert - is this a security alert or operational event - can be made difficult if the alert formats differ.

Now when you create an Action Group, Azure prompts if you would like to enable the ‘common alert schema’. This breaks the notification into two sections: An ‘essentials’ section that is the same for every type of alert and contains a set of standardized fields, and an ‘alert context’ section that contains additional type-specific details for each alert. For email and SMS alerts this manifests as a common template for all alert types - they will be visibly similar and include the same essentials fields followed by a type-specific context section. Programmatic alerts such as Webhooks, Function, Runbooks, and Logic Apps will receive a JSON message with a common structure, meaning that the object will have common properties across alert types.

An example helps illustrate the difference between old-style notifications and those using the common alert schema. For this example I set up two alerts: A log alert for deallocation of a virtual machine, and a metric alert for average CPU utilization. For each alert I triggered email notifications using both the old style and the new common alert schema. The first set of alerts below show the emails using the old style of alert. Looking closer at the alerts we notice a few problems:

The overall output is visually different - wide vs columnar
There are common fields (Resource ID), but they are not located in the same position
Some fields are similar (the alert that fired), but the fields are named differently and the information presented is different (rule name versus full resource path of the rule)

I haven’t shown a JSON message but readers who have done even light coding will understand the difficulty of preparing for messages of different structure. This requires sometimes complex rules to simply understand the type of message received especially if there is not a common field to identify the message type and discern its structure.

Classic schema

Now let’s look at the same alerts using the common alert schema. Observe how these messages improve readability, especially for machine, through their similarity:

Overall appearance of the messages are the same
The entire top section contains the same fields and format for both messages
These fields contain information that can be used to filter and sort the messages based on a variety of parameters: source, severity, resource type
Type-specific information follows the commonly-formatted essentials, and contains the same alert content as the old-style alerts

Schema comparison Schema view2

You may ask why this feature is “opt-in”? If this is such a great improvement, why is it not automatically enabled for alerts? The answer is: existing functionality. Likely many users already have existing software, automation, and integration built around the old style of notifications. If we suddenly changed the format of these notifications, things are going to break. Such backwards-incompatible changes cost money and time to troubleshoot and fix, drawing the ire of users. Presenting the new style of notifications as opt-in preserves functionality of existing systems until they can be refactored to accommodate the new format.

Hopefully this example helps provide a clearer understanding of impacts of enabling the common alert schema on Azure notifications.

References:

Microsoft Docs: Common Alert Schema

Azure ‘Common Alert Schema’

2019/07/04