Alert policies define which events will trigger an alert, the conditions under which an alert will be sent, and how the alert will be sent.
Currently, alerts can be triggered by job run failure/success, schedule/sensor tick failure, and agent downtime.
Job run based alert policies include a set of configured tags. If an alert policy has no configured tags, all jobs will be eligible for that alert. Otherwise, only jobs that contain all the tags for a given alert policy are eligible for that alert.
Alert policies created for schedule/sensor tick failure will apply to all schedules/sensors. However, you will only receive alerts when the schedule/sensor changes from a state of succeeding to failing, so subsequent failures will not trigger new alerts.
Agent downtime alert policies will trigger when a Hybrid agent stops heartbeating.
Alert policies are configured on a per deployment basis. For example, alerts configured in a prod deployment are only applicable to jobs in that deployment.
Currently, Slack and email notifications are supported.
From the Alert Policy type drop-down, select Run alert, Schedule/Sensor alert, or Agent downtime alert.
In the Create alert policy window, fill in the following:
Alert policy name - Enter a name for the alert policy. For example, slack_urgent_failure
Description - Enter a description for the alert policy
For run-based alerts, fill out these additional options:
Tags - Add tag(s) for the alert policy. Jobs with these tags will trigger the alert. For example: level:critical or team:sales
Events - Select whether the alert should trigger on job success, failure, or both
Notification service - Select the service for the alert policy:
Slack - If you haven't connected Slack, click Connect to add the Dagster Cloud Slack app to your workspace. After the installation completes, invite the @Dagster Cloud bot user to the desired channel.
You can then configure the alert policy to message this channel. Note: Only messaging one channel per alert policy is currently supported:
Email - Email alerts can be sent to one or more recipients. For example:
In this example, we'll configure a Slack notification to trigger whenever a run of a job succeeds or fails. This job, named sales_job, has a team tag of sales:
In the alert policies YAML file, we'll define a policy that listens for jobs with a team tag of sales to succeed or fail. When this occurs, a notification will be sent to the sales-notification channel in the hooli workspace:
alert_policies:-name:"slack-alert-policy"description:"An alert policy to send a Slack notification to sales on job failure or success."tags:-key:"team"value:"sales"event_types:-"JOB_SUCCESS"-"JOB_FAILURE"notification_service:slack:slack_workspace_name:"hooli"slack_channel_name:"sales-notifications"
In the alert policies YAML file, we'll define a policy that listens for jobs with a level tag of critical to fail. When this occurs, an email notification will be sent to richard.hendricks@hooli.com and nelson.bighetti@hooli.com:
alert_policies:-name:"email-alert-policy"description:"An alert policy to email company executives during job failure."tags:-key:"level"value:"critical"event_types:-"JOB_FAILURE"notification_service:email:email_addresses:-"richard.hendricks@hooli.com"-"nelson.bighetti@hooli.com"
For a job, alert emails can be configured by setting the dagster-cloud/alert_emails tag on a job. When a job run fails, a notification will be sent to the alert emails.
In this example, we've defined two alert emails for the important_job job: richard.hendricks@hooli.com and nelson.bighetti@hooli.com. On run failure, these two emails will be sent a notification:
from dagster import job, op
from dagster_cloud import ALERT_EMAILS_TAG
@opdefimportant_computation():...@job(
tags={
ALERT_EMAILS_TAG:["richard.hendricks@hooli.com","nelson.bighetti@hooli.com",]})defimportant_job():
important_computation()
When creating an alert policy using the CLI, only certain event_types can be specified together. You can specify multiple job run-based event types together (JOB_SUCCESS, JOB_FAILURE), or a tick-based event type (TICK_FAILURE), but attempting to mix these will result in an error.