Defining Alert Thresholds

Defining alert thresholds involves establishing specific criteria for when an alert should be triggered in the monitoring system, indicating potential issues or anomalies within the application or infrastructure.

Goal

The primary goal is to enable proactive identification and resolution of issues, minimising the impact on users and ensuring the system operates within its desired parameters.

Context

Things will go wrong in production. We need to be able to quickly identify and address these issues to minimise their impact on users and the business.

Threshold Types

TypeDescription
User Behaviour ThresholdsCriteria based on user interactions and behaviour, such as session length and conversion rates.
Performance ThresholdsCriteria based on application performance metrics, such as response times and throughput.
Resource Utilisation ThresholdsLimits set on the usage of system resources, like CPU, memory, and disk space.
Error Rate ThresholdsDefined levels for acceptable error rates within the application's operations.
Cost ThresholdsLimits on cloud resource costs to manage and optimise spending.

Inputs

ArtifactDescription
Realtime Application Performance DataData collected from the application's monitoring tools, providing insights into performance and usage.
Service Level Objectives (SLOs)Agreed-upon performance and reliability targets for the service.

Outputs

ArtifactDescriptionBenefits
Automated AlertsConfigured alerts based on defined thresholds, triggering notifications to stakeholders when breached.Enables timely detection and notification of issues, facilitating quick response.

Anti-patterns

  • Over-Alerting: Setting thresholds too sensitively, leading to frequent, often unnecessary alerts that cause alert fatigue. There will be natural fluctuations in the system, and not all of them are indicative of a problem.

Was this page helpful?

Previous
Instrumenting the Product
© ZeroBlockers, 2024. All rights reserved.