When managing a multi-cluster or multi-region Prometheus setup, external labels play a key role in differentiating alerts from different clusters. By using external labels in Prometheus alerts, you can prevent duplication of alerts and provide context such as the cluster, region, or environment from which the alert originated. In this post, we’ll explore how to configure and use external labels in Prometheus alerts.

What Are External Labels?

External labels in Prometheus are a set of key-value pairs added to all metrics scraped from a particular Prometheus instance. They are especially useful in multi-cluster or federated Prometheus setups, where you need to identify the source of alerts or data. These labels are applied automatically to all time series and help differentiate between data from different clusters or environments.

Why Use External Labels for Alerts?

External labels are useful for:

  • Preventing Duplicate Alerts: In environments where multiple Prometheus instances send alerts to the same Alertmanager, external labels help avoid duplicate alerts by identifying the origin of the alert.
  • Providing Context: External labels add important context to alerts, such as which cluster, region, or environment triggered the alert.
  • Federation: In federated Prometheus setups, external labels ensure that metrics from different clusters or regions remain distinct.

Step 1: Define External Labels in Prometheus Configuration

To start using external labels in Prometheus, you need to define them in your prometheus.yml configuration file.

Example Configuration

global:
  scrape_interval: 15s
  external_labels:
    cluster: 'prod-cluster'
    region: 'us-east'

In this example, we’re adding two external labels: cluster and region. These labels will be attached to every metric scraped by this Prometheus instance.

After updating the configuration, restart Prometheus to apply the changes:

sudo systemctl restart prometheus

Step 2: Use External Labels in Alerts

Once external labels are defined in Prometheus, they will automatically be added to any alert generated by the instance. To use these labels in alert conditions or alert messages, you can reference them in your alerting rules.

Example Alert Rule

groups:
  - name: example-alerts
    rules:
      - alert: HighMemoryUsage
        expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "High memory usage on instance {{ $labels.instance }} in {{ $labels.cluster }}"
          description: "Memory available is below 10% on instance {{ $labels.instance }} in region {{ $labels.region }}."

In this alert, the cluster and region labels are referenced in the summary and description fields, providing context for where the alert originated.

Step 3: Configure Alertmanager to Handle External Labels

Alertmanager can route alerts based on external labels, making it easier to handle alerts from multiple clusters or regions. To configure Alertmanager to route alerts based on external labels, edit the alertmanager.yml file.

Example Alertmanager Route Configuration

route:
  group_by: ['alertname', 'cluster']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'slack-notifications'
  routes:
    - match:
        cluster: 'prod-cluster'
      receiver: 'prod-alerts'
    - match:
        cluster: 'dev-cluster'
      receiver: 'dev-alerts'

In this configuration, alerts from the prod-cluster are routed to the prod-alerts receiver, while alerts from the dev-cluster are routed to dev-alerts. This setup helps manage alerts efficiently across multiple environments.

Step 4: Verify Alerts with External Labels

To ensure that external labels are correctly applied to alerts, you can check the Prometheus metrics and alerting interface.

  1. Check Prometheus Metrics:

    Use the Prometheus web interface (http://localhost:9090) to query the metrics and verify that external labels are present.

    up{cluster="prod-cluster", region="us-east"}
    
  2. Check Alerts in Alertmanager:

    In the Alertmanager web UI, inspect the alerts to ensure that the external labels are included in the alert details.

    Alert: HighMemoryUsage
    Instance: node1
    Cluster: prod-cluster
    Region: us-east
    

This verifies that the external labels are applied correctly and are visible in alerts.

Step 5: Use External Labels in a Federated Setup

In a federated Prometheus setup, where multiple Prometheus instances send data to a central Prometheus server, external labels help differentiate metrics from different sources.

Example Federated Setup

  1. Local Prometheus Instance:

    On each local Prometheus instance, configure external labels:

    global:
      external_labels:
        cluster: 'prod-cluster'
        region: 'us-west'
    
  2. Central Prometheus Server:

    The central Prometheus server scrapes the local instances and uses the external labels to track the source of the metrics:

    scrape_configs:
      - job_name: 'prod-cluster'
        static_configs:
          - targets: ['prod-prometheus:9090']
    

In this scenario, the central Prometheus instance can aggregate data while keeping metrics and alerts organized by their source.

Final Thoughts

Using external labels with Prometheus alerts enhances alerting by adding important context, reducing alert duplication, and ensuring that alerts from multiple clusters or environments are easily distinguished. Whether you’re managing a multi-cluster environment or implementing a federated Prometheus setup, external labels are a valuable tool for improving monitoring and alerting efficiency.