EC2 CPU utilization on {{name.name}} | {{host.name}}
Query | avg(last_30m):avg:aws.ec2.cpuutilization{host:*} by {name,host} > 90 |
Message | {{#is_alert}} CRITICAL {{/is_alert}} {{#is_warning}} WARNING {{/is_warning}} {{#is_recovery}} RECOVERED {{/is_recovery}} |
EC2 memory usage on {{name.name}} | {{host.name}}
Query | avg(last_30m):avg:system.mem.pct_usable{datadog:true} by {host,name} < 0.1 |
EC2 disk space on {{device.name}} in: {{name.name}} | {{host.name}}
Query | avg(last_15m):avg:system.disk.in_use{host:*} by {host,device,name} > 0.9 |
EC2 Status Check on {{name.name}} in {{region.name}}
Query | avg(last_5m):avg:aws.ec2.status_check_failed{host:*} by {host,name} >= 1 |
Datadog agent down on {{host.name}}
Query | “datadog.agent.up”.over(“datadog:true”).by(“host”).last(2).count_by_status() |
w32time service status on {{host.name}} | {{region.name}}
Query | “windows_service.state”.over(“windows_service:w32time”).by(“host”,”name”,”region”,”windows_service”).last(3).count_by_status() |
w3svc service status on {{host.name}} | {{region.name}}
Query | “windows_service.state”.over(“windows_service:w3svc”).by(“host”,”name”,”region”,”windows_service”).last(3).count_by_status() |
schedule service status on {{host.name}} | {{region.name}}
Query | “windows_service.state”.over(“windows_service:schedule”).by(“host”,”name”,”region”,”windows_service”).last(3).count_by_status() |
Route53 health check {{name.name}} failed
Query | avg(last_5m):avg:aws.route53.health_check_status{datadog:true} by {name} < 1 |
IIS Application Pool up on {{host.name}} | {{region.name}}
Query | “iis.app_pool_up”.over(“*”).by(“host”,”name”,”region”,”app_pool”).last(3).count_by_status() |