Monitor What Matters, Not What Is Easy
A practical pattern for alerting that does not cry wolf.
Most monitoring fails not because it misses things but because it shouts about everything. An alert that fires fifty times a day trains you to ignore it. The goal is the opposite: few alerts, each one worth standing up for.
The pattern
Alert on the thing the customer would feel, not the metric that is easy to read. Nobody cares that CPU touched ninety percent for four seconds. They care that the booking system stopped responding.
A simple reachability check, run on a schedule, gets you most of the way there. Here is the shape of it in PowerShell.
$targets = @("practice-gw.example", "booking.example")
foreach ($t in $targets) {
$ok = Test-Connection -ComputerName $t -Count 2 -Quiet
if (-not $ok) {
Write-Output "DOWN: $t at $(Get-Date -Format o)"
# hand off to your notification channel here
}
}
Two probes, not one, so a single dropped packet does not page you. A machine readable timestamp so the log sorts cleanly. A clear word, DOWN, that you can grep for later.
The rule behind the rule
Every alert should answer one question before you write it: if this fires at two in the morning, would I get out of bed? If the honest answer is no, it is a dashboard line, not an alert. Keep the loud channel sacred and people will actually trust it.