1 Million Alerts: What We Learned About Noise

Key Takeaways
– Between 60% and 80% of alerts in most environments require no human action: they’re duplicates, downstream symptoms of one root cause, or self-resolving within minutes. (ITOC360 internal analysis, 2026)
– Alert fatigue is the number one obstacle to faster incident response, outpacing the next challenge by nearly 2:1. (Grafana Labs Observability Survey, 2025)
– Hybrid correlation (rules plus ML, with human review before changes apply) consistently outperforms pure rule-based or pure ML approaches in production.

The number that keeps coming up, across environments of very different sizes and complexity, is somewhere between 60 and 80 percent. That’s the share of alerts that fire in a given month and require no meaningful human action: they’re duplicates, they’re symptoms of a single upstream cause, or they resolve on their own within minutes. One way or another, they didn’t need a person.

Worth sitting with that for a moment. If your on-call engineer receives ten pages in a shift, six to eight of them are probably noise by that definition. The other two or three are real. The problem is that all ten look identical when they arrive at 2 a.m.

This isn’t a configuration failure at any specific company. It’s the default state of most alerting architectures, and understanding why requires looking at three patterns that show up almost everywhere.

Alert Storms: When One Problem Looks Like Forty

A single infrastructure event, whether it’s a network partition, a bad deployment, or a degraded database replica, can trigger alerts across every monitoring tool watching that environment. CPU spikes, latency degradations, error rate increases, queue depth warnings, health check failures. All within a 3-minute window. All technically accurate. All describing the same underlying problem from a different angle.

Without a correlation layer, that’s five separate incidents in your queue. Five acknowledgments. Potentially five engineers, each assuming they’re working on something different.

We’ve seen the same root cause generate more than 40 individual alerts before anyone figured out what was actually happening. The Google SRE Workbook recommends a ceiling of two actionable pages per on-call shift per engineer. A single alert storm from one upstream failure can blow through that budget in under five minutes.

What makes storms particularly damaging isn’t the volume. It’s the context-switching cost. Each alert that looks independent pulls an engineer toward a different hypothesis, and the real investigation stalls while the queue grows.

According to a peer-reviewed survey of alert fatigue published in ACM Computing Surveys (2025), most organizations receive over 10,000 alerts daily, with more than 50% being false positives. Alert storms are one of the main mechanisms behind that ratio.

Stale Thresholds: The Noise Nobody Owns

Thresholds get set at launch, calibrated to expected load at the time. Systems grow. Traffic patterns shift. What was a tight, useful threshold 18 months ago now fires every Tuesday afternoon during normal peak. Nobody updates it because threshold review isn’t anybody’s job, and the risk of making a threshold too loose feels higher than the ongoing cost of the noise.

So the noise stays. Engineers learn to filter it mentally. Then one Tuesday, the threshold fires for a real reason, and the on-call engineer’s mental filter treats it the same way it treated the dozen false positives before it.

This is one of the more insidious failure modes in alerts because it doesn’t appear on any dashboard. The SANS 2025 Detection and Response Survey found that 73% of organizations cite false positives as their top challenge in threat detection, with “very frequent” false positives rising from 13% to 20% year over year. Stale thresholds are a primary contributor.

The fix isn’t always raising the threshold. Sometimes it’s adding a time-of-day qualifier. Sometimes it’s decomissioning the rule entirely. You can’t do either until you know which rules are drifting, and that requires tracking noise score at the rule level (more on that below).

Self-Resolving Alerts: The Quietest Trust Killer

These are the quietest and, in some ways, the most corrosive pattern. An alert fires. Nobody acknowledges it within 15 minutes. Then it auto-resolves. Brief network hiccup. Transient memory spike. It never needed a person. But it paged one anyway.

Do that regularly, and you end up with engineers who wake to their phone, check the alert, see it already gone, and go back to sleep with their trust in the system slightly eroded. Not broken. Just slightly less reliable, every time.

The cumulative effect of enough self-resolving alerts is an on-call team that doesn’t fully trust its own pages. And that’s a dangerous place to operate from.

The Vectra AI 2024 State of Threat Detection report (n=2,000 security practitioners) found that 71% of practitioners worry they’re missing real attacks buried in alert floods, and that teams realistically handle only 38% of the alerts they receive. The mechanism is the same whether it’s a security team or an SRE team: volume conditions people to tune out.

When the real incident arrives, it looks like everything else.

Noise Scoring: The Metric That Changes the Conversation

Most teams track alert volume. The more useful shift is tracking noise score per rule: of all the times this rule fired, what percentage resulted in a human taking a meaningful action? When you rank rules by that metric, a clear structure emerges.

Across the environments we’ve analyzed, 15-20% of rules are typically responsible for 70-80% of actionable incidents. Another 30-40% of rules have noise scores above 80%, meaning they almost never lead to real action. Those are your highest-priority candidates for tuning or retirement.

The middle bucket is harder. Rules that are occasionally useful, occasionally noisy. This is where judgment matters. Machine learning earns its place here, not by making autonomous decisions, but by surfacing patterns in when those rules are actionable versus when they aren’t. That’s information a human can act on with confidence.

Noise score by rule is a forcing function. It converts “our alerting is too noisy” into a ranked list of specific rules to investigate. The conversation shifts from a vague complaint to “rule X has a 94% noise score this quarter. Let’s look at it.”

The Grafana Labs 2025 Observability Survey (n=1,255) identified alert fatigue as the number one obstacle to faster incident response, outpacing the next challenge by nearly 2:1. Thirty-one percent of respondents specifically want AI-driven, training-based alerts as a solution. Noise score by rule is how you build the training baseline to get there.

Why Pure Rules Have a Ceiling (and Pure ML Does Too)

Rule-based correlation handles what you anticipated. It doesn’t handle novel failure modes, cascading dependencies you didn’t map, or subtle behavioral shifts that don’t match any pattern you wrote a rule for. That’s the ceiling.

Pure ML has a different problem: your on-call engineer needs to trust the correlation under pressure, without time to verify the reasoning. If the model groups two unrelated alerts together during a real incident, you’ve made the response harder, not easier. And if engineers start second-guessing the groupings, you’ve lost the efficiency gain entirely.

The approach that holds up in production is hybrid: rules for the known cases (full transparency, predictable behavior), ML for the patterns rules miss, with suggestions surfaced to engineers for review before anything is applied automatically. Neither layer runs without the other. The system learns from what engineers accept or reject. Over time, the suggestions get better.

Data from the New Relic 2026 AI Impact Report (n=6.6 million users, full-year 2025 platform data) shows what this looks like at scale. AI-enabled observability accounts maintained a 46% noisy-alert rate; non-AI environments frequently exceeded 70%. During peak load in May 2025, AI-enabled teams averaged 26.75 minutes per incident. Non-AI teams required 50.23 minutes.

The gains are real. The architecture to get there requires keeping humans in the loop throughout.

Frequently Asked Questions

What is alert noise in SRE and DevOps?
Alert noise refers to alerts that fire without requiring meaningful human action. This includes duplicates, downstream symptoms of a single upstream failure, and self-resolving transients. In most production environments, between 60-80% of monthly alert volume falls into this category, based on ITOC360’s analysis across enterprise deployments.

How do you calculate a noise score for an alert rule?
Pull the rule’s firing history for a defined window (30 or 90 days). Flag which firings resulted in a human taking a meaningful action: acknowledging an incident, triggering remediation, or escalating. Divide non-actionable firings by total firings. A rule with 94 non-actionable firings out of 100 total has a 94% noise score. Rank all rules by this metric to find your highest-priority tuning candidates.

What’s the difference between rule-based and ML-based alert correlation?
Rule-based correlation uses explicitly defined patterns: “if alerts A, B, and C fire within 5 minutes, group them into one incident.” It’s transparent and predictable but limited to scenarios you anticipated. ML-based correlation finds patterns statistically, including novel ones you didn’t write rules for. A hybrid approach uses both, with rules handling known failure modes and ML surfacing suggestions for everything else, reviewed by engineers before being applied.

Is some level of alert noise unavoidable?
Yes. The goal isn’t zero noise. The Google SRE Workbook recommends a ceiling of two actionable pages per on-call shift per engineer. Beyond that, fatigue compounds and response quality degrades. The practical target is a noise level low enough that engineers trust the pages they do receive, which means your real incidents don’t get filtered out by habit.

How does alert noise affect MTTA?
High noise directly raises Mean Time to Acknowledge by eroding trust. Engineers who’ve learned that most pages self-resolve tend to acknowledge more slowly, particularly overnight. Teams using AI-assisted correlation acknowledged and resolved incidents roughly twice as fast during peak load periods compared to non-AI teams, according to the New Relic 2026 AI Impact Report.

ITOC360 On-Call Incident Management tracks noise scores per alert rule and uses hybrid AI to surface correlation patterns and suggest rule improvements. Your team reviews and approves every change before it takes effect. Learn more at itoc360.com