Key Takeaways
– MTTA measures how long it takes an engineer to acknowledge an alert after it fires. Most teams track it. Most teams are tracking it wrong.
– Optimizing MTTA before reducing noise produces a number that looks better while the real problem gets worse.
– The correct fix sequence is: noise reduction first, escalation clarity second, rotation fairness third. Optimizing MTTA directly, without addressing these, is Goodhart’s Law in action.
The number every on-call team reports in their quarterly review. The number that appears in incident post-mortems, vendor pitches, and SLA conversations. MTTA (Mean Time to Acknowledge) is treated as a primary health metric for on-call operations, and it almost never tells you what you think it’s telling you.
That isn’t because the metric is wrong. It’s because most teams measure it incorrectly, and almost all teams try to optimize it in the wrong order.
What MTTA measures and what it doesn’t
MTTA is the time between an alert firing and an engineer marking it as acknowledged. That’s it. It measures the gap between “the system noticed a problem” and “a human registered that they’re aware of it.”
What it does not measure: whether the engineer actually looked at the alert, whether they understood it, whether the alert was real, or whether they did anything useful in response. An engineer who wakes up at 3 a.m., opens the notification, dismisses it without reading it, and goes back to sleep has produced a MTTA of 45 seconds. That number goes into your quarterly average. It makes the rotation look responsive. The incident, if there was one, was not handled.
This distinction matters because most reduction efforts target the number rather than the behavior it’s supposed to represent.
How to calculate MTTA correctly
The basic formula is straightforward.
MTTA = Σ (Acknowledgment timestamp − Alert fired timestamp) / Number of alerts
For a 30-day window, pull every alert that fired during the period, compute the gap for each, and average them. Most on-call tools export this directly.
The problems start with what gets included in that average.
Use median, not mean
Alert acknowledgment times are not normally distributed. A typical shift might have twenty alerts acknowledged within two minutes and one alert that sat unacknowledged for 40 minutes because the engineer was already managing a P1 and didn’t see the page. That single 40-minute outlier pulls your monthly mean upward more than the twenty clean responses pull it down.
The median is the more honest number. It tells you what the typical acknowledgment experience looks like, without being distorted by the tail.
Track both. The mean tells you the total burden. The median tells you the typical experience. If mean and median diverge significantly (say, median is 90 seconds but mean is 12 minutes) you have an outlier problem worth investigating, not an average-MTTA problem.
Exclude self-resolving alerts
This is the most common source of inflation, and the most commonly ignored one.
A self-resolving alert fires, sits unacknowledged, and closes automatically when the transient condition clears. Its “acknowledgment time” is undefined, but many tools record it as the time between firing and auto-resolve, which can be anywhere from 2 minutes to 45 minutes. Include those in your MTTA and the number reflects how long your system waits before giving up, not how fast your engineers respond.
Filter to alerts that received a human acknowledgment. Your calculation should only include rows where an engineer actually touched the page.
Segment by time window
An acknowledgment at 2 p.m. on a Tuesday and an acknowledgment at 3 a.m. on a Saturday are not the same event. Averaging them together produces a number that hides where your real response gaps are.
Calculate in at least three segments: business hours (08:00–18:00 weekdays), evening (18:00–22:00 weekdays), and overnight/weekend. You will almost always find that overnight and weekend response times are 3x to 8x higher than business hours. That gap is the actual problem to address, not the blended average.
The four factors that distort MTTA without improving response
1. Alert volume normalization
When engineers receive a high volume of low-value alerts over a sustained period, they develop acknowledgment habits that look fast but aren’t engaged. They swipe the notification, tap acknowledge, and check whether it self-resolves before investing any attention. MTTA drops. Response quality doesn’t improve; it degrades.
This is the most dangerous MTTA distortion because it produces a metric that moves in the right direction while the underlying operation gets worse. Teams that reduce noise see MTTA improve as a secondary effect, not because they targeted MTTA directly, but because engineers start taking pages seriously again.
2. Escalation path ambiguity
When an engineer isn’t sure whether they’re the right person to handle an alert, acknowledgment slows. They acknowledge the page to stop it from escalating, then spend the next several minutes figuring out whether the alert is in their domain. The acknowledgment timestamp is captured; the 8 minutes of confusion afterward isn’t.
Clear escalation paths (defined ownership per alert type, explicit secondary contacts, documented runbooks) reduce this friction. The acknowledgment happens faster because the engineer knows what they’re acknowledging.
3. Rotation imbalance
An engineer carrying disproportionate on-call burden over several weeks responds more slowly, not because they’re less capable, but because their cognitive reserve is lower. A 3 a.m. page is different for someone who slept the previous three nights than for someone who handled two P1s in the past 72 hours.
The Fair Alert Score framework surfaces this pattern by tracking weighted burden across time window, day type, and severity, not just frequency. An engineer with a rising FAS trend and a climbing overnight MTTA is not a MTTA problem. They’re a rotation design problem that happens to show up in the MTTA data.
4. False positive conditioning
This is the on-call equivalent of the boy who cried wolf, and it compounds with time. An alert that fires weekly during normal traffic peaks and never leads to a real incident trains engineers to treat that alert as background noise. When it eventually fires for a real reason, the acknowledgment is slow, not because the engineer missed it, but because their mental model says it isn’t urgent.
False positive conditioning is a noise reduction problem, not a MTTA problem. Fixing it requires retiring or tuning the rule, not adding pressure on the engineer to acknowledge faster.
Why optimizing MTTA directly usually makes things worse
When teams set explicit MTTA targets (“we need to get under 5 minutes”), engineers find the path of least resistance to hit that number. The path is almost always to acknowledge faster without necessarily responding better.
Acknowledgment becomes a defensive action: tap the page, stop the escalation timer, figure out what’s actually happening afterward. The alert queue clears faster on paper. Incidents still take the same amount of time to resolve because the real bottleneck (understanding the problem and knowing what to do) hasn’t changed.
This is a textbook instance of Goodhart’s Law: when a measure becomes a target, it ceases to be a good measure.
The teams with the best real-world MTTA numbers arrived there indirectly. They reduced noise, so engineers trusted their pages. They fixed escalation paths, so engineers knew what to do when they acknowledged. They balanced their rotations, so engineers weren’t carrying accumulated fatigue into overnight shifts. MTTA improved as a consequence.
The correct fix sequence
If your MTTA is higher than you want it to be, this is the order in which to address it.
Step 1: Reduce noise first.
Calculate your noise score per alert rule: of all the times this rule fired in the last 90 days, what percentage required a human to take a meaningful action? Any rule above 80% noise is a candidate for tuning or retirement. Any rule above 90% should be retired or heavily modified before the next on-call rotation.
A team operating at 60–70% noise rate cannot have a trustworthy MTTA. Engineers are correct not to respond urgently to pages they’ve learned to distrust. Fix the noise, and MTTA improves without any direct intervention.
For a detailed breakdown of how noise accumulates and how to score it by rule type, see: We Processed 1 Million Alerts. Here’s What We Learned About Noise.
Step 2: Clarify escalation ownership.
For every alert type in your rotation, there should be one named owner and one named backup. The owner should have access to a runbook, even a minimal one, that tells them the first three things to check. If you don’t have this for your highest-volume alert types, build it before optimizing anything else.
Escalation ambiguity adds 5–15 minutes of invisible latency to incident response that never shows up in MTTA because it happens after acknowledgment.
Step 3: Audit rotation fairness.
Pull the last 90 days of alert data and run a Fair Alert Score calculation for each engineer. If your top-to-median ratio is above 1.8 (meaning one or more engineers are carrying nearly twice the weighted burden of the team median) you have a rotation design problem that will express itself as elevated MTTA for the overloaded engineers.
The fix is rebalancing the rotation and, where needed, giving the overloaded engineer a protected stretch of 30–45 days without primary coverage.
Full methodology: On-Call Rotation Fairness Calculator: A Framework for Engineering Managers.
Step 4: Only then look at MTTA directly.
Once noise is low and engineers trust their pages, escalation is clear, and rotation load is balanced, review the remaining MTTA by segment. Overnight MTTA above 8 minutes for P1/P2 alerts is a genuine problem. Business hours MTTA above 3 minutes for the same severity is worth investigating. At this stage, the remaining gaps are usually notification delivery issues (wrong channel for the engineer’s timezone or sleep schedule), runbook gaps, or specific alert types without clear ownership.
Frequently asked questions
What is a good MTTA benchmark?
The Google SRE Workbook recommends a ceiling of two actionable pages per on-call shift. That framing is more useful than a specific MTTA number, because it focuses on the engineer’s actual experience rather than a stopwatch. For teams that do track a number, median P1 MTTA under 3 minutes during business hours and under 8 minutes overnight are reasonable targets, but only after noise reduction is complete. Hitting these numbers in a high-noise environment means engineers are acknowledging without engaging.
What’s the difference between MTTA, MTTR, and MTTD?
MTTD (Mean Time to Detect) measures how long it takes the monitoring system to notice a problem. MTTA measures how long it takes a human to register awareness of it. MTTR (Mean Time to Resolve) measures how long the incident takes to close from detection. In a well-functioning on-call operation, MTTD is owned by your alerting configuration, MTTA is owned by your on-call process and rotation design, and MTTR is owned by your runbooks and escalation structure. They interact (high noise raises both MTTA and MTTR) but they have different root causes and different fixes.
Should MTTA be included in engineer performance reviews?
No. Using MTTA as an individual performance metric will produce the same defensive acknowledgment behavior described above, but worse, because now there are career consequences attached. Engineers will acknowledge instantly and investigate later, which is precisely the wrong order. MTTA is a system-level metric. It belongs in rotation design reviews and incident retrospectives, not performance conversations.
How does alert noise affect MTTA?
Directly and in two directions. High noise raises MTTA because engineers deprioritize pages they’ve learned to distrust. It also artificially lowers reported MTTA in environments where engineers have developed the habit of acknowledging everything immediately to clear the notification, which makes the number look healthy while the real response quality degrades. Neither direction gives you useful information about actual incident response capability. Noise reduction is a prerequisite for MTTA to mean anything.
What tools track MTTA automatically?
Most modern on-call platforms export per-alert acknowledgment timestamps. The calculation itself is straightforward in any spreadsheet. The harder problem is segmenting correctly, excluding self-resolving alerts, and tracking noise score alongside MTTA. That typically requires either custom reporting on top of your platform’s raw export, or a platform that tracks these metrics natively as part of its noise reduction layer.
ITOC360 On-Call tracks MTTA segmented by time window, severity, and engineer, alongside noise score per rule and Fair Alert Score per rotation. The combination gives you a MTTA number you can actually trust, alongside the data to understand what’s driving it.
Book a walkthrough and we’ll run your current rotation through the full analysis together.
Sources and further reading
- SRE Workbook: On-Call, Google
- State of Incident Management 2026, Runframe
- Alert Fatigue in SRE: A Peer-Reviewed Survey, ACM Computing Surveys, 2025
- Observability Survey 2025, Grafana Labs
- On-Call Burnout: What Incident Data Doesn’t Show, DEV Community, 2025