Blog • March 10, 2026 • 12 minute read

When a Metric Becomes the Mission - Goodhart's Law in Security Operations

Security Operations metrics are essential, but dangerous when they become the goal. This post explores how SOC metrics like MTTD and MITRE ATT&CK coverage drift from their original intent, examines real-world parallels, and offers a practical framework for keeping your metrics honest and mission-aligned.

In this blog

Introduction

Metrics: everyone's favorite topic within Security Operations. At least mine. I have always been fascinated by the way data gets thrown together to show patterns or insights. I have also become somewhat of a curmudgeon about sanitized data as of late across almost any vertical. When data is presented, it is rarely in raw form, but in a structured and logical pattern for dissemination to a broad audience. With this presentation, human bias is at play, as we all have a reason for why we are presenting this data in the first place.

Back on topic – Security Operations metrics: we should all be familiar with these if you have lived in a SOC. My top 5 are (I know there are more, but for brevity's sake, let's just settle on these five).

Mean Time to Detect (MTTD)
Mean Time to Respond (MTTR)
Mean Time to Close (MTTC)
False Positive / True Positive Ratio
MITRE ATTACK Coverage

Pearson's Law

Now most metrics, including those in Security Operations, come from a simple management principle: Pearson's Law, which states:

"What is measured improves. What is measured and reported improves exponentially"

Let's start with a thought: where should detection engineering efforts be focused? My answer, in order: failed detections from purple and red team emulation events first. Threat intelligence second, and I mean real threat intelligence. Not IOCs, IP addresses, domains, and hashes. Novel attacker behavior. New execution paths. Emerging persistence mechanisms. The how, not the what. MITRE ATTACK Coverage is third. A guide for programs without better inputs, not a roadmap for mature ones.

Am I saying that ATTACK Coverage isn't useful? Absolutely not, but prioritization matters. If you have known missed detections from a recent engagement, those should be addressed first. ATTACK Coverage is a guide, in my mind, to understand where your current incident response program should prioritize detection engineering efforts or highlight gaps within your detection capabilities.

The reason we monitor these metrics is simple – it gives us a snapshot into how effectively a security program detects and responds to cyber events. This should always be the mantra of a SOC: to detect and to respond to cyber events. Dashboarding skills, your grep foo skills, the ability to use SQL, SPL, ECS, KQL and a dozen others – I only care about if it helps security teams to detect and respond faster. Metrics are the same way. Metrics, however, can often be taken as gospel, which can lead to unforeseen problems.

MITRE Coverage Trap

I recently had a conversation with a customer who proudly told me:

"We only had 70% ATTACK Coverage last year. This year, we are pushing above 99%"

To say the least, I was shook. If I hear someone say they are pushing nearly 100 percent ATTACK coverage, I become suspicious. It tells me that they are chasing a number rather than chasing true detections; detections that are a key portion of the SOC mantra "Detect and Respond". Now, why do I hate the 100 percent goal? Simple: not all tactics should be treated the same. Techniques like T1593.001-.003 are essentially open-source research available on websites free for anyone to browse.

Is this technically possible? I will say maybe…

But at what cost? These techniques occur so early in the kill chain that detecting them in any meaningful way produces a ton of noise for your IR personnel. Let's be serious: are we really investigating every person who views one of our employees' LinkedIn profiles? My guess is probably not. ***Now if anyone reads this, and they would like to educate me on how this is being performed effectively, I will happily eat my shoe, make the edit to rectify, and source my newly learned education as appropriate ***

My point here is simple. Not all tactics should be weighted equally within an IR program. Just as not all false positives carry the same operational cost.

Goodhart's Law

This now leads us to another principle known as Goodhart's Law.

"When a measure becomes a target, it ceases to be a good measure"

MITRE ATTACK mapping is meant to help organizations identify detection gaps. Identify gaps across essentially any would-be tactic. But when organizations start targeting 100 percent coverage, the metric loses its usefulness; it stops helping to find meaningful gaps and simply becomes a checklist. Should we strive for continuous improvement? Absolutely. The key is improvement in actual security outcomes, not just better or prettier dashboards.

Airline metrics Rant

Metric gaming, or metric distortion, is not unique to only security. Let us consider airlines for a short tangent. I fly a fair amount, and there is a running joke – I am cursed by whatever travel gods may be out there. This trend is already continuing into 2026. I digress. Metrics.

A common metric within the airline industry is On-Time Departure (OTD). This is measured as the moment the aircraft door closes, and the plane backs away from the gate. You sit on the tarmac waiting 45 minutes to take off… Does that matter? Nope. Departure time was good to go.

Another metric: On-Time Arrival (OTA) is quite commonly touted by the airline industry. However, airlines have found ways to optimize this as well. The idea is fairly simple: add buffer time to the flight schedules. Research shows that flight durations have increased in the past two decades, even though airborne flight time has not changed dramatically. The flights themselves aren't getting slower. They are getting padded. One study found that scheduled flight duration increased by 8 percent over 21 years, largely due to the padding designed to improve on-time metrics.

Source: https://www.researchgate.net/publication/327212752_Where_did_the_time_go_On_the_increase_in_airline_schedule_padding_over_21_years

Did the metrics improve? Absolutely. The underlying customer experience did not.

Not a new problem

Funny enough, I have been working on this blog and at the same time I started reading Seeing Like a State by James Scott. In this book, Scott discusses, in 18th and 19th-century France, how property taxes were calculated based on the number of doors and windows in a house.

The underlying logic: More openings meant a larger home, which translates to greater wealth. This metric is incredibly efficient and scalable. Tax assessors could simply estimate property value without ever needing to enter a property. How does Goodhart's law come into play?

People adapted to the metric. To avoid higher taxes, property owners began boarding up windows or building properties with fewer openings. The metric worked , yet housing for hundreds of thousands suffered. Unintended consequences. In many parts of Europe, you can still see the outlines of brick-over windows from this policy. I keep hitting this point, but I am hoping it sticks: when a metric becomes the target, the system will adapt around it.

Back to SecOps

All right, let's bring this back on topic here. For simplicity, I often think of the "Big Five" metrics in Security Operations as:

Mean Time to Detect (MTTD)
Mean Time to Respond (MTTR)
Mean Time to Close (MTTC)
False Positive / True Positive Ratio
MITRE ATTACK Coverage

All these metrics are useful, and all of them are nuanced. All are apt to drift from the original intent they were designed to measure. Let's walk through an example.

Example 1

A user downloads a malicious Excel file and executes a macro.
Macro executes at 10:00 am
EDR detects behavior at 10:02 am
MTTD = 2 minutes. I will buy that, assuming the investigation confirms there were no earlier artifacts to indicate dwell time.

Example 2

The phishing email containing the macro was delivered at 10:00 PM the night before.
User opens email, and macro executes at 10:00 am
EDR detects behavior at 10:02 am

What is the MTTD? 12 hours and 2 minutes, or 2 minutes? This is where intent matters. The purpose of MTTD is to quantify dwell time, the period during which an attack is within the environment before the enterprise becomes aware. Exposure is not the same as compromise.

Dwell time begins at the first attacker-controlled execution or persistence, not simply when a payload is delivered. If the email sits dormant overnight, there has been no actual attacker activity yet.

My answer stays the same: MTTD = 2 minutes.

Last example

As in previous scenarios, the user received an email at 10:00pm.
User opens email and macro executes at 10:00am the next morning
No immediate detection
Two days later the IR team notices a suspicious executable called by scheduled job.
Detection fired.

So, what is the MTTD? On the surface, two days, but only if the IR team successfully traces the detection back to the original macro execution. That connection is not guaranteed. Without it, the metric anchors to the wrong starting point, and the number becomes meaningless. MTTD is only as accurate as the investigation behind it.

Mean time to detect

Aye, there is the rub: true MTTD is often reactive. It cannot always be accurately captured at the moment of detection. Only after the investigation is complete can we decipher when the incident began. This effectively pushes MTTD as a post-incident analytical metric, not a real-time dashboard number. Otherwise, we are just measuring our alert pipeline latency, or how long it takes our detections to get into the case management system for the IR team. This is a useful metric, but it is not a detection capability.

Metric framework (mental model?)

None of the above means metrics are useless. I still love all things quantifying the work that IR folks are putting in. It is incredibly valuable. I just want to draw attention to the principle that metrics are nuanced, and people will often "game" a metric if they are incentivized or penalized because of it.

I hate being a problem talker and not offering a solution. My solution is in its infancy, and I reserve the right to update it as I research and refine it. So far, I have come to think about metrics using five simple principles.

Remember the Metric's Original Intent: What problem was this metric originally meant to measure? MTTD was designed to approximate dwell time. MITRE coverage was designed to identify detection gaps. False positive ratios are designed to measure signal quality. When the number is more important than the intent: expect drift.
Counter Metrics: A single metric on itself will rarely tell a full story. A speed metric should be counter balanced with a quality metric. Volume metrics should be counter balanced with impact metrics. MTTD should be paired with "missed detection analysis", often around a purple team or red team engagement. If your MTTD is near 0, but you are missing 90 percent of attacker tactics, then I can guarantee your MTTD is not reflecting reality. This is exactly why airlines do not look at On-Time Departure in a silo. To account for sitting on a tarmac for 45 minutes – On-Time Arrival shows a more complete picture. Hitting your OTD but not hitting your OTA is a red flag that the departure metric is masking possible operational issues.
Measure Outcomes: Many of our security metrics measure movement, not necessarily effectiveness. Examples include the number of alerts closed, tickets created or closed, detections created. I would push for more meaningful metrics. Incidents detected internally vs externally, MTTC in high severity incidents, attack paths detected during purple/red team exercises. Activity can be enhanced or optimized without actually improving our detection and response capabilities. Outcomes are harder to game.
Be on the Lookout for Behavioral Distortion (How can this Metric be Gamed?): Metrics change behavior. That is their purpose. Remember: "What is measured improves. What is measured and reported improves exponentially". Metrics also created unintended incentives. If detection engineers are only monitored on the number of detections created, we can expect that the number of detections will increase, but to what extent? Will they be valuable? High fidelity? If analysts are incentivized to "respond" within 15 minutes, then I would expect detections to be assigned quickly, but are they being investigated that quickly? This principle all revolves around the idea: If you had to game this metric, how would you do it? Take that answer and refine or set up counter metrics.
Treat Metrics as Signals, Not Necessarily Targets: Metrics should guide improvement but not become the goal in themselves. They are spot checks, indicators, signals - not a scorecard. One last time: When a measure becomes a target, it ceases to be a good measure. Security Operations is about detecting and responding, not optimizing dashboards, creating detections for the sake of creating detections. Metrics should help us get better at our core mission. Not distract us from it.

Closing thoughts

Whether we are talking about bricked-up windows in 18th-century France, padded airline schedules, or 100 percent MITRE coverage, the lesson is the same: people will always adapt to a metric. Our job is to make sure the metric is still aligned with our mission, within the SOC. That mission is: To detect and to respond.

If this got you thinking about your own program and you want to have a sounding board, the GS&A Security Operations team is here to help.