Further Down The Rabbit Hole With MITRE’s ATT&CK Eval Data
Sometimes it doesn’t feel like everyone’s on the same page about what “good” looks like. It’s been about two weeks since MITRE published the results of round 2 of its enterprise ATT&CK evaluation, which means that about every one of the participants has had time to publish blogs with their own interpretation of how to count the detections everyone has with an inevitable declaration of victory. The fundamental problem we’re ignoring is alert fatigue. I say “we’re” because I’ve been guilty of this, too. Have you ever met a blue-teamer who says things like, “if we only had more alerts . . . “? No, of course not; counting alerts is ridiculously tone-deaf — but there is a natural psychology to want to get all the answers right.
To this end, I’ve spent the last two weeks trying to figure out the best way to characterize the value of generated alerts to allow us as an industry to reframe what 100% looks like.
Actionability Is The Product Of Alert Efficiency And Alert Quality
If we look at alerts as the antecedent for work generation within a security operations center (SOC), then we need a way to understand what is going to determine the actionability of these alerts, or the likelihood that you’re going to go down the rabbit hole with triage and investigation. I’ve encapsulated this challenge in the following equation:
alert actionability = alert efficiency * alert quality
In plain English, this equation recognizes that the efficiency of alerts (not too many) and the quality of the alerts (how well they help you understand the story) are both related and critical to understanding how “actionable” a particular alert is going to be.
In studying this, the first metric we need to pay attention to is “alert efficiency.”
What’s Alert Efficiency?
From a detection perspective, our job is to stop adversaries before they are able to act on objectives. If a product is generating a ton of alerts, there’s a likelihood that it’s alerting on individual behaviors instead of indicting patterns of behavior. To demonstrate this problem, I’ll highlight technique T1107: File Deletion, which is measured in seven different places in this evaluation. Files are getting deleted all day in your environment, so the consequence is that you either multiply all these alerts that are being counted by the size of your infrastructure or you tune them down to where the evaluated product is no longer representative of what’s deployed in your environment. I’ve identified the following formula for measuring the efficiency with which alerts are being generated:
alert efficiency = 1 – (total alerts/total substeps)
The more alerts you’re generating, the less efficient you are at helping a SOC surface true adversarial behavior.
They’re Good Alerts, Brent!
Undoubtedly! The challenge then becomes figuring out how good these alerts are — which brings us to the question of measuring “alert quality.” In my mind, the highest-quality alert is a correlated technique detection, meaning we have clarity into what is being alerted on and context around related events on the system. As each alert can have one or both of these conditions, I’ve defined “alert quality” to average the ratios of alerts that are technique detections and the alerts that have the correlation modifier, with one caveat . . .
The First One’s Free . . . For Correlation
From a detection perspective, an ideal solution would alert once and correlate all other detections to that initial alert — without requiring (but also not penalizing) these other detections to be generating alerts themselves. Understanding an incident as being a chain of events, the first alert in the sequence may not be correlated, so I needed to build in a mechanism for correcting this. MITRE provided a mechanism for doing this by breaking the evaluation into 20 different steps, or adversary tactics. Understanding that the way these products correlate events is going to vary and will likely not align perfectly with the steps MITRE has identified, I’ve compensated by adding correlation equivalencies for the first uncorrelated alert in each of the 20 steps (if necessary), which brings us to this definition of “alert quality”:
alert quality = ((technique alerts) + (correlated alerts + steps with uncorrelated alerts))/(2 * total alerts)
Note the use of parentheses to emphasize we are treating the count of “steps with uncorrelated alerts” as additional correlations. An important takeaway from this equation is that lots of alerts isn’t bad, so long as they are of a proportionally high fidelity.
Final Thoughts On Actionability
I kept looking at the results of this evaluation and feeling like something was missing in the metrics I was generating. I think visibility and correlation are critical to evaluating what a product can do for you from a hunting and investigation perspective but felt there was a real gap in how vendors measured success versus what their clients need. In attempting to solve this, I’ve spent the last two weeks trying to figure out how to create a double-edged sword that penalizes noisy solutions, as well as solutions that are generating alerts without context. I feel actionability is an appropriate foil for both issues and hope it drives us as an industry to measure success by metrics that are impactful for our clients. That said, it is not a stand-alone metric and should be considered with visibility and correlation when determining how a solution satisfies the needs of your organization.
I’ve updated my project on GitHub to reflect these new metrics.