The Numbers Game: Why Alerts Volume and False Positives Matter in MITRE ATT&CK® Enterprise Evaluations 2024

Over the last few years, the MITRE ATT&CK® Evaluations has become an industry standard for evaluating detection capabilities of security products and services by simulating real-world attacks.

A significant number of post-breach investigations this year revealed that attackers often went undetected despite numerous signs of their activity. This typically happens because security teams lack visibility into relevant systems or data sources, fail to recognize the significance of the information they observed - or are overwhelmed with alerts and unable to fully evaluate them.

For this year’s evaluation, MITRE has introduced two important and well-needed metrics "total alerts generated" and "false positives" to better reflect actionability of solutions from participating vendors. This marks the first time MITRE has included this aspect in their product evaluations.

We welcome this change, as it offers a new dimension for potential buyers to assess security solutions and demonstrates MITRE's commitment to conduct evaluations that closely resemble real-world scenarios.

MITRE ATT&CK® Evaluations for Enterprise – Round 6

For Round 6, MITRE prepared three detection scenarios (and one protection scenario) focusing on ransomware (specifically Cl0p and LockBit), as well as attacks targeting macOS systems by the Democratic People's Republic of Korea (DPRK). This marked a significant expansion of the evaluation scope and not all vendors participated in the macOS testing. While two ransomware scenarios focused on Windows, one scenario also included detections related to Linux Ubuntu, further demonstrating the broadening scope of operating systems evaluated.

With the latest revision to evaluations formula, three key metrics now provide insights into the approaches taken by different security solutions and vendors:

Alert Richness: This refers to the level of detail and context an alert provides. Richly detailed alerts equip security teams with a comprehensive understanding of the potential threat, enabling them to take appropriate action.
False Positives: These represent intentionally designed events within the evaluation scenarios that should not trigger alerts.
Total Alerts Generated: This metric represents the overall volume of alerts produced by a product. Excessive alert volumes can overwhelm security teams, making it difficult to prioritize and address genuine threats.

For our analysis, we considered whether to report results for each scenario individually or to provide a cumulative summary. We ultimately opted for a summarized analysis, with complete results for all key metrics included at the end of this blog post for reference.

Alerts Richness Versus Alerts Volume

Similar to previous rounds, MITRE included a detection system with varying levels of coverage. It's crucial to understand that this rating is based on predefined detection logic, not solely on mapping to the MITRE ATT&CK® Framework.

For example, even if a vendor identifies a technique as "System Location Discovery" and the specific subtechnique as "System Language Discovery," the detection may be marked as "None" if it fails to provide required details such as the use of the 'NtQueryInstallUILanguage' API call. In other words, providing detailed information does not automatically guarantee a high detection rating. Our analysis includes only 'analytical' coverage, detections that go beyond basic identification and provide additional context and information.

MITRE's evaluation now includes data on the total number of alerts generated, further broken down by severity levels. This provides valuable insights into the overall alert volume and the distribution of alerts across different threat levels.

The most effective solutions will strike the right balance between providing sufficient context within each alert to understand the threat and minimizing alert noise. An excessive volume of alerts, even with rich detail, can quickly overwhelm security analysts, leading to alert fatigue and potentially hindering their ability to effectively respond to critical threats.

FIGURE 1 - Alert Richness vs. Alert Volume (High & Critical Alerts) - This visualization includes only alerts with severity High or Critical. Participants with very high alert volumes, some exceeding thousands, have been adjusted for better readability. Please refer to the full table at the end of the blog post for the complete and unadjusted data.

False Positives

MITRE has also introduced the concept of False Positives (FPs) in their evaluations. These FPs are essentially "booby traps" – intentionally designed events within the evaluation scenarios that should not trigger alerts. If a security solution reports one of these "booby trap" events, it indicates that the solution is generating false positives. It's important to note that the measured FPs represent only a subset of the potential false positives generated by a solution, especially with high alert volumes.

FIGURE 2 - This graph illustrates the relationship between Alerts Richness and triggered False Positives, intentionally designed events within the evaluation scenarios that should not trigger alerts.

Bitdefender: Prioritizing Actionable Insights

The previous MITRE ATT&CK® Evaluation for Managed Services demonstrated the exceptional performance of our MDR team, a testament to the underlying strengths of our platform. With the introduction of new metrics in this latest evaluation, the factors that make our platform uniquely actionable are now clearly visible.

While 25% of vendors did not participate in the macOS scenario, we achieved 100% analytical coverage for both Linux and macOS, with zero False Positives (FPs) in both cases. Overall analytical coverage is 91% with 6 FPs.

Notably, the average number of incidents reported to the SOC with GravityZone platform was only 3 across all scenarios, compared to an average of 35,000 (median 209) for other solutions. By effectively correlating all scenarios into a small number of incidents, each with sufficient detail, security teams can quickly detect and respond to ongoing threats. Our focus on actionability is further evidenced by unique GravityZone features like Incident Advisor, a single-page summary of extended security incidents.

Conclusion

The latest MITRE evaluation showcases Bitdefender's strengths: exceptional threat detection, actionable insights, and a commitment to minimizing alert fatigue. This translates to a powerful solution that empowers security teams to focus on what matters most – effectively responding to genuine threats and keeping organizations secure.

Want to learn more? Dive deeper with the experts themselves! Join our on-demand webinar featuring Bitdefender's SOC analysts and security researchers where we’ll be unpacking the MITRE evaluations, discussing our results in detail, and answering any questions you have about our approach to security. This is a technical deep-dive (not a marketing event) and a chance to learn directly from the front lines of threat detection and response.