MITRE ATT&CK™ Evaluations 2020

Carbanak+FIN7: Visibility Makes the Difference

Michael Greaves, Product Manager, F‑Secure

In this year’s test, MITRE Engenuity used the MITRE ATT&CK® knowledge base to emulate the tactics and techniques of Carbanak and FIN7, two groups that have compromised financial services and hospitality organizations through the use of sophisticated malware and techniques. The activities of these adversaries has resulted in the theft of more than $1 billion across hundreds of businesses over the past five years.

The ATT&CK Evaluations team chose to emulate Carbanak and FIN7 because they target a wide range of industries for financial gain, whereas prior emulated groups were more focused on espionage.

This is the third evaluation that F-Secure has participated in. We’re going to use this article to discuss how we did against some key detection metrics, touch on some points which relate to our approach to detection and reference some useful pieces and tools that MITRE Engenuity have been doing to accompany the evaluations.

MITRE Engenuity ATT&CK Evaluations are funded by vendors, and intended to provide a dispassionate overview of product capabilities as they compare to MITRE’s publicly accessible ATT&CK® framework. MITRE developed and maintains the ATT&CK knowledge base - which is based on real world reporting of adversary tactics and techniques. ATT&CK is both freely available, and widely used by defenders in industry and government to find gaps in visibility, defensive tools, and processes as they evaluate and select options to improve their network defense. MITRE Engenuity makes the methodology and resulting data publicly available so other organizations can benefit and conduct their own analysis and interpretation. These evaluations do not provide rankings or endorsements. F-Secure is an enthusiastic and long-term supporter of MITRE Engenuity’s work.

Visibility is (still) key

As we mentioned in last year’s post on the APT29 evaluation, we think visibility is one of the most important features of a good detection and response technology.

What do we mean by visibility - and why is it important?

We mean that the tool provides a threat hunter, security specialist, or incident responder with easy access to the data sets they need for all phases of the job -

  • High-fidelity detections Having the ability to develop high-fidelity detections (i.e. detections with a low false positive rate) is something that all blue teams work towards; no one wants to spend their day dealing with false positives after all. This is only possible when you have a rich data set that provides the detail necessary to find the needle in the haystack. An example might be the ability to see code execution AND in-memory code execution.
  • Investigations Investigations are essentially a jigsaw puzzle – and visibility is the equivalent of seeing the picture on the lid of the puzzle’s box. Without that photo of the finished article, it’ll take a lot longer to solve. The equivalent of that box-lid picture for Detection and Response is access to a set of a broad set of data (e.g. process and memory execution, persistence mechanisms, user behavior) with consistent pivot points that allow a hunter to jump from one telemetry source to another as they build that picture.
  • Response Data that allows for comprehensive analysis of an incident is essential to ensure a sound containment plan can be implemented. Without visibility there’s a good chance that activities such as presence persistence mechanisms could be missed and relevant threat artifacts left in the infected system. If the attacker knows that you are on to them they may come back re-tooled and upskilled.
  • Threat Hunting Threat Hunting is about identifying and improving gaps in detection coverage – the more visibility and data a hunter has, the quicker the gap can be plugged with a new detection rule.

How do ATT&CK Evaluations help us understand visibility?

A product’s visibility in the MITRE ATT&CK evaluations reflects the number of total test sub-steps which have a ‘detection’ assigned to them. MITRE Engenuity defines a detection as telemetry or a general, tactic or technique analytic. Visibility is a metric that MITRE Engenuity now calculates for each vendor across all of the evaluations to date. For test sub-steps with multiple detections (e.g. telemetry and technique), only the higher detection (e.g. technique) would contribute towards the count.

Fig 1: MITRE ATT&CK Evaluation, Carbanak+FIN7 – Visibility across all participant

During the Carbanak+FIN7 evaluation, F-Secure had a detection for 152 of the 174 test steps, or 87% (rounded down). This matches visibility provided by our offerings during the APT3 (89%) and APT29 (88%) evaluation, evidence of the work we put in to ensure threat hunters or incident responders can detect, investigate and respond to techniques that attackers are using today. In short: we’ve performed consistently across all three evaluations.

Avoid false positives: Mo' detections = mo' problems

It might seem strange for a vendor to say 87% visibility is sufficient – after all, that surely means a 13% ’gap’ an attacker could exploit to evade detection. Not so.

Attacks will always involve multiple activities at each attack stage. Whilst defenders only need to detect one of these activities to understand an attack is underway, the more stages you have detection capability for, the better your chances of detecting and stopping an attack. However it’s important to balance this with the possibility that what might look like an attack is in fact innocuous – a false positive. Sometimes it is better to not have a detection rule or analytic that you know will be very noisy and prone to false positives and focus on the high-fidelity use cases instead.

We can use the attacker activities from sub-steps 1.A.1 and 1.A.3 with statistics collected from F-Secure’s environment to help demonstrate the problem with a detection reporting an attack every time a user lifts a finger. Having a specific detection rule for explorer.exe opening winword.exe would have generated 8,644 alerts over a single week, because it is a legitimate parent-child relationship when Word is opened by user. Assigning threat hunters to look at these alerts specifically would be a waste of time so we do not have a specific detection rule for this activity but can see this activity within our telemetry. Whereas having a detection rule for winword.exe executing wscript.exe, which generated the ‘technique’ detection we had for sub-step 1.A.3, caused 0 alerts over the same week period because it is looking for activity which has very few legitimate use cases and usually indicates the use of a malicious Word document. Our approach led to detection of the malicious activity in step 1.A.3 without generating unnecessary false positives.

You needn't see a mouse to know you have pests

As well as focusing on use cases which do not drown your threat hunters in false positives, the other approach to detection is to spend effort on developing coverage across multiple stages of the attack rather than trying to detect everything in a single stage. By stages, we mean code execution upon initial access, establishing persistence, elevating privileges, discovery of lateral movement targets or lateral movement itself. Having high-fidelity detections across multiple stages will increase your chances of detection over a strategy which puts all your eggs in one basket.

We can use the steps (aka attack stages) MITRE Engenuity have defined for the Day One (stages 1-10) and Day Two (stages 11-20) attacks to help visualize this

Fig.2: F-Secure analytics at each step of the MITRE ATT&CK Evaluation, Carbanak+FIN7

(For clarity, ‘analytic’ refers to a category of detection identified by MITRE Engenuity which was not telemetry i.e. a detection which would alert the user of suspicious activity without requiring them to search through the raw data.)

As can be seen in Fig. 2, we had analytics in multiple attack stages across both days of the evaluation.

In plain English: lots of the right sort alarms went off, each of which was an opportunity for detection and detailed understanding of the attack as it progressed.

Multiple analytics reflects our approach of having coverage for multiple stages of an attack. Day Two spells out the value of this approach; although we were not assigned an analytic during Step 12, we had generated a high severity alert of the suspicious activity during the initial breach in Step 11 and continued to track the attacker activity in Step 13 through to the end of the attack. A missed analytic does not mean a detection product would miss an attack altogether.

Fig. 3: Step 11 shows our F-Secure’s ability to detect despite not alerting at Step 1

Time to own our mistake

Readers will see that we have a configuration change for ‘Data Sources’ across a number of sub-steps on Day Two. As MITRE Engenuity has included as a footnote against the relevant sub-steps, this was due to an accident where we missed deploying our EDR sensor to one of the test hosts. We rectified this before repeating these test steps which led to the detections associated with those sub-steps. The setup and configuration of our EDR during the test was not changed prior to these test steps being repeated. As you can imagine, we have put steps in place to make sure this does not happen again!

Looking forward to the next Evaluation

In this article, we have analyzed the data in a few ways, but further analysis is always encouraged to determine whether a potential solution will meet your requirements. We’d also encourage readers to look at our previous analyses of the previous two evaluations.

MITRE Engenuity looks after us all here, with features like the Technique and Participant Comparison Tools that let you drill into how a solution performed against a technique that you have identified as being of higher importance when it comes to detection.

Two sources of further reading we’d recommend: the original MITRE ATT&CK space at medium contains a number of blogs which serve as a ‘how-to’ when conducting your own analysis; one of our favorites is a piece by MITRE’s Jamie Williams which touches similar ideas to those we’ve addressed in this article. The second source is the new MITRE Engenuity medium, which picks up where the original ATT&CK medium leaves off.

As we’ve said in previous posts concerning MITRE ATT&CK Evaluations, we believe the evaluations provide invaluable insight for buyers – and help us improve our offering. With the Evaluations – and other tools from MITRE Engenuity – users get a better sense of how effective a detection product is at alerting to the techniques that attackers are using today. Understanding how each solution goes to work on the challenge is particularly important, given the size and fragmented nature of the detection market.

F-Secure will be participating in the fourth Evaluation round later this year.