3 Ways to Automatically Score DFIR Artifacts

Start with clues. Not thousands of artifacts.

Artifact scoring is the most effective way to make your digital investigations faster because most of the data you collect is not relevant to the investigation. It’s normal system activity. 

Scoring categorizes an artifact as one of: 

  • Bad: It’s nearly identical to one used in a past incident or matches a TTP. 
  • Suspicious: It’s similar to a past incident, but it also could be normal activity. 
  • Unknown: Not sure if it’s good or bad.
  • Good: It’s from a trusted source and not malicious.

Once your forensic artifacts are scored, you can focus on the small percentage flagged as bad and suspicious. These can then lead you to identify other evidence that is unique and in the “unknown” category. 

Let’s look into 3 effective ways of adding automated scoring into your investigations:

  1. Malware scanning/sandbox analysis
  2. Indicators of compromise (IOCs)
  3. Machine learning/AI

Processing each artifact with modules that can do this will allow you to make faster decisions.

1: Malware Scanning/Sandbox Analysis

One effective way to find bad and suspicious items is to leverage malware scanning and sandbox services. They will identify files that are identical to or similar to previously seen malware. 

Ideally, every file type that can cause damage should be scanned, including executables, libraries, documents, pictures, and scripts. 

2 ways to add malware analysis into your investigations at scale:

  • Multi-engine scanning services like VirusTotal, ReversingLabs, and Polyswarm that give results from 40+ engines. To analyze at scale, you will need access to their APIs and purchase a subscription (or purchase a tool that includes access). 
  • Sandbox services like Recorded Future, Joe Sandbox, and Any.Run that launch the uploaded file and observe its behaviors. They will often give you a score based on how similar the files actions are in comparison to malware. Because these take a few minutes to run, you should not send all files to them. Only those you’re suspicious of. 

These solutions will allow you to identify bad and suspicious files. . 

When we built Cyber Triage, we chose to use the results to score files as good if no engines flag the file and it is a file that has been in circulation for a long time. 

2: Identifying Previously Seen Indicators (IOCs)

One effective way of identifying bad items is to look for items from previous incidents. These are often called Indicators of Compromise (IOCs).

IOCs:

  • Cryptographic hash values (SHA256, etc.) that are an exact match with previous files.
  • Fuzzy hash values (SSDEEP, ImpHash, etc.) that identify files similar to previous files.
  • IP addresses that are an exact match to a host that at one point was used by attackers.
  • File paths previously used by attackers.
  • Yara rules from previously analyzed files.

The first one is the only exact match method that, when it matches, indicates it’s the same as in a previous incident. The others are simply clues that need further analysis. 

You can subscribe to free and commercial threat intelligence companies for a feed of IOCs.

You should also use a system to manage your own internal IOCs. For example, you could use DFIR IRIS or Cyber Triage will record all items you score as bad and make sure they are scored in the future. 

3: Machine Learning/AI

Machine Learning and Artificial Intelligence (AI) methods can take in large amounts of data and produce rules or scores. These methods can be resource-intensive and not necessary when the attacker is using tactics very similar to a previous attack. 

IOCs are good enough for that. 

2 ways of using machine learning and AI:

  • Identifying outlier artifacts and events not consistent with normal system activity. These could be benign or malicious and are often flagged as suspicious. For example, only one host in the network has a given startup item. It could be benign and from a unique program on that host. Or, it could be from an attacker. 
  • Make rules to detect items similar to existing known artifacts. Machine learning techniques can define what rules are effective at identifying the bad versus good items. 

Conclusion

Scoring is a critical way to make your SOC and DFIR investigations faster, and these are 3 of the most effective ways to determine if an item is good or bad. 

If your workflow does not include artifact and file scoring, consider other approaches that do. Cyber Triage uses all of the above and other ways of scoring artifacts to help you get clues and quickly conduct investigations. You can learn more here.