Stay up to date on our technology, training, events, and more.


By submitting this form, you agree that Sleuth Kit Labs may process your information in accordance with our Privacy Policy. We’ll use your information to send educational and marketing communications.

You can unsubscribe at any time using the link in our emails.

Not now >

DFIR+AI Primer: When Not To Use GenAI

When thinking about where to use GenAI in your investigations, you must consider 3 things:

  • How much faster did you get an answer from GenAI?
  • How error-prone is the prompt on your GenAI model?
  • How much verification will be needed given the stakes of the investigation?

If your verification time is bigger than the time savings of getting the initial answer, don’t use GenAI.

Different Prompts Can Produce Different Errors

GenAI makes errors when the prompt requires it to interpret data, reason about data, or draw conclusions from incomplete data.

Errors commonly happen because the model:

  • Makes bad assumptions.
  • Uses flawed logic.
  • Lacks relevant knowledge or context.

More complicated queries increase the chances of an error.

Different Models Produce Different Errors

GenAI is only as good as its model:

  • A large, general model (Claude, ChatGPT, etc.) has broad reasoning ability, but may produce errors on specialized or obscure DFIR topics where training data is limited.
  • A small, general local model (Qwen, LLama) has less data and reasoning ability. These have the highest risk of errors.
  • A small, but fine-tuned model with DFIR specialization could outperform both.

The likelihood of an error is based on both the prompt and the model.

Verify Results Proportional to the Stakes of the Investigation

Not all errors will have the same impact. GenAI outputs should be verified based on the stakes of the investigation (see previous blog post) and the likelihood of error:

  • Low stakes: A low-severity EDR alert investigation may require only minimal human verification or a judgement from an independent LLM.
  • Average: A medium-severity EDR alert is higher stakes and may require a human to review the timeline and double-check nothing else was missed.
  • Very high stakes: A criminal case where someone could go to jail is very high stakes and a human should verify every item in the final result.

Verifications can happen by:

  • Manual review
  • Judging by other LLMs
  • Deterministic confirmation

When Not to Use GenAI

Don’t use GenAI if the verification is longer than the GenAI time savings.

Don’t Parse Known Structures Using GenAI

Do not use GenAI to create your own, for example, FAT file system parser from scratch. It will take the same amount of time to run your new tool versus existing ones (i.e. no time savings during an investigation), but you will need to spend a lot more time verifying the results from your untested parser.

You should consider using GenAI though to parse new and unknown versions of structured files, such as JSON, XML, or SQLite. GenAI will more quickly allow you to understand these formats and convert them to a more useful format.

Don’t Replace Curated Threat Intelligence with GenAI

If you have a subscription to a curated threat intelligence source, be careful replacing it with a general GenAI. The lookup speeds will be nearly similar (i.e. no time savings during an investigation), but you’ll need to verify the sources of the GenAI output to make sure they are legit. The curated threat intelligence was already vetted.

But, you should also consider using:

  • GenAI models specialized for threat intelligence and therefore from trustworthy sources
  • GenAI if the curated threat intelligence doesn’t know about the item.
  • GenAI to summarize the outputs from multiple curated threat intelligence sources.

Don’t Replace High-Quality Detections with GenAI

If you have a detection or investigation platform with low false positive rules and heuristics, then be careful replacing them with GenAI. The detection speeds will be similar, but you may get more errors from the GenAI and therefore spend more time verifying.

Instead, use GenAI in addition to the rules and heuristics so that you can reach the best conclusions. GenAI can easily go beyond simple matching for detections.

Cyber Triage Is Using Hybrid Approaches

Cyber Triage is taking a hybrid approach with its integration of GenAI. It will continue to use internal rules and heuristics for scoring, traditional parsers, and curated threat intelligence.

But, it will leverage GenAI to expand the capabilities:

  • Items scored as bad and suspicious will be enriched with GenAI to provide more context.
  • GenAI will be used in addition to its existing rules and heuristics to score items as suspicious.
  • Custom reports can be generated based on user’s specific needs.
  • GenAI will be used to safely decode data.

Cyber Triage’s MCP server is clearly a model of supplementing its existing approach with AI since the user needs to prompt the AI to bring in more data.

Version 4 of Cyber Triage in July will automatically prompt for artifact enrichment and to reason about scoring. All results will be clearly identified to enable human verification.

Try GenAI in Cyber Triage

If you are looking to incorporate AI into your investigations, try out Cyber Triage with its MCP server. It’s BYOAI and you can use your Claude account, AWS Bedrock, or local models. Setup instructions are in the user manual.