To improve overall speed in digital forensics and incident response (DFIR), the time it takes to execute each step in the process shown below has to be reduced:
- Start the incident investigation after an alert is generated
- Collect data from the endpoint and network to start analysis (here’s how)
- Analyze the data to answer investigative questions using endpoint forensic techniques (here’s part 1 & part 2)
- Scope the incident and identifying other endpoints that are involved using hunting techniques
- Remediate the incident.
This post will focus on the best strategies to reduce the time between detecting the incident and starting the investigation—between alert and data collection.
Before we dive into specific approaches, let’s review why efficiency during this phase is critical for effective incident response.
Why Starting Incident Response Quickly is Important
Investigating endpoints is all about trying to answer key questions like:
- “Is there a malicious process on the computer?”
- “Who has been remotely logging into this computer?”
The answers to these and other critical questions await evidence. Incident responders can’t investigate anything unless they have data to analyze.
Along with delaying the investigation, poor process efficiency during this stage opens up the entire organization to the general dangers of slow DFIR: Intruders are given more time to steal and destroy proprietary data, deploy persistence mechanisms, or hide. (For more on these risks, read “Incident Response KPIs: TIME Is Critical. Here Are Five Reasons Why.”)
But there are also specific concerns associated with delays during this phase.
- If given enough time, intruders will often clean their tracks, meaning that sluggish collection could result in no data to collect
- If the specific endpoint is in a different time zone, the device could go offline before collection begins, pushing the time for analysis even further off.
Incident responders should prioritize speed during this phase. The quality and quantity of the evidence they retrieve depend on it.
How To Start Incident Response Faster
There are a number of strategies for minimizing the time during this phase. We’ll cover the three most important ones: 24/7 human coverage; continuous collection/monitoring; and integration with SIEM/orchestration. Note that organizations will use several of these strategies to reduce their overall response as much as possible.
24/7 Human Coverage
One labor-intensive and relatively low-tech solution is to ensure that there is always a human waiting for an alert, so that data can be collected ASAP.
For this strategy to work, an organization needs to implement a sustainable system of continuous human alert review and collection launch. This may work via several offices in different time zones working in tandem, or a single-office operation supported by round-the-clock shifts.
For companies whose DFIR teams depend on nonautomated ad hoc tools, this approach is the only way to get quick access to data at 3 a.m. It ensures that a human responder is always on call to execute the right commands for a given incident.
This strategy, however, is not without drawbacks:
- First and foremost, staffing a desk 24/7 is expensive, and not all organizations can afford the investment
- Second, incident responders have limited bandwidth, and if they’re busy, alerts can easily pile up.
A more infrastructure-intensive solution is to continuously collect data from endpoints and servers and save them. Windows or Linux servers, for example, could be configured to continuously send log data to a central server.
For this strategy to work, each endpoint or server typically needs some form of agent that monitors the system, such as endpoint detection and response (EDR) agents. One or more servers are then needed to store the data from all of the endpoints.
This approach circumvents an explicit collection step and ensures that incident data is always quickly available to responders. Because the data is continuously sent to and saved in another location, attackers will have a harder time deleting evidence to cover their tracks.
Now for the downsides:
- This strategy requires significant infrastructure to provide ample room for data storage. In addition, significant levels of monitoring are needed to record all the data necessary for a thorough investigation
- Laptops and mobile devices may not have continuous access to an on-premise server when they are not in the office. In that case, the data may not be available.
Integration with SIEM/Orchestration
An automated solution uses a security information and event management (SIEM) or orchestration platform to start collecting data as soon as an alert is generated.
For this strategy to work, a company needs a SIEM that supports automation or an orchestration tool and endpoint forensics tools that either run on the endpoint as agents or that can be deployed as needed (i.e., agentless).
For example, an organization could put this strategy into place using Splunk as the SIEM and Cyber Triage as the endpoint forensics tool. Splunk’s platform would generate the alerts and trigger Cyber Triage to collect data from the endpoint in question.
A more advanced configuration is to have the SIEM start a playbook in an orchestration tool, such as Demisto, which takes several steps to enrich the alert and collect more data. One step in that process could be to collect endpoint data so that it is available when the analyst is available.
This approach is popular for a couple of reasons.
- Many organizations use SIEM platforms that support this kind of integration, making it easy to deploy this strategy
- It solves the problem by making the existing analysts more efficient, instead of requiring more analysts. Properly planned playbooks allow the tool to make basic decisions and have the data available for an analyst to make more complex decisions.
- Configuring the necessary automation rules can be a bit tricky, as it isn’t always clear which sort of alerts should trigger an endpoint forensics tool to begin collection
- It also may not reveal as much data as continuous monitoring, if a lot of malicious activity occurred before the alert was generated.
If you‘d like to try this approach, we recommend testing Cyber Triage. It’s DFIR automation software that speeds up the entire IR process, from detection to remediation, and integrates with many platforms.
Click here to request a free trial or evaluation.
This post reviewed the best strategies for reducing the time spent from alert to collection during incident response.
Our next post focuses on improving the speed of the next phase: collection.