What is data exfiltration and how can it be traced?

What is data exfiltration and how can it be traced?

Attackers target computers for any number of purposes including disrupting workflow, shutting down essential systems, blocking communication and extracting data. After discovering the attack, the victims can use careful digital forensic techniques to restore order. This can include fixing compromised systems, restarting communications links and enabling work to begin again.

Another important role for computer security professionals that isn’t as time-critical is determining the size and scope of the attack by establishing just how data was stolen and where it went. Some businesses have a responsibility to customers and employees to inform them about data leaks and an understanding of the size of the attack can guide this.

Good teams devoted to digital forensics incident response (DFIR) can produce a useful estimate of the damage. They can review internal computer files like network logs that reveal what just happened. The details can be assembled into a timeline that traces what the attackers tried to do and what they actually accomplished.

There are limits. The details assembled from forensics analysis is often far from a complete record of everything that happened. DFIR teams must make educated inferences about the entries in the log files. The results will not be as comprehensive as a perfect record, but they can still guide the enterprise leadership to make the best decisions.

What types of log files are useful?

The DFIR teams will want to examine the log files from the affected systems. These can reveal clues that establish where data went.

Some of the most useful types are:

  • Network traffic logs: These record the amount of traffic, the time and the IP address of the destination. This is often the best source for understanding where the data went, at least on its first stop.
  • Access logs: When a user logs on to a system, the name, password and time are recorded. These can establish who is responsible.
  • Event logs: These general files track many different types of events like opening a new file, saving a new revision, or accessing data.
  • Firewall logs: The firewalls that protect networks often keep logs of failed attempts. A pattern can reveal how the attacker was probing the network looking for weaknesses.
  • File system metadata: These bits track the last time a file was changed. This won’t be helpful if the file was merely copied, but sometimes files are altered when being accessed. This is most common if the permission bits are adjusted to make the file readable. The metadata can also be useful if the intruder created new files or installed software that would facilitate the move.
  • Registry entries: The Windows operating system will record some file alterations in the registry. This can establish a time of the intrusion and which files may have been changed.
  • Memory dumps: If the system is still running when the intrusion is detected, the forensic team can take a complete memory dump. This may include some useful details leftover in the memory by the attackers. This is best done as soon as possible before the RAM is recycled and assigned to another process.

What are the most valuable parts of logs?

Log files traditionally contain one row of data for each event made up of a number of fields or parts. Not every log file contains each of these parts, but most of them do.

  • Timestamp: The date and time that the event happened can be plotted on a timeline, one of the most valuable for understanding the scope of the attack.
  • User Information: Some events are connected to particular users. Identifying which accounts were compromised will also help limit the scope of the attack and provide a tighter estimate of what the attackers may have accessed.
  • IP Addresses: These show where the attack commands originated and where any data may have been sent. These may only reveal the first destination when attackers are using proxy accounts.
  • Actions: This is the system command. It could be the start of a file copying process or it could be something seemingly innocuous like browsing. When the actions for the attacker are isolated, they will indicate what the attacker is trying to do. Sometimes seemingly innocent actions can help explain what the attacker wants.
  • Response Codes: Not all commands are successful. The event logs can reveal how long an action took to complete and whether it finished successfully.

The full timeline for the log events can explain just what the attacker was looking to find. If the data was exfiltrated, it will show the IP addresses where it went. The sequence of events and the time it took to complete them will also show just how much succeeded and how much data was exposed.

Are there useful tools that can be installed?

Proactive security teams may also install specialized tools that can help establish the nature and extent of any data loss. Not every situation requires the extra data that they collect, but when it’s available it can dramatically increase the precision of any estimate of data loss.

Network intrusion detection tools silently follow all network traffic, looking for anomalies or unusual patterns. They create logs that record all of this behavior and they may generate special alerts (emails, pages, etc) that can alert the team to act immediately.

These are usually installed as standalone devices that live independently from the regular processes. They are usually hardened to resist attack so their data is much less likely to be compromised by the attacker. This allows them to offer a neutral and more trustworthy record of what transpired.

Some teams also include honeypots which are systems designed to be the easiest to find and exploit in the network. They often include bait in the form of data with little or no practical value, perhaps putting in old documents from a canceled product. Some may even create fake documents with particularly attractive details. The honeypots also include extra instrumentation to capture full logs with extensive records of what the attacker might be doing. If no legitimate user even logs into the machine, these logs provide a clean vision of what the attackers may be looking for.

Can the malware reveal a destination?

When the incident response team is able to determine which tools carried out the data exfiltration, they may often be able to uncover the ultimate destination. Many packages have standard attack patterns and the destination is often the same.

This approach of analyzing the attacking malware works best with the kind of broad, unfocused attacks that are common. Many packages do not target any particular company or person. They try to exploit several known vulnerabilities and then they probe machines constantly looking for these weaknesses. When they find a possible target, they strike.

These packages operate with a known playbook that usually can be found by reverse engineering the malware. Some companies already specialize in compiling known lists of viruses, bots, and other dangerous types of malware. If the attacking software can be identified and it’s already been studied, it can be possible to identify the destination for the stolen data.

There are limitations to this approach. Not all malware is studied. Some dedicated attackers reprogram their malware or use a shifting destination that’s adaptive. Others write completely new software for some dedicated attacks and this may require analysis and study de novo.

What are the main steps to take in tracing exfiltrated data?

Data Collection: When the attack is first discovered, the incident response team should create copies of all possible data like log files. Memory dumps of the RAM in affected machines can be the most valuable, but only if it’s captured as soon as possible before other work erases the data.

Timeline Analysis: The time and date information for all events can construct a timeline that will help understand the steps the attackers took and the order in which they executed them.

Network Analysis: The network logs can be the most valuable for understanding how much data was removed and where it went.

Data Recovery: In some attacks, the data is destroyed after it’s copied. If good backups are available, the incident response team can restore the data. This is also useful for understanding what information was in the data packets that were exfiltrated.

System Analysis: Some attackers will alter the system behavior by changing parameters or installing their own software which can speed up future attacks. The incident response team should return the system to a secure state by restoring the correct settings and destroying any software for future attacks.

Malware Analysis: The types of any tools used during the attack will also reveal just what type of data might have been extracted. Some tools, for instance, are only designed to encrypt data in place in order to secure a ransom payment. Others copy data to hard coded IP addresses which can reveal the criminal gang.

Attribution and Legal Action: A solid record of what happened and when will make it possible for any legal team to take over the case and begin legal action.

What are the key takeaways for leadership after an event?

Follow the prescribed plan for data collection and analysis.

Understand that careful analysis can reveal details about what data was taken and where it went. Some details, though, will remain a mystery.

Good preparations can make a big difference. Comprehensive log files can help assemble a good timeline of the events in the attack. Tools that monitor and track misbehavior can provide essential details.

Establish good logging routines before attackers arrive to help mitigate problems after.