Data Carving

Data Carving, What is it?

When a digital forensics investigator begins to analyze a new system, the investigator tries to break the raw data into files. In many cases, the boundaries between the files are clearly defined because that part of the storage system was copied correctly and completely. The researcher can trust that those files were intact and uncorrupted.

Not all data, though, comes with well-defined boundaries for the files. Investigators often need to work with scraps of data and try to reassemble as much of an original file as possible. Sometimes this is because the data was corrupted during the investigation and sometimes it’s because the file was previously deleted and the investigator is using disk sectors that had not been overwritten.

Data carving is the term that’s applied to a number of techniques for piecing together the various parts of the file. Sometimes the boundary lines must be found and sometimes the order of the scraps must be inferred from the data that is present. Sometimes the final result has gaps, often large, and sometimes the DFIR investigator is able to reassemble a complete file.

Sometimes data carving techniques can be used to find subsections or parts of files that are nominally complete. Some software programs store large amounts of data in single files and use a knowledge of the storage format or schema to locate the parts they need. Data carving can also mimic this process to extract essential information for inside large data files.

What are the different types of data carving techniques?

Data carving relies heavily on the understanding of the natural structure of the files. The simplest to understand may be data carving for text files. If one scrap ends with the text “San Francisco’s most famous bridge is” and another scrap begins with “the Golden Gate Bridge”, there’s a good chance that the two scraps belong together. Data carving uses the structure to match the two like puzzle pieces.

Some tools can automate some of the process of aligning the scraps and finding the best ordering. They rely upon either statistical models of the underlying natural language or more structural knowledge of the algorithms that created the files. For instance, some files begin with a “magic number”, a value that was chosen to definitively identify the type of file. While it could occur randomly in another file, the odds are long. PNG files, for example, begin with the bytes 89 50 4E 47 0D 0A 1A 0A, which corresponds to the hexadecimal equivalent of the text string “.PNG”.

Some file formats like JSON or XML come with a strongly defined structure that makes it easy to see whether file scraps belong next to each other. If the original file was correctly formatted– and it usually was – then the rules for parsing can usually tell whether the scraps align.

When is data carving typically used in DFIR?

Data carving is typically used when the simplest file recovery methods are not successful. This can be the case when files have been deleted, overwritten, or corrupted. Data carving can also be used to recover files from unallocated space or from devices that have been formatted.

Digital forensic investigators will try first to create copies of the original files through traditional imaging techniques. These can provide the most legitimate insights into the files as they are currently being used on the computer.

When the investigators need to see what was done in the past, they can turn to data carving. Some attackers, for example, may try to delete the tools they use in the attack after they’re done. Data carving can recover them and help the investigators understand what the attacker did.

Data carving is also useful for recovering files that were purposely deleted by the user. Unless the user deploys secure deletion software that deliberately overwrites the blocks of data with random numbers, data carving can reassemble recently deleted files.

What are the benefits of using data carving in DFIR?

Data carving is an essential part of any investigation that wants to do more than just examine the most obvious files. Sometimes files are deleted either through regular use or out of a desire to hide their existence. The algorithms used in file carving can recover both of these classes.

The technique is especially useful in investigations where the user is hiding their tracks. Some viruses or malware tools will erase parts or all of their files in the hopes of disappearing. Even when data carving can’t recover the entire file, it can deliver enough information to help the DFIR team find the backdoor or weak spot and plug it to prevent future attacks.

In many cases, the files that were meant to be secret are also the most valuable ones recovered in an investigation. While data carving can be laborious, the potential value is great enough that many DFIR teams work hard to make it part of their core competency.

What are the challenges of using data carving in DFIR?

Data carving can be a time-consuming and labor-intensive process. It’s not much different from putting together a digital puzzle. While some of the parts may line up easily, some may not have enough information inside of them to come to a solid conclusion. In some cases, the investigators must use educated guesses that demand much time and analysis.

One of the biggest challenges is analyzing some of the most regular or less structured files. While some files come with well-understood file signatures, magic numbers or complex structures, others offer only repetitive and common fragments. In many cases, there are many possible ways the fragments can be arranged.

If the data has been overwritten or corrupted, it may not be possible to recover it all. File systems regularly reuse blocks and after some time the computer may simply write new data over the old making it impossible to recover some or all of a deleted file.

What are some common tools used for data carving?

The first job for a DFIR is to collect the most relevant information from the storage system as possible. Tools like Autopsy, CyberTriage, KAPE, FTK Imager and several others are good beginnings. They can capture

Analysis begins after the data is collected. Some popular tools include Foremost, PhotoRec, and Scalpel. These have preset definitions of popular file types and signatures and they look through data blocks trying to identify them. Some tools like PhotoRec specialize in common types like photographs.

In some cases, investigators will want the ability to specify new types of file signatures. If they’re investigating a new type of breach or they’re particularly interested in some types of data that may only be found locally, they can use this feature to look for particular types.

How can data carving be used to investigate a specific incident?

Data carving can be used to investigate a variety of different incidents, such as data breaches, malware attacks, and insider threats. For example, data carving can be used to recover deleted files that contain sensitive information, such as customer data or intellectual property. Data carving can also be used to recover malware files that have been hidden on a device.

What are some best practices for using data carving in DFIR?

There are several key challenges that any DFIR team must keep in mind. The first is to do the best job to preserve the accuracy of the data. While not every investigation will rise to the level of supporting a legal proceeding, DFIR teams must still aim to maintain a good chain of custody for any evidence. They must ensure that the data doesn’t become corrupted during the analysis.

Another key challenge is being ready to adjust their target and adapt to the particular nature of an incident. A case involving illicit files on a workplace machine will require different file signatures and processing than a case involving a malware backdoor. Different types of investigations require different types of targets.

What are some key takeaways for C-Suite and Team Leadership?

  • Data carving can assemble lost or deliberately destroyed files from the unallocated blocks of the storage system.
  • Sometimes it can only construct partial versions from the scraps it discovers and sometimes it can’t find anything conclusive.
  • Data carving relies upon the internal structure of the file to identify blocks that should be part of the same file. Some files have a rich structure that makes it easy to figure out how they can be pieced together. Others have so little structure that it’s impossible to conclude how they belong together.
  • DFIR teams should be mindful of keeping a good chain of custody to ensure that the conclusions can be trusted.