Using KAPE in Forensic Analysis

Before an investigation with forensic analysis with tools like Cyber Triage or Autopsy can begin, the data must be collected. One of the most powerful and efficient methods for unpacking and analyzing the data stored in a computer is the Kroll Artifact Parser and Extractor or “KAPE”. The tool, built by Eric Zimmerman, a senior director at Kroll, focuses its efforts on the most revealing corners of the hard disk, often delivering answers much faster than other tools that rely upon full-disk analysis.

KAPE is designed to analyze the storage and working hardware of computers running Windows, MacOS or many versions of Linux. It can run from a thumb drive and also access virtual machines.

The architecture is designed to be simple and easily extensible. The core version comes with over 60 different predefined targets and 90 different modules, each focusing on collecting data or artifacts from a different type of malware. New modules or targets are appearing constantly and some companies or industries will build customized versions.

KAPE can also work with other forensic tools by storing these artifacts after collection. Tools like Cyber Triage can open the artifacts after they’re collected and perform a deeper analysis.

The ability to focus directly on just the key targets or potential attacks is one of the most desired features. DFIR teams that know what they might be facing are able to dig deeply looking for evidence from a particular type of attack without spending extra time analyzing an entire disk by creating a complete image. The process saves storage and time.

What are the types of tools for data capture used in Incident Response?

There are four main strategies for data capture:

  • Full disk image – This creates the most complete record for later analysis at the cost of both time and storage. When the images are big, as they are with some computers, they can be very unwieldy and require special handling for preserving their value as evidence.
  • Target file collection – This focused on a fixed section of the storage, usually defined to be the areas where malware are particularly common. The advantages are speed and smaller image files at the cost of missing evidence that might not be in the expected locations.
  • Hash scanning – This computes a hash value for each file and then looks in a centralized database for a match. This allows identifying potentially dangerous files without examining them directly. If the file is a new example of malware or illicit material, though, its hash value won’t be found in the database and it will be ignored.
  • Smart collection – The algorithm focuses on areas of the storage where filter functions indicate the files may be suspicious. These areas are captured at a much higher level. The advantages are that this will save time and storage space but only if the filter functions are accurate.

A forensics response team has a number of different choices for tools that range in thoroughness and completion. One of the simplest but most complete solutions is to create a complete image of all storage and memory in all affected hardware. This creates a complete record that may be useful in the future if investigators need it, but it requires a great deal of storage and time to construct such an image.

The opposite solution is a focused tool like KAPE. They analyze the hardware directly, looking only for specific types of evidence, often called “targets”. They may dive directly into the different registries like HKEY_LOCAL_MACHINE or HKEY_USERS looking for the footprints of particular viruses or hidden background processes. The data that matches this narrow target can be copied and saved for later analysis in a very efficient file format.

A third option is to break the data into blocks or files and for each of them collect a hash value, message digest or MAC (message authentication code). These functions are designed to reduce a large block of bytes into a single, relatively short value that’s easy to compare. The forensic tools don’t store copies of the data, they just store the values and compare them to centralized lists of known malware.

There are also hybrid approaches that use a mixture of the three solutions. They may only create a complete image of particular file directories that are most likely to be useful like the registry while avoiding sections that might not be important like, say, the partially completed saved game files. This avoids the time and money spent on recording data with a low probability of being useful.

The way that the decisions are made about what to save is an area of active research and often one that distinguishes the best tools. Some use basic filtering functions that simply use well-understood details about where malware may be concentrated. Others use “smarter” filtering before adding an artifact to the archive. For instance, a smarter tool may increase archiving after it detects that something is wrong, corrupted, or listed in some database of known malware. When a smart filter is triggered, it can increase its attention to a section of the file system by archiving many more files or even everything it sees. This approach can save time and disk space by skipping sections that don’t offer any indication of problems while focusing on the areas that trigger analysis.

How can KAPE work with Cyber Triage?

Cyber Triage can work with data collected by a variety of different tools that serve as a front-line for the analysis. DFIR teams can use KAPE to quickly capture the most important data and then store it for later analysis with Cyber Triage.

KAPE saves artifacts in the KAEF (KAPE Artifact Exchange Format) that includes metadata like the time the data was collected as well as a JSON directory of the different artifacts.

Cyber Triage can analyze a wide variety of artifacts in the KAPE output, including:

  • Windows event logs
  • Registry hives
  • File system artifacts
  • Network artifacts
  • Process artifacts
  • User artifacts

After analysis, Cyber Triage can compute a numerical score of each artifact to identify the most likely evidence of a compromise. Investigators can focus their attention on the most important artifacts.

Why Is KAPE Best Used with Other Tools?

KAPE is an efficient and effective tool for gathering data from computers and it is also a good way for first responders to start their analysis. More detailed analysis, though, is best done with other tools that are able to parse and deconstruct the various artifacts. In other words, KAPE is a good way to start an investigation, but many teams will use other tools to go deeper.

Cyber Triage, for example, is a sophisticated response tool that can organize any response and supply an organizational foundation for a team. There are collaborative options that let the team share their insights and split up their workload. It offers a scoring mechanism that will prioritize the most damaging malware and help the team focus on the most important details.

Some DFIR teams may want to use specialized tools that can disassemble viruses or other malware to see their structure and capabilities. In many cases, the malware is similar to older examples and the attackers have just added small changes to adapt.

After KAPE assembles a collection of artifacts, good DFIR teams will turn to these specialized tools for deeper analysis.

Why would you choose a Cyber Triage collection over KAPE?

Using Cyber Triage to collect the data is another option. While some DFIR teams just use Cyber Triage to analyze artifacts after KAPE collects them, some use Cyber Triage to directly capture the artifacts because it offers some advantages. In general, Cyber Triage can do even more focused and thus faster capture. This saves time and disk space.

Cyber Triage is actively updated with threat intelligence from leading DFIR researchers, allowing local teams to benefit from the research and experience of others. This centralized repository adjusts the methodology of the Cyber Triage tool to reflect the latest reports of malware.

Cyber Triage’s intelligent scanning can also reduce the need to re-collect data, an important detail when preparing evidence for legal proceedings. The algorithm adjusts to the latest intelligence making it more likely to capture the relevant data. It uses this understanding to strategically switch between collecting hashes of files to capturing full images when necessary balancing the need for complete information with the need for speed and storage management.

What are some of the most useful targets of KAPE?

Many of the most valuable targets for KAPE’s data collection are the main components of the operating system. KAPE’s low-level analysis captures the key details from the areas that malware often infects. These include:

  • Registry hives: The registry hives in Windows machines track the key low-level details about the software configuration of the system.
  • File system artifacts: Some directories or folders in the file storage contain key artifacts, including recent files, jump lists, and event logs.
  • Network artifacts: The details from where and how the computer contacted others on the Internet can reveal the ultimate destination for any exfiltrated data. The key parts can include DNS logs, web browser history, and email delivery logs.
  • Process artifacts: When software runs, the system can collect process artifacts that can reveal the work of malware, including process lists, memory dumps, and network connections.
    User artifacts: The behavior of the various users is often recorded in personalized logs like user profiles, browser history, and email.

These are just a few examples of the many targets that are available for KAPE. The specific targets that are used in a forensic investigation will vary depending on the specific needs of the investigation.

Some of the targets focus more on individual applications and the records that they keep. Some of the most important are:

  • Web browsers: Artifacts from web browsers, such as cookies, history, and cache reveal just how the user was interacting with many of the different web sites, both locally and across the greater Internet.
  • Email: Artifacts from email clients, such as messages, attachments, and contacts can reveal communications patterns and even obvious data exfiltration.
  • Password managers: Watching password managers can reveal useful personal information such as passwords, usernames, and notes.
  • P2P clients: This target collects artifacts from P2P clients such as file transfers and chat logs.

The targets that are used by KAPE can be customized to meet the specific needs of the investigation. For example, you could create a target that collects all of the files that have been modified in the past 30 days, or a target that collects all of the files that have been accessed by a specific user.

What are some of the most useful modules for KAPE?

The modules for KAPE are designed to handle some of the more general backend chores like storing artifacts or doing some preliminary analysis. They can collect raw bytes and then produce reports or store data in files.

Modules can take many forms. Many modules are just thin layers of glue code that connect other third-party applications with KAPE. Other modules are standalone and contain all of the code necessary to accomplish their job. Some are focused on particular operating systems and others work with general files. The specification allows Turing-complete tools so it is necessarily very general.

Some of the most useful modules include:

  • Autoruns: This analyzes the installed software that’s been configured to run immediately at startup. This can be a common place where attackers install eavesdropping software or other malware.
  • EventLogs: The event logs are one of the most general and powerful options for constructing a timeline of what happened during an event. This module parses them and begins capture.
  • Registry: The configuration settings for the operating system and installed programs are stored in the registry. This module reads the file format and begins the search for particular files from known malware.
  • Processes: This list of running processes can include viruses or installed malware. This module studies the list.
  • Network: Malicious software often communicates with the outside network and this module watches the network for dangerous behavior that may indicate an attack.
  • VolumeShadowCopy: This creates a copy of some or all of a storage volume.

What are the Key Takeaways for CIOs, CSO, and Other Team Leaders?

  • KAPE can be one of the most effective tools for a focused response. It balances the need to save money and time with the importance of targeting the investigation.
  • The tool comes with many standard targets and modules that handle tasks. Others are building customized modules for particular operating systems or configurations. There are also some enterprises that are building custom versions for their own software stack.
  • After the data is collected, specialized tools like Cyber Triage or Autopsy can deliver deeper insights.