Data Archeology

, ,

Fast innovation cyclesmake securing a system against all vulnerabilities virtually impossible. If an attack succeeds, taking certain steps can at least uncover the actions of the criminals to preserve evidence or to harden the system against repeat attacks.

To investigate how a postmortem analysis proceeds (see the “IT Forensics” box), we’ll look at the following sample scenario: On his lunch break, an office clerk uses his colleague’s computer, without the consent of his neighbor, to order several books under this neighbor’s Amazon account and at his neighbor’s expense. To conceal his actions, the attacker then shuts down the computer. How could you prove this crime took place?

On the basis of this scenario, researchers mutually define general and scenario-specific requirements for a collection of forensic tools that allow them to compare the relevant toolkits.

The general requirements for a tool collection include search and filter functions for restricting the relevant data. Additionally, the forensics tools need to provide a way to combine data, thereby allowing processes to be reproduced and relationships between different data sources identified. Other requirements relate to input and output logging and the ability to integrate different image file types.

The scenario-specific requirements include support for finding and processing browser artifacts or undesirable changes to the underlying services or their configurations. The browser and its external components could be infected by malware; for example, the attacker could have changed the network configuration so that the user was unintentionally working with a manipulated name server.

Forensic Investigation

In a professional forensic investigation, investigators base their approach on models. In this article, we use the Secure-Analysis-Present model (see the “SAP Model” box).

We examined the victim system to reconstruct the browser session with the forensic tool collections under comparison here. In the Secure Phase, we first backed up the hard disk using the dcfldd tool, a version of the GNU dd tools, extended to include forensic features. Integrity was ensured by a hash function generated by md5deep . (However, assuring integrity was not the focus of the investigation. In production use, an investigator would typically give preference to SHA.)

The analysis was performed on a forensic workstation in a virtualized environment (VMware vSphere ESXi). Only active open source projects were compared that are suitable for postmortem analysis of a Windows 7 system. Furthermore, tool selection was based on the popularity of the tool collections (Table 1). The investigators also considered the extent to which the default installations of tool collections were suitable for the given scenario. For better comparison, no additional packages were installed retroactively. Instead of looking at all the tools in Table 1, we focused on OSForensics, Autopsy, and CAINE, which all stood out positively.


OSForensics is a Windows tool by PassMark Software for live forensics and postmortem analysis. For this comparison, we looked at version 1.2 (build 1003); version 2.0 has been released in the meantime (build 1003). Forensic duplication is implemented as an additional virtual disk in read-only mode.

Data can be filtered by keyword, time stamp, last activities, registry keys, and access credentials stored in the browser. Internally, the tool offers file, hex, string, and text views. However, files can also be viewed using external programs and linked.

After the investigation, a case report can be generated with case/investigator names, exported files, attachments, notes, email, and links with or without JavaScript as a .html file. Also, a template is available for logging the chain of forensic evidence.

OSForensics returned various scenario-specific results: The keyword search for the victim’s email address (Figure 1) led to F:\Users\Sandy\AppData\Local\Microsoft\Windows\TemporaryInternetFiles\Low\Content.IE5\BYO5TPM7\display[1].html .

Figure 1: Results of the search for the victim’s email address (OSForensics).

Opening this link in IE takes the investigator to the Amazon purchase confirmation. The keyword search for the title of the purchased book led to victimsystem:\Users\Sandy\AppData\Local\Microsoft\Windows\TemporaryInternetFiles\Low\Content.IE5\S6OU8I07\view-upsell[1].html , which is the Amazon shopping cart.

The Images category returned results from the temporary Internet files shown in Figure 2, which match the purchased product.

Figure 2: Pictures of the purchased item in the cache (OSForensics).

The registry entry under Software\Microsoft\InternetExplorer\TypedURLs contains the last URL entries from the IE address bar (Figure 3).

Figure 3: Registry entries (OSForensics).

We were able to view login actions without access credentials using the Website Passwords module, because the user did not save the credentials in the browser.

The ability to identify registry files automatically is an asset to the forensic investigation. Nevertheless, expertise is needed, and a manual search for data by the forensic investigator is essential.


Autopsy is a graphical extension of The Sleuth Kit (TSK), which was developed by Brian Carrier for Windows and Linux systems. We investigated Windows version 3.0.3; version 3.0.6 is now available.

Forensic duplication again was performed in read-only mode as an additional (virtual) disk. For case management, you can enter a name, the storage directory, a case number, and the investigator’s name. Data can be managed by keyword search, matching with hash databases, viewing file residues and unallocated files, automatic filtering by file type (images, video, audio, documents), categories (bookmarks, cookies, browsing history, downloads, etc.), most recently used data, and email messages. Files can be opened in external programs or viewed internally in hex, string, result, text, and media views. Another useful function is the ability to link relevant files and add a comment for the link.

Investigators can create a final report as a .html file, an Excel spreadsheet, or a body file with a case summary, image information, and links including file names, file paths, and comments.

Autopsy returned the following scenario-specific results: The IE history was retrieved from F:\Users\Sandy\AppData\Local\Microsoft\Windows\TemporaryInternetFiles\Low\Con-tent.IE5\index.dat and listed (Figure 4).

Figure 4: Excerpt from the history evaluation (Autopsy).

Additionally, three cookies were found for the Amazon domain; we were able to view their contents in text form. After manual inspection of a recovered file, the keyword search for the victim’s email address led to access data entered in a failed login attempt in plain text.

Opening the history file, F:\Users\Sandy\AppData\Local\Microsoft\Windows\TemporaryInternetFiles\Low\Content.IE5\BYO5TPM7\display[1].html , with IE showed a section from the Amazon shopping cart. The file F:\Users\Sandy\AppData\Local\Microsoft\Windows\TemporaryInternetFiles\Low\Content.IE5\BYO5TPM7\continue.html included the delivery address and the last two digits of the victim’s account number; we were unable to open it with IE.

Autopsy also requires the forensic investigator to know where the data sources are located. Moreover, in this case, a manual review of potentially relevant files is essential.

Computer Aided Investigative Environment (CAINE)

The Linux distribution Computer Aided Investigative Environment (CAINE), currently maintained by Nanni Bassetti, provides a collection of software tools for postmortem analysis and live forensics. Many of the tools have a graphical user interface. Version 3.0, which we looked at, has now been superseded by the current 4.0 version. Forensic duplication was implemented here as a virtual read-only disk, and we used the CAINE tools Forensic Registry Editor (FRED), Galleta, Pasco, NBTempo, Autopsy Forensic Browser, and TSK.

FRED is used to open and then search a registry. When we opened the NTUSER.DAT registry file, FRED showed the last three URLs entered in the IE address field in the Software\Microsoft\Internet_Explorer\TypedURLs key, which led to eBay and Amazon, as shown in Figure 5.

Figure 5: Registry entries (FRED).

The Galleta console tool by Mc-Afee is used for processing IE’s cookies. When you run Galleta against a cookie file, the tool creates a spreadsheet, as shown in Figure 6.

Figure 6: Spreadsheets with cookie content (Galleta).

The data is converted into a table, thus improving clarity and usability for the investigator. However, you first need to identify the relevant cookies, which you can often do based on the file name, which contains the domain name of the visited website.

McAfee is also the company behind the Pasco tool; its focus is on processing IE’s Internet activity based on the index.dat file. To do this, Pasco produces a spreadsheet with the contents of the file. Figure 7 shows the URL history from the scenario.

Figure 7: Generating spreadsheets from the contents of index.dat (Pasco).

The Type column distinguishes between URLs and redirects that are marked with REDR . Pasco makes it easier for investigators to reconstruct the browser or cookie history. Furthermore, it lets you sort, filter, and process content as needed.

NBTempo is a Bash script with a GUI by Nanni Bassetti that generates timelines. The investigator selects a forensic duplicate as an image file or disk. Then, after specifying the target directory, the time zone, the time delay of the data on the victim’s system, and the time period under investigation, NBTempo provides rapid results.

Three files are created. The file named data.txt provides an overview of the image directory, the selected time period, the time zone, and the delay. The times.txt file saves the results in a raw format that is readable for many downstream processing tools, and the report.csv spreadsheet represents the timeline in a table with column names that reflect the investigator’s needs (Figure 8).

Figure 8: Excerpt from the report table (NBTempo).

The timeline can then be sorted, filtered, and processed by the investigator. NBTempo helps the forensic investigator reconstruct the computer’s history. This helps determine which files were created or executed parallel to the browser session, which may provide clues to other data sources.

The Autopsy Forensic Browser is a graphical add-on for TSK (as was its successor, Autopsy), but it uses a different graphical interface. Analysis is performed in a browser, and investigators can save the forensic duplicates on a server. This means that analysis can be performed by several investigators using separate computers. When the case is created, a host is defined along with the details of the case name, name of the system under investigation, time zone, and time vector.

The duplicate we examined was created as a linked partition and was protected with an MD5 hash value. The Autopsy Forensic Browser primarily works with filesystem structures. For example, it can access the details of the NTFS Master File Table (MFT), including its clusters.

Data storage device analysis is carried out in five consecutive steps:

  • In the first step, File Analysis accesses the folder structure and searches for file names and deleted files. The search results can be reviewed in ASCII, hex, and ASCII string views. The tool also offers an export function and an annotation function for relevant files. If you add a note, the Autopsy Forensic Browser creates a report with generic information (file name, hash value, creation time stamp, investigator) with metadata (position in the MFT, attributes) and the appropriate content.
  • The next step is the Keyword Search . In this step, the investigator can search for individual and compound keywords, as well as regular expressions with the grep tool. Some restrictions apply, and the search is thus not totally reliable, as stated in Brian Carrier’s overview of “grep Search Limitations.” The search is very time consuming because the image is not indexed. Results are listed by clusters with reference to the source directory. A cluster can be exported as a file and annotated.
  • Next, the investigator is taken to File Type sortings, where the files are output and sorted into the following categories: archives, audio, compression, crypto, data, disk, documents, executable files, images, system, text, unknown, video, and extension mismatches. The results can be saved as a .html file without links.
  • In Meta Data mode, investigators can search for a specific MFT rec-ord or display the allocation list. Each MFT entry specifies which file is associated with it. You can also display the content here, export the file as a cluster, and add notes and reports.
  • Finally, you can search for clusters in Data Unit mode. This step provides the same information and functionality as the previous step. After analysis, the results can be processed to create File Activity Time Lines , which involves generating a timeline structured in months. Previously created notes can be viewed here. This timeline is stored in tabular form on the workstation.

The Event Sequencer lists all events along with the associated annotations. An event can be individually time stamped and the source added.

Autopsy Forensic Browser is primarily used for data analysis. The bulk of the information is only available in plain text without any links or evaluation. It thus offers insufficient support in this scenario. Only validation through hashes, case management, and categorization of file types facilitate the investigator’s task.


As a CAINE tool, TSK restricts itself to command-line tools for the analysis of filesystems, partitions, images, and disks. The partition tools do not support the analysis of Windows systems. Thus, only the filesystem and image tools were considered.

Using the tsk_gettime s and fls-m tools, we created a timeline of the files in raw format as a body file that is equivalent to NBTempo’s times.txt . We then ran the mactime tool to convert this to a clear-cut table with column names, which in turn matched the report.csv from NBTempo.

The fsstat module provided information about the filesystem, in terms of the layout, sizes, and labels. We noticed here that the Windows 7 operating system on the victim’s system was identified as Windows XP. Details of file extensions and image sizes were provided by the imgstat tool. Depending on the image, this tool provides additional information. The sorter tool assigned files to file types as per the File Type sortings in Autopsy Forensic Browser. The default installation of TSK provides little support for analyzing a browser session. Functions such as reporting, keyword search, and registry analysis require a retroactive installation of the TSK Framework.

Although CAINE offers no central case management (meaning that the investigator must enter the case name and investigators after every reboot), you can manually generate a final report via the interface as a .rtf or .html file or a personal report. CAINE differs in this respect from the other extensive tool collections, SIFT and BackTrack, in that the distribution of the individual tools within the interface is structured on forensic process models and therefore requires comparatively less training time.

Other Toolkits

Besides OSForensics, Autopsy, and CAINE, the following toolkits were analyzed in our test:

  • Digital Forensics Framework (DFF) by ArxSys offers a Windows and Linux distribution for the analysis of drives, filesystems, and user and application data. It also provides a search function for metadata, hidden, and deleted data. The analyzed version was the Windows release 1.2.0 with dependencies. The current version is Windows release 1.3.0 with dependencies.
  • The TSK command-line tool collection is developed by Brian Carrier for both Windows and Linux. We investigated Windows version 4.0.1, which has been replaced by the current version, 4.1.0.
  • The Paladin toolkit for Linux by Sumuri is primarily used for creating images. We were unable to create an image with the 3.0 version that we looked at. The current version is 4.0.

We also looked at the two major Linux toolkits, SANS Investigate Forensics Toolkit (SIFT) in the latest version 2.14 and BackTrack in the current version 5.0 R3. BackTrack has now been replaced by Kali Linux and primarily serves to review the overall security of a network. BackTrack and Kali also provide attack, audit, and penetration tools.


The Autopsy tool is the best suited to reconstruct browser-based offenses in our overall assessment. Many of the tools from the toolkits we looked at build on TSK by adding a graphical user interface. In our evaluation of the toolkits shown in Table 2, requirements that were completely fulfilled were marked with a plus (+), partially filled requirements with a circle (o), unfulfilled requirements with a dash (-), and performance not stated with a question mark (?). Some shortcomings are apparent in the testing of configurations and program operations, as well as in the HTTPS/SSL and DNS fields; in fact, none of the toolkits investigated produced actionable results.

The existing toolkits primarily offered functions for data analysis, hash verification of individual data sources, filtering, and searching. In the future, improvements in the sense of combining data from different sources are essential. Ideally, a browser session should be traceable in an overall picture, step by step. Along with an extensive analysis and reconstruction of browser-based offenses, additional data sources (e.g., the network components and servers involved) are analyzed with monitoring tools. See the “Browser-Specific Data Sources” box for more information.

Additionally, performing a live forensics investigation on volatile data (e.g., active network connections) would be useful. For a more detailed insight into the underlying scientific work, a digital version is available online.


[1] Guide to “Computer Forensics” from Germany’s Federal Office for Security in Information Technology, 2011 (in German)

Related content

  • Comparison of forensic toolkits for reconstructing browser sessions
    Criminals often focus on browsers for various attacks because they are a worthwhile, attractive, and often easy target. However, admins can investigate such attacks with forensic tools that provide the ability to reconstruct browser sessions.
  • Cloud Forensics

    Is your data really secure in the cloud? If a compromise occurs, current forensic approaches will not work and new techniques and standards will be needed.

  • Forensic Analysis on Linux

    In computer forensics, memory analysis is becoming increasingly important as a means for investigating security incidents. In this article, we provide an overview of the various memory dumping options on Linux and introduce the support in Linux for the Volatility Analysis Framework.

  • Acquiring a Memory Image
    Be ready before disaster strikes. In this article we describe some tools you should have on hand to obtain a memory image of an infected system.
  • Detecting security threats with Apache Spot
    Security vulnerabilities often remain unknown when the data they reveal is buried in the depths of logfiles. Apache Spot uses big data and machine learning technologies to sniff out known and unknown IT security threats.
comments powered by Disqus