Hard Drive Collections Made Easy

For most civil litigation cases, data collection of electronic evidence is really very straightforward. The main decision that the legal team needs to make prior to starting the collection is whether to do a targeted collection or a full image of the hard drive(s). I have summarized below the differences between these two choices. I also try to simplify and demystify the term forensic collection.  

Targeted Collection

A targeted collection can also be referred to as an active file collection. For most litigation, a targeted collection is usually the best approach. A targeted collection consists of the collection of certain specific files (active files) from the custodian(s) as deemed relevant to the case. A targeted collection usually includes all of the documents or electronic communications created by the custodian in Word, PDF, Excel and PowerPoint as well as e-mails.  This type of collection does not usually include system files (programs used to run the computer such as Windows and Office). System files account for a significant amount of the hard drive space (around 10 gigabytes on a new computer). This alone eliminates the collection of a lot of data at the front end of the process. 

 

Benefits of a Targeted Collection

  • Time savings. A targeted collection of a hard drive is usually 50% faster than a full image; two hours versus four hours for an 80 gigabyte hard drive.
    • The amount of downtime and disruption to the custodian is minimized.
  • Cost savings. The cost to do a targeted collection is less than a full image due to the faster time to complete it. More importantly, the amount of data processed and eventually reviewed is smaller which is the real cost savings.
  • A proper collection will preserve all of the targeted meta-data 

Limitations of a Targeted Collection

  • It does not capture and/or preserve the deleted files or file fragments.
  • Spoliation of deleted data. If after the collection, it becomes necessary to review deleted data, it might not be available.
  • The collection methodology might be easier to challenge in court.

Full Image

A full image of a hard drive is a bit-for-bit copy of the hard drive(s). It includes all active files, deleted files, file fragments and blank space. 

 

Benefits of a Full Image

  • Preserves all data on the hard drive including deleted files.
    • Deleted files could be relevant to certain cases for example cases that involve some type of malfeasance allegations.
  • Verified procedure and greater legal defensibility
    • Easier to defend methodology in court
  • Reduces the risk of spoliation
  • Preserves all of the meta-data

Limitations of a Full Image

  • It takes longer to collect the data. A full image of a hard drive takes twice the time as a targeted collection: 4 hours versus 2 hours for an 80 GB hard drive.
    • Greater inconvenience to the custodian.
  • Higher cost to image the drive due to the longer time factor. More importantly, there will be additional data to process and review which will increase the cost further.
  • Deleted data that has not been overwritten is fairly readily accessible. It is harder to argue that the data is inaccessible and therefore cost prohibitive to produce.
    • You might have to produce more data than otherwise intended and thus increase the overall cost of review, production, etc.

Forensic Collection

In the legal community there is confusion surrounding what is means to do a forensic collection. In the IT world, a forensic copy of a drive refers to a full copy of the drive. In the legal world, the key is not whether it’s a full copy but instead is the process defensible. 

 

A good definition of a forensically-sound collection comes from Chris Ball (forensically sound):

“A ‘forensically-sound’ duplicate of a drive is, first and foremost, one created by a method which does not, in any way, alter any data on the drive being duplicated. Second, a forensically-sound duplicate must contain a copy of every bit, byte and sector of the source drive, including unallocated ‘empty’ space and slack space, precisely as such data appears on the source drive relative to the other data on the drive. Finally, a forensically-sound duplicate will not contain any data (except known filler characters) other than which was copied from the source drive.” 

 

The important point is that the standard definition of a forensically-sound collection stresses the importance of a bit-by-bit copy of the source hard drive. It is a full copy of the entire drive including the deleted space and unused space (often referred to as empty or slack space).  I think that all collections should be done in a sound, defensible manner using industry accepted tools and procedures. I prefer to use Mike Murr’s definition of forensically-sound as a collections process that leads to “an accurate representation of the source evidence” (forensic blog). In other words, all collections need to be legally defensible and need to preserve the data in its original form. However, the legal team can make a choice between making a targeted collection or a full image. In either case, it must be done in a forensically sound manner.