Not All TIFFs Are Created Equal

Processing of electronic discovery data can lead to interesting surprises in terms of the complexity and/or size of the data. This can sometimes make it challenging to accurately estimate a timeline for a project prior to loading the data and performing some preliminary analysis.

For example, we recently received 20 spreadsheets that needed to be converted into TIFF images and produced to opposing counsel. The client called and asked us for an estimated time to complete the project. Based on the fact that it was only 20 spreadsheets, we estimated that we would have this project completed within a few hours. Assuming 50 pages per spreadsheet, our estimate was that this was going to be about 1,000 TIFF images.

After we received the data, we loaded it in our system and created TIFF images of the spreadsheets. It turned out that the 20 spreadsheets generated close to 100,000 TIFF images or pages (an average of 5,000 pages per spreadsheet). One spreadsheet converted into approximately 20,000 TIFF images. This meant that the data size was almost 100 times bigger than we had expected. As a result, the project took longer than our original estimate. The good news was that most of the spreadsheets actually had a lot of blank pages and other “quirky” formatting issues. In the case of the 20,000 page spreadsheet, we were able to fix the formatting (without, of course, changing any of the original data) which reduced the spreadsheet to a few hundred pages. We were also able to significantly reduce the page size for the other spreadsheets by a similar amount. The additional time that we took to fix the formatting ended up saving counsel countless review hours and cost.

Bottom line, when requesting a firm timeline and cost estimate from an electronic discovery vendor, it is always best to give them the actual data and request that they do a preliminary analysis of the data prior to finalizing an estimate. This will insure a much more realistic estimate.
 

Controlling Rising Litigation Costs

In 2009 you will continue to see very large increases in the size of the discovery data sets. According to a study by McKinsey & Company, the demand for corporate data is growing by 50% per year. According to the ABA, 60 to 90% of the cost of litigation relates to first level review. This means that litigation costs in 2009 will continue to increase. Of course in a global economic recession, the “amount in controversy” will not increase by an average of 50% in 2009. As Magistrate Judge Paul W. Grimm writes in Mancia v. Mayflower Textile Services Co., Civ. No. 1:08-CV-00273-CCB (D. Md. October 15, 2008), “The goal is to attempt to quantify a workable “discovery budget” that is proportional to what is at issue in the case.”

If the size of the data involved in litigation is growing by 50% per year, the only way to effectively control litigation costs is to embrace the use of electronic discovery tools. The good news is that understanding and using the basic electronic discovery tools is relatively straightforward.

I have broken the basic tools into five main categories:

1. Data collection. How the data is collected will have a significant impact on the overall cost of the case.

a. Determine if you need a full forensic image (copy of the full drive; this will pick up deleted files as well as active files); or

b. Targeted File collection (also referred to as an Active File collection). Usually this collection method will avoid collecting system files.

2. Filter the data by custodian and/or date ranges. This means eliminate/suppress the data that is outside of the date range of the case. Also, eliminate/suppress the data for custodians that were inadvertently collected in the data collection but are not part of the case.

3. Cull out the system files. Suppress all system files. This usually accounts for a significant amount of the data collected. The only exception to this rule is if for some reason certain system files relate to the case.

4. De-duplicate the data. This will identify exact duplicates of documents and suppress them.


5. Search the data. Different search techniques are used for different type of cases. However, a key word search is still the most frequently used technique to identify relevant documents.

Here’s a recent example of the power of eDiscovery tools. We recently worked on a case in which we did a full forensic image of over 10 computers and laptops. The total data collected was over 1 terabyte. Using the first four eDiscovery tools listed above – collection, filtering, culling and de-duplication -- we reduced the data set to 150 gigabytes. Working with counsel on key word searching, we were able to reduce the 150 GB to 5 gigabytes. This represents a reduction of over 99% of the data size prior to attorney review. The attorneys reviewed the data and we produced one gigabyte of responsive data or under 50,000 documents.

In summary, embrace eDiscovery tools. Treat them as your NBF (“new best friend”). They will allow you to better control the rising cost of discovery as a result of the fact that corporate data is growing at a 50% annual rate.

 

Hard Drive Collections Made Easy

For most civil litigation cases, data collection of electronic evidence is really very straightforward. The main decision that the legal team needs to make prior to starting the collection is whether to do a targeted collection or a full image of the hard drive(s). I have summarized below the differences between these two choices. I also try to simplify and demystify the term forensic collection.  

Targeted Collection

A targeted collection can also be referred to as an active file collection. For most litigation, a targeted collection is usually the best approach. A targeted collection consists of the collection of certain specific files (active files) from the custodian(s) as deemed relevant to the case. A targeted collection usually includes all of the documents or electronic communications created by the custodian in Word, PDF, Excel and PowerPoint as well as e-mails.  This type of collection does not usually include system files (programs used to run the computer such as Windows and Office). System files account for a significant amount of the hard drive space (around 10 gigabytes on a new computer). This alone eliminates the collection of a lot of data at the front end of the process. 

 

Benefits of a Targeted Collection

  • Time savings. A targeted collection of a hard drive is usually 50% faster than a full image; two hours versus four hours for an 80 gigabyte hard drive.
    • The amount of downtime and disruption to the custodian is minimized.
  • Cost savings. The cost to do a targeted collection is less than a full image due to the faster time to complete it. More importantly, the amount of data processed and eventually reviewed is smaller which is the real cost savings.
  • A proper collection will preserve all of the targeted meta-data 

Limitations of a Targeted Collection

  • It does not capture and/or preserve the deleted files or file fragments.
  • Spoliation of deleted data. If after the collection, it becomes necessary to review deleted data, it might not be available.
  • The collection methodology might be easier to challenge in court.

Full Image

A full image of a hard drive is a bit-for-bit copy of the hard drive(s). It includes all active files, deleted files, file fragments and blank space. 

 

Benefits of a Full Image

  • Preserves all data on the hard drive including deleted files.
    • Deleted files could be relevant to certain cases for example cases that involve some type of malfeasance allegations.
  • Verified procedure and greater legal defensibility
    • Easier to defend methodology in court
  • Reduces the risk of spoliation
  • Preserves all of the meta-data

Limitations of a Full Image

  • It takes longer to collect the data. A full image of a hard drive takes twice the time as a targeted collection: 4 hours versus 2 hours for an 80 GB hard drive.
    • Greater inconvenience to the custodian.
  • Higher cost to image the drive due to the longer time factor. More importantly, there will be additional data to process and review which will increase the cost further.
  • Deleted data that has not been overwritten is fairly readily accessible. It is harder to argue that the data is inaccessible and therefore cost prohibitive to produce.
    • You might have to produce more data than otherwise intended and thus increase the overall cost of review, production, etc.

Forensic Collection

In the legal community there is confusion surrounding what is means to do a forensic collection. In the IT world, a forensic copy of a drive refers to a full copy of the drive. In the legal world, the key is not whether it’s a full copy but instead is the process defensible. 

 

A good definition of a forensically-sound collection comes from Chris Ball (forensically sound):

“A ‘forensically-sound’ duplicate of a drive is, first and foremost, one created by a method which does not, in any way, alter any data on the drive being duplicated. Second, a forensically-sound duplicate must contain a copy of every bit, byte and sector of the source drive, including unallocated ‘empty’ space and slack space, precisely as such data appears on the source drive relative to the other data on the drive. Finally, a forensically-sound duplicate will not contain any data (except known filler characters) other than which was copied from the source drive.” 

 

The important point is that the standard definition of a forensically-sound collection stresses the importance of a bit-by-bit copy of the source hard drive. It is a full copy of the entire drive including the deleted space and unused space (often referred to as empty or slack space).  I think that all collections should be done in a sound, defensible manner using industry accepted tools and procedures. I prefer to use Mike Murr’s definition of forensically-sound as a collections process that leads to “an accurate representation of the source evidence” (forensic blog). In other words, all collections need to be legally defensible and need to preserve the data in its original form. However, the legal team can make a choice between making a targeted collection or a full image. In either case, it must be done in a forensically sound manner.