Collections: Top 5 Questions

By David Rostov and Debora Motyka Jones

There is a fair amount of confusion regarding collections of client data.  To help guide your approach to collections, we have provided an overview of the top five questions and their answers.

 

How should collections be performed?

·        All data collections should be performed in a forensically sound manner. This means that the collection should done using sound, defensible manner using industry accepted tools and procedures. The collection should produce an accurate representation of the source evidence.

 

Targeted Collection versus Full Forensic Collection?

 

·        A targeted collection includes only active files deemed relevant to the case (e.g. emails and Microsoft office documents). 

    • Reduces cost and time due to faster collection time and less data. 
    • It does not preserve deleted data.
      • Some additional spoliation risk.
      • The methodology may be easier to challenge in court.

·        A forensic collection is a bit-for-bit copy of the entire hard drive including all active files, deleted files, file fragments and blank space.

    • Preserves all data reducing the risk of spoliation.
    • Has greater legal defensibility.
    • However, it is more expensive.

If files are deleted, what can be recovered?

·        When the content of the file remains on the drive AND

o       Files are in the Windows Recycle Bin;

o       Files have not been overwritten by a new file;

o       Files are partially overwritten.

·        When the content is in a PST AND

o       The damaged/corrupted files, “Deleted Items” files and partially overwritten files are identified and recovered into a new PST file.

 

If files are deleted, what cannot be recovered?

·        Files that were completely overwritten with new files.

·        Drives that were “wiped” using wiping software.

·        Drives that were physically damaged and cannot be repaired
(even in a lab environment).

 

What are the leading industry software tools for collection?

·        EnCase and FTK Imager for forensic collections.

o       This is the “gold” standard used by law enforcement as well. 

·        Paraben’s Device Seizure for cell phone collections.

·        Microsoft ExMerge for Exchange server collection.

·        Microsoft Robocopy often used for Targeted Collections.

·        Microsoft NTBackup for backup files (.bkf).

·        Symantec Norton Ghost for backup and recovering files.

Top Five Questions to Ask When Choosing an E-Discovery Vendor

By David Rostov and Debora Motyka Jones

We often get questions from our clients about how best to select an electronic discovery vendor.  Important considerations in this process are what questions to ask, how best to compare vendors and what are the important issues that are typically missed in the selection process.  In particular, our clients often tell us that they sometimes struggle in the vendor selection phase to be able to best assess the quality and capabilities of a vendor.  Given the challenges of choosing the right vendor, we often hear that law firms default to making their decision based almost exclusively on price considerations. 

We put together a short list of key questions that can help in the eDiscovery vendor selection process. 

 

 

 

Top Questions To Ask When Choosing an E-Discovery Vendor

  • Scope of Services

        What services does the vendor offer?

        If case parameters change, will the vendor be able to meet your needs and time frames?

        Are there volume benefits/discounts if you use multiple services (e.g. processing, hosting and production versus just hosting)?

        What services are sub-contracted out and does data ever leave the vendor’s site?

        What size or type of case is too big for the vendor?

        What have been vendor’s toughest cases?

  •        Expertise (Not all vendors are created equal; and it is not all about price)

        What is the vendor’s knowledge level of the technical issues?

        Are the vendor’s employees certified in the tools they use?

        What is the vendor’s level of understanding of the legal process?

        Are there legal professionals on staff?

        How does the vendor’s expertise compare to other vendors?

  •        Quality of Services

        Is this a vendor that you could see yourself establishing a longer term relationship?

        How does the vendor manage ensuring high quality service consistently: accurate and on-time?

        Are errors tracked? What are considered errors? How are errors addressed?

        What do the references say about the vendor?

  •        Customer Service

        What hours does the vendor operate?

        How available are the vendor’s employees during non-business hours?

        How much lead time is needed for processing and production?

        How are cases staffed?

        Who is the primary point of contact? Is it the same throughout the case? 

        What is the nature of the vendor’s project management team and approach?

        How are issues escalated?

  •        Technical Specifications

        Does the vendor use proprietary versus non-proprietary software and what are the benefits/trade-offs?

        If the data is not being processed locally, what is the vendor’s FTP connection speeds and how does this compare with the law firm’s FTP speeds?

        What is the vendor’s policy on backing up data?

        What is the vendor’s policy regarding storing data?

 

 

Why are So Many Email Collections Corrupted

Many email collections are done improperly and produce corrupted files. Unless properly repaired, corrupted email files cannot be processed for litigation. The most common email collection problem is from Microsoft Exchange Server collections (.PST files).  Improperly collected exchange data adds significant time and cost to the eDiscovery process. It also introduces an element of risk in terms of the overall integrity of the evidence.

Microsoft Outlook saves all email files in a .PST file format. Think of the PST as an expanding container file. For most custodians, all of their email resides in a few PST files. 

Often email collections are performed by internal IT personnel. Usually email collections are done using the Microsoft Exchange Mailbox Merge Program (ExMerge.exe). This program enables a network administrator to extract data from mailboxes on an Exchange Server and merge it into the same mailboxes on another computer that is running Exchange Server. The program copies the PST file from the source mailbox server and merges the data into the same PST file on the destination server. The most common practice is to copy the data while the custodian(s) are still logged into the system. This allows the custodian to continue working while the collection is occurring. This is the main cause of the file corruption. The system cannot properly synchronize the various sets of files, in particular slight differences in dates/times, while the custodian’s email account is active. 

 

The good news is that there is a very simple and effective solution to this problem. The solution is to make sure that the custodian is logged out of his/her account during the entire collection process and that the account has been properly synchronized with the server. It is always advisable to verify that the data was successfully collected prior to turning it over it to your eDiscovery vendor or counsel. To verify the collected PST, use the function “Advanced Find” in Outlook. If you do not see any messages in the view pane, this is an indication that the collection was not successful and the data has been corrupted.

 

Paraben has a tool called E-mail Examiner that does a good job of insuring that the email collection is forensically sound. Their product is more expensive than ExMerge and not as widely used. However, it is designed specifically for purposes of litigation and investigations.

 

Repairing Corrupted PSTs

If the collection was not done properly and the data is corrupted, repairing a PST usually involves a number of hours of senior technical time. A rough estimate is that a 10 GB PST will take a few hours to repair. There are two tools that we would recommend for this type of repair. Both tools search all the files in order to locate the corrupt files and then attempt to recover the damaged information.

 

1.      EasyRecovery File Repair. This tool is from Kroll Ontrack. 

2.      Outlook Recovery Tool Box. This is a Microsoft tool that is usually included with Outlook.

 

Unfortunately not all corrupt PSTs can be repaired. If so, you will need to have the data re-collected. Be prepared for an unhappy custodian when you show up to re-collect their data.

Text Messaging and Its Impact on eDiscovery

To-date, most litigation electronic discovery requests are limited to custodian email and loose documents. The requests ignore custodian mobile phone data, in particular stored text messages. The next big eDiscovery collection trend for litigation will likely be the collection of text messages from mobile phones.

Text messaging is still viewed as something that only teenagers really use. However, the usage data on text messaging is quite revealing. Over 70% of Americans ages 25 to 49 use text messaging. The average number of texts sent per day per user in the US is over 10. In 2008, the number of text messages sent surpassed mobile phone calls. And text messaging is growing at 100 to 200% per year.

 

To put texting in its proper context, it is estimated that Americans send about 30 emails per day (the data on this is not very precise). This means that texting accounts for ¼ of the daily electronic correspondence sent in the US.

 

The first step in any forensics investigation is identifying sources of evidence.  Mobile phones store evidence in a variety of locations and media formats. Similar to desktop computers, most cell phones have an internal memory and a removable storage media (SD Cards).  Depending on the carrier, an internal SIM (Security Identity Module) card stores pertinent information, such as phone numbers, contacts, and unique subscriber registration data.

 

As with computer collections, mobile device collections should be done in a forensically sound manner. This means that the data collected must be collected without changing the original device content. A forensic hash should be performed on the collected data to insure that no subsequent changes are made to the data. Keep in mind that the data on mobile devices is constantly changing (e.g. clock time, network data, etc.) so it is important to make an exact replica as quickly as possible.

 

The main challenge with mobile collections is that most cellular phones use a proprietary operating system. This is compounded by the fact that new mobile devices are constantly being introduced into the market making it a challenge to stay current on the collections tools. Often the hardest part in the collection is just having the right phone adapter on hand to be able to do the data transfer from the phone to the acquiring computer.

 

After making a copy of the phone data, the next step is to analyze the data. The forensic tools available for analysis and processing are still in their early stage of development. However, there are a number of forensic tools available such as Paraben’s Device Seizure Toolkit and Guidance Software’s Neutrino.  Paraben’s Device Seizure is probably the most common tool used both by law enforcement as well as for commercial litigation.  These tools are very similar to traditional forensics software utilities and offer many of the same capabilities and functionally, such as text viewing and keyword. During the analysis phase text messages, e-mails and contacts can be identified, undeleted (if necessary), searched, and exported for review or further processing. If you are interested in more information on mobile collections, The National Institute of Standards and Technology (NIST) has a good overview.

 

 

Hard Drive Collections Made Easy

For most civil litigation cases, data collection of electronic evidence is really very straightforward. The main decision that the legal team needs to make prior to starting the collection is whether to do a targeted collection or a full image of the hard drive(s). I have summarized below the differences between these two choices. I also try to simplify and demystify the term forensic collection.  

Targeted Collection

A targeted collection can also be referred to as an active file collection. For most litigation, a targeted collection is usually the best approach. A targeted collection consists of the collection of certain specific files (active files) from the custodian(s) as deemed relevant to the case. A targeted collection usually includes all of the documents or electronic communications created by the custodian in Word, PDF, Excel and PowerPoint as well as e-mails.  This type of collection does not usually include system files (programs used to run the computer such as Windows and Office). System files account for a significant amount of the hard drive space (around 10 gigabytes on a new computer). This alone eliminates the collection of a lot of data at the front end of the process. 

 

Benefits of a Targeted Collection

  • Time savings. A targeted collection of a hard drive is usually 50% faster than a full image; two hours versus four hours for an 80 gigabyte hard drive.
    • The amount of downtime and disruption to the custodian is minimized.
  • Cost savings. The cost to do a targeted collection is less than a full image due to the faster time to complete it. More importantly, the amount of data processed and eventually reviewed is smaller which is the real cost savings.
  • A proper collection will preserve all of the targeted meta-data 

Limitations of a Targeted Collection

  • It does not capture and/or preserve the deleted files or file fragments.
  • Spoliation of deleted data. If after the collection, it becomes necessary to review deleted data, it might not be available.
  • The collection methodology might be easier to challenge in court.

Full Image

A full image of a hard drive is a bit-for-bit copy of the hard drive(s). It includes all active files, deleted files, file fragments and blank space. 

 

Benefits of a Full Image

  • Preserves all data on the hard drive including deleted files.
    • Deleted files could be relevant to certain cases for example cases that involve some type of malfeasance allegations.
  • Verified procedure and greater legal defensibility
    • Easier to defend methodology in court
  • Reduces the risk of spoliation
  • Preserves all of the meta-data

Limitations of a Full Image

  • It takes longer to collect the data. A full image of a hard drive takes twice the time as a targeted collection: 4 hours versus 2 hours for an 80 GB hard drive.
    • Greater inconvenience to the custodian.
  • Higher cost to image the drive due to the longer time factor. More importantly, there will be additional data to process and review which will increase the cost further.
  • Deleted data that has not been overwritten is fairly readily accessible. It is harder to argue that the data is inaccessible and therefore cost prohibitive to produce.
    • You might have to produce more data than otherwise intended and thus increase the overall cost of review, production, etc.

Forensic Collection

In the legal community there is confusion surrounding what is means to do a forensic collection. In the IT world, a forensic copy of a drive refers to a full copy of the drive. In the legal world, the key is not whether it’s a full copy but instead is the process defensible. 

 

A good definition of a forensically-sound collection comes from Chris Ball (forensically sound):

“A ‘forensically-sound’ duplicate of a drive is, first and foremost, one created by a method which does not, in any way, alter any data on the drive being duplicated. Second, a forensically-sound duplicate must contain a copy of every bit, byte and sector of the source drive, including unallocated ‘empty’ space and slack space, precisely as such data appears on the source drive relative to the other data on the drive. Finally, a forensically-sound duplicate will not contain any data (except known filler characters) other than which was copied from the source drive.” 

 

The important point is that the standard definition of a forensically-sound collection stresses the importance of a bit-by-bit copy of the source hard drive. It is a full copy of the entire drive including the deleted space and unused space (often referred to as empty or slack space).  I think that all collections should be done in a sound, defensible manner using industry accepted tools and procedures. I prefer to use Mike Murr’s definition of forensically-sound as a collections process that leads to “an accurate representation of the source evidence” (forensic blog). In other words, all collections need to be legally defensible and need to preserve the data in its original form. However, the legal team can make a choice between making a targeted collection or a full image. In either case, it must be done in a forensically sound manner.