Relativity Pivot as a Review Quality Control Measure

By Debora Motyka Jones

Following up on my last few blogs, you can also create various charts in Pivot to show you what your reviewers are doing, including how they are tagging documents, and quickly identify any possible quality issues. As you can see below, you can use Pivot’s bar graph capabilities to identify that one reviewer is disproportionately tagging documents as “Not Responsive.” You can link to the documents he or she has tagged as non-responsive to take a closer look and find out whether it is a quality issue, or merely that they have received several batches of irrelevant documents.

The Power of Relativity Pivot on Case Strategy

By Debora Motyka Jones

I can just go on and on about Relativity’s Pivot feature. I have already bragged about its early case assessment potential (see earlier blog posting) but there is more…it can also help you get an overview of the issues in your case. For example, you can run a pie chart of the “issues” coding by custodian. As you can see in the example below, taken from a sales demo database, most of the documents related to the “Shady Investments” issue involve a custodian named Paul Allen, and to a lesser extent, a custodian named Eric Bass. Using this information, the lawyer on this case can now focus on these two custodians for this particular issue and exclude other individuals, such as Larry May or Thomas Martin, who have no documents related to Shady Investments. By running this analysis for each of the issues in your case, you are able to target further discovery and your depositions in your case.

 

The Power of Relativity Pivot in Early Case Assessment

By Debora Motyka Jones

When kCura released Relativity Version Six earlier this month, I was skeptical of its improvements. Many times, new software releases include minor upgrades or backend tweaks that do not affect my day-to-day life. Boy did kCura prove me wrong on this one. Though a few of the upgrades were the usual “boring” tweaks, the addition of Pivot was enough to blow me away. Pivot is a new analysis tool that allows users to get an overview of their case in a visual format. Not only does it step up Relativity’s visual appeal, but it is also useful for gaining insight into your case. With Pivot, you can now evaluate your case in an entirely new way. For example, you can analyze email domains, see the intersection between keywords by custodian and create timelines. This analysis is presented visually in lists, pie charts, line graphs or bar graphs. These graphs can be used for early case assessment, case strategy and review quality control. This is a huge improvement to Relativity allowing you to use this tool for more than just linear review and clustering.

 

Thanks to the power of Pivot, Relativity can now be used as a culling tool before you start your review. If you are not using a separate ECA tool, you can rely on the Pivot feature to gain some similar ECA tool advantages. You can use the Pivot feature to view timelines of your case and assess the relevant dates as well as to cull out certain categories of documents. For example, in just a few minutes, and a few clicks of your mouse, you can segregate groups of documents and prioritize your review. Pivot allows you to obtain a list of the sender domains in your data. From this list, you can identify potential irrelevant domains and/or potentially privileged documents (see screen shot below). By clicking on the domain, you can see all the documents in the particular domain. For the potentially privileged documents, you can bulk batch them and send to your team for immediate privilege review. For the likely irrelevant domains, you can scroll through a sampling of the documents and when you are satisfied that the documents are not in fact relevant, you can bulk tag them as such and avoid the time and expense of reviewing junk mail.

 

 

You can also use the line graph feature to give you an idea of what months and/or years to focus discovery on and identify any issues with your discovery to-date. By creating a graph such as the one below, you can identify all the sent dates for your custodians. The graph below, created using data in a sales database, shows that the custodian Larry Campbell sent a large number of emails in November 2000 and May 2001. If these same peaks appear in other custodians, you may be able to focus your case around this period. This chart also helps you identify that there are some issues with the sent dates in your data—there should not be emails from 1979 and 2020. You can follow up on these data concerns early on in your case whereas without a timeline you may not have noticed this anomaly.

 

 

 

Collections: Top 5 Questions

By David Rostov and Debora Motyka Jones

There is a fair amount of confusion regarding collections of client data.  To help guide your approach to collections, we have provided an overview of the top five questions and their answers.

 

How should collections be performed?

·        All data collections should be performed in a forensically sound manner. This means that the collection should done using sound, defensible manner using industry accepted tools and procedures. The collection should produce an accurate representation of the source evidence.

 

Targeted Collection versus Full Forensic Collection?

 

·        A targeted collection includes only active files deemed relevant to the case (e.g. emails and Microsoft office documents). 

    • Reduces cost and time due to faster collection time and less data. 
    • It does not preserve deleted data.
      • Some additional spoliation risk.
      • The methodology may be easier to challenge in court.

·        A forensic collection is a bit-for-bit copy of the entire hard drive including all active files, deleted files, file fragments and blank space.

    • Preserves all data reducing the risk of spoliation.
    • Has greater legal defensibility.
    • However, it is more expensive.

If files are deleted, what can be recovered?

·        When the content of the file remains on the drive AND

o       Files are in the Windows Recycle Bin;

o       Files have not been overwritten by a new file;

o       Files are partially overwritten.

·        When the content is in a PST AND

o       The damaged/corrupted files, “Deleted Items” files and partially overwritten files are identified and recovered into a new PST file.

 

If files are deleted, what cannot be recovered?

·        Files that were completely overwritten with new files.

·        Drives that were “wiped” using wiping software.

·        Drives that were physically damaged and cannot be repaired
(even in a lab environment).

 

What are the leading industry software tools for collection?

·        EnCase and FTK Imager for forensic collections.

o       This is the “gold” standard used by law enforcement as well. 

·        Paraben’s Device Seizure for cell phone collections.

·        Microsoft ExMerge for Exchange server collection.

·        Microsoft Robocopy often used for Targeted Collections.

·        Microsoft NTBackup for backup files (.bkf).

·        Symantec Norton Ghost for backup and recovering files.

Requesting Data: More is Better

We were recently asked for our thoughts on how best to strategically sift through large data sets when you are on the receiving end of the data. Here is a summary of what we recommended.

Meet and Confer. The goal is to get the other side to produce as much information as possible tailored to work best with your case strategy and review tools.

 

Optimal Production on the Receiving Side. We recommend requesting processed natives – data that the other side converted into a litigation data base. The data would include extracted text, OCR, metadata and all images. The files would be numbered with document IDs so that both sides could keep track of the documents. Benefits include: lower cost, ease of search, ability to cluster documents and tracking discussions. 

 

TIFF Production with Metadata. If the other side will only provide TIFF images, then we would recommend that you require that they provide as much metadata as possible. At a minimum, you will want to get the extracted text, OCR and all major metadata fields. Here are some ways that you can improve the efficiency of your review. 

 

·         Use of file type analysis to eliminate "junk"

·         Recreate email discussion threads using the Message ID field

·         Grouping documents that are similar (near dupes)

·         Comparing track changes across similar documents

 

TIFF Production with Limited Metadata. If the data provides are TIFFs with limited metadata (e.g. existing discovery, hard copy documents, etc.), then the focus on increasing review work flow. Here are a few ways to improve reviewer:

 

·        Mass remove “junk” files to a lower level review team that can confirm non-responsive documents (e.g. a football pool email).

·        Use a work flow that does not require reviewers to tag non-responsive documents at first glance. After finishing, mass tag anything left without a tag as Not Responsive.

·        Use an MD5-hash value to find and auto-code duplicate documents.

 

To summarize, the more metadata that the producing party provides, the better. Therefore, up-front ask for as much meta-data as possible with specificity around the technical format required.

Top Five Questions to Ask When Choosing an E-Discovery Vendor

By David Rostov and Debora Motyka Jones

We often get questions from our clients about how best to select an electronic discovery vendor.  Important considerations in this process are what questions to ask, how best to compare vendors and what are the important issues that are typically missed in the selection process.  In particular, our clients often tell us that they sometimes struggle in the vendor selection phase to be able to best assess the quality and capabilities of a vendor.  Given the challenges of choosing the right vendor, we often hear that law firms default to making their decision based almost exclusively on price considerations. 

We put together a short list of key questions that can help in the eDiscovery vendor selection process. 

 

 

 

Top Questions To Ask When Choosing an E-Discovery Vendor

  • Scope of Services

        What services does the vendor offer?

        If case parameters change, will the vendor be able to meet your needs and time frames?

        Are there volume benefits/discounts if you use multiple services (e.g. processing, hosting and production versus just hosting)?

        What services are sub-contracted out and does data ever leave the vendor’s site?

        What size or type of case is too big for the vendor?

        What have been vendor’s toughest cases?

  •        Expertise (Not all vendors are created equal; and it is not all about price)

        What is the vendor’s knowledge level of the technical issues?

        Are the vendor’s employees certified in the tools they use?

        What is the vendor’s level of understanding of the legal process?

        Are there legal professionals on staff?

        How does the vendor’s expertise compare to other vendors?

  •        Quality of Services

        Is this a vendor that you could see yourself establishing a longer term relationship?

        How does the vendor manage ensuring high quality service consistently: accurate and on-time?

        Are errors tracked? What are considered errors? How are errors addressed?

        What do the references say about the vendor?

  •        Customer Service

        What hours does the vendor operate?

        How available are the vendor’s employees during non-business hours?

        How much lead time is needed for processing and production?

        How are cases staffed?

        Who is the primary point of contact? Is it the same throughout the case? 

        What is the nature of the vendor’s project management team and approach?

        How are issues escalated?

  •        Technical Specifications

        Does the vendor use proprietary versus non-proprietary software and what are the benefits/trade-offs?

        If the data is not being processed locally, what is the vendor’s FTP connection speeds and how does this compare with the law firm’s FTP speeds?

        What is the vendor’s policy on backing up data?

        What is the vendor’s policy regarding storing data?

 

 

De-Duplication -- Different Tools, Different Results

If two emails are identical, shouldn’t they be considered duplicates?

Unfortunately in eDiscovery it is not quite so simple. The industry standard is to calculate an MD5 hash value for all emails in a population and then identify the duplicate emails (this is referred to as de-duping). MD5 hash value is the output of a complex mathematical algorithm; it provides a way to identify each unique document. Ralph Losey has written some very thoughtful commentary on hash values. He makes a very interesting case for using hash values as the replacement for Bates numbering; the 21st century version of Bates numbering. 

The issue/challenge is that each of the major eDiscovery software tools uses its own proprietary definition of the inputs used in calculating the hash values. In the hash value world, even a very small difference means that two documents that are truly identical can be considered distinct. As a result, this leads to a certain set of documents being reviewed and/or produced more than once. The table below provides a summary of the inputs used to calculate the hash values from three leading tools: Clearwell, LAW and IPRO. As you can see, they are each different.

 

Clearwell

LAW

IPRO

From

Yes

Yes

Yes

To

Yes

Yes

Yes

Cc

Yes

Yes

Yes

Bcc

Yes

Yes

Yes

Subject of the email

Yes

No

Yes

Email date (sent date)

UTC

No

GMT

Body content

Yes

Yes

Yes

Attachment Names

No

Yes

Yes

IntMsgID

No

Yes

No

       

Notes:

     

Yes indicates it is included in the hash computation.

   

No indicates that it is not included in the hash computation.

   

IPRO hash methodology can be customized based on the settings outlined above.

 

As a recent real world example, we worked on an eDiscovery project where the custodian sent out an email to eight people within his company. By any reasonable standard, this means that there were eight exact duplicates of this email in the population set. However, the software tool used to process this data categorized this email as being four different emails. This was due to the fact that the company had various internal email servers (a fairly common occurrence in larger corporations) and each time the email was handed off to a different internal server, it placed a slightly different time in one of the metadata fields.

Conclusion

Although each software tool calculates the hash values slightly differently, this does not necessarily mean that one tool is better or worse than another or that one is inherently more accurate. If hash values were to become the Bates stamp of the 21st century, the electronic discovery industry could benefit from a standard method of calculating hash values. Absent a standard, it is important to be aware of this issue in case you run across it.

 

 

Why are So Many Email Collections Corrupted

Many email collections are done improperly and produce corrupted files. Unless properly repaired, corrupted email files cannot be processed for litigation. The most common email collection problem is from Microsoft Exchange Server collections (.PST files).  Improperly collected exchange data adds significant time and cost to the eDiscovery process. It also introduces an element of risk in terms of the overall integrity of the evidence.

Microsoft Outlook saves all email files in a .PST file format. Think of the PST as an expanding container file. For most custodians, all of their email resides in a few PST files. 

Often email collections are performed by internal IT personnel. Usually email collections are done using the Microsoft Exchange Mailbox Merge Program (ExMerge.exe). This program enables a network administrator to extract data from mailboxes on an Exchange Server and merge it into the same mailboxes on another computer that is running Exchange Server. The program copies the PST file from the source mailbox server and merges the data into the same PST file on the destination server. The most common practice is to copy the data while the custodian(s) are still logged into the system. This allows the custodian to continue working while the collection is occurring. This is the main cause of the file corruption. The system cannot properly synchronize the various sets of files, in particular slight differences in dates/times, while the custodian’s email account is active. 

 

The good news is that there is a very simple and effective solution to this problem. The solution is to make sure that the custodian is logged out of his/her account during the entire collection process and that the account has been properly synchronized with the server. It is always advisable to verify that the data was successfully collected prior to turning it over it to your eDiscovery vendor or counsel. To verify the collected PST, use the function “Advanced Find” in Outlook. If you do not see any messages in the view pane, this is an indication that the collection was not successful and the data has been corrupted.

 

Paraben has a tool called E-mail Examiner that does a good job of insuring that the email collection is forensically sound. Their product is more expensive than ExMerge and not as widely used. However, it is designed specifically for purposes of litigation and investigations.

 

Repairing Corrupted PSTs

If the collection was not done properly and the data is corrupted, repairing a PST usually involves a number of hours of senior technical time. A rough estimate is that a 10 GB PST will take a few hours to repair. There are two tools that we would recommend for this type of repair. Both tools search all the files in order to locate the corrupt files and then attempt to recover the damaged information.

 

1.      EasyRecovery File Repair. This tool is from Kroll Ontrack. 

2.      Outlook Recovery Tool Box. This is a Microsoft tool that is usually included with Outlook.

 

Unfortunately not all corrupt PSTs can be repaired. If so, you will need to have the data re-collected. Be prepared for an unhappy custodian when you show up to re-collect their data.

Big Changes in Early Case Assessment

There are some very exciting trends and developments going on in the Early Case Assessment (“ECA”) phase of litigation. ECA is a critical part of the litigation process since it is a time to perform a preliminary analysis of the merits of a case, claims, likely defenses and estimate of the cost of the case. Usually the ECA is conducted in the first 90 days from the time a case is filed. There is general agreement that ECA improves litigation outcomes. For example, a survey by LexisNexis  showed that ECA results in favorable outcomes in 76% of cases and reduces litigation expenses in 50% of cases.

In the digital era, the BIG CHALLENGE is getting access to the electronic data early in the ECA process and having the tools to allow the legal team to evaluate the case based on a preliminary review of the evidence. This is both a technology challenge as well as a cost challenge. The good news is that there are now a number of early case assessment tools on the market that can solve this problem. We are big fans of Clearwell for this and our clients are seeing the value.

 

 

The key benefits from this are:

 

  1. Speeding up access to client data. The documents can be fully indexed and available to review within hours rather than weeks.
  2. An easy to use web interface. This means it is available anywhere and anytime. There is no need to rely on internal IT resources and no need to purchase additional software or hardware.
  3. Collaboration between in-house counsel and outside counsel. It is very easy to have the legal team work together to examine key documents.

Effective use of an early case assessment tool makes it possible to prepare an Early Case Assessment in the digital era. A good understanding of the documents allows the legal team to prepare a more complete litigation strategy. It also helps lower the overall cost of the case by reducing the amount of data that needs to be processed for review and correspondingly reducing the amount of legal hours required for review. The other added benefit is that the legal team will be able to create a more accurate budget for the case based on their insight into the data size and its nuances.

 

Text Messaging and Its Impact on eDiscovery

To-date, most litigation electronic discovery requests are limited to custodian email and loose documents. The requests ignore custodian mobile phone data, in particular stored text messages. The next big eDiscovery collection trend for litigation will likely be the collection of text messages from mobile phones.

Text messaging is still viewed as something that only teenagers really use. However, the usage data on text messaging is quite revealing. Over 70% of Americans ages 25 to 49 use text messaging. The average number of texts sent per day per user in the US is over 10. In 2008, the number of text messages sent surpassed mobile phone calls. And text messaging is growing at 100 to 200% per year.

 

To put texting in its proper context, it is estimated that Americans send about 30 emails per day (the data on this is not very precise). This means that texting accounts for ¼ of the daily electronic correspondence sent in the US.

 

The first step in any forensics investigation is identifying sources of evidence.  Mobile phones store evidence in a variety of locations and media formats. Similar to desktop computers, most cell phones have an internal memory and a removable storage media (SD Cards).  Depending on the carrier, an internal SIM (Security Identity Module) card stores pertinent information, such as phone numbers, contacts, and unique subscriber registration data.

 

As with computer collections, mobile device collections should be done in a forensically sound manner. This means that the data collected must be collected without changing the original device content. A forensic hash should be performed on the collected data to insure that no subsequent changes are made to the data. Keep in mind that the data on mobile devices is constantly changing (e.g. clock time, network data, etc.) so it is important to make an exact replica as quickly as possible.

 

The main challenge with mobile collections is that most cellular phones use a proprietary operating system. This is compounded by the fact that new mobile devices are constantly being introduced into the market making it a challenge to stay current on the collections tools. Often the hardest part in the collection is just having the right phone adapter on hand to be able to do the data transfer from the phone to the acquiring computer.

 

After making a copy of the phone data, the next step is to analyze the data. The forensic tools available for analysis and processing are still in their early stage of development. However, there are a number of forensic tools available such as Paraben’s Device Seizure Toolkit and Guidance Software’s Neutrino.  Paraben’s Device Seizure is probably the most common tool used both by law enforcement as well as for commercial litigation.  These tools are very similar to traditional forensics software utilities and offer many of the same capabilities and functionally, such as text viewing and keyword. During the analysis phase text messages, e-mails and contacts can be identified, undeleted (if necessary), searched, and exported for review or further processing. If you are interested in more information on mobile collections, The National Institute of Standards and Technology (NIST) has a good overview.