Requesting Data: More is Better

We were recently asked for our thoughts on how best to strategically sift through large data sets when you are on the receiving end of the data. Here is a summary of what we recommended.

Meet and Confer. The goal is to get the other side to produce as much information as possible tailored to work best with your case strategy and review tools.

 

Optimal Production on the Receiving Side. We recommend requesting processed natives – data that the other side converted into a litigation data base. The data would include extracted text, OCR, metadata and all images. The files would be numbered with document IDs so that both sides could keep track of the documents. Benefits include: lower cost, ease of search, ability to cluster documents and tracking discussions. 

 

TIFF Production with Metadata. If the other side will only provide TIFF images, then we would recommend that you require that they provide as much metadata as possible. At a minimum, you will want to get the extracted text, OCR and all major metadata fields. Here are some ways that you can improve the efficiency of your review. 

 

·         Use of file type analysis to eliminate "junk"

·         Recreate email discussion threads using the Message ID field

·         Grouping documents that are similar (near dupes)

·         Comparing track changes across similar documents

 

TIFF Production with Limited Metadata. If the data provides are TIFFs with limited metadata (e.g. existing discovery, hard copy documents, etc.), then the focus on increasing review work flow. Here are a few ways to improve reviewer:

 

·        Mass remove “junk” files to a lower level review team that can confirm non-responsive documents (e.g. a football pool email).

·        Use a work flow that does not require reviewers to tag non-responsive documents at first glance. After finishing, mass tag anything left without a tag as Not Responsive.

·        Use an MD5-hash value to find and auto-code duplicate documents.

 

To summarize, the more metadata that the producing party provides, the better. Therefore, up-front ask for as much meta-data as possible with specificity around the technical format required.

Not All TIFFs Are Created Equal

Processing of electronic discovery data can lead to interesting surprises in terms of the complexity and/or size of the data. This can sometimes make it challenging to accurately estimate a timeline for a project prior to loading the data and performing some preliminary analysis.

For example, we recently received 20 spreadsheets that needed to be converted into TIFF images and produced to opposing counsel. The client called and asked us for an estimated time to complete the project. Based on the fact that it was only 20 spreadsheets, we estimated that we would have this project completed within a few hours. Assuming 50 pages per spreadsheet, our estimate was that this was going to be about 1,000 TIFF images.

After we received the data, we loaded it in our system and created TIFF images of the spreadsheets. It turned out that the 20 spreadsheets generated close to 100,000 TIFF images or pages (an average of 5,000 pages per spreadsheet). One spreadsheet converted into approximately 20,000 TIFF images. This meant that the data size was almost 100 times bigger than we had expected. As a result, the project took longer than our original estimate. The good news was that most of the spreadsheets actually had a lot of blank pages and other “quirky” formatting issues. In the case of the 20,000 page spreadsheet, we were able to fix the formatting (without, of course, changing any of the original data) which reduced the spreadsheet to a few hundred pages. We were also able to significantly reduce the page size for the other spreadsheets by a similar amount. The additional time that we took to fix the formatting ended up saving counsel countless review hours and cost.

Bottom line, when requesting a firm timeline and cost estimate from an electronic discovery vendor, it is always best to give them the actual data and request that they do a preliminary analysis of the data prior to finalizing an estimate. This will insure a much more realistic estimate.