Requesting Data: More is Better
We were recently asked for our thoughts on how best to strategically sift through large data sets
when you are on the receiving end of the data. Here is a summary of what we recommended.
Meet and Confer. The goal is to get the other side to produce as much information as possible tailored to work best with your case strategy and review tools.
Optimal Production on the Receiving Side. We recommend requesting processed natives – data that the other side converted into a litigation data base. The data would include extracted text, OCR, metadata and all images. The files would be numbered with document IDs so that both sides could keep track of the documents. Benefits include: lower cost, ease of search, ability to cluster documents and tracking discussions.
TIFF Production with Metadata. If the other side will only provide TIFF images, then we would recommend that you require that they provide as much metadata as possible. At a minimum, you will want to get the extracted text, OCR and all major metadata fields. Here are some ways that you can improve the efficiency of your review.
· Use of file type analysis to eliminate "junk"
· Recreate email discussion threads using the Message ID field
· Grouping documents that are similar (near dupes)
· Comparing track changes across similar documents
TIFF Production with Limited Metadata. If the data provides are TIFFs with limited metadata (e.g. existing discovery, hard copy documents, etc.), then the focus on increasing review work flow. Here are a few ways to improve reviewer:
· Mass remove “junk” files to a lower level review team that can confirm non-responsive documents (e.g. a football pool email).
· Use a work flow that does not require reviewers to tag non-responsive documents at first glance. After finishing, mass tag anything left without a tag as Not Responsive.
· Use an MD5-hash value to find and auto-code duplicate documents.
To summarize, the more metadata that the producing party provides, the better. Therefore, up-front ask for as much meta-data as possible with specificity around the technical format required.