When It Really Matters770-777-2090



To learn more about electronic discovery or
discuss a specific matter





When Are Keyword Search Terms Risky Business?

Gregory L Fordham
January 2009

The use of computerized keyword search terms has become widespread as litigators sift through large volumes of data in today’s e-discovery cases.

Although keyword searches are powerful, there is growing recognition of their “process” risks and “capability” limitations.   

The process risks include things such as inadequate sampling and testing of the search terms to determine their suitability as predictors of responsive documents. 

They can also include the user’s unfamiliarity with the search engine and whether it is suitable for searching the kinds of electronic documents that are of interest.

Both of these kinds of issues can result in either an excessive volume of false positives that waste resources and squander budget or, even worse, result in instances of false negatives, where significant documents are not identified at all.

Some recent cases where these kinds of problems were encountered include U.S. v O’Keefe, 537 F. Supp. 2d 14, D.D.C. (2008); Victor Stanley, Inc. v Creative Pipe, 250 F.R.D. 251, D.Md (2008); and Rhoads Industries, Inc. v Building Materials Corp. of America, 2008 WL 4916026, E.D.Pa., (2008).

In terms of capability limitations, keyword search terms are suitable for answering only certain kinds of questions.  For example, keyword searches can answer whether a document exists that contains the keywords. 

They are not suited for answering other kinds of questions, however.  For example, they are not suitable for answering questions about whether a document was accessed, deleted, omitted or shared with others.

Certainly finding a document with a keyword search means it exists but does that mean it was accessed?  If “accessed” means possessed then its identification by a keyword search would confirm possession.

On the other hand, if “accessed” means opened and viewed then examination of other system metadata artifacts like link files and recently used lists are required.

Similarly, if a document is not identified by keyword search terms that does not mean that a document has not been accessed.  It could simply indicate the existence of a “process” risk.

In addition, failure of the document to be identified by keyword searches could indicate that it has been deleted.  Since this is the same outcome that one would experience if the document had never existed a keyword search is not well suited for proving spoliation. 

Rather, the real proof of deletion or spoliation would again come from the examination of system metadata such as deleted file system entries or  unmatched link files, to name a few possibilities.

Another example of keyword search term limitations is whether the document was shared with others.  As with the question about document access, this is also not a good question to be answered with keyword search terms.

Despite the various limitations of keyword searches, they are being used to answer these kinds of questions.  Some examples of cases where they were used in this fashion are Equity Analytics, LLC v Lundin, 248 F.R.D. 331, D.D.C. (2008) and Calyon v Mizuho Securities USA, Inc., 2007 WL 2618658, S.D.N.Y (2007).

These two cases demonstrate more than the improper use of keyword search terms, however.  Indeed, they also signal an emerging new defense tactic that is designed to frustrate discovery and prevent production of evidence.

In both cases the defendants used privacy concerns to restrict the access of their computer media to their own experts and then only for the purpose of performing keyword searches.

While judges are sensitive to the privacy  concerns of the parties, the answer to questions about access, spoliation and sharing will not likely be revealed with keyword searches.  Rather those answers are best obtained through the inspection of system metadata, which is unlikely to contain privileged information or other sensitive data.

So, there are risks and limitations to keyword searches.  Understanding those risks and limitations is important for their users because of how they can adversely effect the outcome of a case as a result of excessive false positives or false negatives.