This Page

has been moved to new address

The eDiscovery Paradigm Shift

Sorry for inconvenience...

Redirection provided by Blogger to WordPress Migration Service
----------------------------------------------------- Blogger Template Style Name: Snapshot: Madder Designer: Dave Shea URL: / Date: 27 Feb 2004 ------------------------------------------------------ */ /* -- basic html elements -- */ body {padding: 0; margin: 0; font: 75% Helvetica, Arial, sans-serif; color: #474B4E; background: #fff; text-align: center;} a {color: #DD6599; font-weight: bold; text-decoration: none;} a:visited {color: #D6A0B6;} a:hover {text-decoration: underline; color: #FD0570;} h1 {margin: 0; color: #7B8186; font-size: 1.5em; text-transform: lowercase;} h1 a {color: #7B8186;} h2, #comments h4 {font-size: 1em; margin: 2em 0 0 0; color: #7B8186; background: transparent url( bottom right no-repeat; padding-bottom: 2px;} @media all { h3 { font-size: 1em; margin: 2em 0 0 0; background: transparent url( bottom right no-repeat; padding-bottom: 2px; } } @media handheld { h3 { background:none; } } h4, h5 {font-size: 0.9em; text-transform: lowercase; letter-spacing: 2px;} h5 {color: #7B8186;} h6 {font-size: 0.8em; text-transform: uppercase; letter-spacing: 2px;} p {margin: 0 0 1em 0;} img, form {border: 0; margin: 0;} /* -- layout -- */ @media all { #content { width: 700px; margin: 0 auto; text-align: left; background: #fff url( 0 0 repeat-y;} } #header { background: #D8DADC url( 0 0 repeat-y; } #header div { background: transparent url( bottom left no-repeat; } #main { line-height: 1.4; float: left; padding: 10px 12px; border-top: solid 1px #fff; width: 428px; /* Tantek hack - */ voice-family: "\"}\""; voice-family: inherit; width: 404px; } } @media handheld { #content { width: 90%; } #header { background: #D8DADC; } #header div { background: none; } #main { float: none; width: 100%; } } /* IE5 hack */ #main {} @media all { #sidebar { margin-left: 428px; border-top: solid 1px #fff; padding: 4px 0 0 7px; background: #fff url( 1px 0 no-repeat; } #footer { clear: both; background: #E9EAEB url( bottom left no-repeat; border-top: solid 1px #fff; } } @media handheld { #sidebar { margin: 0 0 0 0; background: #fff; } #footer { background: #E9EAEB; } } /* -- header style -- */ #header h1 {padding: 12px 0 92px 4px; width: 557px; line-height: 1;} /* -- content area style -- */ #main {line-height: 1.4;} {font-size: 1.2em; margin-bottom: 0;} a {color: #C4663B;} .post {clear: both; margin-bottom: 4em;} .post-footer em {color: #B4BABE; font-style: normal; float: left;} .post-footer .comment-link {float: right;} #main img {border: solid 1px #E3E4E4; padding: 2px; background: #fff;} .deleted-comment {font-style:italic;color:gray;} /* -- sidebar style -- */ @media all { #sidebar #description { border: solid 1px #F3B89D; padding: 10px 17px; color: #C4663B; background: #FFD1BC url(; font-size: 1.2em; font-weight: bold; line-height: 0.9; margin: 0 0 0 -6px; } } @media handheld { #sidebar #description { background: #FFD1BC; } } #sidebar h2 {font-size: 1.3em; margin: 1.3em 0 0.5em 0;} #sidebar dl {margin: 0 0 10px 0;} #sidebar ul {list-style: none; margin: 0; padding: 0;} #sidebar li {padding-bottom: 5px; line-height: 0.9;} #profile-container {color: #7B8186;} #profile-container img {border: solid 1px #7C78B5; padding: 4px 4px 8px 4px; margin: 0 10px 1em 0; float: left;} .archive-list {margin-bottom: 2em;} #powered-by {margin: 10px auto 20px auto;} /* -- sidebar style -- */ #footer p {margin: 0; padding: 12px 8px; font-size: 0.9em;} #footer hr {display: none;} /* Feeds ----------------------------------------------- */ #blogfeeds { } #postfeeds { }

Tuesday, July 31, 2012

Professionalism and eDiscovery: Going beyond ethical considerations

During the last few years, there has been much discussion, and even some interesting debates, about ethical eDiscovery issues.  Much focus has been on the topics of duties to preserve records, duties to disclose records, and the state Rules of Professional Conduct.  But, I believe it is not sufficient to consider only the ethical issues involved.  We must also focus on the professionalism of eDiscovery.  Some of these professionalism issues are raised in discovery generally, but others are unique to eDiscovery.

I believe it nearly universally true that the most professional and ethical lawyers are usually the best lawyers.  They have either long ago abandoned, or never acquired a taste for, unprofessional conduct.  They have mastered their craft and find no use for unprofessional behavior.  The same could be said for business leaders; if they are not professional, others would rather do a business deal with someone else.

Attorney Civility Rules

Some states have developed civility rules that are guidelines only.  These rules are not intended to be enforced against lawyer conduct the way that the Rules of Professional Conduct are enforced.  However, these are excellent guidelines for ensuring that lawyers maintain professionalism in eDiscovery.

Included in New York’s Standards of Civility rules are standards are obligations to be “courteous and civil in all professional dealings with other persons.”  This includes a requirement that lawyers “should act in a civil manner regardless of the ill feelings that their clients may have toward others” and “[l]awyers can disagree without being disagreeable.”
The New York Standards of Civility also state that “[a] lawyer should not use any aspect of the litigation process, including discovery and motion practice, as a means of harassment or for the purpose of unnecessarily prolonging or increasing litigation expenses.”  ESI requests are particularly prone to abuse in this area as it can be used to harass and increase litigation expenses.

Everything I Really Need to Know I Learned In Kindergarten
In Robert Fulghum’s popular essay about what he learned in kindergarten, he discussed a few basic principles that both lawyers and businesses should abide by.  Included among those are basic professional principles like “share everything,” “play fair,” “don’t hit people,” “clean up your own mess,” “don’t take things that aren’t yours,” “say you’re sorry when you hurt somebody,” and “live a balanced life.”  A healthy dose of these basic ideas would serve the lawyer well in eDiscovery practice.  Although the pressing matter may seem most important at the time, conduct will create a reputation, and an unprofessional reputation is difficult to lose once it is gained. You can play fair while vigorously representing your client.

What Professionalism Should Govern eDiscovery Practice?

In eDiscovery circles, there is much discussion taking place about “proportionality.”  Essentially, this is an issue of reasonableness.  I believe reasonableness is also an issue of professionalism.  Recall that the scope of discovery is what is “reasonably calculated to lead to the discovery of admissible evidence.” Fed. R. Civ. Proc. 26(b)(1).  By narrowly tailoring requests to what is reasonable will enhance eDiscovery professionalism. eDiscovery costs should never be used as a way to bludgeon the opposing party into submission.   If ESI the scope of a request can be narrowed without harming a client’s case, then it should be narrowed.  The New York Rules of Civility state that “[a] lawyer should avoid discovery that is not necessary to obtain facts or perpetuate testimony or that is designed to place an undue burden or expense on a party.”

While many crack jokes about the professionalism and ethics of lawyers, most lawyers I know take both ethics and professionalism very seriously.  I believe that the best lawyers are not only ethical but highly professional as well.  Some clients act professionally as well, while others may will push for unprofessional practices.  It is the lawyer’s job to reign in his or her client.  While a lawyer must zealously advocate for a client, no case or client is ever worth squandering one’s reputation.  Never allow a client to cause you to do something unethical or unprofessional.

Lawyers involved in eDiscovery should strive for not only meeting the basic Rules of Professional Conduct but also the Rules of Civility.  By doing so, we serve the judicial system, our colleagues and our clients with integrity.

Labels: , , , , , , ,

Wednesday, January 5, 2011

Linear Review is an Outdated Methodology

As we trudge through the first week of 2011, I am going through my list of Blog posts that I wanted to comment on and the December 28, 2010 post on Linear Review by Venkat Rangan, Clearwell Systems CTO, seemed like a good place to start 2011.  The post titled, “Reinventing Review in Electronic Discovery”  discusses a topic that I am very familiar with and have been somewhat outspoken about it in the past couple of years.  Review costs still comprise over 70% of the overall cost of eDiscovery and therefore as an industry, we need to find better ways to approach review and, more importantly, reduce the costs.

Given my background in enterprise class applications development methodology and technology, I lived through the paradigm shift when that industry shifted from legacy waterfall methodology (i.e. linear) to rapid applications development (RAD) and now agile development methodology.  The increases in productivity were dramatic.

Mr. Rangan’s bases his Blog post on a excellent paper by The Demise of Linear Review by Bennett Borden of Williams Mullen.  Mr. Rangan states that the paper, citing several factual data from various studies, as well as drawing parallel to other similar anachronisms of the past, makes excellent arguments for rethinking how legal review is performed in eDiscovery.

I hope that in 2011, the litigation market begins to understand and embrace both the practical and financial benefits of replacing linear review with newer and more effective review methodologies and technologies.

The full text of Mr. Rangan’s Blog post is as follows:

In a recent workshop that I attended, I had the privilege of sharing thoughts on the latest electronic discovery trends with other experts in the market. Especially interesting to me was discussing the provocatively titled paper, The Demise of Linear Review by Bennett Borden of Williams Mullen. The paper, citing several factual data from various studies, as well as drawing parallel to other similar anachronisms of the past, makes excellent arguments for rethinking how legal review is performed in e-discovery.

When linear review is mentioned, the first mental picture one conjures up is boredom. It has generally been associated with a mental state that is a result of repetitive and monotonous tasks, with very little variation. To get a sense for how bad this can affect performance, one only needs to draw upon several studies of boredom at the workplace, especially in jobs such as mechanical assembly of the 1920s and the telephone switchboard operators of the 1950s. In fact, the Pentagon sponsored study, Implications for the design of jobs with variable requirements, from Navy Personnel Research and Development Center, presents an excellent treatise on contributors for workplace fatigue, stress, monotony, and distorted perception of time. This is best illustrated in their paper:

Mechanical assembly, inspection and monitoring, and continuous manual control are the principal kinds of tasks most frequently studied by researchers investigating the relationship between performance and presumed boredom. On the most repetitive tasks, degradation of performance has typically been found within 30 minutes (Fox & Embry, 1975; Saito, Kishida, Endo, & Saito, 1972). The early studies of the British Industrial Fatigue Board (Wyatt & Fraser, 1929) concluded that the worker’s experience of boredom could be identified by a characteristic output curve on mechanical assembly jobs. The magnitude of boredom was inversely related to output and was usually marked by a sharp decrement in the middle of a work period.
How does this apply to linear review? Well, a linear review is most often performed using a review application or tool, simulating a person reading and classifying a pile of documents. The reviewer is asked to read the document and apply a review code, based on their judgment. While it appears easy, it can be one of the most stressful, boring, and thankless jobs for a well-educated, well-trained knowledge worker. Even with technology and software advances a reviewer is required to read documents in relatively constrained workflows. Just scrolling through pages and pages of a document, comprehending its meaning and intent in the context of the production request can make it stressful. To add to this, reviewers are often measured for their productivity based on the number of documents or pages they review per day or per hour. In cases where large number of reviewers are involved, there are very direct comparisons of rates of review. Finally, the review effort is judged for quality without consideration for the very elements that impact quality. Imagine a workplace task where every action taken by a knowledge worker is monitored and evaluated to the minutest detail.

Given this, it is no wonder that study after study has found a straight plough-through linear review produces less than desirable results. A useful way to measure effectiveness of a review exercise is to submit the same collection of documents to multiple reviewers and assess their level of agreement on their classification of the reviewed documents in specific categories. One such study, Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review, finds that the level of agreement among human reviewers was only in the 70% range, even when agreement is limited to positive determination. As noted in the study, previous TREC inter-assessor agreement notes as well as other studies on this subject by Barnett et al., 2009 also shows a similar and consistent result. Especially noteworthy from TREC is the fact that only 9 out of 40 topics studied had an agreement level higher than 70%, while remarkably, four topics had no agreement at all. Some of the disagreement is due to the fact that most documents fall on varying levels of responsiveness which cannot easily be judged on binary yes/no decision (i.e., the “where do you draw the relevance line” problem). However, a significant source on variability is simply attributed to the boredom and fatigue that comes with repetitiveness of the task.

A further observation on reviewer effectiveness is available from the TREC 2009 Overview Report, which studied the appeals and adjudication process of that year’s Interactive Task. This study offers an excellent opportunity to assess the effectiveness of initial review and subsequent appeals and adjudication process. As noted in the study, the Interactive Task involves an initial run submission from participating teams which are sampled and reviewed by human assessors. Upon receiving their initial assessments, participating teams are allowed to appeal those judgments. Given the teams’ incentive to improve upon the initial results, they are motivated to construct an appeal for as many documents as they can, with each appeal containing a justification for re-classification. As noted in the study, the success rates of appeals were very high, with 84% to 97% of initial assessments being reversed. Such reversals were across the board and directly proportional to the number of appeals, suggesting that even the assessments that were not appealed could be suspect. Another aspect that is evidenced is that the appeals process requires a convincing justification from the appealing team, in the form of a snippet of the document, document summary, or a portion of the document highlighted for adjudication. This in itself biases the review and makes it easier for the topic assessor to get a clearer sense for the document on their attempt at adjudicating the appeal. This fact is also borne out by the aforementioned Computer Classification vs. Manual Review study where the senior litigator with the knowledge of the matter had the ability to offer the best adjudications.

Given that linear review is flawed, what are the remedies? As noted in Bennett’s paper, intelligent use of newer technologies along with a review workflow that leverages them can offer gains that are demonstrated in other industries. Let’s examine a few of them.

Response Variation Response variation is a strategy for coping with boredom by attempting to build variety into the task itself. In mechanical assembly lines, response variation is added through innovative floor and task layouts, such as Cellular Layout. On some tasks, response variation may involve only simple alternation behaviors, such as reversing the order in which subtasks are performed; on others, the variety may take more subtle forms reflected in an inconsistency of response times. In the context of linear review, it can help to organize your review batches so that your review teams alternate classifying documents for responsiveness, privilege and confidential etc. Another interesting approach would be to mix the review documents but suggest that each be reviewed for a specific target classification.

Free-Form Exploration Combining aspects of early case assessments and linear review is one form of exploration that is known to offer both a satisfying experience and effective results. While performing linear review, the ability to suspend the document being reviewed and jump to other similar documents and topics gives the reviewer a cognitive stimulus that improves knowledge acquisition. Doing so offers an opportunity for the reviewer to learn facts of the case that would normally be difficult to obtain, and approach the knowledge levels of a senior litigator of the case. After all, we depend on the knowledge of the matter to be a guide for reviewers, so attempts to increase their knowledge of the case can only be helpful. Also, on a free-form exploration, a reviewer may stumble on an otherwise difficult to obtain case fact and the sheer joy of finding something valuable would be rewarding.

Expanding the Work Product Besides simply judging the review disposition of a document, the generation of higher value output such as document summaries, critical snippets, and document meta-data that contribute to the assessment can both reduce the boredom of the current reviewer as well as contribute valuable insights to other reviewers. As noted earlier, being able to assist the review with such aids can be immensely helpful in your review process.

Review Technologies Of course, fundamentally changing linear review with specific technologies that radically changes the review workflow is an approach worth considering. While offering such aids, it must be remembered that human judgment is still needed and the process must incorporate both increasing their knowledge as well as their ability to apply judgment. We will examine these technologies in an upcoming post.

Labels: , , , , , , , , , ,

Friday, October 9, 2009

Standards Need to Emerge for Collecting and Processing Electronically Stored Evidence (ESE)

Most litigators and their litigation support staff that have been practicing over the past 5-10 years could probably teach a class on the process of preservation, collection, processing, review and production of paper evidence. Or, at least they could stand at a whiteboard and draw a basic workflow diagram of the basic steps.

However, with the dramatic and accellerating increase in the amount of Electronically Stored Information (ESI) which I like to call Electronically Stored Evidence (ESE), the subsequent technical issues and the associated changes to the Federal Rules of Civil Procedures (FRCP), very few, if any of the same litigators and their staff, can now even describe the most basic workflow to to get ESE for a trial. Therefore, although many are talking of their importance (myself included), eDiscovery standards of any substance, are a long way off.

This is certainly not the fault of the lawyers as they have never been required to have much of true understanding of the technology of processing evidence in order to be successful litigators. However, the bar has now literally been raised and litigators can't even provide adequate representation without an indepth understanding of these new issues.

Maybe we should consider requiring a license or some type of ceritification to practice law when eDiscovery is involved? Or, has ESE become so intertwined in our matters that there isn't a case without eDiscovery and therfore every lawyer that want to litigate anything should have to be certified?

As a place to start this discussion / debate, we need to start identifying the basic components of ESE and how it is stored, how to preserve it, how to extract it (the new word for collection), how to process it, how to review it, and how to produce it.

Wouldn't it great if 5 years from now, litigators could stand at a whiteboard and diagram and explain the basic "standard" components of the workflow for processing Electronically Stored Evidence (ESE)?

Eric P. Blank addresses these issues in an excellent article titled,"The Need for E-Discovery Standards: A Call From the Trenches", posted on October 5, 2009 on the EDD Update Blog.

Eric P. Blank is the founder and managing attorney of Blank Law + Technology PS. His practice focuses on electronic discovery counseling, e-security response planning and implementation, investigations and computer forensics. Mr. Blank has conducted more than 300 investigations into computer and software-related torts and employee misconduct since 2001 and has frequently been a court-appointed special master or neutral in e-discovery matters.

The full text of Mr. Blank's post is as follows:

Most discussion about standards in electronic discovery focuses on the big-picture issues of scope, cost and cost shifting.

These are important questions eloquently argued in the courts. However, they overlook the mundane, pick-and-shovel e-discovery concerns that affect every case. I’m talking about the elementary technical issues of preservation, extraction, processing, review and production.

I’m talking about extracting data from electronic storage media, processing the data and its metadata into a document review software application platform, supporting the review and producing the data as discovery or evidence.

Outside the e-discovery world, the first stage of this process is known as Extract, Transform, Load (ETL). Identifying and overcoming the challenges of ETL have occupied computer scientists for decades. Principal obstacles to effective ETL include widely diverse and poorly documented storage repositories, asynchronous multimedia platforms, constantly evolving software, hardware and software anomalies, and human error, usually with respect to initial planning.

E-discovery vendors on the ground face those obstacles and more. Consider, as just a few of many examples, the following:

Mobile phones and PDAs: In some models, data can be extracted through forensic imaging. In others, such as many of those without SIM cards, data can only be pulled through live file extraction. Click here and here to read my earlier blog posts about the difference between forensic imaging and live file extraction. In any case, the question is this: Should data extraction scope be defined by current technical capabilities, or should there be a single common standard – such as live files only – for those instances when mobile phones and PDAs are subject to e-discovery?

A multitude of file types: Extraction and processing applications address dozens, sometimes hundreds, of file types. These file types are usually associated with, and identified by, a particular file extension, such as .doc or .xls. However, custom extensions are easy to apply – documents I create might have a .epb file extension, for example – and it is also simple to apply a nonstandard extension to a particular file type (e.g., a .doc extension to a PDF file). These are often missed, or improperly processed, by extraction and processing software.

Computer forensics software in the hands of an experienced technician can reveal documents by file type without relying on extension format and such, but doing so is costly and time consuming. What checks should be done for mislabeled or unusual file extensions? When are such checks required?

Metadata: Most of us think of metadata in basic terms such as the putative author, creation date, modification date, last-access date and so forth. However, metadata varies widely across data types. Microsoft Office documents, for example, have more than 100 metadata fields. It is also possible to create custom fields with many document types. Nearly all of these, such as the ubiquitous P-size and L-size, are nearly never important in civil litigation.

“Nearly never” is not, however, the same as “never.” Such data can be extracted, but it is not, as a rule, supported by processing software, which renders it unavailable at the attorney review level. Is it possible to agree on which metadata fields should be preserved and processed? When they should be processed? Which fields are important forensically? When all fields should be preserved?

Rapid technological change: Software is updated all the time. This affects how metadata is produced and the appearance of electronic documents. Processing software hasn’t kept up. It’s also inconsistent. For example, the last-access date on a Word 2007 document running in Windows Vista is affected differently than an Office XP Word document running on Vista. Both documents, however, are processed the same, as if the metadata means the same, when it does not. How should inconsistencies like this be addressed? What should the typical approach be?

Webmail: Screenshots of Web-based email services such as Hotmail are a common and inexpensive workaround to downloading actual Hotmail files. Which method is preferred? Is either method not preferred? As third-party cloud data repositories multiply, what constitutes best practices with regard to extraction methods will become a critical question.

Capture rates: What percentage capture rate is acceptable for processing software? Many files are often not processed by even the best technology, and must be laboriously hand processed. In a million-item processing job, a 1 percent miss rate equals 10,000 documents not processed and available for review. Is 99 percent acceptable? Is 98 percent? Note: If you think that the processing rate for your document review software is 100 percent, you’re kidding yourself.

Searching: Keyword searching, including keyword searches supported by “fuzzy” search techniques, are giving way to conceptual searching, which is the future of document search and review. Conceptual searching, however, involves proprietary algorithms and processes with a wide range of accuracy. What standards must conceptual searching meet to be accepted? How are these standards applied? When, if ever, is conceptual searching disallowed?

File format: In e-discovery today, most documents are produced in .Tiff format. Putting aside the larger question of whether .Tiff should be the standard for producing electronic documents, what about documents such as spreadsheets that don’t translate well into .Tiff files? In what format should presentation-type documents be produced? As slide shows? As workbook copies with notes and presenters’ comments? How are native files to be tracked and authenticated as a best practice?

Today, e-discovery consultants decide many of these questions on their own or after consulting with litigation counsel. In essence, a consultant decides when it is and isn’t practical to extract files from a system, whether to image a particular hard drive and whether to put aside as unreadable a back-up tape from a set of tapes that must be searched.

Much of the time, the consultant makes the “right” decision, as subsequently decided by the court, the client or the opposing party. It’s a rare consultant, however, who won’t admit that adopting e-discovery standards would bring enormous benefit to the practical challenges of data extraction, processing and production.

I'll be discussing these and other issues in the future. Any of the problems mentioned above could be an entire article. I look forward to working with the legal and technical community to address these “technical” standards – as opposed to the widely discussed “strategic” standards which may ultimately be addressed by changes in the Federal Rules of Civil Procedure.

Labels: , , , , , ,

Tuesday, October 28, 2008

Electronic Data Discovery Workflow and Best Practices

Over the past twenty four (24) months I have had the opportunity to meet with law firms, the litigation and IT departments of the Fortune 2000, regional litigation service providers, eDiscovery consultants and eDiscovery technology vendors to discuss the current state and technologies of eDiscovery.  Based upon these discussions, I have come to the conclusion that there are substantial differences in approach and a complete lack of standards in Electronic Data Discovery (EDD) workflow and best practices throughout the entire industry.

EDRMEven after the tremendous body of  EDD research and information being proliferated by the Sedona Conference Institute and the other various groups associated with the Sedona Conference along with the excellent EDD guidelines and standards established by the Electronic Discovery Reference Model, there is still a wide chasm between what these experts have established as the EDD standards and how the main stream litigation market processes EDD.

As such, I am planning to spend the next two (2) - three (3) weeks soliciting detailed input on "real world" EDD workflow and best practices from all available sources and then publish the results.  Please note that this exercise is not intended to replace any of the work done by either the EDRM or the Sedona Conference.    However, I believe that we could lay the foundation for creating a mainstream guide to EDD.

As a place to start, I would like to offer the following questions for discussion:
  1. What tools are being used at each step of the EDRM model if any?
  2. Are these tools integrated or do you have to move data in between them?
  3. If you have to move data, is anyone really using the new EDRM XML standard?
  4. How many IT organizations have instituted data retention policies with litigation in mind?
  5. How many IT organizations are using technologies like Kazeon or Autonomy for for in-place collection, processing and analysis, in an integrated workflow within a single application?
  6. Are the tools as mention in # 5 able to collect all of the data?  And, if not, how the remainder of the data being collected and integrated?
  7. How does litigation hold technology  from vendors such as Exterro fit into the process? And, are they tightly integrated or loosely coupled requiring manual management?
  8. Where do early case assessment technologies from vendors such as Clearwell Systems fit into the process?
  9. Is the market finding that the tools as listed in # 8 are sufficient for "flattening" all of the embedded files from the Electronically Stored Information (ESI) or are they only used to gather and produce a general file extension catalogue?
  10. Is the market using tools like Clearwell for early case assessment and then importing the data into standard EDD processing tools like LAW and iPro for production processing?
  11. Given the answer to # 9, is the market concerned that "smoking gun" ESI is not being discovered?
  12. Given the answers to # 9 - # 11, how are these tools being used in conjunction with  requirements and preparation for the 26(f) conferences?
  13. When is culling and de-dupe taking place and what technologies are being used?
  14. Are the keywords for culling and filtering being externally determined by the legal teams or in some way being internally generated by technology?
  15. Where do conceptual search technologies such as Orcatec and leading de-dupe technologies such as Equivio  fit into this process?
  16. What percentage of ESI is the culling and de-duping process removing from the data pool?
  17. How is the market completing the various review processes and what tools are they using?
  18. Is the market doing early case assessment with native files, loading the results into a review tool, completing an initial "quick peek" of the data to develop keywords and then loading the data back into an EDD processing solution for another culling and de-dupe pass?  Or, is this all happening in a single pass?
  19. Is the market concerned that opposing parties are not processing data effectively enough produce adequate responses?
  20. Is the market actually using the 26(f) conference to define how opposing parties are going to produce responsive data?
  21. How is the market handling privileged ESI during this process?  What is the actual privileged information is only part of an ESI data set?
  22. Is the market still imaging ESI into TIFF and PDF for Review and final production?
  23. How do bates numbers fit into this process?
  24. How is the market integrating paper data into this process?
  25. How is the market handling complex coding?
  26. Is the market still using in house legacy client/server solutions for the most part to complete the various pieces of the EDD process?
  27. What does the market think about the new generation of SaaS based EDD solutions?
  28. Are there any single source end-to-end solutions?
  29. Are there any integrated end-to-end solutions?
  30. Is the market concerned with chain of custody issues?
  31. Have the courts ruled in any definitive way on any accepted way of processing EDD?
  32. Has the cost of eDiscovery gone down or gone up over the past six (6) months?  And, if it has gone down, why?
  33. Do corporate IT departments want control over the EDD process?  And, is that the right place for it to reside?
Answering these questions and the ones that these questions will prompt is certainly not going to fill in the chasm between the current best practices and standards as put forth by the experts and the rest of us mere mortals in the litigation market.  However, it might begin to provide the market with some real world examples of what works and what doesn't work.
I encourage everyone who has input to any one of these questions to respond.  And then , stay tuned for an update in a couple of weeks.

Labels: , , , , , , , , , ,

Wednesday, September 24, 2008

The Global Financial Crisis of 2008 Will Provide a Windfall to eDiscovery Market

With the global financial crisis of 2008 in full swing, I predict that eDiscovery providers worldwide will see a windfall for years to come.    The very nature of the financial industry will provide billions of pages of paper and exabytes of electronically stored information (ESI) stored all over the world for eDiscovery professionals to uncover, extract, process, analyze and prepare for pre-trial conferences, trials and the inevitable litany of appeals.  With most law firms ill-equipped to deal with the new world of eDiscovery, they will have to rely on the new breed of eDiscovery consultants, technology vendors and electronic data discovery processing professionals to even get an initial understanding of the data involved in any of these matters.

Further, after all of the dust begins to settle, the global financial crisis of 2008 will fuel demand for the next generation of integrated eDiscovery/eCompliance/eGoverance technology to ensure more rigorous and meaningful oversight.  This is, without a doubt, the paradigm shift that we have all been waiting for in the eDiscovery market.

Following are a couple of examples of the legal action that is already underway:

Fannie May and Freddie Mac Bailout
Plaintiffs’ firm Coughlin Stoia Geller Rudman & Robbins hasn’t wasted any time taking action in the wake of the Fannie and Freddie bailout.

Partners Samuel Rudman and David Rosenfeld filed a securities class action in the U.S. District Court in the Southern District of New York yesterday on behalf of those who held publicly traded securities of Fannie Mae between November 16, 2007 and last Friday, before the federal government took over the company.

Defendants in the suit are Stephen Ashley, chairman of Fannie’s board; Daniel Mudd, Fannie’s president and chief executive officer; Stephen Swad, Fannie’s ex-chief financial officer; and Robert Levin, formerly the company’s executive vice president and chief business officer.

The complaint alleges that Fannie’s publicly disclosed financial results misrepresented the financial health of the company, and that the defendants either made false statements or failed to disclose the truth to investors. As a result, says the complaint, the defendants’ “fraudulent scheme” was successful in deceiving the public, artificially inflating the prices of publicly traded Fannie Mae stock, and causing class members to buy Fannie stock at those inflated prices.

Class Action Lawsuit Filed against the Primary Fund by Stull, Stull & Brody
NEW YORK, Sep 23, 2008 (BUSINESS WIRE) -- Notice is hereby given that a class action amended complaint (the "Complaint") was filed September 19, 2008 in the U.S. District Court for the Southern District of New York by the law firm of Stull, Stull & Brody and its co-counsel, on behalf of plaintiff and a proposed class of purchasers of shares of The Reserve Fund's Primary Fund (NASDAQ: RFIXX) (the "Fund") during the period September 28, 2007 through September 16, 2008, inclusive (the "Class Period").
The Complaint alleges that the Fund and the Fund's underwriter, investment adviser, officers and trustees and the other related Defendants, violated Sections 11, 12 and 15 of the Securities Act of 1933 by making false and misleading statements and omissions concerning the lack of true diversification of the Fund's assets, safety of principal, access to liquidity and exposure to at least face value debt of $785 million of the now defunct Lehman Brothers Holdings, Inc., that the Fund's risk profile was not only "marginally higher" than cash, the high vulnerability of the money market fund to suddenly drop below $1 per share to as low as $.95 per share, and the fact that the net asset value of the money market fund ("ANAV") was speculative and inflated. Thus, the Fund's Registration Statement and Prospectus issued September 28, 2007, pursuant to which members of the proposed class purchased or acquired shares of the Fund during the Class Period, was materially false and misleading.
Plaintiff seeks to recover damages on his own behalf and on behalf of the Class and is represented by the law firm of Stull, Stull & Brody and its co-counsel Kantrowitz, Goldhamer & Graifman, P.C., firms with significant experience successfully prosecuting complex securities fraud class actions on behalf of defrauded investors.

Labels: , , , , , , ,

Wednesday, September 10, 2008

The Cost of eDiscovery

As we approach the two(2) year anniversary of the December 2006 changes to the Federal Rules of Civil Procedure (FRCP), it has been interesting and somewhat educational to follow the tremendous confusion and the dramatic changes in the cost of processing Electronically Stored Information (ESI). As I have been claiming, the changes to the FRCP and the subsequent requirements to utilize new technology to meet these changes, is a classical example of a market paradigm shift causing chaos and therefore the opportunity for the free market system to run wild. And, as George Socha and Tom Gelbmann report in the 2008 Socha-Gelbmann Survey, commercial expenditures on Electronic Data Discovery (EDD) topped $2.7 billion in 2007 , up 43 percent from 2006, and that they predict that they will grow by 21 percent, 20 percent and 15 percent in 2008, 2009 and 2010 which equates to a $4.5 billion market at the end of 2010. As a point of comparison, Gartner Says "Worldwide IT Spending On Pace to Surpass $3.4 Trillion in 2008".

Standard EDD Fees
In a traditional electronic data discovery (EDD) project, costs break down along the following lines: (1) The cost of collecting data; (2) the cost of processing data from its native format into a format that can be loaded into an electronic document review system, and: (3) the cost of reviewing the data. Please note that according to the Fourth Annual Fulbright Litigation Trends Survey, document review accounted for 30 to 50 percent of all litigation costs. So, the current trend is to reduce the amount of data that has to be reviewed and to also reduce the actual review costs.

Manual Data Collection and Computer Forensics:
The cost of manually collecting data and performing computer forensics is billed at an hourly rate plus expenses. In 2007, rates ranges from $350 - $750 per hour and have dropped to $250-$500 per hour or even lower depending upon the complexity of the project.

EDD Processing: EDD Processing or what is more commonly known as electronic data discovery (EDD) includes loading the data onto an EDD platform, extracting and "flattening" attachments and embedded files, converting the data to a readable format, reducing the volume of the data by culling it down based upon key word filters, de-duping and then creating a load file to enable the data to be loaded into an electronic review software. In 2007 EED processing costs were $1,500 - $2,000 per Gigabyte (GB) and have dropped to less than $1,500 per GB depending upon whether or not the clients request key word filtering, de-duplication or other specialized processing. In addition, in the last six (6) months, some service providers and vendors are charging less than $1,000 per GB for what they are calling "quick peek" EDD processing which is nothing more than flattening and converting the Electronically Stored Information (ESI) into a format that can be read by one of the electronic document review platforms.

Manual Document Review: Document Review is the "time honored" ritual of paying a lawyer or paralegal and hourly rate to literally review every document associated with a matter to determine pertinence, privilege and responsiveness. In 2007, if done manually in the United States by a licensed lawyer, hourly rates for review ranged from $250-$500 per hour or higher. However, in response to the markets demand to reduce the cost of review, many of the larger US based law firms are setting up off-shore operations in India and Malaysia to take advantage of the lower cost of labor and there are also now several third party off-shore document review organizations that are putting downward pressure on hourly fees with rates as low at $35 per hour. In addition, there are now several software solutions that contend that they can supplement human document review to even further reduce the costs.

Recently, several service providers, law firms and other interested third parties have introduced the concept of per-unit pricing for document review as opposed to the conventionally accepted billable hour. So, as with any market going through a paradigm shift, creative people will step in to fill a demand or void. Brett Burney does an excellent job of outlining some of the current trends in reducing the cost of document review in a June 23, 2008 article on the site, titled "Subdue the Costs of Document Review".

Document Review Platforms: There are several very poplar legacy client/server document review platforms such as Concordance and Summation and several newer Online Review Tools (ORT) such as iConect, RingTail, ImageDepot and Lexbe. The costs to host data on these platforms runs about $50 per Gigabyte (GB) with $100 - $200 per GB setup and load fees. However, several of the new Software-as-a-Service (SaaS) based ORT's such as ImageDepot, have recently run promotions to offer hosting and review of unlimited amounts of data for $1 per month for 5 months with a twelve (12) month commitment.

In keeping with the standard dynamics of new technology being introduced into a market in a paradigm shift, these newer ORTs are using SaaS technology that is very scalable and very inexpensive to run and can actually make an acceptable margin at these lower prices. It will be interesting to watch how the legacy vendors respond. The new lower price points will force them to either develop their own SaaS based tools, acquire one of the newer SaaS based tools or go out of business.

Solutions that Reduce the Amount of ESI
As stated, the easiest way to reduce the overall cost of EDD is to reduce the amount of data as early in the process as possible. Not surprisingly, several advanced technology vendors have emerged to fill this demand, including Clearwell Systems and Kazeon.

Clearwell Systems: Clearwell System's E-Discovery Platform, for instance, is an appliance-based system that identifies duplicate email and document attachments. It filters email according to domain, sender and receiver to further winnow the amount of data that needs to be reviewed. Beyond that, it handles reporting and hosting (in your client's data center, of course) and offers privacy controls.
Standard pricing for Clearwell Intelligence Platform is $65,000 per 100 GB of managed data. Compared with processing fees of about $1,500 per gigabyte of data, Clearwell Intelligence Platform costs clients about $650 per gigabyte to reduce the data required for review. By reducing the amount of data to be reviewed (which, again, is more expensive per gigabyte than the processing work), companies can realize significant savings -- a minimum of $200,000 per 100 GB, assuming 1% is actually reviewed.

Kazeon contends that they revolutionizes the way companies perform eDiscovery by using the Kazeon Information Access Platform software to intelligently discover, search & index, classify and act on electronically stored information. Kazeon provides a full spectrum of proactive and reactive eDiscovery solutions in response to litigation, information security and privacy, corporate investigations, regulatory compliance and storage consolidation requirements. The Kazeon Information Server software automates eDiscovery functions from identification, collection through processing, preservation, analysis and review for corporations, service providers, and law firms. Through the development of unique search & indexing, analysis and workflow automation technology, Kazeon has been recognized by various industry forums, including Gartner's eDiscovery MarketScope and the recent Socha-Gelbmann eDiscovery Survey 2008 where Kazeon was recognized as a Top 5 eDiscovery Software Provider. Kazeon has established partnerships with leading companies, including Fujitsu, Siemens, Google, Network Appliance, Oracle and Symantec.

Kazeon's Claim of $4.30 per GB Processing Costs Stirs Up the Industry
On May 6, 2008, in a press releases announcing that Attenex and Kazeon have formed an eDiscovery Alliance, the real news was buried in the boilerplate as Kazeon contended that they could process a Gigabyte (GB) of data for $4.30. This was actually not new news as they had previously contended in a press release titled "ESG Lab Finds Kazeon’s Information Server Delivers Fast and Cost Effective Information Access" this same financial offering in a slightly less alarming way by stating that the ESG Lab verified impressive price/performance with a single Kazeon Information Server appliance able to index 2,500 documents per dollar and a cost as low as $4,300 dollars per terabyte (which is approximately $4.19 per gigabyte).
However, even in the face of the overwhelming Blog posting response from the industry about this claim being completely misleading, Kazeon has not removed the claim from subsequent press releases and in fact has embraced the claim as the center piece of their marketing message. The best and most thoughtful response came from a posting on the eDiscovery 2.0 Blog by Kurt Leafstrand on May 6th, 2008 titled "eDiscovery Processing: You Get What You Pay For". Given the fact that the eDiscovery 2.0 Blog is written and sponsored by Clearwell Systems, a another player in the eDiscovery space, it may be a bit biased. However, Kurt did a great job of listing the various critical and required processing components that need to be considered when comparing the overall cost of Electronic Data Discovery (EDD).

Example of the Cost of an EDD Project (Using Standard Technology)
Given all of the above as a general foundation for the fees associated with an EDD project, following is an example project:
Parameters and Assumptions
  1. Two (2) US based corporations
  2. Two (2) Sites for ESI Collections
  3. Approximately two (2) terabytes (TB) of data (all native ESI)
  4. 2X expansion of ESI during the initial flattening and extraction of attachments and embedded data
  5. 50% reduction during initial processing with key word filtering and de-duplication
  6. Document Review required at multiple locations
  7. Time from matter filing date to completion is estimated to be 3 years.
  8. Onshore legal review at 300 pages per hour
Collection Costs
Collection of the data along with computer forensics work, travel expense, storage hardware and hourly rates is as follows;
Collection Fees = 10 hours x $300/hour = $3,000
Storage Hardware = $2,000
Travel Costs = $2,500
Computer Forensic Fees = 10 hours x $300/hour = $3,000

Total Collection Costs = $10,500

Initial Processing Costs (Flattening, Culling, De-duping)
4 TB of Data x $1,000/ GB = $4M (Reduced back down to 2 TB)

Hosting and Review Technology Costs
36 months of Hosting for 2 TB at $50/GB/Month = $3,600,000

Document Review
2 TB of data at 75,000 pages per GB = 150M pages
150M pages at 300 pages / hour = 500,000 hours at $150 / hour = $75M

Total Cost of Project
Collection = $10,500
Processing = $4M
Hosting = $3.6M
Review = $75M

Total Cost = $82,610,500

Alternative Approach
Obviously, $82M is cost prohibitive. So, where can we save from money? Following is an alternative approach:
Changed Assumptions
  1. Use of new front-end processing technology
  2. 90% reduction in data during culling and de-dupe
  3. Use of off-shore review
Alternative Collection / Processing Costs
Although it may not be completely realistic, lets assume that both sites had technology to automatically gather, filter and de-dupe the data (e.g. Clearwell, Kazeon). And, there was no computer forensic work required. Therefore, the cost of processing 4 TB of data using the Kazeon model would be 4 TB x $4,300 / TB = $8,600 (reduction of data to 400 GB)

Alternative Hosing and Review Technology Costs
36 months of Hosting for 400 GB at $50/GB/Month = $720,000

Document Review
400 GB of data at 75,000 pages per GB = 30M pages
30M pages at 300 pages / hour = 100,000 hours at $40 / hour = $4M

Total Cost of Project
Collection / Processing = $8,600
Hosting = $720,000
Review = $4M

Total Cost = $4,728,600

ESI reduction through culling and de-dupe is essential to reduce the amount of data that has to be hosted and reviewed along with reducing the cost of actual document review is the key to reducing the overall cost of EDD processing.

Labels: , , , , , , , ,

Friday, August 8, 2008

eDiscovery Confounds Companies and Their Lawyers

As an excellent adjunct to my Blog entries over the past several weeks regarding how corporate council within the Global 5000 and their associated outside council are dealing with the eDiscovery Paradigm Shift, Neil Roiter, Senior Technology Editor at recently wrote a great aritlce titled "E-discovery still confounds companies and their lawyers".

Mr. Roiter does an excellent job of describing some of the more pressing issues facing the enterprise and their legal professional both inside and out with quotes from John Benson, an electronic discovery consultant for Kansas City law firm Stinson Morrison Hecker LLP.

I was particularly amused by a quote from John Benson where he state that "The world left the legal profession in the dust years ago," Benson told a Black Hat audience Wednesday. "Attorneys are just coming to the realization that people have computers and have important information on them. I spend a good deal of time dragging attorneys kicking and screaming into the 20th century."

I believe that this comment was definitely representative of the industry in 2006 and 2007. And, it may still be true for a certain portion of the industry that is probably going to have a hard time being successful with the eDiscovery Paradigm Shift. However, I have seen the legal profession take a much more aggressive embrace of technology over the past 18 months.

Mr. Benson also points out that the major cost of eDiscovery is with the review process and that this piece could cost upward of $1,000 per GB. Again, I believe that this was true in 2007 and and even early in 2008. However, with the mass roll out of email archiving technologies such as Kazeon, culling technology from the major EDD players, de-duping technology such as Equivio and the move to less expensive offshore review from organizations such as Integreon, the over all cost of review is beginning to fall. In addition, the legal market is becoming more aware of what eDiscovery should cost and processing best practices and as a result I am also staring to see "Quick Peek" EDD for an initial processing run on native files going for as little as $250-$300 per GB.

Benson also does an excellent job pointing out that eDiscovery security is an issue that the enterprise needs to be concerned with. Becoming known in the industry as "Chain of Custody", there are now several vendors that offer this as part of their overall offering such as "FirstLink" from Trial Solutions or as a stand-alone solution from organizations such as PSS Systems.

In summary, I think that Neil and John have pointed out some very important issues that are facing the legal community in regards to the eDiscovery Paradigm Shift. I just think that they are about 18 months behind in their view of where the industry, or at least the innovators and early adaptors actually are today.

The full article by Mr. Roiter is as follows:

E-discovery is incredibly expensive, time-consuming and fraught with error. If you botch it, your company may lose its case in court and be sanctioned with heavy fines for failing to produce all the required information. And your lawyers can get hauled before the bar association for ethical breaches if their client (that's you) fails to meet its legal obligations.

Federal Rules of Civil Procedure (FRCP) were amended in 2006 to clarify the requirements for e-discovery, said John Benson, an electronic discovery consultant for Kansas City law firm Stinson Morrison Hecker LLP, but the issues around e-discovery should have been resolved a long time ago, he said.

While companies have routinely been creating, distributing, storing, duplicating and re-duplicating information electronically for years, when it comes to e-discovery, most corporations, and what's more troubling, their lawyers, still don't get it.

"The world left the legal profession in the dust years ago," Benson told a Black Hat audience Wednesday. "Attorneys are just coming to the realization that people have computers and have important information on them. I spend a good deal of time dragging attorneys kicking and screaming into the 20th century."

Legal discovery is not a cookie-cutter process. Each corporate environment and case is different. E-discovery is expensive and will likely remain expensive. What's more, the e-discovery process itself is fraught with security issues; but companies can do a lot to minimize costs, strengthen their hand in court, and avoid sanctions while securing information. IT plays a critical role.
Companies and their lawyers typically overreact, attempting to preserve everything, for example, in backup tapes. This just adds to expenses -- IT folks reuse backup tapes for a reason -- and make it harder to sift through terabytes of information.

The greatest cost comes during the review process. Even with new search technologies, information still has to be eye-balled to ensure it's what you're looking for. And there's a lot of it. Mass storage is cheap, and employees can spread information among themselves and scatter it on servers, laptops, PDAs and smartphones, removable storage devices and home computers. Restoring data from backups and imaging files, and cleaning up metadata and OCR to produce documents in their final form for lawyers is costly.

Anticipate that the e-discovery process will cost about $1,000 per gigabyte. This is a fact of life, but you can control the cost, Benson said, by taking steps to identify data, lowering the volume to make it easy to secure and review when you need it, and centralizing storage.

Benson recommends fully documented and well enforced policies and procedures for handling and backing up data. There should be an established litigation response plan, including a formal litigation response team prepared to move into action as soon as the company sniffs the possibility of litigation.

That last point is important. Sometimes a company doesn't see a suit coming until it is served. But there's often much more lead time. You can anticipate possible litigation when there's a data breach or employee termination, for example. The sooner you move the better -- you have to take steps to preserve what's likely to be applicable data immediately. Your at the point when the lawyers, IT managers and other groups who might be involved need to decide what needs to be preserved -- from backup tapes to, possibly, an image of an employee's hard drive -- to avoid tampering.

"This is the most critical time to avoid sanctions and to avoid getting in trouble down the road with counsel and courts around preservation of data," Benson said.

Anticipation is key, he said. FRCP rules require the two sides in litigation to meet and discuss issues surrounding producing electronic information within 99 days of the start of litigation. That's not much time.

Be aware of e-discovery security issues, Benson warned. You're giving your data over to third parties -- your e-discovery processing vendor, your law firm, your opponent's law firm and its processing vendor. They all may be hacker targets, and it's a good bet security's not high on their priorities. There are a lot of new e-discovery vendors out there, Benson warned, vet both them carefully and take steps to make sure your law firm has solid data security policies and practices.
There's good news ahead, though, as technology gets better, and, we hope, companies get more savvy about dealing with electronic data.

"Technology will, over time, change the way legal system works," Benson said. "But that will only happen if there is good, meaningful communication between legal and IT communities. Through that communication, we'll drive the cost of litigation down. That's not necessarily a good thing for law firms, but it's certainly a good thing for corporations."

Labels: , , , ,

Saturday, May 10, 2008

In Search of Integrated Conceptual eDiscovery Search Technology

Over the past 6 months I have been investigating cost effective, integrated conceptual eDiscovery search technology delivered under a SaaS model. The basis for this investigation is to identify a way to extend the current capabilities of eDiscovery search through a forward thinking search technology that can be tightly integrated on the same Microsoft stack based eDiscovery platform with email archiving and other proactive data retention technology, Electronic Data Discovery (EDD) software and an Online Review Tool (ORT). My finding are that the current state of forward thinking search technology is such that it requires the support of a separate and proprietary database and therefore does not lend itself to integration with EDD and ORT platforms that sit on standard SQLServer solutions.

Where this current state of the market leaves the user is with a choice of either moving large amounts of data or least large amounts of index files and associated data between platforms or investing in a completing propriety eDiscovery solution.

In the process of this investigation, I have found several outstanding articles that touch on the various topics incumbent in this discussion. The first article, found on, titled "In Search of Better E-Discovery Methods" by H. Christopher Boehning and Daniel J. Toal, does an excellent job of discussing some of the standard criteria for new search technology and whether or not it surpasses currently available keyword and Boolean search technology.

The second article is actual a Blog posting by Cher Devey, titled "Alternative Search Technologies - Too Good to be True" on her "eDiscovery Myth or Reality?" Blog. Ms. Devey discusses the concept and viability of human intervention into the search process. (Please note that the full text of Ms. Devey's Blog Post can be found at the bottom of this posting).

The full text of Mr. Boehning's and Mr. Toal's article is as follows:

As the burdens of e-discovery continue to mount, the search for a technological solution has only intensified. The holy grail here is a search methodology that will enable litigants to identify potentially relevant electronic documents reliably and efficiently.

In an effort to achieve these often competing objectives, litigants most commonly search repositories of electronic data for documents containing any number of defined search terms (keyword searches) or search terms appearing in a specified relation to one another (Boolean searches). These search technologies have been in use for years, both in litigation and elsewhere, and accordingly are well understood and widely accepted by courts and practitioners.

But keyword and Boolean searches are far from perfect solutions; they are blunt instruments. Such searches will identify only those electronic documents containing the precise terms specified. These methodologies therefore will not catch documents using words that are close, but not identical, to the specified search terms, such as abbreviations, synonyms, nicknames, initials and misspelled words.

On the other hand, using more search terms may reduce the risk that an electronic search will miss a relevant document, but only at the price of increasing -- often quite dramatically -- the number of irrelevant documents found in the search. This is a serious problem because counsel must manually review whatever documents the searches yield in order to sift out non responsive materials, make privilege determinations and designate confidential documents. Keyword and Boolean searches thus require a careful balance to be struck: Unduly restrictive searches may miss too many responsive documents while over broad searches threaten stratospheric discovery costs.

Against this backdrop, courts and litigants understandably have been intrigued by the claims of those promoting alternative search technologies, such as "concept searching." The vendors of such technologies suggest their search strategies are able to identify the overwhelming majority of responsive documents while virtually eliminating the need for lawyer involvement in the review process.

Such claims strike many in the legal community as too good to be true. And their skepticism is appropriately heightened because the precise methodologies that such vendors use often are shrouded in mystery, owing to their stated desire to safeguard their proprietary processes and techniques. But this also means their tantalizing claims cannot readily be subjected to independent scrutiny. The question thus posed -- and still largely unexplored -- is whether these alternative search technologies have anything to offer and, if so, how best to evaluate the competing technologies and the often sensational claims of their promoters.

To evaluate whether an alternative search technology might be helpfully employed in any particular case, it is first essential to understand how it works. Some of the principal alternative search technologies, which fall under the broad heading of "concept searching" methodologies, are as follows:

Clustering. Whereas keyword and Boolean searches mechanically apply certain logical rules to identify potentially relevant documents, clustering relies on statistical relationships, which results in documents containing similar words being clustered together in relevant categories. The clustering tool compares each document in a pool to "seed" documents, which have already been designated as relevant. The more words a document has in common with a seed document, the more likely it is to be about the same subject and therefore to be responsive. Moreover, clustering tools generally rank documents based on their statistical similarity to the seed documents.

Taxonomies and ontologies. A taxonomy tool is used to categorize documents containing words that are subsets of the topics relevant to a litigation. For example, if one of the topics of interest is "dogs," a taxonomy tool would capture documents that mention "golden retrievers," "poodles" and "chihuahuas." Ontology tools perform similar searches, but are not confined to identifying subset relationships. Building on the last example, an ontology tool would capture documents that mention "kennels" or "veterinarians."

Bayesian Classifiers. Bayesian search systems use probability theory to make educated inferences about the relevance of documents based on the system's prior experience in identifying relevant documents in the particular litigation. The search results then would be ranked based on the predicted likelihood of their relevance to the litigation.

These alternative search technologies may sound promising in concept, and the claims about their efficiency and accuracy likely add to their allure, but the question remains whether these approaches outperform the standard search approach.

Keyword searching (including with the use of Boolean connectors), its acknowledged limitations notwithstanding, has secured such widespread acceptance for a reason. As an initial matter, the technology and search methodology is well understood and familiar to anyone who has used Westlaw, Lexis or similar search engines. It therefore can be easily discussed with both opposing counsel and judges. The simplicity of keyword searching also doubtlessly promotes negotiated resolution of discovery disputes because the parties have less reason to fear that ignorance about the technology will lead them to strike a bad bargain.

But the simplicity of keyword searching is also its principal weakness. Keyword searches capture only documents containing the precise terms designated, which virtually assures that such a search will miss relevant documents. And, on the other side of the equation, keyword searches will mechanically capture every document -- whether relevant or not -- containing any search term. This means keyword searches may be both substantially under- and over-inclusive. Concept searching systems, by contrast, are not dependent on a particular term appearing in a document and therefore may locate documents a Boolean search would not. But they may suffer from other infirmities.

So how does concept searching stack up? The best evidence to date comes from the Text REtrieval Conference, which in 2006 designed an independent research project to compare the efficacy of various search methods. In view of the prevalence of keyword and Boolean searches in litigation today, TREC was particularly interested in determining whether the alternative search methodologies outlined above were better than Boolean.

As its starting point, the TREC study used a test set of 7 million documents that had been made available to the public pursuant to a Master Settlement Agreement between tobacco companies and several state attorneys general. Attorneys assisting in the study then drafted five test complaints and 43 sample document requests (referred to as topics). The topic creator and a TREC coordinator then took on the roles of the requesting and responding counsel and negotiated over the form of a Boolean search to be run for each document request.

In addition to the Boolean searches, computer scientists from academia and other institutions attempted to locate responsive documents for each topic utilizing 31 different automated search methodologies, including concept searching. The results were striking. On average, across all the topics, the negotiated Boolean searches located 57 percent of the known relevant documents.

But none of the alternative search methodologies reliably performed any better. That is to say, for each topic, the Boolean search did about as well as the best alternative search methodology.

Interestingly, although the Boolean searches generally outperformed the alternative search protocols, the methods did not necessarily retrieve the same responsive documents. In fact, when all of the responsive documents found by the 31 alternative runs were combined, TREC discovered that the alternative search runs collectively had located, on average, an additional 32 percent of the responsive documents in each topic.

As a result, while the Boolean search generally equaled or outperformed any of the individual alternative search methods, those searches also captured at least some responsive documents that the Boolean search had missed.

This suggests that even if alternative search methodologies have not yet been shown to beat Boolean searches, their use to supplement Boolean searches might increase the number of responsive documents located. But at what cost? The potential benefits of locating any additional documents through use of an alternative search methodology would still have to be weighed against the cost, both in money and resources, required to locate them.

The relevant cost here is not just the price of using the alternative search technology, but also the number of false positives identified by the approach (i.e. documents retrieved by the search, but turn out not to be responsive). Any automated search method -- whether a keyword or concept search -- will yield false positives, which counsel must review and filter out prior to production, which can be a costly process. It therefore is far from clear that use of an alternative search methodology in addition to a keyword or Boolean search will be appropriate in any particular case, a question the TREC study does not attempt to address.

For now, the available evidence suggests that keyword and Boolean searches remain the state-of-the-art and the most appropriate search technology for most cases. This seems particularly true when keyword or Boolean searches are used in an iterative manner, where litigants: (i) negotiate search terms and Boolean operators, (ii) run the agreed-upon searches, (iii) review the preliminary results, and (iv) adjust the searches through a series of meet-and-confers. This type of "virtuous cycle of iterative feedback" has been endorsed by courts and commentators alike.

The intuition of the legal community that an iterative approach to electronic discovery promotes reliability and efficiency finds empirical support in the TREC study. As part of its study, TREC employed an expert tobacco document searcher who used an "interactive" search methodology.

TREC found that the expert searcher located, on average, an additional 11 percent of the relevant documents beyond those that had been located by the initial Boolean searches, which means that an interactive Boolean approach ultimately located 68 percent of the relevant documents -- far better than any of the alternative search methodologies.

It may be that alternative search methodologies eventually will surpass the performance of keyword and Boolean searches, but that day does not yet seem to have arrived.

The independent research conducted to date suggests that, for the time being at least, nothing beats Boolean, particularly when used as part of an iterative process.

That does not necessarily mean that alternative search technologies are not worth considering, either independently or along with Boolean or keyword searches. But practitioners would be well advised to carefully scrutinize the marketing claims of the purveyors of such technologies and to factor in often substantial direct and indirect costs of such approaches.

H. Christopher Boehning and Daniel J. Toal are litigation partners at Paul, Weiss, Rifkind, Wharton & Garrison. Associate Jason D. Jones and Aaron Gardner, the firm's discovery process manager, assisted in the preparation of this article.

The Full Text of Ms. Devey's Blog Posting is as follows:

It seems that alternative search technologies (alternative to the familiar Keyword and Boolean searches) touted by Vendors are considered as ‘too good to be true’. Check it out yourself at In Search of Better E-Discovery Methods By H. Christopher Boehning and Daniel J. Toal, New York Law Journal April 23, 2008

The above legal article also mentioned the Text Retrieval Conference (TREC) 2006 study which was also examined by Will Uppington in the article, Better Search for E-Discovery, March 11th, 2008

What I find interesting in Will Uppington’s article is the finding; ‘One of the best ways to get better search queries is to commit human resources to improving them, by putting a “human-in-the-loop” while performing searches’.

Reading in between these two ‘search themed’ titles, one from the legal side and the other from a technical perspective, highlighted the contrasting findings and interpretation on the TREC 2006 study

What else can we say/talk about the ‘human-in-the loop’, the ‘virtuous cycle of iterative feedback’ & “interactive” search methodology?

Well such phrases/concepts are not new. What is new is that the ‘human actions’ aspects are creeping (awareness?) into the ediscovery space. Other knowledge researchers outside the ediscovery domain have been busily coming up with phrases/concepts such as the ‘concept searching’ methodologies. Reality (or inertia adoption) testing of such newer technologies are clearly not well understood (too good to be true?) by the courts and practitioners.

On human actions and computer programs, a beautiful quote comes from my friend, Roger C: “While computer programs can write other computer programs, they can’t write the first program”.

Labels: , , , , , ,

Tuesday, May 6, 2008

Kazeon Press Release Focuses Attention on the Cost of a Latte and the Falling Cost of eDiscovery

Like lots of other eDiscovery professionals this morning, I had to do a double take when I read the press release today from Kazeon announcing their new partnership with Attenex, titled, "Attenex and Kazeon Announce eDiscovery Alliance". The double take was caused by the fact that Kazeon announced that they can deliver industry-leading price/performance for in-house processing of ESI in preparation for reactive and proactive eDiscovery matters as low as $4.30 per Gigabyte. This is actually not new news as they had previously contended in a press release titled "ESG Lab Finds Kazeon’s Information Server Delivers Fast and Cost Effective Information Access" this same financial offering in a slightly less alarming way by stating that the ESG Lab verified impressive price/performance with a single Kazeon Information Server appliance able to index 2,500 documents per dollar and a cost as low as $4,300 dollars per terabyte (which is approximately $4.19 per gigabyte).

However, any way you crunch the numbers, position the cost or spin the offering, it is just flat alarming and bordering on unbelievable for both users and technology vendors in the eDiscovery market. Bottom line, whether or not you believe that Kazeon is comparing true eDiscovery apples with the rest of the apples in the market, it doesn't matter as this is definitely the first shot across the bow of the rest of the eDiscovery vendors. Prices are comming down and the rest of the market is going to have to keep up. Unfortunately, many of them have very old technology that requires lots of manual manipulation and processing and therefore may not have the "legs" to stay in the race. I don't see any dramatic changes in 2008 as users will still trying to figure out what they are getting for thier investments or not getting from each of the vendors. However, once this is all common knowledge, 2009 may be the year of the changing of the vendor guard in eDiscovery.

All of that being said, the best comments on this press release came from Kurt Leafstrand of Clearwell on his eDiscovery 2.0 Blog. The title of his posting is "eDiscovery Processing: You Get What You Pay For" and does a really great job of questioning whether or not Kazeon can really support even the minimum requirements of the EDRM processing node for what basically amounts to the less than the cost of my favorite Starbukcs venti, breve, sugar free hazlenut latte.

With all the talk of the cost of lattes, maybe the next big eDiscovery winner will be the vendor that announces a cross marketing deal with Starbucks to include a free venti latte with every GB of data procesed. Remember that you heard it first here on the eDiscovery Paradigm Shift Blog.

Because I beleive that it is one of the more humerous eDiscovery posts that I have read in some time, I am inlcuding the entire contents of Kurts Blog post below:

Anyone reading today’s announcement from Kazeon could be forgiven for doing a double-take: did someone misplace the decimal point? Kazeon claims that it can perform “processing of ESI in preparation for eDiscovery matters as low as $4.30 per Gigabyte.” Assuming that’s not simply a typo, it begs an obvious question: If Kazeon really can process information at a tiny fraction of what e-discovery service providers are charging, how come every e-discovery service provider isn’t going out of business? Why wouldn’t everyone take this incredibly good deal?
The answer (in press releases, as in politics) lies in definitions. Exactly what sort of processing would you be getting for your four dollars and change?

You’ll have to ask Kazeon to get the answer to that one, but give a venti latte to a bleary-eyed e-discovery service provider who’s just pulled an all-nighter preparing for a meet-and-confer, and they’ll tell you all about the nuances, complexities, and risks inherent in e-discovery processing that may be difficult for enterprise search/information lifecycle management vendors to grasp. Quite likely, they will refer you to EDRM’s processing node overview, which outlines the basic goals of robust processing:

  1. Capture and preserve the body of electronic documents;
  2. Associate document collections with particular users (custodians);
  3. Capture and preserving the metadata associated with the electronic files within the collections;
  4. Establish the parent-child relationship between the various source data files;
    Automate the identification and elimination of redundant, duplicate data with the given dataset;
  5. Provide a means to programmatically suppress material that is not relevant to the review based on criteria such as keywords, date ranges or other available metadata;
  6. Unprotect and reveal information within files; and Accomplish all of these goals in a manner that is both defensible with respect to clients’ legal obligations and appropriately cost-effective and expedient in the context of the matter.

And that’s just the high-level overview. After the caffeine from the latte starts to kick in, they’ll tell you it’s also absolutely critical to:

  1. Provide statistical count tie-outs that reconcile every incoming email, loose file, and attachment with the processed document set
  2. Automatically scan critical large container files (such as PSTs) for errors and problems prior to processing
  3. Automatically perform custodian mapping to track ownership of all documents
    Maintain detailed reports on every anomaly encountered during processing, down to the individual email, loose file, and attachment
  4. Automatically handle common metadata anomalies (with logging) so that the maximum number of documents are made available for review
  5. Provide robust and thorough handling for container files regardless of container format
    Support non-email content types such as contacts, calendar entries, tasks, and notes
  6. Robustly handle embedded objects
  7. Provide full visibility into exceptions encountered during processing, along with an integrated exception handling process to allow repaired/decrypted data to be easily added back into the document set

All that for under five bucks? That’s quite a deal! But remember, if you drive by your corner gas station tomorrow morning and they’re advertising regular unleaded for 20 cents a gallon: It may be cheap, but it’s probably not gas you’re getting.

Labels: , , ,