This Page

has been moved to new address

The eDiscovery Paradigm Shift

Sorry for inconvenience...

Redirection provided by Blogger to WordPress Migration Service
----------------------------------------------------- Blogger Template Style Name: Snapshot: Madder Designer: Dave Shea URL: / Date: 27 Feb 2004 ------------------------------------------------------ */ /* -- basic html elements -- */ body {padding: 0; margin: 0; font: 75% Helvetica, Arial, sans-serif; color: #474B4E; background: #fff; text-align: center;} a {color: #DD6599; font-weight: bold; text-decoration: none;} a:visited {color: #D6A0B6;} a:hover {text-decoration: underline; color: #FD0570;} h1 {margin: 0; color: #7B8186; font-size: 1.5em; text-transform: lowercase;} h1 a {color: #7B8186;} h2, #comments h4 {font-size: 1em; margin: 2em 0 0 0; color: #7B8186; background: transparent url( bottom right no-repeat; padding-bottom: 2px;} @media all { h3 { font-size: 1em; margin: 2em 0 0 0; background: transparent url( bottom right no-repeat; padding-bottom: 2px; } } @media handheld { h3 { background:none; } } h4, h5 {font-size: 0.9em; text-transform: lowercase; letter-spacing: 2px;} h5 {color: #7B8186;} h6 {font-size: 0.8em; text-transform: uppercase; letter-spacing: 2px;} p {margin: 0 0 1em 0;} img, form {border: 0; margin: 0;} /* -- layout -- */ @media all { #content { width: 700px; margin: 0 auto; text-align: left; background: #fff url( 0 0 repeat-y;} } #header { background: #D8DADC url( 0 0 repeat-y; } #header div { background: transparent url( bottom left no-repeat; } #main { line-height: 1.4; float: left; padding: 10px 12px; border-top: solid 1px #fff; width: 428px; /* Tantek hack - */ voice-family: "\"}\""; voice-family: inherit; width: 404px; } } @media handheld { #content { width: 90%; } #header { background: #D8DADC; } #header div { background: none; } #main { float: none; width: 100%; } } /* IE5 hack */ #main {} @media all { #sidebar { margin-left: 428px; border-top: solid 1px #fff; padding: 4px 0 0 7px; background: #fff url( 1px 0 no-repeat; } #footer { clear: both; background: #E9EAEB url( bottom left no-repeat; border-top: solid 1px #fff; } } @media handheld { #sidebar { margin: 0 0 0 0; background: #fff; } #footer { background: #E9EAEB; } } /* -- header style -- */ #header h1 {padding: 12px 0 92px 4px; width: 557px; line-height: 1;} /* -- content area style -- */ #main {line-height: 1.4;} {font-size: 1.2em; margin-bottom: 0;} a {color: #C4663B;} .post {clear: both; margin-bottom: 4em;} .post-footer em {color: #B4BABE; font-style: normal; float: left;} .post-footer .comment-link {float: right;} #main img {border: solid 1px #E3E4E4; padding: 2px; background: #fff;} .deleted-comment {font-style:italic;color:gray;} /* -- sidebar style -- */ @media all { #sidebar #description { border: solid 1px #F3B89D; padding: 10px 17px; color: #C4663B; background: #FFD1BC url(; font-size: 1.2em; font-weight: bold; line-height: 0.9; margin: 0 0 0 -6px; } } @media handheld { #sidebar #description { background: #FFD1BC; } } #sidebar h2 {font-size: 1.3em; margin: 1.3em 0 0.5em 0;} #sidebar dl {margin: 0 0 10px 0;} #sidebar ul {list-style: none; margin: 0; padding: 0;} #sidebar li {padding-bottom: 5px; line-height: 0.9;} #profile-container {color: #7B8186;} #profile-container img {border: solid 1px #7C78B5; padding: 4px 4px 8px 4px; margin: 0 10px 1em 0; float: left;} .archive-list {margin-bottom: 2em;} #powered-by {margin: 10px auto 20px auto;} /* -- sidebar style -- */ #footer p {margin: 0; padding: 12px 8px; font-size: 0.9em;} #footer hr {display: none;} /* Feeds ----------------------------------------------- */ #blogfeeds { } #postfeeds { }

Saturday, June 14, 2008

eDiscovery Search Predictions for 2008

As I continued my education eDiscovery search platforms over the past couple of weeks and subsequent update to my eDiscovery Paradigm Shift Blog, I came across a really interesting update by Stephen E. Arnold on the current state of the search market in general titled "Search Rumor Round Up, Summer 2008". And, although it is not specific to search in the eDiscovery space, his overview of search is outstanding and very applicable to what we have to look forward to in eDiscovery.

As I pointed out in my post titled "eDiscovery Search Case Law Emerging", the courts are starting to catch up in regards to the value and impact of search in the eDiscovery process along with the subtle nuances of search technology. And, as I talk to law firms and the legal departments of many of the Fortune 500 about their eDiscovery issues, I am finding that the topic of search and more recently conceptual search is being raised more often in the context of culling, de-duping, finding potentially responsive and privileged docs and gaining a better understanding of their data in general. However, I have also found a complete lack of understanding of current search technology and it applicability to the real needs of the litigators and their litigation services consultants.

Further, I am curious to understand the foundation for this weeks acquisition of Attenex by FTI. Was this a fire sale because the conceptual search market has not matured and therefore Attenex has not been able to reach its full potential as a conceptual search based review platform? Or, was this a brilliant move by FTI to add yet another leading edge technology to its roster based upon accelerating market demand? Just for the record, I happen to think that it is the latter. But, there are some intriguing arguments for the former.

The observations and comments in Mr. Arnold's article that I believe are most applicable to eDiscovery are as follows:

Rumor 1: More Consolidation in Search
As eDiscovery technology matures and the obvoius winners in technology and the appropriate strategy/formula for success begins to emerge, we are seeing the same basic consolidation in eDiscovery in general and will continue to see even more eDiscovery technology consolidation over the remainder of 2008 and in to 2009.

Rumor 5: Search Will Become a Commodity
I believe this prediction to be true at the desk top / SaaS based eDiscovery platform level where the user is wanting to search several 100,000 docs in a short period of time. However, at the service center production level where the document pool is terabytes of data, I believe that there is still room for several search technology leaders that can figure out how to get these massive search project done in hours as apposed to days.

Rumor 6: Search Is a Component of Other Enterprise Software
I believe that there is no doubt that user are quickly going to expect sophisticated search to become a seamlessly integrated part of any eDiscovery platform.

Rumor 9: Key Word Search Is Dead
This is my favorite prediction for eDiscovery as I can visualize all the current eDiscovery vendors cringing.

Rumor 10: A Hardware Maker Will Put Search on a Chip
Coming from a background of integrating the appropriate and ripe software technologies into firmware / hardware solutions, I absolutely agree with this prediction. And, it fits really nicely with the "behind the firewall appliance" direction that some of the archiving vendors are going.

All of this being said, the full text of Mr. Arnold's article is as follows:

I am fortunate to receive a flow of information, often completely wacky and erroneous, in my redoubt in rural Kentucky. The last six months have been a particularly rich period. Compared to 2007, 2008 has been quite exciting.

I’m not going to assure you that these rumors have any significant foundation. What I propose to do is highlight several of the more interesting ones and offer a broader observation about each. My goal is to provide some context for the ripples that are shaking the fabric of search, content processing, and information retrieval.

The analogy to keep in mind is that we are standing on top of a jello dessert like this one.

The substance itself has a certain firmness. Try to pick it it up or chop off a hunk, and you have a slippery job on your hands. Now, the rumors:

Rumor 1: More Consolidation in Search
I think this is easy to say, but it is tough to pull off in the present economic environment. Some companies have either investors who have pumped millions into a search and content processing company. These kind souls want their money back. If the search vendor is publicly traded, the set up of the company or its valuation may be a sticky wicket. There have been some stunning buy outs so far in 2008. The most remarkable was Microsoft’s purchase of Fast Search & Transfer. SAS snapped up the little-known Teragram. But the wave of buy outs across the more than 300 companies in the search and content processing sector has not materialized.

Rumor 2: Oracle Will Make a Play in Enterprise Search
I receive a phone call or two a month asking me about Oracle SES10g. (When you access the Oracle Web site, be patient. The system was sluggish for me on June 14, 2008.)The drift of these calls boils down to one key point, “What’s Oracle’s share of the enterprise search market?” The answer is that its share can be whatever Oracle’s accountants want it to be. You see Oracle SES10g is linked to the Oracle relational database and other bits and pieces of the Oracle framework. Oracle’s acquisitions in search and retrieval from Artificial Linguistics more than a decade ago to Triple Hop in more recent times has given Oracle capability. As a superplatform, Oracle is a player in search. So far this year, Oracle has been moving forward slowly. An experiment with Bitext here and a deployment with Siderean Software there. Financial mavens want Oracle to start acquiring search and content processing companies. There are rumors, but so far no action, and I don’t expect significant changes in the short term.

Rumor 3: Microsoft Will Tidy Up Its Search Operations
This rumor suggests that Microsoft, a giant company with many barons and dukes controlling fiefdoms, can deploy one search solution. I don’t think that will happen quickly. The Certified Gold Partners who make better search systems than those available from Microsoft can rest easy for the foreseeable future. Search is too complicated in general and within Microsoft for a one-size-fits-all solution. I anticipate more search options, not fewer. Coveo, Exalead, ISYS Search Software, and others will benefit from the Microsoft approach to search for months, if not years.

Rumor 4: Semantic Search Will Unseat Google
Semantic technology is now within reach of almost any search and content processing vendor. The technology is relatively well known and the processing power is available at a reasonable cost. By itself, semantic search will not be enough to shift the market share that Google is amassing in the consumer search and enterprise markets. Google’s been chugging along for a decade, and it has yet to meet significant competition other than itself. Semantic technology is a component, not a Google killer in the hands of a competitor at this time.

Rumor 5: Search Will Become a Commodity
No, as I described in my Web log post on May 12, 2008, about the “search elephant”, search has too many different meanings for one solution to sweep the board. Each unit of a company has many different search and content processing needs. It is, therefore, difficult to convince the legal department to use the open source Lucene tool for eDiscovery. The legal eagles will want to use a service from Brainware or Stratify. Down the hall, the chemical engineers need to find chemical structure. Search consists of niches, and these will bump heads, overlap, and been quite confusing to sort out. In that confusion, consultants and different vendors thrive.

Rumor 6: Search Is a Component of Other Enterprise Software
This is a rumor related to “search will become a commodity”. True, enterprise software vendors will include more robust search and content processing systems in their software, market the heck out of the enhancements, and bundle it with whatever the client wants to buy. But enterprise applications open the door to point solutions that meet specific needs. So search certainly will become ubiquitous and the ecosystem will spawn new species of information access. Nope, search is going to be with us for a long, long time.

Rumor 7: The Google Search Appliance Doesn’t Work
False. The GOOG has more than 10,000 licensees, a fleet of partners, and the OneBox API that can make the Google Search Appliance work like Roy Roger’s prescient horse, Trigger. The GOOG has had an impact on the enterprise search market. It’s easy to complain about the Google Search Appliance. It’s harder to explain how a company with such an interesting approach to sales can sell such a large number of units. Obviously a certain sector of the market wants these Google boxes.

Rumor 8: Social Search Will Revolutionize Enterprise Search
Nope. Social functions can be useful, but in regulated industries, there are some challenges associated with social search. Social search is the equivalent of a restaurant’s weekly special. If the customers gobble enough of the dish, the special could be promoted to a specialty. It’s early days for social search in an enterprise, but it’s not too soon for law enforcement and military intelligence people to embrace the concept. Social search is quite useful in certain areas, but one needs to have lots of social data to crunch to see the technology in full flower.

Rumor 9: Key Word Search Is Dead
Key word search is hard for many people. Alternatives and options are needed. But key word search is too useful in certain types of research to go the way of the dodo. Investors like to think that a whizzy interface without a search box is the next big thing in search. Interfaces are becoming more important by the hour. But an interface without a way to look for words and phrases won’t carry the day.

Rumor 10: A Hardware Maker Will Put Search on a Chip
What’s happening is research and investigation. The Exegy appliance could be boiled down to a smaller gizmo. At some point in the future, any search appliance could be reduced to firmware. I think the likelihood of search on a chip is high, but it’s not something you will be able to buy in 2008. Intel invested in Endeca for a reason. Intel had a brief love affair and then a messy divorce with search vendor Convera years ago. Other chip-centric outfits are poking around in this area as well. On the horizon, yes, but appliances will be about as close to search in a single package that we will have in 2008.

Most people don’t realize that search is like a giant jello dessert. There is a shimmery, attractive quality to the whole thing. When you start to pick it apart, the substance becomes slippery and tough to pin down. It’s easy to be fooled by surface changes like semantic search and social search, which are like a squirt of whipped topping on the jello. Do you have candidates for rumors you think I should have included in my round up. If so, use the comments section of this Web log to post your favorites. To avoid legal hassles, I may have to edit some of your inputs. Ah, life in the modern world is so rewarding.

Labels: , , , , , , ,

Thursday, June 12, 2008

Concept Search Case Law Emerging

In a followup to my post regarding my investigaiton of a SaaS based conceptual search technology for the eDiscovery market:, I came accross an interesting article on the site by Leonard DeutchmanPennsylvania Law Weekly titled "When E-Discovery Is Put to the Test, Will federal rules on expert testimony govern admission of search engine results?".

This outstanding article discusses the issues of how the courts view search technology in light of Disability Rights Council v. Washington Metropolitan Transit Authority, 242 F.R.D. 139 (D.D.C. 2007) and United States v. O'Keefe, 537 F. Supp. 2d 14 (D.D.C. 2008) and Equity Analytics v. Lundin, 2008 U.S. Dist. LEXIS 17407 (D.D.C. Mar. 7, 2008).

Please note that I am still evaluating search technology and will post my finding as soon as I have completed my investigations. However, I felt as though this article raised so many pertinent and timely issues that I wanted to post it before my findings were complete.

The fulll article is as follows:

An influential federal district judge whose opinions on e-discovery are well respected may have set e-discovery on a path toward its most searching scrutiny yet.

In Disability Rights Council v. Washington Metropolitan Transit Authority, 242 F.R.D. 139 (D.D.C. 2007), Judge John M. Facciola recommended "concept searching," -- the use of complex search engines that make use of linguistic or statistical patterning to locate responsive e-mails and electronic -documents, in order for a tardy producer of discovery to wade through voluminous electronically stored information quickly. Interestingly, Facciola made no mention of whether the use of concept searching tools should be subject to Federal Rule of Evidence 702, which governs the admission of scientific or expert testimony.

Recently, however, in United States v. O'Keefe, 537 F. Supp. 2d 14 (D.D.C. 2008) and Equity Analytics v. Lundin, 2008 U.S. Dist. LEXIS 17407 (D.D.C. Mar. 7, 2008), Facciola held that any challenges to or defenses of search methodology in producing e-discovery must be scrutinized under Rule 702, and so ordered hearings under Daubert v. Merrill Dow Pharmaceuticals, 509 U.S. 579 (1993).

These rulings give rise to the question of what a Daubert hearing for an e-discovery search engine would look like.

The first issue the court would address is how search engines search. The most direct approach is keyword searching, which take three basic forms:
Direct searching for keywords, e.g., "Locate all files with 'Jones.'"
Boolean searching, e.g., "Locate 'Jones' or 'Smith'," "Locate 'Jones' but not 'Smith,'" and other combinations.

Proximity searching, e.g., "Locate 'Jones' within 25 words of 'Smith.'" Often such searching is restricted by date range, e.g., "Locate all e-mails with 'Jones' created after January 1 but before July 1, 2007 only."

Concept searching, as has already been briefly discussed, takes a different approach. It targets information relating to a concept even if specific keywords are not present (e.g., a series of e-mails mentioning the words "Clinton," "McCain" and "Obama" would likely concern the 2008 U.S. presidential election, even if the phrase "presidential election" does not appear).

Some concept searching tools use "taxonomies" or "ontologies," that is, compilations of both commercially available data and data supplied by the client pertinent to the case collected from the lawyers and key players. Some concept searching uses linguistic analysis examining how the communicants discuss matters, while other approaches, such as "clustering" and "latent semantic indexing," use mathematical probabilities to determine whether a given file is related to a given concept. For an excellent discussion of concept searching, see "The Sedona Conference Best Practices Commentary on the Use of Search and Informational Retrieval Methods in E-Discovery."

Regardless of which approach the search engine takes, the actual Daubert hearing will prove difficult for two practical reasons, both stemming from the fact that search engine applications are proprietary. First, it simply will be hard to get the designer to appear at the hearing to testify as to how the engine works. Second, the designer will fight giving the best evidence of the efficacy of the engine, that is, the engine's source code, because that code is proprietary. Should the code be revealed, the design would lose its value, as anyone could use that code without having to obtain a license (i.e., a copy of the application) from the designer.

Proprietary applications can be validated without their source code revealed, but only under certain circumstances. The easiest is where a specific positive finding needs to be corroborated. For example, if a proprietary forensic search tool such as Guidance Software's EnCase reports that a file is found at a particular location on a hard drive, an examiner can use an "open source" tool, i.e., a tool whose source code is known and which has been validated, to confirm the finding. Such corroboration, however, does not validate the search tool, only the result of the use of that tool at a particular time. To validate a proprietary tool generally using open source tools requires months of work, thousands of hours by highly experienced analysts, such as the FBI put in when validating EnCase. Of course, each time a new version of a tool comes out, more hours of validation are needed. Thus, while this means may be reliable, it is hardly practical.

A second means would be to use another proprietary tool -- say, Access Data's FTK -- to run the same search as EnCase performed and compare results. This method, however, is not truly scientific, since identical search results are just as likely to confirm that the two engines are identically flawed as they are reliable.

A third means to validate a proprietary search engine without revealing its code would be to search test data sets with known test results and which contain the types of data that the engine would search when regularly deployed. Comparing the results of the searches by the proprietary search engine to the known results should validate or invalidate the search engine. Again, however, such testing is extremely time-consuming and expensive. The designer would have to engage in such testing and publish its results; one could hardly expect the typical user of the search engine to engage in such studies.

As previously stated, the second Daubert hearing issue is where the searching was done. Specifically, the issue would be whether only the files actively stored on a hard drive, for example, were searched, or whether deleted files, temporary files or file fragments in the "unallocated space" of a hard drive were also searched. When ESI is gathered, unless bit stream, forensic images (i.e., exact copies of every 1 and 0 on a piece of media) are made, the deleted files, etc., will not even be present to search. To search for such ESI, forensic tools must be used. Thus, in United States v. O'Keefe, for example, the defendant challenged the government's search results for potentially exculpatory evidence in its possession by arguing that by not looking "everywhere" on the drive for deleted files or file fragments, the government had not fully discharged its duty to search everywhere.

The problem with searching "everywhere," however, is not so much a Rule 702 problem as a practical one: forensic searches of every possible file fragment take impossibly long, and if many hard drives and servers are involved, the impossible becomes unthinkable. O'Keefe, however, raises another issue, one far more interesting and conceptually difficult: for search engines, passing the Daubert test may depend upon whether one is trying to prove that something is there or that something is not there.

Anyone who remembers examining scientific method when taking high school or college science classes will recall the question whether the absence of evidence that "x" is present means that "x" truly is not present or whether the test for finding "x" was simply insufficient. For example, while a PET Scan's positive finding for cancer is conclusive, a failure to detect cancer may mean the absence of cancer or that the PET Scan failed to detect cancer that was present.

Thus, the acceptance of a test as scientific proof under Daubert and Rule 702 is more likely when the test is to prove that something is present than absent. In Sanders v. Texas, 191 S.W.3d 272 (Ct. App. 2006), for example, the Texas Court of Appeals had no trouble affirming the trial court's findings that the expert's use of EnCase to create a bit stream, forensic image of the defendant's hard drive and his search of the drive to uncover child pornography -- both positive findings -- was scientifically valid. Since Encase's findings can be corroborated by a tool other than the proprietary one used, the validity of the imaging and search is much easier to establish.
In O'Keefe, Equity Analytics v. Lundin and the prototypical e-discovery matter, the typical challenge is the opposite of the typical challenge in a criminal matter: the requesting party's typical challenge to e-discovery production is not that it is inauthentic but that it is incomplete. The Daubert challenge in e-discovery cases is to prove that the search results yielded "everything."

If the search engine in question were an open-source tool, the challenge could be more easily met: the search engine's methodology would be open for all to test, and it would either work when searching test sets with known results or produce anomalies or mistakes. However, if the search engine is proprietary, proving the negative (it did not miss anything) by proving the positive (this is how it searches) is not available to the tool's proponent. The "third means" discussed above -- subjecting the search tool to known test data to see whether it missed any "hits" -- could work, but that means is extremely time-consuming, expensive and beyond the capability of the typical user.

The Sedona Conference commentary provides an interesting method of "corroboration." It cites a study in which review attorneys, doing a "manual" review of discovery, were asked how much responsive data they were able to find. The attorneys guessed 75 percent, but a detailed analysis revealed that they had found only 20 percent. Using that study to illustrate what the Sedona Conference's commentary refers to as the "myth of perfection," i.e. that review attorneys slogging through e-documents and e-mails will catch responsive ESI that concept search engines will miss, the commentary makes the scientifically questionable but legally valid point that the validity of concept search tools must be determined by measuring concept search results against the actual results of review attorneys, not against results of a "perfect" search. If concept searching improves upon review practice as it now stands, it is a valid litigation tool.
In making its point, the Sedona Conference commentary returns to a touchstone of discovery practice: that when producing discovery, a "perfect review ... of information is not possible ... . The governing legal principles and best practices do not require perfection in making disclosures or in responding to discovery requests."

The Daubert challenge raised by Facciola, then, may be met not by judging the scientific validity of a search engine in an absolute way, but by judging how valid it is to suit the purposes of e-discovery production, an undertaking which involves many factors, such as the costs in time, money and energy to the producing party and their marginal benefit to the requesting party and the litigation, that have no bearing on the scientific validity of the search engine. In other words, the ultimate acceptance of an e-discovery search tool may be informed by its relative perfection but will ultimately depend, like so many other things in the law, upon the totality of circumstances.

Labels: , , , , ,

Wednesday, June 11, 2008

Trial Solutions Offers SaaS Hosting for 5 Gigabytes for 5 Months for 5 Users for $5

As a rule, I don't often talk about Trial Solutions or its products and offerings. However, I believe that their decision to offer ImageDepot, its SaaS based Hosting and Review Platform for $1 per Gigabyte represents a true paradigm shift in the eDiscovery market and therefore warrants mention on my Blog.

Trial Solutions, which has been very quietly offering ImageDepot for a number of years to its Service Provider Alliance Network, decided to begin distributing ImageDepot earlier this year to the eDiscovery market as part of its Litigation Support Ecosystem. With this aggressive new pricing, ImageDepot, the only true SaaS based production quality ORT in the industry based upon the Microsoft Technology Stack, could very quickly become an industry standard for online review. Or, at the very least will provide a very attractive alternative to any law firm or corporate legal department that is currently paying thousands of dollars per month hosting with one of the legacy hosting platforms.

The Company reports that the new ImageDepot promotion will enable users to host and review 5 Gigabytes of data for 5 months for 5 users for $5. Or, an unlimited amount of data for 5 months for an unlimited number of users for $5 with a 12 month hosting committment. This second option is available for both new projects and for transferring existing projects from another hosting platform. The Company notes that at the end of the initial promotional term that standard monthly hosting rates will apply. Further, the company indicated that the 5 for 5 Promotional offer will only be available through authorized ImageDepot Resellers.

For more information, you can contact Trial Solutions at 877-595-6464 or visit the ImageDepot Website at

Labels: , , , ,

Tuesday, June 10, 2008

Interesting eDiscovery Technology Acquisition Opportunity

This entry marks that first time that I have posted information about a technology vendor that has eDiscovery technology and the desire to be acquired. It is important to point out that I am not a broker and I am only posting this information as a source of information for the eDiscovery market.

The eDiscovery technology company believes they would be a good acquisition target for an email/data archiving company or a national litigation support/e-discovery provider that would like to have its own e-discovery processing/production and review platform. Currently, they are open to acquisition by a larger company that would provide sales and support infrastructure or a strategic investment that would allow them to build out their own infrastructure. The technology fits into the Processing, Review, Analysis and Production phases of the Electronic Discovery Reference Model (EDRM).

For more information about this company and transaction please go to the Litigation Support Industry: Business News and Information Blog.

Labels: ,