This Page

has been moved to new address

Computer Aided Review Has Arrived

Sorry for inconvenience...

Redirection provided by Blogger to WordPress Migration Service
----------------------------------------------------- Blogger Template Style Name: Snapshot: Madder Designer: Dave Shea URL: mezzoblue.com / brightcreative.com Date: 27 Feb 2004 ------------------------------------------------------ */ /* -- basic html elements -- */ body {padding: 0; margin: 0; font: 75% Helvetica, Arial, sans-serif; color: #474B4E; background: #fff; text-align: center;} a {color: #DD6599; font-weight: bold; text-decoration: none;} a:visited {color: #D6A0B6;} a:hover {text-decoration: underline; color: #FD0570;} h1 {margin: 0; color: #7B8186; font-size: 1.5em; text-transform: lowercase;} h1 a {color: #7B8186;} h2, #comments h4 {font-size: 1em; margin: 2em 0 0 0; color: #7B8186; background: transparent url(http://www.blogblog.com/snapshot/bg-header1.gif) bottom right no-repeat; padding-bottom: 2px;} @media all { h3 { font-size: 1em; margin: 2em 0 0 0; background: transparent url(http://www.blogblog.com/snapshot/bg-header1.gif) bottom right no-repeat; padding-bottom: 2px; } } @media handheld { h3 { background:none; } } h4, h5 {font-size: 0.9em; text-transform: lowercase; letter-spacing: 2px;} h5 {color: #7B8186;} h6 {font-size: 0.8em; text-transform: uppercase; letter-spacing: 2px;} p {margin: 0 0 1em 0;} img, form {border: 0; margin: 0;} /* -- layout -- */ @media all { #content { width: 700px; margin: 0 auto; text-align: left; background: #fff url(http://www.blogblog.com/snapshot/bg-body.gif) 0 0 repeat-y;} } #header { background: #D8DADC url(http://www.blogblog.com/snapshot/bg-headerdiv.gif) 0 0 repeat-y; } #header div { background: transparent url(http://www.blogblog.com/snapshot/header-01.gif) bottom left no-repeat; } #main { line-height: 1.4; float: left; padding: 10px 12px; border-top: solid 1px #fff; width: 428px; /* Tantek hack - http://www.tantek.com/CSS/Examples/boxmodelhack.html */ voice-family: "\"}\""; voice-family: inherit; width: 404px; } } @media handheld { #content { width: 90%; } #header { background: #D8DADC; } #header div { background: none; } #main { float: none; width: 100%; } } /* IE5 hack */ #main {} @media all { #sidebar { margin-left: 428px; border-top: solid 1px #fff; padding: 4px 0 0 7px; background: #fff url(http://www.blogblog.com/snapshot/bg-sidebar.gif) 1px 0 no-repeat; } #footer { clear: both; background: #E9EAEB url(http://www.blogblog.com/snapshot/bg-footer.gif) bottom left no-repeat; border-top: solid 1px #fff; } } @media handheld { #sidebar { margin: 0 0 0 0; background: #fff; } #footer { background: #E9EAEB; } } /* -- header style -- */ #header h1 {padding: 12px 0 92px 4px; width: 557px; line-height: 1;} /* -- content area style -- */ #main {line-height: 1.4;} h3.post-title {font-size: 1.2em; margin-bottom: 0;} h3.post-title a {color: #C4663B;} .post {clear: both; margin-bottom: 4em;} .post-footer em {color: #B4BABE; font-style: normal; float: left;} .post-footer .comment-link {float: right;} #main img {border: solid 1px #E3E4E4; padding: 2px; background: #fff;} .deleted-comment {font-style:italic;color:gray;} /* -- sidebar style -- */ @media all { #sidebar #description { border: solid 1px #F3B89D; padding: 10px 17px; color: #C4663B; background: #FFD1BC url(http://www.blogblog.com/snapshot/bg-profile.gif); font-size: 1.2em; font-weight: bold; line-height: 0.9; margin: 0 0 0 -6px; } } @media handheld { #sidebar #description { background: #FFD1BC; } } #sidebar h2 {font-size: 1.3em; margin: 1.3em 0 0.5em 0;} #sidebar dl {margin: 0 0 10px 0;} #sidebar ul {list-style: none; margin: 0; padding: 0;} #sidebar li {padding-bottom: 5px; line-height: 0.9;} #profile-container {color: #7B8186;} #profile-container img {border: solid 1px #7C78B5; padding: 4px 4px 8px 4px; margin: 0 10px 1em 0; float: left;} .archive-list {margin-bottom: 2em;} #powered-by {margin: 10px auto 20px auto;} /* -- sidebar style -- */ #footer p {margin: 0; padding: 12px 8px; font-size: 0.9em;} #footer hr {display: none;} /* Feeds ----------------------------------------------- */ #blogfeeds { } #postfeeds { }

Tuesday, January 12, 2010

Computer Aided Review Has Arrived

Over the past several months I have been blogging about the emergence of Hal 900 like experiences in eDiscovery and Web 3.0. As such, I have been following the work of Dr. Herb Roitblat regarding the utilization of semantic and related technology to reduce the cost of document review through computer aided document categorization.

In a Blog posting on the Orcatec Blog, "Information Discovery", on Monday January 11, 2009 titled, "Computer Assisted Document Categorization in eDiscovery ", Dr. Roitblat announces that the January issue of the Journal of the American Society for Information Science and Technology, 61(1):1–11, 2010, has an article by Roitblat, Kershaw, and Oot describing a study that compared computer classification of eDiscovery documents with manual review. It found that computer classification was at least as consistent as human review was at distinguishing responsive from non responsive documents. If having attorneys review documents is a reasonable approach to identifying responsive documents, then any system that does as well as human review should also be considered a reasonable approach.

Dr. Roitblat reports in his Blog posting that the study compared an original categorization, done by contract attorneys in response to a Department of Justice Second Request with one done by two new human teams and two computer systems.

He further reported that the two re-review teams were employees of a service provider specializing in conducting legal reviews of this sort. Each team consisted of 5 reviewers who were experienced in the subject matter of this collection. The two teams independently reviewed a random sample of 5,000 documents. The two computer systems were provided by experienced eDiscovery service providers, one in California, and one in Texas.

The documents used in the study were collected in response to a "Second Request" concerning Verizon's acquisition of MCI. The documents were collected from 83 employees in 10 US states. Together they consisted of 1.3 terabytes of electronic files in the form of 2,319,346 documents. The collection consisted of about 1.5 million email messages, 300,000 loose files, and 600,000 scanned documents. After eliminating duplicates, 1,600,047 items were submitted for review. The attorneys spent about four months, seven days a week, and 16 hours per day on the review at a total cost of $13,598,872.61 or about $8.50 per document. After review, a total of 176,440 items were produced to the Justice Department.Accuracy was measured as agreement with the decisions made by the original review team. The level of agreement between the two human review teams was also measured. The two re-review teams identified a greater proportion of the documents as responsive than did the original review. Overall, their decisions agree with the original review on 75.6% and 72.0% of the documents. The two teams agreed with one another on about 70% of the documents.About half of the documents that were identified as responsive by the original review were identified as responsive by either of the re-review teams. Conversely, about a quarter of the documents identified as nonresponsive by the original review were identified as responsive by the new teams. Although the original review and the re-reviews were conducted by comparable people with comparable skills, their level of agreement was only moderate.

Dr. Roitblat reported that they did not know whether this was due to variability in the original review, or was due to some other factor, but these results are comparable to those seen in other situations where people make independent judgments about the categorization of documents (for example, in the TREC studies). A senior attorney reclassified the documents on which the two teams disagreed. After this reclassification, the level of agreement between this adjudicated set and the original review rose to 80%. The two computer systems identified fewer documents as responsive than did the human review teams, but still a bit more than were identified by the original review.

One system agreed with the original classification on 83.2% of the documents and the other on 83.6%. Like the human review teams, about half of the documents identified as responsive by the original review were similarly classified by the computer systems.

Dr, Roitblat commented that, "As legal professionals search for ways to reduce the costs of eDiscovery, this study suggests that it may be reasonable to employ computer-based categorization. The two computer systems agreed with the original review at least as often as a human team did. "

If this is indeed true, the laborious and expensive task of document review could indeed be in the process of evolving to the next level of computer aided automation. Add in come simple computer aided voice technology and a Hal 900 experience may not be that far away.

For more information about this paper or to discuss computer aided document categorization, you may contact Dr. Roitblat at: herb@orcatec.com.



Labels: , , ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home