This Page

has been moved to new address

The eDiscovery Paradigm Shift

Sorry for inconvenience...

Redirection provided by Blogger to WordPress Migration Service
----------------------------------------------------- Blogger Template Style Name: Snapshot: Madder Designer: Dave Shea URL: mezzoblue.com / brightcreative.com Date: 27 Feb 2004 ------------------------------------------------------ */ /* -- basic html elements -- */ body {padding: 0; margin: 0; font: 75% Helvetica, Arial, sans-serif; color: #474B4E; background: #fff; text-align: center;} a {color: #DD6599; font-weight: bold; text-decoration: none;} a:visited {color: #D6A0B6;} a:hover {text-decoration: underline; color: #FD0570;} h1 {margin: 0; color: #7B8186; font-size: 1.5em; text-transform: lowercase;} h1 a {color: #7B8186;} h2, #comments h4 {font-size: 1em; margin: 2em 0 0 0; color: #7B8186; background: transparent url(http://www.blogblog.com/snapshot/bg-header1.gif) bottom right no-repeat; padding-bottom: 2px;} @media all { h3 { font-size: 1em; margin: 2em 0 0 0; background: transparent url(http://www.blogblog.com/snapshot/bg-header1.gif) bottom right no-repeat; padding-bottom: 2px; } } @media handheld { h3 { background:none; } } h4, h5 {font-size: 0.9em; text-transform: lowercase; letter-spacing: 2px;} h5 {color: #7B8186;} h6 {font-size: 0.8em; text-transform: uppercase; letter-spacing: 2px;} p {margin: 0 0 1em 0;} img, form {border: 0; margin: 0;} /* -- layout -- */ @media all { #content { width: 700px; margin: 0 auto; text-align: left; background: #fff url(http://www.blogblog.com/snapshot/bg-body.gif) 0 0 repeat-y;} } #header { background: #D8DADC url(http://www.blogblog.com/snapshot/bg-headerdiv.gif) 0 0 repeat-y; } #header div { background: transparent url(http://www.blogblog.com/snapshot/header-01.gif) bottom left no-repeat; } #main { line-height: 1.4; float: left; padding: 10px 12px; border-top: solid 1px #fff; width: 428px; /* Tantek hack - http://www.tantek.com/CSS/Examples/boxmodelhack.html */ voice-family: "\"}\""; voice-family: inherit; width: 404px; } } @media handheld { #content { width: 90%; } #header { background: #D8DADC; } #header div { background: none; } #main { float: none; width: 100%; } } /* IE5 hack */ #main {} @media all { #sidebar { margin-left: 428px; border-top: solid 1px #fff; padding: 4px 0 0 7px; background: #fff url(http://www.blogblog.com/snapshot/bg-sidebar.gif) 1px 0 no-repeat; } #footer { clear: both; background: #E9EAEB url(http://www.blogblog.com/snapshot/bg-footer.gif) bottom left no-repeat; border-top: solid 1px #fff; } } @media handheld { #sidebar { margin: 0 0 0 0; background: #fff; } #footer { background: #E9EAEB; } } /* -- header style -- */ #header h1 {padding: 12px 0 92px 4px; width: 557px; line-height: 1;} /* -- content area style -- */ #main {line-height: 1.4;} h3.post-title {font-size: 1.2em; margin-bottom: 0;} h3.post-title a {color: #C4663B;} .post {clear: both; margin-bottom: 4em;} .post-footer em {color: #B4BABE; font-style: normal; float: left;} .post-footer .comment-link {float: right;} #main img {border: solid 1px #E3E4E4; padding: 2px; background: #fff;} .deleted-comment {font-style:italic;color:gray;} /* -- sidebar style -- */ @media all { #sidebar #description { border: solid 1px #F3B89D; padding: 10px 17px; color: #C4663B; background: #FFD1BC url(http://www.blogblog.com/snapshot/bg-profile.gif); font-size: 1.2em; font-weight: bold; line-height: 0.9; margin: 0 0 0 -6px; } } @media handheld { #sidebar #description { background: #FFD1BC; } } #sidebar h2 {font-size: 1.3em; margin: 1.3em 0 0.5em 0;} #sidebar dl {margin: 0 0 10px 0;} #sidebar ul {list-style: none; margin: 0; padding: 0;} #sidebar li {padding-bottom: 5px; line-height: 0.9;} #profile-container {color: #7B8186;} #profile-container img {border: solid 1px #7C78B5; padding: 4px 4px 8px 4px; margin: 0 10px 1em 0; float: left;} .archive-list {margin-bottom: 2em;} #powered-by {margin: 10px auto 20px auto;} /* -- sidebar style -- */ #footer p {margin: 0; padding: 12px 8px; font-size: 0.9em;} #footer hr {display: none;} /* Feeds ----------------------------------------------- */ #blogfeeds { } #postfeeds { }

Friday, July 13, 2012

Five Initial Steps to Meet the Governance, Risk and Compliance Obligations Brought on by Today's Big Data File Stores

The accelerating increase in the amount of unstructured Electronically Stored Information (ESI) is leaving IT organizations struggling with how to store and manage all of this new information. Aside from just providing the underlying storage infrastructure to host this amount of data, companies are also faced with the task of properly managing their Big Data file stores to meet existing governance, risk and compliance obligations. To do so, there are five steps they can take now to position their organization to meet them.


According to a 2010
report by IDC, the amount of information created, captured or replicated has exceeded available storage for the first time since 2007. The size of the digital universe this year will be tenfold what it was just five years earlier. According to this same IDC report, the volume of unstructured ESI is expected to grow at over 60% CAGR (Compounded Annual Growth Rate).

According to Forrester Research and as
reported in an article that appeared on Forbes website last week:
  • The average organization will grow their data by 50 percent in the coming year
  • Overall corporate data will grow by a staggering 94 percent
  • Database systems will grow by 97 percent
  • Server backups for disaster recovery and continuity will expand by 89 percent
Overseeing the expansion of storage space and ensuring that the data is protected has become a minor part of the overall task of Big Data file storage and management. Business stakeholders and the Information Technology (IT) organizations from enterprises of all sizes and across all industries must now face a list of Governance, Risk and Compliance (GRC) regulations to which they have to legally comply or face potentially fatal financial penalties to the enterprise. 

The most obvious laws to which they are subject include:
  • Sarbanes-Oxley (SOX)
  • Health Insurance Portability and Accountability Act (HIPAA)
  • Gramm-Leach-Bliley (GLBA)
  • Federal Information Security Management Act (FISMA)
  • Consumer Information Protection Laws
  • Federal Rules of Civil Procedure (FRCP)

Further, the list of new regulations is growing. The passage of The Patient Protection and Affordable Care Act (PPACA) will result in the US Government adding 159 new agencies, programs, and bureaucracies to assist with the compliance of over 12,000 pages of new regulations. Over the past ten years, in response to the threat of international terrorism, the US Department of Homeland Security (DHS) has added hundreds of new regulations. Finally, cyber terrorism, including acts of deliberate, large-scale disruption of enterprise computer networks, is now a reality that all businesses must face.

In the face of this, Big Data file storage and management vendors, along with the associated industry consultants, have developed a list of hardware and software requirements and associated value propositions to help enterprise buyers decide which Big Data file storage and management platforms to purchase.

But before they buy, there are five steps that buyers should take first to ensure they are prepared to meet the governance, risk and compliance obligations brought on by today's Big Data file stores:
  • Internal Collaboration: File management and Governance, Risk and Compliance (GRC) requirements affect business stakeholders from the boardroom to IT to the manufacturing floor and loading dock to the accounting office. The development of cross functional workgroups and the promotion of internal collaboration between functional experts is the key to successfully identifying, understanding and addressing all of the requirements and issues involved in Big Data file management across the entire enterprise.
  • Network Architecture Planning:  Over the past 25 years, enterprise architectures grew with little or no planning resulting in wasteful redundancy and little or no access to all the enterprise data as may be required to comply with today’s GRC requirements. The advent of the Internet and now cloud computing has brought this decades of poorly planned networks to light resulting in them become more of an enterprise liability than an asset. The time is now for IT to hit the restart button and explore new options such as virtualization, hybrid cloud architectures and the use of cloud service providers (CSPs) that enable them to better leverage, manage and optimize their existing infrastructure..
  • Security:  The introduction and proliferation of portable storage devices, Wireless Internet, mobile computing devices, enterprise Software-as-as-Service (SaaS) applications, cloud storage, blogs and social media such as Facebook, LinkedIn and Twitter, data theft and cyber attacks are a real issue for which many (and arguably most) companies do not have a good answer. Now is the time for IT to take a serious look at their internal file access policies and move as quickly as possible to address any existing shortcomings.
  • Data Retention Policy Development and Implementation: Sarbanes-Oxley (SOX), the Health Insurance Portability and Accountability Act (HIPAA) and the Federal Rules of Civil Procedure (FRCP) all have very specific data retention guidelines for what types of ESI data an enterprise has to keep and how long to keep it.  Enterprises must investigate and document these requirements, development data retention policies and acquire the appropriate software to ensure compliance.
  • Technology Vendors and Consulting Partners: Business stakeholders and IT management may be overwhelmed with the task of addressing the issues of successfully meeting the GRC obligations of big file storage and management. If this is the case, reach out to the hardware and software vendor community and askhow their solutions support these issues. If required, engage the services of vendor independent consulting partners to act as trusted advisors to assist in the successful navigation of the required cultural transitions and the acquisition of the best technology platforms.

The accelerating increase in the amount of unstructured Electronically Stored Information (ESI) is putting IT organizations on the defensive as they struggle to figure out how to store and manage all of this new information. However, overseeing the expansion of storage space and ensuring that appropriate backups are completed has become a minor part of the overall task of big file storage and management.

Rather business stakeholders and IT staff need to act now to first bring their infrastructure under control so they can get in front of the growing list of GRC regulations to which they are subject. By following the five steps outlined above, enterprises will be in a position so that when they purchase a product, they will have a good grasp of what their true enterprise challenges are and have a high probability of bringing in a product that addresses them.

Labels: , , , , , , , , , , , ,

Wednesday, July 11, 2012

BeyondRecognition Ranked as Top Disruptive eDiscovery Technology to Watch in 2012


As a followup to eDSG's list of the Top Five eDiscovery Technologies to Watch in 2012,  today's blog post is a more detailed overview of BeyondRecognition, the image processing technology that I ranked as the number one eDiscovery technology to watch in 2012.

This past week, I had the opportunity to spend some time with John Martin, founder of BeyondRecognition ("BR") and long-time document conversion and litigation technology expert. First of all, to put BR's new technology into perspective, not much has changed in Optical Character Recognition (OCR) for over 30 years.  And, although once you begin to understand what John has created, you will realize that it is much more than just OCR, OCR is a good place to start the comparison.

OCR software electronically translates scanned images of handwritten, typewritten or printed text into machine-encoded text. This software is used to convert books and documents into electronic files, to computerize record-keeping systems in offices and to publish text onto websites.  And, with the accelerating rush to Electronically Stored Information (ESI), it would be easy to think that there just isn't that much paper to convert.  However, there are literally trillions of existing documents that will someday need to be converted with billions more yet to be created. The legal, healthcare, mortgage and government markets are currently the prime offenders for creating more paper.  In fact, in the five years to 2012, revenue for the Optical Character-Recognition Software industry is expected to increase at an annualized rate of 1.6% to $386.9 million.

And,  OCR software developers have not really upgraded OCR software for a very long time.  State-of-the-art today is not much different that it was 5-10 years ago.  That's why John and BR have the opportunity to disrupt the market with a completely new approach to the challenge of converting non-digital documents to searchable ESI.

BeyondRecognition Technology

BR’s core technology includes image-based (NOT text-based) document clustering, individual glyph (i.e. character) clustering for highly effective cascading text conversion, error correction, and document-type-specific data extraction functionality with accuracy rivaling or exceeding human coding.

Traditional Old Character Recognition (“OCR”) analyzes each glyph or character in a linear fashion, treating each new glyph as a new issue, and optimizes images for conversion purposes at the page level. By contrast, BR clusters similar glyphs prior to trying to convert them to textual characters, optimizing the portion of the images around each glyph, and then converts the glyphs to characters using the most complete glyph from each glyph cluster. BR then provides cascading or persistent error correction in which characters with low confidence conversion scores are edited in words that failed spell checking. Correcting a single word not only corrects all the instances of that sequence of glyphs but corrects other words where the same glyph was used so long as the correction results in a word that is in the word spelling dictionary.
This cascading effect permits editors to correct hundreds of thousands of words with a single keystroke or mouse click – and the error correction is persistent because future occurrences of that glyph will also be converted correctly.

Example of Cascading Corrections

In one example from a mortgage loan file project, correcting the word “thc” with one keystroke resulted in correcting 142,121 instances of the word “the” but also had the cascading effect of correcting yet others, resulting in correcting a total of 149,520 instances of incorrectly spelled words.

Here just a few of the words impacted by the cascading effect in that example:

“cducation” corrected to  “education” (1209)
 “Codc” to “Code” (744)
“qucstion” to “question” (702)

Example of Reconstituting Unreadable Text

Following is an example of how BR is able to optimize individual glyphs and essentially reconstitute a page image of old court opinions using the “best” glyph from individual glyph clusters to produce the most accurate text conversion.











BR’s success in converting previously “unreadable” images on things like decades-old court opinions and computer-output microfiche (“COM”) makes it tempting to pigeon hole BR as just an advanced text recognition company, but the functionality doesn’t stop there.

Potential to Compete with Off-shore Coding

By clustering like documents based on image similarity and then enabling users to rapidly build data extraction rules for each type or class of record, including location-based rules or rules based on non-textual elements, BR can create data extraction fields or metadata elements for each of those document types. The resulting index of specific types of data elements rivals or exceeds human coding.  And, that may be the real disruptive feature of BR as it is going to provide a much more accurate and financially compelling alternative to off-shore coding.

Summary and Comments

Not much has changed in Optical Character Recognition (OCR) for over 30 years.  However, BeyondRecognition is about to change that with a new image processing technology that is going to be very disruptive to the legal market and many other market that have to convert and code documents.  And that why  I ranked BeyondRecognition as the number one eDiscovery technology to watch in 2012.

Labels: , , , ,

Tuesday, July 10, 2012

Top Five eDiscovery Technologies to Watch in 2012

Over the past 12 months I have had the unique opportunity to seek out and review over 100 eDiscovery technologies.  As a result, I have had the pleasure of being exposed to some exciting new technologies that will have a disruptive impact on the eDiscovery market in the second half of 2012.

With over 100 technologies to choose from, culling the list down was not an easy task.  Therefore, I had to rely upon some amount objective criteria such as platform technology stack and supported environments along with a heavy dose of my subjective opinion in regards to how disruptive a technology could be within the paradigm shift of the eDiscovery market.

The objective, more technical criteria was easy.  At least in regards to 4 of my 5 choices.  The subjective criteria was a bit more complicated as I took into consideration criteria such as pricing, positioning, deployment flexibility, management team and uniqueness.

Following are my choices for the Top Five eDiscovery Technologies to Watch in 2012:


BeyondRecognition, BeyondRecognition

Optical Character Recognition (OCR), a foundational technology for litigation services and eDiscovery hasn't changed much in 30 years.  And, vendors really haven't worked on making it more accurate or added any significant new bells and whistles.  Therefore, after I had the opportunity to spend some time with John Martin from BeyondRecognition and review his new glyph based image processing technology designed to replace OCR and more, it was an easy decision to rank Beyond Recognition as the technology that will have the most disruptive impact on the eDiscovery market in the second half of 2012.

With c
haracter and word identification and conversion accuracy at 99.5%+, single instance character and word correction, Logical Document Determination (LDD), document type classification, duplication document detection, database field indexing and a cloud based scalable architecture that can process terabytes of data per day, BeyondRecognition will most definitely have a disruptive impact on the eDiscovery market and any other markets that require large volumes of text based materials to be processed and coded.  For more information about BeyondRecognition, please visit: http://www.beyondrecognition.net/BeyondRecognition,_LLC/Overview.html.


X1 Social Discovery, X1 Discovery

With the rapid proliferation of social media, I predict there are going to be very few eDiscovery and Information Governance projects going forward that don't include potential evidence from social media sources such as Facebook, LinkedIn and Twitter.  Therefore, I included
X1 Social Discovery from X1 Discovery as one of my top five eDiscovery technologies to watch in 2012.

X1 Social Discovery™ is the industry's first investigative solution specifically designed to enable eDiscovery and computer forensics professionals to effectively address social media content and web content, in one single interface. X1 Social Discovery  provides for a powerful platform to collect, authenticate, search, review and produce electronically stored information (ESI) from Facebook, Twitter, LinkedIn and other web sources.  For more information about X1 Social Discovery , please visit: http://x1discovery.com/social_discovery.html.



X1 Rapid Discovery, X1 Discovery

With the accelerating volume of Electronically Stored Information (ESI) in the cloud, I predict there are going to be more and more eDiscovery and Information Governance projects going forward that require potential evidence to be extracted from the cloud and possibly even processed in the cloud.  Therefore, I included 
X1 Rapid Discovery from X1 Discovery as one of my top five eDiscovery technologies to watch in 2012.

With X1 Rapid Discovery, organizations can quickly access, search, triage and collect their data in their existing cloud environments, without having to first export that data; thereby transforming how organizations address the challenges of search, collection and analysis of cloud-based data. While other eDiscovery products require migrating or even shipping data to the vendor tools, X1 Rapid Discovery  is a hardware-independent software solution that uniquely installs and operates on demand where your data currently resides. For more information about X1 Rapid Discovery , please visit:  http://x1discovery.com/rapid_discovery.html.


TunnelVision, Mindseye Solutions

The Early Case Assessment (ECA) tool landscape has changed dramatically over the past 12 months.  New tools that cover a larger percentage of the
EDRM model and are built upon newer more flexible technologies have emerged with aggressive new pricing models. Therefore, I included TunnelVision from Mindseye Solutions, a representative of those new ECA platforms, as one of my top five eDiscovery technologies to watch in 2012. 

TunnelVision was purpose-built by long time eDiscovery industry experts to address the challenges that organizations are facing when supporting eDiscovery and Information Governance. The technology is a simple yet flexible platform, designed to scale, and delivers full transparency. TunnelVision carries a predictable cost model and helps in managing risk, identifying exposure, and eliminating wasted time throughout the process. For more information about TunnelVision from Mindseye, please visit:
http://www.mindseyesolutions.com/.


Equivio Zoom, Equivio

Predictive coding or Technology Assisted Review (TAR) has captured the imagination of the industry in the first half of 2012.  As such, I wanted to include predictive coding technology in my list of 
top five eDiscovery technologies to watch in 2012 and therefore I chose Equivio Zoom from Equivio.

Equivio develops text analysis software for eDiscovery. Users include the DoJ, the FTC, KPMG, Deloitte, plus hundreds of law firms and corporations. Equivio offers Zoom, an integrated web platform for analytics and predictive coding. Zoom organizes collections of documents in meaningful ways. So you can zoom right in and find out what’s interesting, notable and unique. For more information about Equivio Zoom from Equivio, please visit: http://www.equivio.com/.



Summary and Comments

With over 100 technologies to choose from, culling the list down was not an easy task.  Therefore, I had to rely upon some amount objective criteria such as platform technology stack and supported environments along with a heavy dose of my subjective opinion in regards to how disruptive a technology could be within the paradigm shift of the eDiscovery market. BeyondRecognition, X1 Social Discovery, X1 Rapid Discovery,  TunnelVision and Equivio Zoom definitely met these criteria.

It will be interesting to review my list of Top Five eDiscovery Technologies to Watch in 2012 this time next year when I choose another list.  Some on the 2012 list will have had a big impact and some will not.  However, one thing is for sure, each of these technologies represents a major step forward for litigation and eDiscovery technology.


Additional Product Reviews

Over the coming weeks, I will be posting additional product reviews for each of the technologies listed in the eDSG To
p Five eDiscovery Technologies to Watch in 2012.



Labels: , , , , , , , ,