This Page

has been moved to new address

The eDiscovery Paradigm Shift

Sorry for inconvenience...

Redirection provided by Blogger to WordPress Migration Service
----------------------------------------------------- Blogger Template Style Name: Snapshot: Madder Designer: Dave Shea URL: mezzoblue.com / brightcreative.com Date: 27 Feb 2004 ------------------------------------------------------ */ /* -- basic html elements -- */ body {padding: 0; margin: 0; font: 75% Helvetica, Arial, sans-serif; color: #474B4E; background: #fff; text-align: center;} a {color: #DD6599; font-weight: bold; text-decoration: none;} a:visited {color: #D6A0B6;} a:hover {text-decoration: underline; color: #FD0570;} h1 {margin: 0; color: #7B8186; font-size: 1.5em; text-transform: lowercase;} h1 a {color: #7B8186;} h2, #comments h4 {font-size: 1em; margin: 2em 0 0 0; color: #7B8186; background: transparent url(http://www.blogblog.com/snapshot/bg-header1.gif) bottom right no-repeat; padding-bottom: 2px;} @media all { h3 { font-size: 1em; margin: 2em 0 0 0; background: transparent url(http://www.blogblog.com/snapshot/bg-header1.gif) bottom right no-repeat; padding-bottom: 2px; } } @media handheld { h3 { background:none; } } h4, h5 {font-size: 0.9em; text-transform: lowercase; letter-spacing: 2px;} h5 {color: #7B8186;} h6 {font-size: 0.8em; text-transform: uppercase; letter-spacing: 2px;} p {margin: 0 0 1em 0;} img, form {border: 0; margin: 0;} /* -- layout -- */ @media all { #content { width: 700px; margin: 0 auto; text-align: left; background: #fff url(http://www.blogblog.com/snapshot/bg-body.gif) 0 0 repeat-y;} } #header { background: #D8DADC url(http://www.blogblog.com/snapshot/bg-headerdiv.gif) 0 0 repeat-y; } #header div { background: transparent url(http://www.blogblog.com/snapshot/header-01.gif) bottom left no-repeat; } #main { line-height: 1.4; float: left; padding: 10px 12px; border-top: solid 1px #fff; width: 428px; /* Tantek hack - http://www.tantek.com/CSS/Examples/boxmodelhack.html */ voice-family: "\"}\""; voice-family: inherit; width: 404px; } } @media handheld { #content { width: 90%; } #header { background: #D8DADC; } #header div { background: none; } #main { float: none; width: 100%; } } /* IE5 hack */ #main {} @media all { #sidebar { margin-left: 428px; border-top: solid 1px #fff; padding: 4px 0 0 7px; background: #fff url(http://www.blogblog.com/snapshot/bg-sidebar.gif) 1px 0 no-repeat; } #footer { clear: both; background: #E9EAEB url(http://www.blogblog.com/snapshot/bg-footer.gif) bottom left no-repeat; border-top: solid 1px #fff; } } @media handheld { #sidebar { margin: 0 0 0 0; background: #fff; } #footer { background: #E9EAEB; } } /* -- header style -- */ #header h1 {padding: 12px 0 92px 4px; width: 557px; line-height: 1;} /* -- content area style -- */ #main {line-height: 1.4;} h3.post-title {font-size: 1.2em; margin-bottom: 0;} h3.post-title a {color: #C4663B;} .post {clear: both; margin-bottom: 4em;} .post-footer em {color: #B4BABE; font-style: normal; float: left;} .post-footer .comment-link {float: right;} #main img {border: solid 1px #E3E4E4; padding: 2px; background: #fff;} .deleted-comment {font-style:italic;color:gray;} /* -- sidebar style -- */ @media all { #sidebar #description { border: solid 1px #F3B89D; padding: 10px 17px; color: #C4663B; background: #FFD1BC url(http://www.blogblog.com/snapshot/bg-profile.gif); font-size: 1.2em; font-weight: bold; line-height: 0.9; margin: 0 0 0 -6px; } } @media handheld { #sidebar #description { background: #FFD1BC; } } #sidebar h2 {font-size: 1.3em; margin: 1.3em 0 0.5em 0;} #sidebar dl {margin: 0 0 10px 0;} #sidebar ul {list-style: none; margin: 0; padding: 0;} #sidebar li {padding-bottom: 5px; line-height: 0.9;} #profile-container {color: #7B8186;} #profile-container img {border: solid 1px #7C78B5; padding: 4px 4px 8px 4px; margin: 0 10px 1em 0; float: left;} .archive-list {margin-bottom: 2em;} #powered-by {margin: 10px auto 20px auto;} /* -- sidebar style -- */ #footer p {margin: 0; padding: 12px 8px; font-size: 0.9em;} #footer hr {display: none;} /* Feeds ----------------------------------------------- */ #blogfeeds { } #postfeeds { }

Monday, April 23, 2012

Virtual eDiscovery Arrives

On Monday April 17, 2012, X1 Discovery out of Pasadena, California announced a major release of X1 Rapid Discovery, it’s Early Case Assessment (ECA) and Search platform.  This latest version of X1 Rapid Discovery (X1RD) is actually a major rewrite of the architecture and user interface resulting in the first virtual Early Case Assessment (ECA) platform that users can remotely deploy to wherever Electronically Stored Information (ESI) is located.

I had to the chance to talk with Skip Lindsey, EVP Sales and Business Development at X1 Discovery about the genesis of  X1RD.  He indicated that they wanted to build on the success that they had seen providing advanced search and ECA to traditional enterprise clients with ESI behind a firewall by expanding support to the cloud.  However, they didn’t want to be just another “me too” ECA cloud platform.  Therefore, they decided to expand X1RD capabilities to run as a virtual application designed specifically to run in a private or public cloud environment in addition to its legacy support for traditional enterprise environments.  He also indicated that they wanted X1RD to be highly scalable, easy for users to install and manage remotely, easy for users to move the virtual components to the data and conducive to a very disruptive pricing model.  I think that they accomplished those goals and most certainly have earned a first mover status in the ECA platform market.

In today’s complex enterprise environment, X1RD should be a very attractive multi-purpose ECA alternative for users that currently have ESI scattered around behind their corporate firewall, in private clouds and maybe even held captive with one of the Cloud Service Providers (CSPs).  The major paradigm shift is that X1RD enables users to move the ECA process to the ESI whereas current ECA technology requires that ESI be collected and moved to the ECA platform either manually or over slow Internet pipes. 

Current ECA Best Practices
Currently, an enterprise faced with responding to a legal matter has to identify where all potentially responsive ESI may be located, figuring out what they need to harvest and then collect and aggregate the ESI to a centralized physical location before they can even start the process of analysis and first pass review.  This process can take weeks and months and cost thousands if not millions of dollars.  Adding ESI stored with a Cloud Service Provider (CSP) has only exacerbate the situation by adding another layer of complexity, more time and more cost.  For a more detailed overview of virtualization, please see “Virtualization is the Key to Future eDiscovery Software”.

X1RD Changes the ECA Paradigm
X1RD completely changes the ECA paradigm.  X1RD users have the ability to remotely load the X1RD virtual application into the computing environment of their choice as long as they have the appropriate credentials to do so.  Once loaded, uses can then remotely configure, though a drag and drop user interface, as many X1RD virtual ECA processing engines as may be required to collect, index and process a given set of ESI. Please note that users will have the flexibility to configure X1RD on a single physical machine with multiple virtual machines or on multiple physical machines, each with multiple virtual machines.  Because X1RD has done such a good job of masking the complexities of configuring and managing this virtual environment from users, its value may be lost on many users.  However, this flexibility has tremendous operational and financial benefits.

As an example, because X1RD enables users to break up a large processing task into multiple smaller processing units that can  run on very inexpensive machines, it doesn’t require users to purchase and/or provision high-end and expensive servers.    X1RD enables users to either leverage the computing resources that they currently own foregoing the requirement to invest in large expensive servers that will sit idle for most of the time waiting for the next big ECA job.

X1RD installs in a matter of minutes and the user interface is attractive, easy to understand and provides everything that a litigation knowledge worker requires to search, analyze, review, tag and generate a load file in today’s eDiscovery environment.   Further, for those clients that don’t want to run their own X1RD environment, X1RD lends itself to a managed services model that enables Managed Service Providers (MSPs) to run X1RD for clients from a remote location. 

X1RD and the Amazon Cloud
As recently reported on the New York Times Bits Blog, in a post titled, “Amazon Creates a Software Rental Store”, Amazon.com’s Amazon Web Services (AWS) business, facing looming competition for its business of renting online data storage and computing, is introducing a store where customers will be able to rent business software from a number of third-party providers, including I.B.M., Microsoft and SAP.  And now, X1RD is the first virtual ECA platform listed in the Amazon Web Service (AWS) ISV catalog.

Although there are other ECA and Document Review platforms running in the Amazon Web Services (AWS) cloud such as NextPoint, there is a very distinct difference as AWS clients will be able to remotely load X1RD into their AWS computing environment and configure it to collect, search, analyze, complete first pass review and generate a load file without having to move any ESI. Its the paradigm shift that I eluded to earlier that enables users to move X1RD to the ESI as opposed to the legacy approach of moving ESI to the ECA platform.

Currently, X1RD will be able to provision and configure computing power, memory and storage from AWS as required for any ECA processing task large or small, provision X1RD from X1 Discovery and be processing ESI in a matter of hours. Once completed with the responsive ESI exported, users will be able to shut down this ECA environment and only pay for what they utilized.  It’s a much different model and the industry has ever seen.

With the announcement of the AWS partnerships with IBM, Microsoft and SAP, it will be interesting to see what AW does with the X1RD opportunity from a marketing perspective.  As indicated in “Amazon is Overlooking the Financial Value of eDiscovery”, there is a latent demand for eDiscovery and Information Governance in the Cloud.  And, in fact not providing such services may in fact be a barrier to entry for some enterprise clients.  I am very encouraged by the announcement that AWS has added X1RD to its ISV catalog and optimistic that AWS may in fact understand that providing Information Governance, Search and eDiscovery may be one of the keys for Amazon to increase the $1.19 Billion in revenue that AWS posted for 2011 to a much higher level than could have ever been imagined with just IaaS or even standard PaaS services.

Disruptive Pricing
X1 is also pushing the pricing envelope by pricing X1RD at $1,000 per day or $5,000 per week for “all you can eat” or process.  As an alternative, they also offer the option of a $25,000 perpetual license with 20% annual maintenance.  This pricing model is very aggressive and is going to put downward pressure on the legacy model of pricing ECA processing on a per gigabyte basis.

Final Observations and Comments
As indicated in “eDiscovery will follow the Cloud Computing Boom”, with the expanding volume of ESI in the cloud, the demand for Information Governance and eDiscovery is sure to follow.  However, just moving your ECA or Document Review platform to the Cloud is not necessarily the answer as it still requires users to move large amounts of ESI to that platform.  The most efficient model is to move the ECA application to the ESI.  And, it appears that with the release of most recent version of X1RD that X1 Discovery has become a first mover in the race to capture this market.

In addition, in today’s complex enterprise environment, X1RD should also be a very attractive multi-purpose ECA alternative for users that currently have ESI scattered around behind their corporate firewall, in private clouds and maybe even held captive with one of the Cloud Service Providers (CSPs).

As has been reported on several other blogs about this subject, with X1RD forging the way, other ECA vendors are sure to follow. However, that may be easier said then done,   With years of experience designing, developing and maintaining complex enterprise class applications, I would suggest that many current legacy ECA vendors have code bases that would be literally impossible to move to the cloud as self-provisioning and remotely configurable virtual apps that can run on almost any class server.   The only option would be a complete re-write which in many cases is a very dangerous path for a software vendor with a large installed based as it opens up that base to explore alternatives such as X1RD.  As such, I am not convinced that  many legacy ECA vendors will follow this trend.

One thing is for sure, the demand to process ESI that is in the cloud is going to continue to increase. And, it appears that users are going to have several options to address this demand; moving the ESI to the ECA platform or moving the ECA platform to the ESI.

Either way, the next few years are certainly going to be exciting times in the eDiscovery in the Cloud market.

Labels: , , , , , , ,

Tuesday, September 6, 2011

eDiscovery and Big Data Analytics

If cloud computing in general is the next challenge facing information governance and eDiscovery.  Then, Big Data Analytics is one of the specific issues that information governance and eDiscovery technologist are going to have to conquer.

Cloud computing is an IT infrastructure choice, storage architecture and application delivery mechanism.  And, combined with mobile computing, a choice that will results in more Electronically Stored Information (ESI) or Electronically Stored Evidence (ESE), if you are a lawyer, than the total information from all previous generations.

And, whereas the eDiscovery industry has been struggling to collect, process and analyze terabytes of data in a reasonable amount of time for a reasonable cost, this new paradigm of cloud computing and its associated federated data stores is already producing peta and exabytes of data.  Hence, the term Big Data (its actually all a matter of perspective).  Of even more concern is the fact that as the eDiscovery market has been struggling to appropriately and accurately analyze structured data, the new paradigm of Big Data in the cloud is largely unstructured data and therefore largely left out of the eDiscovery equation.

Today, whether right or wrong and due largely to a lack of understanding and associated technological and financial restraints, very little unstructured data is even considered during 26(f) strategies and is therefore left out of most litigation. From an eDiscovery standpoint, there is no doubt that there are "smoking guns" hiding in some Big Data store as unstructured data and as such brings a whole new meaning to the phrase of "looking for a needle in a haystack".

However, I believe that there is hope as Big Data analytics do in fact exist and the technology is evolving. As Srinivasan Sundara Rajan from HP points out in September 6, 2011 article on SOA World Magazine Site titled, "Traditional vs Big Data Analytics," "Big data analytics provide new ways for businesses and government to analyze unstructured data which so far have been rejected by the data cleansing routines in a typical enterprise data warehouse scenario."  This same technology will be useful for information governance and eDiscovery.  And, from a requirements standpoint,  may prove to be a very interesting and financially rewarding vertical for the technologists to address.

The full text of the Srinivasan Sundara Rajan's article is as follows:

Big Data Analytics Convergence Among the Major IT Companies
Major IT companies acquiring analytics software and application providers has been the order of the day. We have seen the words ‘Big Data Analytics' being used in many solutions for the enterprise.
‘Big Data' is the general term used to represent massive amounts of unstructured data that are not traditionally stored in a Relational form in enterprise databases. The following are the general characteristics of Big Data.
  • Data storage defined in order of PETA BYTES, EXA BYTES and much higher in volume to the current storage limits in enterprises which TERA BYTES.
  • Generally it is considered as Unstructured data and not really falling the under the relational database design which the enterprises have been used to
  • Data Generated using unconventional methods outside of data entry like, RFID, Sensor networks etc...
  • Data is time sensitive and consists of data collected with relevance to the time zones
In the past, the term ‘Analytics' has been used in the business intelligence world to provide tools and intelligence to gain insight into the data through fast, consistent, interactive access to a wide variety of possible views of information.

Very close to the concept of analytics, data mining has been used in enterprises to keep pace with the critical monitoring and analysis of mountains of data. The biggest challenge is how to unearth all the hidden information through the vast amount of data.

Traditional DW Analytics vs Big Data Analytics
The analytics of enterprise data toward meaningful insights into the information that exists over a period of
time in that context is why Big Data Analytics makes it different from traditional data warehouse analytics.


Traditional Data warehouse AnalyticsBig Data Analytics
Traditional Analytics  analyzes on the known data terrain that too the data   that is well understood.  Most of the data warehouses have a elaborate ETL processes and database constraints, which means the data that is loaded inside a data warehouse is well under stood, cleansed and in line with the business metadata.The biggest advantages of the Big Data  is it is targeted at unstructured data outside of traditional means of capturing the data. Which means there is no guarantee that the incoming data is well formed and clean and  devoid of any errors.  This makes it more challenging but at the same time it gives a   scope for much more insight into the data.
Traditional Analytics is built on top of the relational data model,  relationships between the subjects of interests have been created  inside the system and the  analysis is done based on them.In typical world, it is very difficult to establish  relationship between all the information in a formal way, and  hence unstructured data in the form  images, videos, Mobile generated information, RFID etc... have to be considered in big data analytics. Most of the big data analytics database are based out  Columnar databases.
Traditional  analytics is batch oriented  and  we need to wait for nightly ETL and transformation jobs to complete before the required insight is obtained.Big Data Analytics is aimed at  near real time analysis of the data using the  support of the software meant for it
Parallelism in  a traditional analytics system is achieved  through  costly hardware like MPP (Massively Parallel Processing) systems   and / or  SMP systems.While there are appliances in the market for the Big Data Analytics,  this can also be achieved  through commodity hardware and new generation of analytical software like Hadoop or other Analytical databases.
Traditional Data warehouse AnalyticsBig Data Analytics
Traditional Analytics  analyzes on the known data terrain that too the data   that is well understood.  Most of the data warehouses have a elaborate ETL processes and database constraints, which means the data that is loaded inside a data warehouse is well under stood, cleansed and in line with the business metadata.The biggest advantages of the Big Data  is it is targeted at unstructured data outside of traditional means of capturing the data. Which means there is no guarantee that the incoming data is well formed and clean and  devoid of any errors.  This makes it more challenging but at the same time it gives a   scope for much more insight into the data.
Traditional Analytics is built on top of the relational data model,  relationships between the subjects of interests have been created  inside the system and the  analysis is done based on them.In typical world, it is very difficult to establish  relationship between all the information in a formal way, and  hence unstructured data in the form  images, videos, Mobile generated information, RFID etc... have to be considered in big data analytics. Most of the big data analytics database are based out  Columnar databases.
Traditional  analytics is batch oriented  and  we need to wait for nightly ETL and transformation jobs to complete before the required insight is obtained.Big Data Analytics is aimed at  near real time analysis of the data using the  support of the software meant for it
Parallelism in  a traditional analytics system is achieved  through  costly hardware like MPP (Massively Parallel Processing) systems   and / or  SMP systems.While there are appliances in the market for the Big Data Analytics,  this can also be achieved  through commodity hardware and new generation of analytical software like Hadoop or other Analytical databases.


Use Cases for Big Data Analytics
Enterprises can understand the value of Big Data Analytics based on the use cases and how the traditional problems can be solved with the help of Big Data Analytics. The following are some of the usages.

Customer Satisfaction and Warranty Analysis: Probably this is the one big area that most product-based enterprises are worried about. As of today, there is not a clear way of gauging the issues with the products and the associated customer satisfaction, unless they come in a formal way in an electronic form.
  • Information regarding quality is collected through various external channels and most of the times the data is not clean
  • As the data is unstructured there is no way to relate the associated issues, so that the long-term fix can be given to customer.
  • Classification and grouping of problem statements are missing , resulting enterprises not able to group the issues
From the above discussion, utilizing the Big Data Analytics for customer satisfaction and Warranty analysis will help enterprises gain insight into the much-needed customer mind set and solve their problems effectively and to avoid them in their new product lines.

Competitor Market Penetration Analysis: In today's economy where the competition is high, we need to gauge the areas where the competitors are strong and their pain points through an analysis within the legal means. This information is available in a variety of web sites, social media sites and other public domains. Big data analytics on this data can provide an organization with much needed information about Strength, Weakness, Opportunities and Threats for their product lines.

Healthcare / Epidemic Research & Control: Epidemics and seasonal diseases like influenza start with certain patterns among the people and they spread to a larger section if they are not detected early and controlled. This is one of the biggest challenges for growing as well as developed nations. The current issue most of the times the symptoms vary between the people and various health care providers treat them differently. There is also not a common classification of symptoms across people. Adopting Big Data Analytics on this typically unstructured data will help the local governments to effectively tackle the outbreak situations.

Product Feature and Usage Analysis: Most product companies, especially consumer products, keep adding lot of features to their product line, however it may happen that some of the features are not really used by the consumers and some are used more and effective analysis of this data captured by various mobile devices and other RFID based inputs can provide valuable insights to the product companies.

Future Direction Analysis: The trends in each business are analyzed by research groups and this information is available through industry specific portals or even common web blogs. Constant analysis of this futuristic data will help enterprises to look forward to future and bring them to their product lines.

Summary
Big data analytics provide new ways for businesses and government to analyze unstructured data which so far have been rejected by the data cleansing routines in a typical enterprise data warehouse scenario. However as evident from the use cases above, these analyses will go a long way in improving the operations of the organizations. We will see more convergence of the products and appliances in this space in the coming days.

Labels: , , , ,