The eDiscovery Paradigm Shift

Monday, April 23, 2012

Virtual eDiscovery Arrives

On Monday April 17, 2012, X1 Discovery out of Pasadena, California announced a major release of X1 Rapid Discovery, it’s Early Case Assessment (ECA) and Search platform. This latest version of X1 Rapid Discovery (X1RD) is actually a major rewrite of the architecture and user interface resulting in the first virtual Early Case Assessment (ECA) platform that users can remotely deploy to wherever Electronically Stored Information (ESI) is located.

I had to the chance to talk with Skip Lindsey, EVP Sales and Business Development at X1 Discovery about the genesis of X1RD. He indicated that they wanted to build on the success that they had seen providing advanced search and ECA to traditional enterprise clients with ESI behind a firewall by expanding support to the cloud. However, they didn’t want to be just another “me too” ECA cloud platform. Therefore, they decided to expand X1RD capabilities to run as a virtual application designed specifically to run in a private or public cloud environment in addition to its legacy support for traditional enterprise environments. He also indicated that they wanted X1RD to be highly scalable, easy for users to install and manage remotely, easy for users to move the virtual components to the data and conducive to a very disruptive pricing model. I think that they accomplished those goals and most certainly have earned a first mover status in the ECA platform market.

In today’s complex enterprise environment, X1RD should be a very attractive multi-purpose ECA alternative for users that currently have ESI scattered around behind their corporate firewall, in private clouds and maybe even held captive with one of the Cloud Service Providers (CSPs). The major paradigm shift is that X1RD enables users to move the ECA process to the ESI whereas current ECA technology requires that ESI be collected and moved to the ECA platform either manually or over slow Internet pipes.

Current ECA Best Practices
Currently, an enterprise faced with responding to a legal matter has to identify where all potentially responsive ESI may be located, figuring out what they need to harvest and then collect and aggregate the ESI to a centralized physical location before they can even start the process of analysis and first pass review. This process can take weeks and months and cost thousands if not millions of dollars. Adding ESI stored with a Cloud Service Provider (CSP) has only exacerbate the situation by adding another layer of complexity, more time and more cost. For a more detailed overview of virtualization, please see “Virtualization is the Key to Future eDiscovery Software”.

X1RD Changes the ECA Paradigm
X1RD completely changes the ECA paradigm. X1RD users have the ability to remotely load the X1RD virtual application into the computing environment of their choice as long as they have the appropriate credentials to do so. Once loaded, uses can then remotely configure, though a drag and drop user interface, as many X1RD virtual ECA processing engines as may be required to collect, index and process a given set of ESI. Please note that users will have the flexibility to configure X1RD on a single physical machine with multiple virtual machines or on multiple physical machines, each with multiple virtual machines. Because X1RD has done such a good job of masking the complexities of configuring and managing this virtual environment from users, its value may be lost on many users. However, this flexibility has tremendous operational and financial benefits.

As an example, because X1RD enables users to break up a large processing task into multiple smaller processing units that can run on very inexpensive machines, it doesn’t require users to purchase and/or provision high-end and expensive servers.    X1RD enables users to either leverage the computing resources that they currently own foregoing the requirement to invest in large expensive servers that will sit idle for most of the time waiting for the next big ECA job.

X1RD installs in a matter of minutes and the user interface is attractive, easy to understand and provides everything that a litigation knowledge worker requires to search, analyze, review, tag and generate a load file in today’s eDiscovery environment.   Further, for those clients that don’t want to run their own X1RD environment, X1RD lends itself to a managed services model that enables Managed Service Providers (MSPs) to run X1RD for clients from a remote location.

X1RD and the Amazon Cloud
As recently reported on the New York Times Bits Blog, in a post titled, “Amazon Creates a Software Rental Store”, Amazon.com’s Amazon Web Services (AWS) business, facing looming competition for its business of renting online data storage and computing, is introducing a store where customers will be able to rent business software from a number of third-party providers, including I.B.M., Microsoft and SAP. And now, X1RD is the first virtual ECA platform listed in the Amazon Web Service (AWS) ISV catalog.

Although there are other ECA and Document Review platforms running in the Amazon Web Services (AWS) cloud such as NextPoint, there is a very distinct difference as AWS clients will be able to remotely load X1RD into their AWS computing environment and configure it to collect, search, analyze, complete first pass review and generate a load file without having to move any ESI. Its the paradigm shift that I eluded to earlier that enables users to move X1RD to the ESI as opposed to the legacy approach of moving ESI to the ECA platform.

Currently, X1RD will be able to provision and configure computing power, memory and storage from AWS as required for any ECA processing task large or small, provision X1RD from X1 Discovery and be processing ESI in a matter of hours. Once completed with the responsive ESI exported, users will be able to shut down this ECA environment and only pay for what they utilized. It’s a much different model and the industry has ever seen.

With the announcement of the AWS partnerships with IBM, Microsoft and SAP, it will be interesting to see what AW does with the X1RD opportunity from a marketing perspective. As indicated in “Amazon is Overlooking the Financial Value of eDiscovery”, there is a latent demand for eDiscovery and Information Governance in the Cloud. And, in fact not providing such services may in fact be a barrier to entry for some enterprise clients. I am very encouraged by the announcement that AWS has added X1RD to its ISV catalog and optimistic that AWS may in fact understand that providing Information Governance, Search and eDiscovery may be one of the keys for Amazon to increase the $1.19 Billion in revenue that AWS posted for 2011 to a much higher level than could have ever been imagined with just IaaS or even standard PaaS services.

Disruptive Pricing
X1 is also pushing the pricing envelope by pricing X1RD at $1,000 per day or $5,000 per week for “all you can eat” or process. As an alternative, they also offer the option of a $25,000 perpetual license with 20% annual maintenance. This pricing model is very aggressive and is going to put downward pressure on the legacy model of pricing ECA processing on a per gigabyte basis.

Final Observations and Comments
As indicated in “eDiscovery will follow the Cloud Computing Boom”, with the expanding volume of ESI in the cloud, the demand for Information Governance and eDiscovery is sure to follow. However, just moving your ECA or Document Review platform to the Cloud is not necessarily the answer as it still requires users to move large amounts of ESI to that platform. The most efficient model is to move the ECA application to the ESI. And, it appears that with the release of most recent version of X1RD that X1 Discovery has become a first mover in the race to capture this market.

In addition, in today’s complex enterprise environment, X1RD should also be a very attractive multi-purpose ECA alternative for users that currently have ESI scattered around behind their corporate firewall, in private clouds and maybe even held captive with one of the Cloud Service Providers (CSPs).

As has been reported on several other blogs about this subject, with X1RD forging the way, other ECA vendors are sure to follow. However, that may be easier said then done,   With years of experience designing, developing and maintaining complex enterprise class applications, I would suggest that many current legacy ECA vendors have code bases that would be literally impossible to move to the cloud as self-provisioning and remotely configurable virtual apps that can run on almost any class server.   The only option would be a complete re-write which in many cases is a very dangerous path for a software vendor with a large installed based as it opens up that base to explore alternatives such as X1RD. As such, I am not convinced that many legacy ECA vendors will follow this trend.

One thing is for sure, the demand to process ESI that is in the cloud is going to continue to increase. And, it appears that users are going to have several options to address this demand; moving the ESI to the ECA platform or moving the ECA platform to the ESI.

Either way, the next few years are certainly going to be exciting times in the eDiscovery in the Cloud market.

Labels: Analytics, Cloud Computing, Cloud Service Providers, Early Case Assessment, ECA, eDiscovery, IBM, SAP

Tuesday, September 6, 2011

eDiscovery and Big Data Analytics

If cloud computing in general is the next challenge facing information governance and eDiscovery. Then, Big Data Analytics is one of the specific issues that information governance and eDiscovery technologist are going to have to conquer.

Cloud computing is an IT infrastructure choice, storage architecture and application delivery mechanism. And, combined with mobile computing, a choice that will results in more Electronically Stored Information (ESI) or Electronically Stored Evidence (ESE), if you are a lawyer, than the total information from all previous generations.

And, whereas the eDiscovery industry has been struggling to collect, process and analyze terabytes of data in a reasonable amount of time for a reasonable cost, this new paradigm of cloud computing and its associated federated data stores is already producing peta and exabytes of data. Hence, the term Big Data (its actually all a matter of perspective). Of even more concern is the fact that as the eDiscovery market has been struggling to appropriately and accurately analyze structured data, the new paradigm of Big Data in the cloud is largely unstructured data and therefore largely left out of the eDiscovery equation.

Today, whether right or wrong and due largely to a lack of understanding and associated technological and financial restraints, very little unstructured data is even considered during 26(f) strategies and is therefore left out of most litigation. From an eDiscovery standpoint, there is no doubt that there are "smoking guns" hiding in some Big Data store as unstructured data and as such brings a whole new meaning to the phrase of "looking for a needle in a haystack".

However, I believe that there is hope as Big Data analytics do in fact exist and the technology is evolving. As Srinivasan Sundara Rajan from HP points out in September 6, 2011 article on SOA World Magazine Site titled, "Traditional vs Big Data Analytics," "Big data analytics provide new ways for businesses and government to analyze unstructured data which so far have been rejected by the data cleansing routines in a typical enterprise data warehouse scenario." This same technology will be useful for information governance and eDiscovery. And, from a requirements standpoint, may prove to be a very interesting and financially rewarding vertical for the technologists to address.

The full text of the Srinivasan Sundara Rajan's article is as follows:

Big Data Analytics Convergence Among the Major IT Companies
Major IT companies acquiring analytics software and application providers has been the order of the day. We have seen the words ‘Big Data Analytics' being used in many solutions for the enterprise.
‘Big Data' is the general term used to represent massive amounts of unstructured data that are not traditionally stored in a Relational form in enterprise databases. The following are the general characteristics of Big Data.

Data storage defined in order of PETA BYTES, EXA BYTES and much higher in volume to the current storage limits in enterprises which TERA BYTES.
Generally it is considered as Unstructured data and not really falling the under the relational database design which the enterprises have been used to
Data Generated using unconventional methods outside of data entry like, RFID, Sensor networks etc...
Data is time sensitive and consists of data collected with relevance to the time zones

In the past, the term ‘Analytics' has been used in the business intelligence world to provide tools and intelligence to gain insight into the data through fast, consistent, interactive access to a wide variety of possible views of information.

Very close to the concept of analytics, data mining has been used in enterprises to keep pace with the critical monitoring and analysis of mountains of data. The biggest challenge is how to unearth all the hidden information through the vast amount of data.

Traditional DW Analytics vs Big Data Analytics
The analytics of enterprise data toward meaningful insights into the information that exists over a period of
time in that context is why Big Data Analytics makes it different from traditional data warehouse analytics.

Traditional Data warehouse Analytics	Big Data Analytics
Traditional Analytics analyzes on the known data terrain that too the data that is well understood. Most of the data warehouses have a elaborate ETL processes and database constraints, which means the data that is loaded inside a data warehouse is well under stood, cleansed and in line with the business metadata.	The biggest advantages of the Big Data is it is targeted at unstructured data outside of traditional means of capturing the data. Which means there is no guarantee that the incoming data is well formed and clean and devoid of any errors. This makes it more challenging but at the same time it gives a scope for much more insight into the data.
Traditional Analytics is built on top of the relational data model, relationships between the subjects of interests have been created inside the system and the analysis is done based on them.	In typical world, it is very difficult to establish relationship between all the information in a formal way, and hence unstructured data in the form images, videos, Mobile generated information, RFID etc... have to be considered in big data analytics. Most of the big data analytics database are based out Columnar databases.
Traditional analytics is batch oriented and we need to wait for nightly ETL and transformation jobs to complete before the required insight is obtained.	Big Data Analytics is aimed at near real time analysis of the data using the support of the software meant for it
Parallelism in a traditional analytics system is achieved through costly hardware like MPP (Massively Parallel Processing) systems and / or SMP systems.	While there are appliances in the market for the Big Data Analytics, this can also be achieved through commodity hardware and new generation of analytical software like Hadoop or other Analytical databases.

Traditional Data warehouse Analytics	Big Data Analytics
Traditional Analytics analyzes on the known data terrain that too the data that is well understood. Most of the data warehouses have a elaborate ETL processes and database constraints, which means the data that is loaded inside a data warehouse is well under stood, cleansed and in line with the business metadata.	The biggest advantages of the Big Data is it is targeted at unstructured data outside of traditional means of capturing the data. Which means there is no guarantee that the incoming data is well formed and clean and devoid of any errors. This makes it more challenging but at the same time it gives a scope for much more insight into the data.
Traditional Analytics is built on top of the relational data model, relationships between the subjects of interests have been created inside the system and the analysis is done based on them.	In typical world, it is very difficult to establish relationship between all the information in a formal way, and hence unstructured data in the form images, videos, Mobile generated information, RFID etc... have to be considered in big data analytics. Most of the big data analytics database are based out Columnar databases.
Traditional analytics is batch oriented and we need to wait for nightly ETL and transformation jobs to complete before the required insight is obtained.	Big Data Analytics is aimed at near real time analysis of the data using the support of the software meant for it
Parallelism in a traditional analytics system is achieved through costly hardware like MPP (Massively Parallel Processing) systems and / or SMP systems.	While there are appliances in the market for the Big Data Analytics, this can also be achieved through commodity hardware and new generation of analytical software like Hadoop or other Analytical databases.

Use Cases for Big Data Analytics
Enterprises can understand the value of Big Data Analytics based on the use cases and how the traditional problems can be solved with the help of Big Data Analytics. The following are some of the usages.

Customer Satisfaction and Warranty Analysis: Probably this is the one big area that most product-based enterprises are worried about. As of today, there is not a clear way of gauging the issues with the products and the associated customer satisfaction, unless they come in a formal way in an electronic form.

Information regarding quality is collected through various external channels and most of the times the data is not clean
As the data is unstructured there is no way to relate the associated issues, so that the long-term fix can be given to customer.
Classification and grouping of problem statements are missing , resulting enterprises not able to group the issues

From the above discussion, utilizing the Big Data Analytics for customer satisfaction and Warranty analysis will help enterprises gain insight into the much-needed customer mind set and solve their problems effectively and to avoid them in their new product lines.

Competitor Market Penetration Analysis: In today's economy where the competition is high, we need to gauge the areas where the competitors are strong and their pain points through an analysis within the legal means. This information is available in a variety of web sites, social media sites and other public domains. Big data analytics on this data can provide an organization with much needed information about Strength, Weakness, Opportunities and Threats for their product lines.

Healthcare / Epidemic Research & Control: Epidemics and seasonal diseases like influenza start with certain patterns among the people and they spread to a larger section if they are not detected early and controlled. This is one of the biggest challenges for growing as well as developed nations. The current issue most of the times the symptoms vary between the people and various health care providers treat them differently. There is also not a common classification of symptoms across people. Adopting Big Data Analytics on this typically unstructured data will help the local governments to effectively tackle the outbreak situations.

Product Feature and Usage Analysis: Most product companies, especially consumer products, keep adding lot of features to their product line, however it may happen that some of the features are not really used by the consumers and some are used more and effective analysis of this data captured by various mobile devices and other RFID based inputs can provide valuable insights to the product companies.

Future Direction Analysis: The trends in each business are analyzed by research groups and this information is available through industry specific portals or even common web blogs. Constant analysis of this futuristic data will help enterprises to look forward to future and bring them to their product lines.

Summary
Big data analytics provide new ways for businesses and government to analyze unstructured data which so far have been rejected by the data cleansing routines in a typical enterprise data warehouse scenario. However as evident from the use cases above, these analyses will go a long way in improving the operations of the organizations. We will see more convergence of the products and appliances in this space in the coming days.

Labels: Analytics, Big Data, Early Case Assessment, ECA, eDiscovery

This Page

The eDiscovery Paradigm Shift

Monday, April 23, 2012

Virtual eDiscovery Arrives

Tuesday, September 6, 2011

eDiscovery and Big Data Analytics

Contributors

Links

Previous Posts

Archives