The eDiscovery Paradigm Shift: eDiscovery and Big Data Analytics

If cloud computing in general is the next challenge facing information governance and eDiscovery. Then, Big Data Analytics is one of the specific issues that information governance and eDiscovery technologist are going to have to conquer.

Cloud computing is an IT infrastructure choice, storage architecture and application delivery mechanism. And, combined with mobile computing, a choice that will results in more Electronically Stored Information (ESI) or Electronically Stored Evidence (ESE), if you are a lawyer, than the total information from all previous generations.

And, whereas the eDiscovery industry has been struggling to collect, process and analyze terabytes of data in a reasonable amount of time for a reasonable cost, this new paradigm of cloud computing and its associated federated data stores is already producing peta and exabytes of data. Hence, the term Big Data (its actually all a matter of perspective). Of even more concern is the fact that as the eDiscovery market has been struggling to appropriately and accurately analyze structured data, the new paradigm of Big Data in the cloud is largely unstructured data and therefore largely left out of the eDiscovery equation.

Today, whether right or wrong and due largely to a lack of understanding and associated technological and financial restraints, very little unstructured data is even considered during 26(f) strategies and is therefore left out of most litigation. From an eDiscovery standpoint, there is no doubt that there are "smoking guns" hiding in some Big Data store as unstructured data and as such brings a whole new meaning to the phrase of "looking for a needle in a haystack".

However, I believe that there is hope as Big Data analytics do in fact exist and the technology is evolving. As Srinivasan Sundara Rajan from HP points out in September 6, 2011 article on SOA World Magazine Site titled, "Traditional vs Big Data Analytics," "Big data analytics provide new ways for businesses and government to analyze unstructured data which so far have been rejected by the data cleansing routines in a typical enterprise data warehouse scenario." This same technology will be useful for information governance and eDiscovery. And, from a requirements standpoint, may prove to be a very interesting and financially rewarding vertical for the technologists to address.

The full text of the Srinivasan Sundara Rajan's article is as follows:

Big Data Analytics Convergence Among the Major IT Companies
Major IT companies acquiring analytics software and application providers has been the order of the day. We have seen the words ‘Big Data Analytics' being used in many solutions for the enterprise.
‘Big Data' is the general term used to represent massive amounts of unstructured data that are not traditionally stored in a Relational form in enterprise databases. The following are the general characteristics of Big Data.

Data storage defined in order of PETA BYTES, EXA BYTES and much higher in volume to the current storage limits in enterprises which TERA BYTES.
Generally it is considered as Unstructured data and not really falling the under the relational database design which the enterprises have been used to
Data Generated using unconventional methods outside of data entry like, RFID, Sensor networks etc...
Data is time sensitive and consists of data collected with relevance to the time zones

In the past, the term ‘Analytics' has been used in the business intelligence world to provide tools and intelligence to gain insight into the data through fast, consistent, interactive access to a wide variety of possible views of information.

Very close to the concept of analytics, data mining has been used in enterprises to keep pace with the critical monitoring and analysis of mountains of data. The biggest challenge is how to unearth all the hidden information through the vast amount of data.

Traditional DW Analytics vs Big Data Analytics
The analytics of enterprise data toward meaningful insights into the information that exists over a period of
time in that context is why Big Data Analytics makes it different from traditional data warehouse analytics.

Traditional Data warehouse Analytics	Big Data Analytics
Traditional Analytics analyzes on the known data terrain that too the data that is well understood. Most of the data warehouses have a elaborate ETL processes and database constraints, which means the data that is loaded inside a data warehouse is well under stood, cleansed and in line with the business metadata.	The biggest advantages of the Big Data is it is targeted at unstructured data outside of traditional means of capturing the data. Which means there is no guarantee that the incoming data is well formed and clean and devoid of any errors. This makes it more challenging but at the same time it gives a scope for much more insight into the data.
Traditional Analytics is built on top of the relational data model, relationships between the subjects of interests have been created inside the system and the analysis is done based on them.	In typical world, it is very difficult to establish relationship between all the information in a formal way, and hence unstructured data in the form images, videos, Mobile generated information, RFID etc... have to be considered in big data analytics. Most of the big data analytics database are based out Columnar databases.
Traditional analytics is batch oriented and we need to wait for nightly ETL and transformation jobs to complete before the required insight is obtained.	Big Data Analytics is aimed at near real time analysis of the data using the support of the software meant for it
Parallelism in a traditional analytics system is achieved through costly hardware like MPP (Massively Parallel Processing) systems and / or SMP systems.	While there are appliances in the market for the Big Data Analytics, this can also be achieved through commodity hardware and new generation of analytical software like Hadoop or other Analytical databases.

Traditional Data warehouse Analytics	Big Data Analytics
Traditional Analytics analyzes on the known data terrain that too the data that is well understood. Most of the data warehouses have a elaborate ETL processes and database constraints, which means the data that is loaded inside a data warehouse is well under stood, cleansed and in line with the business metadata.	The biggest advantages of the Big Data is it is targeted at unstructured data outside of traditional means of capturing the data. Which means there is no guarantee that the incoming data is well formed and clean and devoid of any errors. This makes it more challenging but at the same time it gives a scope for much more insight into the data.
Traditional Analytics is built on top of the relational data model, relationships between the subjects of interests have been created inside the system and the analysis is done based on them.	In typical world, it is very difficult to establish relationship between all the information in a formal way, and hence unstructured data in the form images, videos, Mobile generated information, RFID etc... have to be considered in big data analytics. Most of the big data analytics database are based out Columnar databases.
Traditional analytics is batch oriented and we need to wait for nightly ETL and transformation jobs to complete before the required insight is obtained.	Big Data Analytics is aimed at near real time analysis of the data using the support of the software meant for it
Parallelism in a traditional analytics system is achieved through costly hardware like MPP (Massively Parallel Processing) systems and / or SMP systems.	While there are appliances in the market for the Big Data Analytics, this can also be achieved through commodity hardware and new generation of analytical software like Hadoop or other Analytical databases.

Use Cases for Big Data Analytics
Enterprises can understand the value of Big Data Analytics based on the use cases and how the traditional problems can be solved with the help of Big Data Analytics. The following are some of the usages.

Customer Satisfaction and Warranty Analysis: Probably this is the one big area that most product-based enterprises are worried about. As of today, there is not a clear way of gauging the issues with the products and the associated customer satisfaction, unless they come in a formal way in an electronic form.

Information regarding quality is collected through various external channels and most of the times the data is not clean
As the data is unstructured there is no way to relate the associated issues, so that the long-term fix can be given to customer.
Classification and grouping of problem statements are missing , resulting enterprises not able to group the issues

From the above discussion, utilizing the Big Data Analytics for customer satisfaction and Warranty analysis will help enterprises gain insight into the much-needed customer mind set and solve their problems effectively and to avoid them in their new product lines.

Competitor Market Penetration Analysis: In today's economy where the competition is high, we need to gauge the areas where the competitors are strong and their pain points through an analysis within the legal means. This information is available in a variety of web sites, social media sites and other public domains. Big data analytics on this data can provide an organization with much needed information about Strength, Weakness, Opportunities and Threats for their product lines.

Healthcare / Epidemic Research & Control: Epidemics and seasonal diseases like influenza start with certain patterns among the people and they spread to a larger section if they are not detected early and controlled. This is one of the biggest challenges for growing as well as developed nations. The current issue most of the times the symptoms vary between the people and various health care providers treat them differently. There is also not a common classification of symptoms across people. Adopting Big Data Analytics on this typically unstructured data will help the local governments to effectively tackle the outbreak situations.

Product Feature and Usage Analysis: Most product companies, especially consumer products, keep adding lot of features to their product line, however it may happen that some of the features are not really used by the consumers and some are used more and effective analysis of this data captured by various mobile devices and other RFID based inputs can provide valuable insights to the product companies.

Future Direction Analysis: The trends in each business are analyzed by research groups and this information is available through industry specific portals or even common web blogs. Constant analysis of this futuristic data will help enterprises to look forward to future and bring them to their product lines.

Summary
Big data analytics provide new ways for businesses and government to analyze unstructured data which so far have been rejected by the data cleansing routines in a typical enterprise data warehouse scenario. However as evident from the use cases above, these analyses will go a long way in improving the operations of the organizations. We will see more convergence of the products and appliances in this space in the coming days.

The eDiscovery Paradigm Shift

Tuesday, September 6, 2011

eDiscovery and Big Data Analytics

No comments:

Post a Comment