This Page

has been moved to new address

eDiscovery and Big Data Analytics

Sorry for inconvenience...

Redirection provided by Blogger to WordPress Migration Service
----------------------------------------------------- Blogger Template Style Name: Snapshot: Madder Designer: Dave Shea URL: mezzoblue.com / brightcreative.com Date: 27 Feb 2004 ------------------------------------------------------ */ /* -- basic html elements -- */ body {padding: 0; margin: 0; font: 75% Helvetica, Arial, sans-serif; color: #474B4E; background: #fff; text-align: center;} a {color: #DD6599; font-weight: bold; text-decoration: none;} a:visited {color: #D6A0B6;} a:hover {text-decoration: underline; color: #FD0570;} h1 {margin: 0; color: #7B8186; font-size: 1.5em; text-transform: lowercase;} h1 a {color: #7B8186;} h2, #comments h4 {font-size: 1em; margin: 2em 0 0 0; color: #7B8186; background: transparent url(http://www.blogblog.com/snapshot/bg-header1.gif) bottom right no-repeat; padding-bottom: 2px;} @media all { h3 { font-size: 1em; margin: 2em 0 0 0; background: transparent url(http://www.blogblog.com/snapshot/bg-header1.gif) bottom right no-repeat; padding-bottom: 2px; } } @media handheld { h3 { background:none; } } h4, h5 {font-size: 0.9em; text-transform: lowercase; letter-spacing: 2px;} h5 {color: #7B8186;} h6 {font-size: 0.8em; text-transform: uppercase; letter-spacing: 2px;} p {margin: 0 0 1em 0;} img, form {border: 0; margin: 0;} /* -- layout -- */ @media all { #content { width: 700px; margin: 0 auto; text-align: left; background: #fff url(http://www.blogblog.com/snapshot/bg-body.gif) 0 0 repeat-y;} } #header { background: #D8DADC url(http://www.blogblog.com/snapshot/bg-headerdiv.gif) 0 0 repeat-y; } #header div { background: transparent url(http://www.blogblog.com/snapshot/header-01.gif) bottom left no-repeat; } #main { line-height: 1.4; float: left; padding: 10px 12px; border-top: solid 1px #fff; width: 428px; /* Tantek hack - http://www.tantek.com/CSS/Examples/boxmodelhack.html */ voice-family: "\"}\""; voice-family: inherit; width: 404px; } } @media handheld { #content { width: 90%; } #header { background: #D8DADC; } #header div { background: none; } #main { float: none; width: 100%; } } /* IE5 hack */ #main {} @media all { #sidebar { margin-left: 428px; border-top: solid 1px #fff; padding: 4px 0 0 7px; background: #fff url(http://www.blogblog.com/snapshot/bg-sidebar.gif) 1px 0 no-repeat; } #footer { clear: both; background: #E9EAEB url(http://www.blogblog.com/snapshot/bg-footer.gif) bottom left no-repeat; border-top: solid 1px #fff; } } @media handheld { #sidebar { margin: 0 0 0 0; background: #fff; } #footer { background: #E9EAEB; } } /* -- header style -- */ #header h1 {padding: 12px 0 92px 4px; width: 557px; line-height: 1;} /* -- content area style -- */ #main {line-height: 1.4;} h3.post-title {font-size: 1.2em; margin-bottom: 0;} h3.post-title a {color: #C4663B;} .post {clear: both; margin-bottom: 4em;} .post-footer em {color: #B4BABE; font-style: normal; float: left;} .post-footer .comment-link {float: right;} #main img {border: solid 1px #E3E4E4; padding: 2px; background: #fff;} .deleted-comment {font-style:italic;color:gray;} /* -- sidebar style -- */ @media all { #sidebar #description { border: solid 1px #F3B89D; padding: 10px 17px; color: #C4663B; background: #FFD1BC url(http://www.blogblog.com/snapshot/bg-profile.gif); font-size: 1.2em; font-weight: bold; line-height: 0.9; margin: 0 0 0 -6px; } } @media handheld { #sidebar #description { background: #FFD1BC; } } #sidebar h2 {font-size: 1.3em; margin: 1.3em 0 0.5em 0;} #sidebar dl {margin: 0 0 10px 0;} #sidebar ul {list-style: none; margin: 0; padding: 0;} #sidebar li {padding-bottom: 5px; line-height: 0.9;} #profile-container {color: #7B8186;} #profile-container img {border: solid 1px #7C78B5; padding: 4px 4px 8px 4px; margin: 0 10px 1em 0; float: left;} .archive-list {margin-bottom: 2em;} #powered-by {margin: 10px auto 20px auto;} /* -- sidebar style -- */ #footer p {margin: 0; padding: 12px 8px; font-size: 0.9em;} #footer hr {display: none;} /* Feeds ----------------------------------------------- */ #blogfeeds { } #postfeeds { }

Tuesday, September 6, 2011

eDiscovery and Big Data Analytics

If cloud computing in general is the next challenge facing information governance and eDiscovery.  Then, Big Data Analytics is one of the specific issues that information governance and eDiscovery technologist are going to have to conquer.

Cloud computing is an IT infrastructure choice, storage architecture and application delivery mechanism.  And, combined with mobile computing, a choice that will results in more Electronically Stored Information (ESI) or Electronically Stored Evidence (ESE), if you are a lawyer, than the total information from all previous generations.

And, whereas the eDiscovery industry has been struggling to collect, process and analyze terabytes of data in a reasonable amount of time for a reasonable cost, this new paradigm of cloud computing and its associated federated data stores is already producing peta and exabytes of data.  Hence, the term Big Data (its actually all a matter of perspective).  Of even more concern is the fact that as the eDiscovery market has been struggling to appropriately and accurately analyze structured data, the new paradigm of Big Data in the cloud is largely unstructured data and therefore largely left out of the eDiscovery equation.

Today, whether right or wrong and due largely to a lack of understanding and associated technological and financial restraints, very little unstructured data is even considered during 26(f) strategies and is therefore left out of most litigation. From an eDiscovery standpoint, there is no doubt that there are "smoking guns" hiding in some Big Data store as unstructured data and as such brings a whole new meaning to the phrase of "looking for a needle in a haystack".

However, I believe that there is hope as Big Data analytics do in fact exist and the technology is evolving. As Srinivasan Sundara Rajan from HP points out in September 6, 2011 article on SOA World Magazine Site titled, "Traditional vs Big Data Analytics," "Big data analytics provide new ways for businesses and government to analyze unstructured data which so far have been rejected by the data cleansing routines in a typical enterprise data warehouse scenario."  This same technology will be useful for information governance and eDiscovery.  And, from a requirements standpoint,  may prove to be a very interesting and financially rewarding vertical for the technologists to address.

The full text of the Srinivasan Sundara Rajan's article is as follows:

Big Data Analytics Convergence Among the Major IT Companies
Major IT companies acquiring analytics software and application providers has been the order of the day. We have seen the words ‘Big Data Analytics' being used in many solutions for the enterprise.
‘Big Data' is the general term used to represent massive amounts of unstructured data that are not traditionally stored in a Relational form in enterprise databases. The following are the general characteristics of Big Data.
  • Data storage defined in order of PETA BYTES, EXA BYTES and much higher in volume to the current storage limits in enterprises which TERA BYTES.
  • Generally it is considered as Unstructured data and not really falling the under the relational database design which the enterprises have been used to
  • Data Generated using unconventional methods outside of data entry like, RFID, Sensor networks etc...
  • Data is time sensitive and consists of data collected with relevance to the time zones
In the past, the term ‘Analytics' has been used in the business intelligence world to provide tools and intelligence to gain insight into the data through fast, consistent, interactive access to a wide variety of possible views of information.

Very close to the concept of analytics, data mining has been used in enterprises to keep pace with the critical monitoring and analysis of mountains of data. The biggest challenge is how to unearth all the hidden information through the vast amount of data.

Traditional DW Analytics vs Big Data Analytics
The analytics of enterprise data toward meaningful insights into the information that exists over a period of
time in that context is why Big Data Analytics makes it different from traditional data warehouse analytics.


Traditional Data warehouse AnalyticsBig Data Analytics
Traditional Analytics  analyzes on the known data terrain that too the data   that is well understood.  Most of the data warehouses have a elaborate ETL processes and database constraints, which means the data that is loaded inside a data warehouse is well under stood, cleansed and in line with the business metadata.The biggest advantages of the Big Data  is it is targeted at unstructured data outside of traditional means of capturing the data. Which means there is no guarantee that the incoming data is well formed and clean and  devoid of any errors.  This makes it more challenging but at the same time it gives a   scope for much more insight into the data.
Traditional Analytics is built on top of the relational data model,  relationships between the subjects of interests have been created  inside the system and the  analysis is done based on them.In typical world, it is very difficult to establish  relationship between all the information in a formal way, and  hence unstructured data in the form  images, videos, Mobile generated information, RFID etc... have to be considered in big data analytics. Most of the big data analytics database are based out  Columnar databases.
Traditional  analytics is batch oriented  and  we need to wait for nightly ETL and transformation jobs to complete before the required insight is obtained.Big Data Analytics is aimed at  near real time analysis of the data using the  support of the software meant for it
Parallelism in  a traditional analytics system is achieved  through  costly hardware like MPP (Massively Parallel Processing) systems   and / or  SMP systems.While there are appliances in the market for the Big Data Analytics,  this can also be achieved  through commodity hardware and new generation of analytical software like Hadoop or other Analytical databases.
Traditional Data warehouse AnalyticsBig Data Analytics
Traditional Analytics  analyzes on the known data terrain that too the data   that is well understood.  Most of the data warehouses have a elaborate ETL processes and database constraints, which means the data that is loaded inside a data warehouse is well under stood, cleansed and in line with the business metadata.The biggest advantages of the Big Data  is it is targeted at unstructured data outside of traditional means of capturing the data. Which means there is no guarantee that the incoming data is well formed and clean and  devoid of any errors.  This makes it more challenging but at the same time it gives a   scope for much more insight into the data.
Traditional Analytics is built on top of the relational data model,  relationships between the subjects of interests have been created  inside the system and the  analysis is done based on them.In typical world, it is very difficult to establish  relationship between all the information in a formal way, and  hence unstructured data in the form  images, videos, Mobile generated information, RFID etc... have to be considered in big data analytics. Most of the big data analytics database are based out  Columnar databases.
Traditional  analytics is batch oriented  and  we need to wait for nightly ETL and transformation jobs to complete before the required insight is obtained.Big Data Analytics is aimed at  near real time analysis of the data using the  support of the software meant for it
Parallelism in  a traditional analytics system is achieved  through  costly hardware like MPP (Massively Parallel Processing) systems   and / or  SMP systems.While there are appliances in the market for the Big Data Analytics,  this can also be achieved  through commodity hardware and new generation of analytical software like Hadoop or other Analytical databases.


Use Cases for Big Data Analytics
Enterprises can understand the value of Big Data Analytics based on the use cases and how the traditional problems can be solved with the help of Big Data Analytics. The following are some of the usages.

Customer Satisfaction and Warranty Analysis: Probably this is the one big area that most product-based enterprises are worried about. As of today, there is not a clear way of gauging the issues with the products and the associated customer satisfaction, unless they come in a formal way in an electronic form.
  • Information regarding quality is collected through various external channels and most of the times the data is not clean
  • As the data is unstructured there is no way to relate the associated issues, so that the long-term fix can be given to customer.
  • Classification and grouping of problem statements are missing , resulting enterprises not able to group the issues
From the above discussion, utilizing the Big Data Analytics for customer satisfaction and Warranty analysis will help enterprises gain insight into the much-needed customer mind set and solve their problems effectively and to avoid them in their new product lines.

Competitor Market Penetration Analysis: In today's economy where the competition is high, we need to gauge the areas where the competitors are strong and their pain points through an analysis within the legal means. This information is available in a variety of web sites, social media sites and other public domains. Big data analytics on this data can provide an organization with much needed information about Strength, Weakness, Opportunities and Threats for their product lines.

Healthcare / Epidemic Research & Control: Epidemics and seasonal diseases like influenza start with certain patterns among the people and they spread to a larger section if they are not detected early and controlled. This is one of the biggest challenges for growing as well as developed nations. The current issue most of the times the symptoms vary between the people and various health care providers treat them differently. There is also not a common classification of symptoms across people. Adopting Big Data Analytics on this typically unstructured data will help the local governments to effectively tackle the outbreak situations.

Product Feature and Usage Analysis: Most product companies, especially consumer products, keep adding lot of features to their product line, however it may happen that some of the features are not really used by the consumers and some are used more and effective analysis of this data captured by various mobile devices and other RFID based inputs can provide valuable insights to the product companies.

Future Direction Analysis: The trends in each business are analyzed by research groups and this information is available through industry specific portals or even common web blogs. Constant analysis of this futuristic data will help enterprises to look forward to future and bring them to their product lines.

Summary
Big data analytics provide new ways for businesses and government to analyze unstructured data which so far have been rejected by the data cleansing routines in a typical enterprise data warehouse scenario. However as evident from the use cases above, these analyses will go a long way in improving the operations of the organizations. We will see more convergence of the products and appliances in this space in the coming days.

Labels: , , , ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home