The eDiscovery Paradigm Shift: eDiscovery XML

Coming from the object oriented software development world and more recently from the Software-as-a-Service (SaaS) development world, XML or eXtensible Markup Language is a mainstay. However, with my new pond being eDiscovery technology, XML is just another acronym that everyone has to learn. And, although us eDiscovery technology gurus may never catch up to the SaaS nerds, I would suspect that XML is going to have to be a big part of our pond as we all seek to exchange and integrate the massive amounts of ESI that we have so eloquently generated.

As such, I found the following article, E-Discovery Guru Not Yet Wed to XML, by Craig Ball published on the Law Technology News site on March 25, 2008 to be extremely informative and timely. The text of Mr. Ball's article is as follows:

I want to love XML. I want to embrace it with the passion of my wiser colleagues, excited by its schemas, titillated by its well-formed code, flushed from its pull-parsing. I want to love XML as much as the cool kids do. So why does it leave me cold?

I want XML the dragon slayer: all the functionality of native electronic evidence coupled with the ease of identification, reliable redaction and intelligibility of paper documents. The promise is palpable; but for now, XML is just a clever replacement for load files, those clumsy Sancho Panzas that serve as squire to addled TIFF image productions. Maybe that's reason enough to love XML.

XML is eXtensible Markup Language, an unfamiliar name for a familiar technology. Markup languages are coded identifiers paired with text and other information. They can define the appearance of content, like the reveal-codes screen of Corel Inc.'s WordPerfect documents. They also serve to tag content to distinguish whether 09011957 is a birth date (09/01/1957), a phone number (0-901-1957) or a Bates number. Plus, markup languages allow machines to talk to each other in ways humans understand.

Internet surfers rely on a markup language called HyperText Markup Language or HTML that forms the pages of the World Wide Web. There's a good chance the e-mail you send or receive is HTML, too. If you've tried to move documents between WordPerfect and Microsoft Corp.'s Word, or synchronize information across different programs, you know success hinges on how well one application understands the data of another.

Something as simple as importing day-first European date formats to month-first U.S. systems causes big headaches if the recipient doesn't know what it's getting.
Standardized markup languages alleviate problems by tagging data to describe it (e.g., ), constraining data by imposing conditions (e.g., restricting dates to U.S. formats: ) and supporting hierarchic structuring of information (e.g., 01/09/1957).

There are so many kinds of data and metadata unique to applications and industries that a universal tagging system would be absurdly complex and couldn't keep pace with technology and business. Accordingly, XML is extensible; that is, anyone can create tags and set their descriptions and parameters. Then, just as persons with different native tongues can agree to converse in a language both speak, different computer systems can communicate using an agreed-upon XML implementation. It's Esperanto for electrons.

In e-discovery, we deal with information piecemeal, such as native documents and system metadata or e-mail messages and headers. We even deconstruct evidence by imaging it and stripping it of searchability, only to have to reconstruct the lost text and produce it with the image. Metadata, header data and searchable text tend to be produced in containers called load files housing delimited text, meaning that values in each row of data follow a rigid sequence and are separated by characters like commas, tabs or quotation marks. Using load files entails negotiating their organization or agreeing to employ a structure geared to review software such as CT Summation or Lexis Nexis Concordance. Conventional load files are unforgiving. Deviate from the required sequence, or omit, misplace or include an extra delimiter, and it's a train wreck.

By tagging each value to identify its content and connection to the evidence, XML brings intelligence and resilience to load files. More importantly, XML fosters the ability to move data from one environment to another simply by matching the tags to proper counterparts.
Like our multilingual speakers using a common language, as long as two systems employ the same XML tags and organization (typically shared as an XML Schema Definition or XSD file), they can quickly and intelligibly share information. Parties and vendors exchanging data can fashion a common schema custom tailored to their data or employ a published schema suited to the task.

There is no standard e-discovery XML schema in wide use, but consultants George Socha and Tom Gelbmann are promoting one crafted as part of their groundbreaking Electronic Discovery Reference Model project. Socha (a member of LTN's Editorial Advisory Board) and Gelbmann have done an impressive job securing commitments from e-discovery service providers to adopt EDRM XML as an industry lingua franca. See http://edrm.net.

A mature e-discovery XML schema must incorporate and authenticate native and nontextual data and ensure that the resulting XML stays valid and well-formed. It's feasible to encode and incorporate binary formats using MIME (the same way they travel via e-mail), and to authenticate by hashing; but these refinements aren't yet a part of the EDRM schema.

So stay tuned. I don't love XML yet, but it promises to be everyone's new best friend.

Craig Ball, a member of the editorial advisory boards of both LTN and Law.com Legal Technology is a trial lawyer and computer forensics/EDD special master, based in Austin, Texas.

The eDiscovery Paradigm Shift

Tuesday, March 25, 2008

eDiscovery XML

No comments:

Post a Comment