Wednesday, February 28, 2007

Three Stages of XML Migration and the Challenge to OpenDocument

  1. Conversion Fidelity - the billions of binaries problem
  2. Round Trip Fidelity - the MSOffice bound business processes, line of business integrated apps, and assistive technology type add-ons
  3. Application Interop - the cross platform, inter application, cross information domain problem

First it was the EU's controversial Valoris Report that centered on a fictional open standard XML universal file format known as “OpenDocument”. This was fictional open standard XML universal file format known as “OpenDocument”.

This was followed by the Massachusetts's pilot study and mandate for the real thing, the ISO/OASIS XML OpenDocument file format. The States of Texas and Minnesota weighed in with similar legislative proposals once again describing a open standard XML universal file format in measures that only the ISO/OASIS OpenDocument can measure up to.

And now comes the crusher. The State of California, touting what would be the world's eighth largest economy if it were a nation, steps forward into the breach with a legislative proposal describing in such detailed parameters what can only be the OpenDocument universal file format that one can almost see the butterfly tattoo on the specifications buttocks.

No doubt the great migration to XML is upon us, with XML universal file formats leading the way. Deciding to go with OpenDocument is the easy part.
Getting there is something else. A quote from Peter Quinn, the legendary CIO of Massachusetts who threw that great Commonwealth into the breach, says it all:
"Open document formats: I get it! But how do I get there? Discuss."
Sam Hiser has provided us with another fine commentary,“Open Standards Mandatory in Denmark”that's loaded with insight. He discusses the recent announcement by Denmark of their intentions to mandate support for two conflicting, contradictory, and irreconcilable XML file formats; the ISO/OASIS OpenDocument and the Microsoft Ecma Office Open XML proposal. For the life of me i don't see how mandating both ODF and OOXML will be helpful to any government. It's true that both are “XML” by design, but beyond that the promise and expectations of XML are broken by the proprietary application, platform and system specific dependencies that make up MS Ecma OOXML. In fact, the quality of XSL transformation filters between the two XML file formats is so bad it might as well be zero.

Although Microsoft assured Massachusetts that such a transformation would be trivial, nearly two years later they have nothing to show for their braggadocio.

Which means Denmark will be hoist on the petard of having their documents in two file formats that are not interchangeable. And except for the MSOffice ODF Plugins from Sun and the Foundation, application interoperability might as well be zero.

A Moving Target – MSOffice 2007 Plugin Architecture:

It's important to keep in mind that both daVinci and ACME 376, the Foundation's Plugins, were working fine in every beta release of MSOffice 2007, but broke in the final public version. I suspect Sun had the very same problem we did. It seems that Microsoft altered the system default for “zip” file format packages such that whenever a “zip” package like ODf is double clicked, the OOXML conversion engine automatically starts to convert the package to MSOffice 2007 in-memory-binary-representation. Of course, the OOXML conversion engine knows nothing about ODf. So the problem we now have is reverse engineering what looks to be a low level system “default” setting that loads the OOXML conversion engine instead of the daVinci conversion engine. Microsoft has a long history of dirty tricks and an ever moving APi, effectively protecting their monopoly from would be poachers and thieves, er “competitive application and service providers working the Windows – Vista platform”. And here we go again. This first act of aggressively blocking the ODf Plugins should signal governments loud and clear that moving into the Vista Stack of desktop, servers, devices and services is going to be the equivalent of exclusively mandating OOXML. We've known this to be the case in the past, the primary example being the Exchange/SharePoint Hub and developer platform which is optimized for MSOffice 2003 MSXML. So much so that the E/S Hub has to be considered an ODf killer. Yet even those governments exclusively mandating ODf are boldly going forward with E/S Hub purchases, totally unaware of the consequences. As i've said many times, Massachusetts is just one E/S Hub Court Docket System away from revoking their ODf requirement and standardizing on OOXML. (Watch carefully now, the hand is quicker than the eye; ViSTO 2005, which was released with MSOffice 2007, dropped support for MSXML entirely in favor of the MS version of OOXML. (i mention this because there is clear evidence that MOOXML, legacy MOOXML, and now MOOXL Binary InfoSet for Excel all include eXtensions and dependencies that differ from the Ecma 376 version submitted to ISO/IEC). Remember our Peter Quinn quote, "Open document formats: I get it! But how do I get there? Discuss.”? The simple truth is that the ODf Community is not providing the means to get to ODf. There is no bridge from the legacy installations of MSOffice and the billions of binary documents, to ODf ready applications and services.

Get to XML:

What most ODF-OOXML warriors forget is that the real issue for workgroup and workflow oriented consumers is getting their binary documents into XML. Which XML, ODf or OOXML, is a secondary consideration. The one thing everyone understands is that the way to connect important information domains, systems and architectures is through the skillful use of open XML and open Internet technologies. Important information domains and architectures includes:
  • Desktop Productivity Environments (the Office Suites)

  • Enterprise publication, content management and archive systems

  • SOA – Service Oriented Architectures

  • SaaS – Software as a Service systems

  • The Internet – Web 2.0 and beyond

You can't efficiently exchange information across these domains if it's trapped in application – platform specific and unstructured binary formats. Conversion fidelity requirements will break a binary bound exchange process every time. Even Microsoft realizes the need to move to XML, although they will of course strive to control that XML through continued use of application and platform specific dependencies and, artificially contrived implementation constraints.

XML Migration: The Three Stages

The problem as we saw in Massachusetts, Munich, and Bristol is that there are three stages of XML file format migration. The three stages for migrating to XML file formats are inextricably linked and must be tackled in exactly this order:
  1. Conversion Fidelity - the billions of binaries problem
  2. Round Trip Fidelity - the MSOffice bound business processes, line of business integrated apps, and assistive technology type add-ons
  3. Application Interop - the cross platform, inter application, cross information domain problem

Sadly, the ODf Community is near entirely focused on stage three, leaving the critically important first and second stages up to customers.

Interestingly, OOXML nails all three stages perfectly, with the proviso being that all interop applications be Microsoft application – platform specific. This is what they mean when they say, “interoperability by design”; applications designed to speak fluent OOXML, making exclusive internal use of .NET 3.0 system dependencies.

Note well that since the starting point of XML migration is that of MSOffice bound binaries and business processes, OOXML is today the only XML file format proposal that can perfectly answer all three stages of XML migration requirements. The OpenDocument Foundation does have three products in the works that are designed to also perfectly answer these same XML requirements, but the Foundations approach has met with considerable resistance, argument, doubt and continued outrage from the ODf Community at large. Note also that these three stages are only important to workgroup and workflowsituations – situations where MSOffice bound business processes are critical day to day to day operations. People who are involved in simple document exchange can easily move to ODf through an OpenOffice download and some simple conversion artifact fixes as documents are converted as needed. Workgroups and workflows require a very high level of round trip fidelity. Which is why mixed application environments present workgroups with an near insurmountable problem; even where the mix is limited to that of different MSOffice versions! Of course no one wants to see the life span of MSOffice eXtended indefinitely,which is exactly the conclusion most come too regarding the daVinci Plugin. daVinci makes it too too easy to stay with MSOffice since there is zero disruption to MSOffice bound business processes, line of business integrated apps, and the functioning of assistive technology add-ons. The ODf community doesn't realize three things.

  • First, it's an absolute real world requirement that the three stages of XML migration be met perfectly by any file format contestant.

  • Second, Microsoft has handed the ODf Community the opportunity of a lifetime regarding the most difficult stage for ODf; stage two, the migration of MSOffice bound business processes.

  • Third, consumers migrating to XML have no stomach, tolerance or funds for the highly disruptive and costly rip out and replace” approach most in the ODf community favor.

The Massachusetts RFi:

Massachusetts ITD deserves credit for figuring this out. The three stages are the reason behind their ODf Plugin for MSOffice RFi. Which is about as desperate a cry for help from the ODf Community as i've ever seen. There is no other way to describe the events in Massachusetts except to say that the ODf Community refused to listen, and worse, made no attempt whatsoever to understand the exact nature of problem consumers wanting to migrate to ODf face. Hence the unprecedented RFi (Request for Information). So what's going on here? It's the reality of being caught in the clutches of a monopolist for over ten years.

In 1995 Microsoft won the Office Suite application wars, with the next two years mopping up as they paved the way for the real crusher, MSOffice 97. The office suite wars were marked by the vicious marketing of comparative feature sets and competitive price inducements. The feature wars ended in 1995, were set by 1997, a time period during which a new movement had begun – the use of the ubiquitous MSOffice as a development platform for business processes, line of business integrated applications, and a proliferation of add-ons like those for assistive technologies.

In short, by 1997 it was no longer about “features”. It was all about MSOffice bound business processes. And the binding occurs at two levels; the MSOffice application layer, and the binary file format layer. This creates a binding trifecta of MSOffice, MS Binary Documents, MSOffice bound business processes.

The real battle today is over how to migrate or participate in the MSOffice bound business processes developed since 1995. This is the core of the problem every organization trying to migrate to ODf must confront. Massachusetts, to their everlasting credit, came up with a rather magical solution; the idea of using ODf Plugins for MSOffice to solve the problems of stages one and two. The stage three ODf problem of application interoperability is a work in progress, and i would direct everyone's attention to the ODf 1.2 progress ofthe three OASIS ODf sub committees; Accessibility, Open Formula, and Metadata RDF/XML. A great leap in application interop is brewing there, and expectations for ODf 1.2 run high with good reason.

The Problem With Rip Out and Replace:

Many people mistakenly believe that migrating to ODf is as easy as downloading for free OpenOffice 2.0. Which is true if your collaborative work is that of simple document exchange. For workgroups and workflows currently based on MSOffice bound business processes and binaries though, the disruptive and re engineering cost of moving those processes to ODf ready OpenOffice is impossibly high. While it's true that Novell is working on an automated means of converting bound business process VBA scripts to an OpenOffice footing, it's not there yet. But that's just a fraction of the problem. These business processes include compound documents for templates, forms, formula bound spreadsheets, and reports - often involving systems dependent data bindings. Rather than trying to unwind these MSOffice bound business processes and re write them to somehow become application independent and ODf ready, we believe there is another approach demanding consideration. Instead of migrating to an ODf ready desktop MSOffice alternative, we believe the ODf Community should instead be focused on the Microsoft migration of these same business processes to the OOXML enhanced Exchange/SharePoint/Groove Hub.

The Great Opportunity: The Vista Information Processing Chain

Briefly the greatopportunity is this: Microsoft has been very busy migrating those MSOffice bound business processes to the Exchange/SharePoint Hub, which will soon be joined with Groove Collaboration servers. XML Hubs are interesting in that they are an indispensable core to any SOA effort. The Hubs are a very effective point where access, aggregation and repurposing of XML services and information streams connecting to such things as backend data, transaction, inventory and billing processing or content –archival management systems. As a universal transformation layer able to connect to many disparate and purpose specific backend legacy systems, XML is unparalleled.

But what if these same XML Hubs are integrated as a way station connecting desktops and devices to Internet server systems as well as the legacy backends? And there my friends is the magic of an Exchange/SharePoint/Groove Hub. On the desktop side E/S Hubs integrate directly with the MSOffice productivity environment using OOXML (versions Ecma 376, MOOXML, legacy MOOXML, and MOOXML Binary InfoSet) as the both the XML container and transport. The E/S Hub provides the integration point where eMail, documents, workflows, project participants (people), and information resources (web services, XHRequest streams, media streams, and message streams) can be aggregated, sorted, re purposed, published, managed and scheduled. At the E/S Hub it's easy to bind data from MS SQL Server systems, MS CRM, MS ERP and MS Live to the portable OOXML document transport.

The most important point about XML Hubs is that migrating business processes to them always results in extraordinary productivity gains. Yes, the E/S Hub represents a new lock-in point for MSOffice bound workgroup customers. The immediate productivity gains however will far outweigh the long term cost of having your business lock-in to MS only applications and services for the next fifteen years. So where's the opportunity you ask?

By intercepting this migration at the head point, MSOffice, converting the documents to ODf, and provisioning ODf ready Hubs the ODf Plugins could potentially walk off with the entire monopoly base of over 500 million MSOffice bound desktops. The ODf Plugin route is infinitely cheaper in that there is no need for desktop upgrades. And, the cost of long term MS lock-in is completely negated.

The ODf Plugin alone however isn't enough - which is exactly why the Foundation began development of the lightweight, efficient and highly portable ODf InfoSet Engine and APi. We need ODf ready Hubs to complete the ODf Information Processing Chain alternative. ODf Hubs that can compete with E/S Hub Juggernaut.

C'mon Alfresco, Lotus, and Zimbra - the whole world is waiting.

One more thing about these ODf Hubs and ODf ready applications that span desktop, servers and device platforms. The Internet has ushered in a new world of universal connectivity, exchange and collaborative computing. Some ODf ready applications will be designed to participate in specific information processing chains, so they will be built to act as efficient routers of information.

Other applications will profess to be ODf ready, but continue to act in the legacy traditions of information end points - ignoring the ODf document needs as defined by other information processing chains; dropping objects and data binding mechanisms they don't understand or lack the feature sets needed to make use of. Or, they might add value by asserting application or platform specific dependencies that would otherwise corrupt the documents use in other information processing chains.

Application Interoperability

The problems of application interoperability are far more difficult for ODf than for OOXML because ODf was designed for a very different objective.

OOXML was designed exactly for MSOffice bound binary and business process compatibility, (stage I & II of the XML migration requirement) including application and platform specific dependencies.

ODf was designed to be an application and platform independent universal file format, dependent on open XML and other Internet technologies available to any application or service.

For ODf, proprietary or platform specific dependencies are an interop killer. For OOXML, proprietary and platform specific dependencies are the monopolists life blood of "interoperability by design”.

The world is going to move to XML, and from there to RDF/XML. That's a given. OOXML has the incredible advantage of meeting the three “migration to XML” requirements. But the migration comes at the cost of long term business process lock-in, with interoperability demands limiting applications and services to the Vista Stack of:

MSOffice <> OOXML <> IE <>ViSTO .NET <> E/S/G Hub <> MS Active Directory <> MS SQL Server <> MS CRM <> MS ERP <> MS Live etc. etc. etc.

The core of this chain is the MSOffice <> OOXML <> Exchange/SharePoint Hub

ODf can similarly meet the the three “migration to XML” requirements, and do so just as well as OOXML – in spite of MS claims otherwise. Although they anger and upset the rip out and replace bent ODf community crowd, the ODf plugins for MSOffice from Sun and the Foundation are the only way ODf can crack and perfect these critically important “migration to XML” requirements. Danes please take note!

And of course, there is today no ODf Stack or ODf information Processing Chain comparable to the one Microsoft has unleashed. Some great stuff cooking with Lotus Notes, but they are obviously lacking a strategy for stage I & II of the “migration to XML” requirements. Alfresco is hard charging, but they have that same problem. Zimbra and Google Hubs are still mired in the application as end points approach where they see conversion of documents as a one way process where continuing and persistent loss of fidelity and the dropping of feature related objects or bindings is the expected and unavoidable collateral damage cost, so get over it.

Meanwhile, the ISO/IEC consideration of Ecma 376 is a diversion from where the real action is; the optimized for MSOffice OOXML Exchange/SharePoint juggernaut.

It's high noon for XML. Do you know where your ODf information processing chain is?


Notes: The OpenDocument Foundation is working on three products that we believe to be essential to the development of an ODf Information Processing Chain that can connect desktops, servers, devices and Internet systems transitioning information based on the highly interactive and collaborative portable XML document/data model/streaming media model.

These products are still in development. What follows are the design goals and objectives:

  • daVinci ODf 1.2 Plugin for MSOffice

  • InfoSet Engine & APi

  • Interop Wizard for OpenOffice

  • ACME 376 XML-RTF Plugin for MSOffice

The daVinci ODf 1.2 Plugin:

To the best of our knowledge, daVinci continues to be the only ODf plugin for MSOffice designed to work “internally”. What happens is that daVinci triggers the internal conversion process Microsoft uses to do MSOffice conversions.

DaVinci triggers the internal process, intercepting an undocumented internal super structure we call “MS Universal RTF”, for lack of a better name. This undocumented super structure accounts for the extraordinary near perfect conversion fidelity achieved by daVinci and demonstrated by the ACME 376 plugin.

We do not believe similar conversion fidelity results can be achieved by either the MCN XSL Translator plugin approach, or the traditional “external” binary file reverse engineering approach used by OpenOffice and the Sun ODf Plugin. Which is perhaps why Microsoft themselves use this same internal process to convert legacy MS binary documents to MS OOXML.

The daVinci conversion process follows this “internal” sequence (and it'sreverse):

imbr <> MS Universal RTF <> daVinci InfoSet <> ODf 1.2

imbr :: Microsoft in-memory-binary-representation which becomes a MS Binary (dump) on save (the reverse on load).

MS Universal RTF: This is a very special structure that MSOffice uses to do all conversions from imbr to MSXML, OOXML, and RTF (the RTF the rest of the world uses).

DaVinci InfoSet: A super InfoSet structure daVinci imposes on the MS Universal RTF structure.

ODf 1.2 :: daVinci maps from InfoSet to ODf 1.0 (first daVinci version) and ODf 1.2 to produce the ODf file. The mapping mechanism could be redirected to Chinese UOF or even a subset of OOXML with some work.

The ODf 1.2 metadata RDF/XML model provides daVinci with the descriptive flexibility needed to maintain a high level of interoperability. The first version of daVinci provided to Massachusetts in June of 2006 targeted ODf 1.0 and high interoperability with OpenOffice. This came at the expense of perfect conversion fidelity and much needed round trip fidelity with the legacy of billions of MS binary document. In essence, the 85% conversion fidelity one expects from OpenOffice was exactly the same conversion fidelity delivered by daVinci ODf 1.0.

ODf 1.2 enables daVinci to hit near perfect conversion fidelity without compromising on the high level of application interoperability users expect.

The ACME 376 plugin is available for public download and testing - (the ODf 1.2 metadata RDF/XML proposal will not be completed until April of 2007). Although ACME 376 perfects a XML encoding of RTF, it demonstrates the extraordinary near perfect conversion fidelity provided by the daVinci engine.

The daVinci Plugin design objectives includes these features:

  • daVinci plugin for MSWord, Excel and PowerPoint :: MSOffice versions 1998, 2000, XP Office, XP 2003 and 2007 (note- unexpected problems with final release of MSOffice 2007)

  • Accessibility Interface installed with daVinci (tagging graphic & media objects in MSOffice with ODf 1.1 accessibility eXtensions)

  • PDF/ODF w/Digital Signatures (XML-XForms sig model) :: a combination PDF/ODF file that can be read in Acrobat, but edited in any ODf ready application given proper digital rights :: excellent for data binding, transport and extraction

  • XForms Interface for MSOffice installed with daVinci

  • Enhanced document library storage search, re use, and re purposing through the advanced ODf 1.2 metadata RDF/XML model

  • Enhanced cross platform/cross information domain application

  • interoperability through the ODf 1.2 metadata RDF/XML model

InfoSet Engine & APi:

Based on our knowledge and experience of working with the MSOffice internal conversion process, we began development on a ODf InfoSet Engine with developers APi. The objective is to provide developers, IDE's, Server-Device-Desktop Application providers and cross platform run time engines with a complete ODf conversion and layout engine that is light weight, portable, and easy to embed. Work on the “layout” engine portion has not yet begun.

The Interop Wizard for OpenOffice:

This is a plugin for OpenOffice that enables workgroup user to set either the individual document or default document settings to provide perfect interop with MSOffice desktops.

The issue here is that mixed environment workgroups (ODf Plugin MSOffice & OpenOffice desktops) will face a near insurmountable problem of round trip fidelity corruption. This has nothing to do with file formats or conversion fidelity processing, and everything to do with an issue known as layout engine impedance mismatch.

OpenOffice has a very sophisticated and complex layout engine that is optimized for the advanced presentation and use of “styles”. MSOffice on the other hand has a comparatively simple layout engine design and does not attempt to do common presentation tasks such as “table within a table”.

The layout engine mismatch limits and defines certain feature sets that make it structurally impossible to map documents between the two application suites. The “structural” problems are well known, and fall into these five categories: lists, fields, sections, page breaks, and tables.

The Interop Wizard shuts off the advanced feature sets of OpenOffice, limiting functionality to mimic near exactly the structural features of MSOffice. When the wizard is on, the advanced features are not available unless the end user overrides the warning.

Post a Comment