Tuesday, January 30, 2007

Yankees in the Court of King Arthur, with a Microsoft Agenda

ANSI/INCiTS has completed their review of Ecma 376, and is ready to cast their ISO/IEC Contradiction Review Phase Fast Track Ballot in favor of Ecma 376 being rammed through ISO. As Sam Hiser points out in his PlexNex blog, not only are the findings of contradictions, inconsistencies, and proprietary dependencies pouring into the public view, there's not much an American can do about it. ANSI/INCiTS has determined that no contradictions exist."

Hi Sam,

As a fellow American, prepare yourself for humiliation and shame. ANSI/INCiTS has decided that they will not object to MS Ecma 376 on the ISO/IEC Fast Track Ballot. This in spite of the massive compilation of contradictions and inconsistencies compiled by GrokDoc, whose members raced through the weekend just to have the document ready for the ANSI/INCiTS meetings.

Rather than confront the clear evidence of contradictions and inconsistencies, the brave hearts at ANSI/INCiTS choose to narrow the definition of what a contradiction is. And narrow it they did.

They decided that one standard contradicts another standard only if the proposed standard causes the existing standard not to work.

Sounds good doesn't it? Au contraire mon ami. This is the kind of self serving maneuvering only a bureaucrat could love.

Using the analogy of the Chinese WAPI WiFi networking standard that was defeated last year because the protocol caused radio interference with existing 801.11 networks, our standards champions came up with their mechanical interference measure of “contradiction”.

Because both files can physically exist on the same disk without interfering with each other, our champions determined that OOXML did not contradict ODF.

Maybe they thought this would go unnoticed, but as one disheartened friend of open standards pointed out, “this argument can be used for every XML format, every programming language, every operating system, in fact every software standard, since software is ultimately data, and data can be segregated on disks. So they essentially chose a definition so narrow that it nullified the concept of "contradiction" for most of what JTC1 has authority over”.

So narrow a definition of “contradiction” that you can drive a fleet of monopolist trucks through the hole they've carved out of the ISO/IEC standards process.

So here we are. The champions appointed by our National Institute of Standards and Technology to represent our interest in the International Standards process have carved out a dangerous and possibly enduring loophole in the ISO/IEC fast track process. A loophole designed to serve the interest of a single proprietary monopolist seeking to control mankind's digital future.

Sam, where do we hide?

To our friends abroad fighting desperately to preserve the integrity of international open standards, with our digital freedom and the future of open Internet on the line, we ask that they dig in their heels and fight this to the end. Those Yankees you see striding into the Court of King Arthur are not from Connecticut, and the Camelot of ISO had better beware. Stalking standards to the tune of that ugliest of all Americans, the corporation from Redmond, ANSI/INCiTS has sold our souls that the world might be fodder for Vista.

Dig in your heels friends,

~ge~

Thursday, January 25, 2007

Brace Yourselves! ACME 376 XML Is Here!

The ACME 376 Proposal for International Standardization
The following letter is to Rob Weir, who somehow managed to get his own personal XML language approved as a standard. We too would like our personal XML language and file format approved. But we don't have Rob's connections. We also don't pack the massive checkbook he carries! Still, our XML is just as worthy of international consideration. We also are providing the standards bodies members with an easy to download ACME 376 Compatibility Kit for versions of MSOffice 97-2007 so that they can test and use our personal XML language. We need help Rob!



Dear Rob,
We were very inspired by your "ECMA Weirish" effort. We too have ideas, thoughts and business needs so unique that we need our own XML language. So we invented ACME 376. We even have a ACME 376 Plugin for MSOffice so that coyote's everywhere can immediately improve their road runner catching productivity using our own XML language, "ACME 376". Now coyote's, plumbers and amateur bomb-makers everywhere can finally join the XML revolution, converting their billions of legacy binary documents to perfect, 100% high-fidelity ACME 376!

It's a great day for coyote's the world over. Woe to the hapless roadrunners who dissed us in the past.

So Rob, how did you get ECMA to rubber stamp "ECMA Weirish"? We need the same no-questions-asked rubber stamping for ACME 376. And from there we need to push it through ISO/IEC, but in a way so that the road runners of the world are caught unawares of our nefarious plot for their destruction.

Could you please review the following proposal and advise,

~ge~

Wanted: A Standards Body Willing To Rubber Stamp
Inspired by Rob Weir’s blog who intended to standardize “Weirish” as an international standard, we thought that if Rob has his own standard then we need one too.

So, we invented the “ACME 376” file format. It meets all the requirements of the ECMA TC 45, namely:

  • * it is XML! Actually “.acme” files are ZIP files which contain pure XML.
  • * it is compatible with Microsoft Office (2000, XP and 2003).
  • * it can convert those billions of binary documents with perfect fidelity
  • * it is great!

Make your Microsoft Word desktop productivity environment “ACME 376” ready today. You can download the installer here. This is not a joke! If you have problems, leave a comment here.

Standards bodies of the world! ACME 376 needs your consideration and approval! We need your vote for ISO/IEC approval now. We made sure not to use any other existing standard so a review for contradiction is not necessary.

Make ACME 376 an ISO standard today!

Download the installer for ACME 376 Compatibility Kit for Microsoft Office.

Monday, January 22, 2007

Running On OpenDocument Inside of Microsoft Office

Perfect Conversion Fidelity & The daVinci ODF Plugin for Microsoft Office

By now it's clear that Microsoft's Ecma-approved Open Office XML file format specification is filled with contradictions to existing ISO/IEC standards products. Beyond the traditions of international standards consideration, there is a second perhaps even more important concern: How do we reasonably migrate from a world where Microsoft Office bound business processes drive critically important economic, governmental and organizational concerns? How do we migrate these processes to OpenDocument ("ODF"), which is an international standard designed for interoperability? And is there any possibility of converting with acceptable fidelity the billions of binary documents trapped in Microsoft's proprietary file formats? The world wants to move to ODF XML. But the question is, "Can This Be Done?". And further, "Can this be done without costly disruption to our day to day business processes?" Microsoft has long claimed that only their proprietary Office Open XML could convert those billions of binaries to XML without loss of fidelity (data loss, or "lossiness"). The claimed that ODF was inadequate and unable to handle the rich feature set of Microsoft Office. This is a strange claim in that the "X" in XML stands for eXtensible. Since XML formats are eXtensible, of course ODF can handle anything Microsoft Office or those billions of binary documents have to throw at it. The real truth about this issue is that no one has ever been able to crack the secret code of those Microsoft binaries that hold so much of the world's documents. And Microsoft is not about to disclose the specifications for those formats. Years of reverse engineering efforts by non-Microsoft develoeprs have brought us within range, but the binaries are an ever-moving target for interoperability. With each new version of Microsoft Office applications, the binary formats change arbitrarily. For example, there is no single DOC format; there are a host of DOC formats that all use the same file extension, DOC. Do not kid yourself. Microsoft is not about to be the one that converts all those binary documents to the open standard ODF. Instead, they set out to convert those billions of binaries to their own proprietary XML. So, the next question is of course, "Is there any possibility of converting those binaries to Microsoft's EOOXML and from there a transformation to ODF?" After all, easy and perfect transformation is the promise of XML. Microsoft's Steve Ballmer answers that question for us when he claims that conversions between EOOXML and ODF can be done, but that Microsoft's plugin will never provide full fidelity conversions between EOOXML and ODF; in other words, only a core set of features will be converted and the conversions of documents implementing other features will be lossy, resulting in data loss in largely unpredictable situations, depending on the differences in particular documents. So Microsoft's solution will not allow the automated conversion of those billions of binary documents to ODF. Not without data loss. We'll have transformation processes between EOOXML and ODF, it just won't be worth doing unless you are willing to manually compare documents when rendered both on Microsoft's applications and on an application that fully supports ODF, just to ensure that no crucial data was lost in the conversion. Essentially Microsoft is claiming that only they can convert the billions of binary documents to XML with the fidelity needed, perfect fidelity (no lossiness), and only to Microsoft's flavor of XML, EOOXML. And Microsoft wants the ISO to award it a monopoly in converting its legacy formats to XML by making its own personal XML its own personal international standard. Microsoft is also funding an open source project to perfect conversions (more properly, "transformations" in XML lingo) between EOOXML and ODF. Maybe just to prove they are good guys who can be trusted with our information, and that they are not out to "replace" ODF, but rather to perfect the conversion of those legacy billions of binaries to XML. One objective of the Microsoft-Novell Translator Project is to provide an easy to install EOOXML <> ODF plugin for both Microsoft Office and OpenOffice. Betas for both of these plugins are expected to be released this month. Because this work is somewhat in the open, we are well aware of the intransigence of continuing conversion fidelity problems in both versions. (The Novell work on the OpenOffice.org Writer Translator plugin is complete, but not open to the public, or contributed back to OpenOffice.org -- yet). Note that the Translator Project is based on a XSL Transformation process. So it's expected to be both application and platform independent. The only question is, "Can they achieve the quality of fidelity needed to be of any use?" Steve Ballmer says no. And he's funding the project. There's also the consideration that a scorpion can't help but continue to act like a scorpion. This famous quote from Bill Gates continues to haunt the technology industry to this day. He might as well been referencing the Microsoft binary file formats:
"I doubt they [Digital Research] will be able to clone Windows. It is very difficult to do technically, we have made it a moving target and we have some visual copyright and patent protection. I believe people underestimate the impact DR-DOS has had on us in terms of pricing." (May 18, 1989 - Bill Gates)

The ODF Plugins Appear::

When Massachusetts announced their Request for Information concerning the possibility of an ODF Plugin for Microsoft Office, there were many responses. Each with a different approach to the problem. Microsoft responded to the RFi with the promise of their XSLT based Translator Plugin. Sun provided two different plugin designs; the first based on a OpenOffice Server side conversion, the second based on a C# routine connecting Microsoft Office functions to a locally installed OpenOffice conversion. In both cases it was the OpenOffice conversion engine we know and love that was doing the work of MS binary formats to ODF and back. Importantly, neither the Microsoft nor the Sun plugins allow ODF to be set as the default file save format in Microsoft Office, what is known as "native support" for a file format, a situation sure to produce lots of accidental non-ODF files. Imagine what it would be like to have to train yourself never to hit the "Save file" option or keystroke shortcut in Microsoft Office. Instead, you must open a special menu option to save a file as ODF. Do you think you just might accidentally use the normal file save commands every now and then? An unpredictable mixture of file formats in a network can have unpredictable consequences, particularly when automated processes are involved. A third ODF Plugin for Microsoft Office was proposed and submitted by the OpenDocument Foundation. This plugin conversion process was based on internally triggering a Microsoft Office native conversion process; one the Foundation believes is the same or similar to that which is used when the EOOXML Compatibility Pack is installed, and the Microsoft Office in-memory-binary representation of the document ("IMBR") is converted to EOOXML. Unlike the other plugins, the Foundation plugin adds full native support for the ODF file formats to Microsoft Office. (The present version adds that support to Microsoft Word; later versions will add it to Excel and Powerpoint.) You can use the normal file save dialogs and commands. Moreover, you will not need to study/rewrite all of your existing scripts to ensure that files are saved to the right format. Just set the default file save format to ODF. There were other submissions, but these are three we know the most about. They each represent a different approach to the problem of converting those billions of binary documents to XML. They each offer a different quality of conversion fidelity. Only one allows ODF to be set as the default file save format in Microsoft Office. The real question is whether or not any of the three provide file conversion fidelity acceptable enough so that there would be little or no disruption to existing Microsoft Office bound business processes, line of business dependencies and the functioning of assistive technology add-ons. In short, without near-perfect conversion fidelity, there is no measure of "interoperability" worth talking about. That is the reality of Microsoft's EOOXML blackmail attempt. For years they have withheld from competitors and customers the secret binary file format details needed for perfect conversions, reserving that advantage for themselves. Only Microsoft holds the key to unlock your information from Microsoft applications it remains bound to. So because they alone hold the key to so much of our binary-bound information, they insist that the world must adopt their proprietary, self serving, application and platform bound, monopoly leveraging XML as an international standard. We are being blackmailed by the problems of converting those legacy billions of binary documents to XML. I have a counter offer that ISO/IEC might consider; Give us the keys to those legacy binaries and the documentation for the new MSXML InfoSet binaries that first appeared in Microsoft Office EXcel 2007, and we'll give you international standardization for EOOXML. A fair trade i think, because it will break the monopolist's grip, level the competitive playing field, and restore competition wherever desktop, server and device systems need to interconnect and exchange information.

Three Conversion Approaches to Consider::

What we have then are three conversion methods, all enabled through an easy to install plugin model, and each with a different level of conversion fidelity:
  • The MS Translator XSLT method (EOOXML <> ODF) :: note that this initially was an application and platform-independent approach. In the strict sense, this is a "transformation", not a "conversion"
  • The Sun external OpenOffice.org conversion engine method (MS Binary Files <> ODF) :: note that the OOo conversion engine is based on years of reverse engineering needed to understand the secret structure of those mysterious and enigmatic billions of binary documents. A secret only Microsoft can unlock with perfect fidelity.
  • The Foundation's daVinci "internal" conversion process (MS in-memory-binary representation <> ODF) :: note that this process harnesses the internal conversion methods of Microsoft Office applications in much the same way as the EOOXML Compatibility Pack.

A brief description of the daVinci internal process:

So how does daVinci do this magic of triggering an internal process and letting the native resources of Microsoft Office perfect the conversion? Well, the first problem was getting inside Microsoft Office applications and working "natively". The know how for this was provided by Microsoft themselves when they mounted their first MSXML add-on to Microsoft Office 2003. And what a great job they did! Once inside Microsoft Office and working natively, the entire view of how to best convert MS binaries to ODF changes radically. Rather than trying to crack the intransigent and enigmatic binaries externally, on the inside you simply let the Microsoft Office applications do it for you. We don't know for sure, but there is every indication that daVinci works very similar to how the EOOXML Compatibility Pack works. There is no doubt that without the public information Microsoft has provided concerning the early versions of MSXML, we would not have had the series of breakthrough discoveries that make daVinci possible. So the key to daVinci is in letting Microsoft Office apps handle the billions of binary documents, especially their conversion to IMBR (the Microsoft Office apps in-memory-binary representation). Internally, when a conversion process of any sort is triggered, Microsoft Office apps follow pretty much the same routine. There is a point where this internal conversion process can be intercepted and routed (mapped) to a different, non Microsoft, file format structure. Imagine if an internal conversion process from IMBR <> EOOXML is triggered, and you intercept the process the moment before mapping to EOOXML begins. And you reroute by mapping to ODF. That's daVinci. DaVinci triggers, intercepts, and maps to ODF. It could just as easily be configured to map to Chinese UOF. Or to EOOXML (well, forget the easy part regarding EOOXML; the haphazard and sprawling structure of EOOXML makes this mapping difficult - but as the MS Compatibility Pack proves, it is possible :). daVinci could even be configured to map to Romanian XML or Oracle XML. The conversion quality of the daVinci process really depends on the flexibility of the XML schemas it is mapping to. Let me say that again, "The conversion quality of the daVinci process really depends on the flexibility of the XML schemas it is mapping to". Since EOOXML was made expressly and specifically for mapping Microsoft Office IMBR to, you better get perfect fidelity. What about ODF? Yes, you can get the same perfect fidelity. The flexibility is there, and has been there since the February 2003 addition of the <foreign element> tags, section 1.5 of the ODF v1.0 standard (casually referred to as the <microsoft tags> because of what they can do). So yes, if you can break the secret of the proprietary IMBR, understand their hidden structure and function, you absolutely can get perfect-fidelity conversions to ODF and EOOXML. This is an incredible achievement for the OASIS ODF Technical Committee ("TC"). ODF was designed to be a universal file format, totally application and platform-independent, and it has the built in flexibility to easily handle anything the enigmatic billions of binaries might throw at. Tapping into the Microsoft Office IMBR just makes it easier for daVinci to see what's actually happening inside those unspecified binary blocks that blanket the billions of binary documents were trying to convert to XML. As Rob Weir has remarked: "If Microsoft supported ODF 1.0 in Office today, using the foreign attribute support already specified in ODF 1.0, they could achieve backwards compatibility with their legacy documents. There is nothing that prevents them from adding a "DoItLikeWord95" attribute to an ODF document."

Blanketed with Unspecified Binary Objects - The dark spots ::

The real problems of converting those billions of binary documents or working as a near native file format within Microsoft Office has nothing to do with either EOOXML or ODF. And everything to do with the secret, enigmatic binary file formats. Microsoft is busily spinning the world to convince us otherwise, but it only takes one demonstration of daVinci to set things straight. We shouldn't give in to blackmail. Especially blackmail designed to leverage the Microsoft desktop monopoly deep into our future of converged and highly interoperable multi platform systems. Control of the file formats, and keeping them bound to proprietary applications and platforms, is control of everyone's information and information processes. So Microsoft Office will do a great conversion of those billions of binaries to IMBR for us. And when triggered, IMBR will set things up for daVinci to intercept an internal conversion structure and map to ODF. Because there are billions of binary documents out there, with years of file level application feature tweaking and enhancements by independent LOB - business process developers and assistive technology add-ons to deal with, there's no telling what kind of unspecified binary objects daVinci will encounter and have to map to ODF. daVinci needs the mapping flexibility in the XML target structure to place these unspecfied anomalies otherwise called "dark objects". The thing is that these binary object anomalies are unspecified on both ends of the conversion equation. They are unspecified with regard to the historical annals of reverse engineering, which itself is based on the cryptic, enigmatic, and often misleading documentation Microsoft has provided for RTF and the MS binaries. And, they are unspecified by the XML structures at the other end of the conversion equation. Like ODF.

The Skinny on daVinci inside::

When a user loads a binary document (or creates a binary document in Microsoft Office applications), the apps themselves convert the binary documents to IMBR (the Microsoft Office apps in-memory-binary representation). The user works the document in IMBR mode. This means all application features, business process adaptations, assistive technology add-ons, whatever, are available and cooking without disruption or change. When an internal MS conversion process of IMBR is triggered, daVinci intercepts the results, and maps to ODF. The ODF version is saved to file. An internal conversion process is triggered whenever functions like save, save as, open, or open most recent is called for.

Conversion Fidelity & Interoperability ::

We fully believe that ODF version 1.0 provides daVinci with the flexibility we need to hit the same quality of fidelity of conversion of those billions of binary documents that EOOXML promises. Which is to say that ODF 1.0 has long offered Microsoft the same opportunity to convert everything to ODF and back. There is no technical reason for Microsoft not to have implemented ODF. And there is no technical reason for them to now ask that ISO/IEC consider a second universal file format specification as an International standard. But what's beyond the issue of conversion fidelity? Inter application and cross platform interoperability; the ability to transport and exchange documents across many different kinds of information domains without loss of fidelity or structural compromise. Interop is a tall order. Especially after years of living with application bound file formats that only the application and platform vendors can transport and exchange effectively. In their EOOXML pitch, Microsoft promises something called, "Interoperability by design". Translated this means that all Microsoft applications will be designed to work perfectly with EOOXML. Most likely we will also see MS applications able to handle the binary extensions of EOOXML that showed up Excel 2007. This includes desktop, server and device systems written to .NET 3.0 and the Vista platform. To make certain this happens, Microsoft has provided us with a new version VSTO 2005 where they drop support for MSXML and introduce support for EOOXML. They make it easy. The thing is that if you're a non Microsoft application, most likely you won't be able to fully implement EOOXML. Definitely you won't have access to the binary InfoSet extensions of EOOXML. Least ways not without a price, and never if you're a competitor like Oracle, IBM or Sun. ODF Interoperability is open and freely available to anyone wanting to implement ODF. Participation in the OASIS ODF TC specification process is open and affordable. There are no application or platform specific dependencies, or licensing restrictions, or patent encumbrances - legal risks holding anyone back. Universal file format interoperability is a given with ODF. Application interoperability is another matter. Especially existing applications that might have layout engines developed long before ODF became available. With new ODF applications this won't be a problem since they can develop directly to the specification. This is one of the reasons so much work is going into ODF 1.2, to accommodate the differences of traditional layout engines as they implement ODF. We don't have the power or authority of a Microsoft to rewrite every application to work perfectly with ODF. Nor do we have a similar command of the marketplace to force a user base of over 550 million desktops to upgrade to a Vista platform of Microsoft Office 2007 - VSTO -IE 7.0 -Exchange/SharePoint/Groove - MS SQL Server, MS Active Directory Server, etc. So instead, we have ODF 1.2 waiting in the wings. That's where our solution to universal application interop lies.

About the flexibility of ODF 1.0 - The Interop of ODF 1.2 ::

It is true that ODF has had, since February of 2003, an extremely flexible set of tags were added to the specification. They are called the <foreign element> and <alien attribute> tags, and were designed exactly to handle the billions of unspecified and conversion defiant binary object anomalies known to comprise years of Microsoft proprietary binary file format use. When you're mapping from an IMBR conversion process to ODF, you have to have something to put the unknowns in. You have to map to an existing tag. Since the very nature of these dark objects is that they are "unspecified" and previously unknown, they are also outside the range of ODF. Using these tags, daVinci can get perfect fidelity between the billions of binary documents and ODF. And get it right every time, with every Microsoft Office version from 1997 to 2007. But this "perfect fidelity" comes at a high cost of interoperability with other ODF ready applications. Simply put, the other ODF applications have no idea what to do with the daVinci <foreign elements>! An Microsoft Office with daVinci knows. But for all other ODF ready applications these dark objects are still a mystery. In many ways the <foreign element> tags are the equivalent of the what EOOXML does with the same volumes of unspecified legacy tags. They're there, but no one excepting Microsoft Office Compatibility Pack enabled installations know what to do with them. The same is true with daVinci ODF. Only daVinci knows what to do with these dark objects. In other words, from day one ODF has had the exact same means wrapping in proper XML an unspecified binary object or processing instruction as that which EOOXML is now parading about as something absolutely necessary (and unique to EOOXML) for converting those billions of binary documents to XML. With ODF 1.2, daVinci gains the flexibility to map whatever dark objects found in ways that will dramatically improve interop with other ODF 1.2 ready applications. (see the <interop eXtensions> proposal submitted by Florian Reuter to the ODF Metadata SC). Using the new metadata model, daVinci can then proceed to fully describe everything known and intuited about the dark object. Keep in mind that daVinci has an inside view. What daVinci sees is the IMBR context and conversion structure that is missing from a binary file format as well as EOOXML's cryptic tags. This descriptive model will provide every other ODF 1.2 ready application a much better chance to handle and render the dark objects. With ODF 1.0, we were limited in how we describe for interop purposes these unspecified creatures. With ODF 1.2, daVinci can field these objects on the fly, and give other ODF 1.2 ready applications a fighting chance to properly render them. Sticking a binary object into an an XML wrapper is just kicking the can forward. It's passing the problem onto someone else. Yes, it solves the momentary problem of an XML file format plugin running inside Microsoft Office (EOOXML and ODF). No problemo for those users. But it punts the problem of roundtrip interoperability with other ODF ready applications. They are left hanging. With the generic <interop eXtension> approach, and the metadata descriptive model, my guess is that ODF 1.2 ready applications will handle upwards of 98% of these problems instead of having to ignore the entire binary block. Over time of course, we will come to understand, specify properly, and map directly these binary objects. Years of reverse engineering has brought us to upwards of 85% conversion fidelity. Now we need to nail that remaining and highly elusive 15%. Uncompromising demands from Massachusetts and the EU have forced Microsoft to come out in the open with their proprietary XML. They are fighting tooth and nail to keep their application bound binary secrets secret. And with good reason. If we crack that last 15%, and do it in a way that provides users with a totally non disruptive migration to ODF path, the monopoly will have been cracked open. Sometimes i wonder if the ISO/IEC JTC-1 members realize that they have it in their power to do what no government has thus far been able to do - stop the Microsoft monopoly from illegally leveraging their control into other markets, and restore open competition to technology marketplaces. The daVinci ODF 1.2 and ODF 1.0 plugin demonstrations will be made available to ISO/IEC members as positive and irrefutable proof that "it can be done". Hopefully we can get a video demonstration to walk them through daVinci so they can see for themselves. Hope this helps, ~ge~

Notes:

Microsoft joined the original OASIS Open Office XML effort in November 2002 (now OpenDocument or "ODF"). But they refused to participate or comment, instead quietly observing the work of the ODF Technical Committee for the next four years. Meanwhile, they began work on a proprietary XML file format designed specifically and solely to meet the "XML" needs of their Microsoft Office applications and emerging Vista platform of desktop, server and device systems. In 2004, Microsoft presented their proprietary XML effort to the European Union in response to a famous study known as the "Valoris Report". The report recommended that EU governments and organizations mandate information technology purchase requirements based on a far reaching but uncompromising infrastructure of SOA, Open Standards, and Open XML mandates. Valoris also recommended the development of a universal XML file format that was application and platform independent, able to service the portable document needs of an SOA infrastructure stretching over desktops, servers, devices and across the Internet. The universal file they envisioned was tagged "OpenDocument". When Massachusetts followed the EU with a clear mandate for Open Standards and open XML file formats that were recognized by Open Standards bodies, Microsoft formed the MS Ecma 45 workgroup with the objective of developing an open standard XML file format perfectly compatible with the existing Microsoft Office XML file formats. Meaning the end result was in the hands of the Ecma 45 workgroup before they even began work. All that needed to be done was some massive documentation of what in essence is a Microsoft Office binary dump into XML. Or, if you prefer, an XML encoding of Microsoft Office proprietary binary file formats.

OpenDocument as the perfect Microsoft Office file format

How to:

Add native file support for OpenDocument to Microsoft Office

What is the da Vinci plugin for Microsoft Word? Where did it come from? How does it work? And can it really convert files generated by Microsoft Word versions 97-2007 into fully compliant, OpenDocument format-ready applications? Can it do so so without disruption to bound business processes, dependent line of business applications, and assistive technology add-ons?

Let's begin from a simple truth, discussed in more detail below. The OpenDocument format ("ODF") is able to handle anything Microsoft Office can throw at it, and handle it at least as well as Microsoft's new EOOXML file formats. That includes the bound business processes, assistive technology add-ons, line of business dependencies and advanced feature sets unique to various versions of Microsoft Office. The converse however can not be said of Ecma Office Open XML -- the new Microsoft EOOXML file formats -- as evidenced by Microsoft's own ODF Translator Plug-in Project difficulties.

In April of 2006, the Commonwealth of Massachusetts Information Technology Division broke ranks with IT tradition and put out an open Request for Information concerning the feasibility of an ODF Plugin for Microsoft Office. It is not that Massachusetts ITD had rejected the many current software offerings of ODF vendors and open source community efforts. But they lacked a critical tool to perfect a migration to ODF -- from years of Microsoft Office business processes and binary document dependencies to ODF-ready solutions -- without impossibly costly disruptions to their day-to-day business activities.

Massachusetts ITD correctly reasoned that ODF plugins for the Microsoft Office major apps would solve this problem, enabling a mixed environment of interoperable desktop solutions, giving their business processes much needed time to be migrated for the right reasons at a manageable pace. The right reasons are productivity gains and efficiency enhancements, as opposed to merely reactive efforts to squelch the costly insanity of file format madness. The manageable pace is variable, depending on human and other resource availability, rather than on a disruptive rip-and-replace of most software across entire networks to achieve interoperability.

Imagine what your own office would be like if you came to work one day and discovered that all of the software staff normally used had been replaced on every machine by different software. Could your office afford to shut down long enough for everyone to learn all of the new software and procedures? Is that kind of trauma something you would like to go through? People need to be able to exchange documents. That means software interoperability is key, and it simply is not workable for people to exchange files in formats that are incompatible with software used by the recipients.

Now imagine how much worse the problem would be if your office is dependent on fully-automated business processes where processing is entirely dependent and bound to specific applications, and those applications spanning the process pipeline have a secret language only they can speak. Additional workstations must have the right software if they are to participate in the process. Business process change becomes an issue for the application vendors since they alone can tweak the language specific applications. That's expensive. So much so that many a business lets much-needed process changes lapse until they simply are no longer competitive or profitable, and the cost of rewriting the applications to re engineer the business process must be paid; at any cost. Still, a driving and determining factor is that any new adjustments must be compatible with the information that drove prior iterations of that business process. Anything less than full fidelity in converting legacy documents to new formats greatly limits the changes and productivity enhancements that can be gained as well as ensuring that your applications and service continue to come from the same vendors -- for years and years to come. -- No matter what the cost.

Full fidelity is a term used by the file conversion congnoscenti. It refers to the quality of conversions from one file format to another. Full fidelity means the conversion from one format to another cannot result in data loss. If data loss is possible, for example depending on what software features are invoked in a given document, then we refer to the conversion as lossy.

For example, there is the always problematically lossy and always one way-only conversion of a Microsoft Word 97 binary file format (.doc) to a Microsoft Word 2003 binary file format (.doc). There is also the common file format conversion of a Microsoft Word binary .doc to Rich Text Format ("RTF") and back. For our purposes, we are concerned with the fidelity of file conversions between OpenDocument and binary files bound to Microsoft Word versions from Word 97 to Word 2007. Microsoft is able to perfect a full fidelity conversion of those same binary documents to and from EOOXML, using the Microsoft XML compatibility pack plugin. We argue that the same full fidelity of conversion is possible between Microsoft Word and OpenDocument, using the daVinci plugin for Microsoft Word. In fact, the process of how this is done is very similar to that used by the MS XML compatibility pack.

Automated business processes are key to increased productivity and profit, but they are still relatively few and far between, a situation not expected to last long. Automated business process systems are regarded as one of the next Big Things in the software industry. Legacy business process systems often require much end user interaction such as manual management of information, verification of data, tracking of data as it transits the process, etc. Many are dependent on particular legacy software applications' proprietary binary file formats.

Basically, an automated system provides exacting interfaces so that the system handles the data, workflow and intelligent routing of documents, and users concentrate on the decision points and architecture of the process itself. The Workflow Management Coalition defines workflow, in general, as "the automation of a business process, in whole or parts, where documents, information or tasks are passed from one participant to another to be processed, according to a set of procedural rules," where participant can mean either a human being or another automated process.

To get from the partial automation of legacy systems to the full automation of future systems, information transfer must be accelerated systematically and human participants must be provided with interfaces that enable rapid reorganizing of people, resources, documents, eMail, messaging, wiki collaborative documents, direct access to data bindings within those documents, and the ability to push change into advanced workflows managed by "the system". There is simply no practical way to do this without having a universal file format based on the XML portable document with system-bound data model. The universal XML file formats exist.

The systems-level XML hubs are coming on strong as evidenced by the Exchange/SharePoint/Groove Hub, Workplace/Lotus Notes/Domino, Scalix, and Zimbra. This is where data and various information streams merge with the portable document model, with information streams collapsing into a single easy to schedule and manage aggregate; the merge occurs in the context of end user project and workflow management interfaces.

Full fidelity conversions of the information that drives our current business processes to universal portable XML document models is the only way we can make that productivity leap to fully automated business processes. Every time there is a loss of fidelity in data conversion we basically lose a node in the process. We have to go manual and fix a problem that should have been handled by the system. That impedes any potential productivity gains.

The major problem with anything less than full fidelity conversion is that human intervention to fix the artifacts is required. This is both disruptive and costly. Imagine a situation where an important binary document is worked in MSOffice and NOT converted to ODF at the headpoint by daVinci. That binary document then flows through the system and is converted to ODF and further processed at an OpenOffice workstation. There is some loss of presentation, but no big deal for the OpenOffice wiz. From there it goes to a Workplace enabled workstation where it is processed and responded to. No problemo, WorkPlace can read and write ODF perfectly. Including an ODF-XForms document. From there it goes back to an MSOffice workstation for review and data extraction. The individual sends it back to the WorkPlace desktop and asks for a Word 2000 .doc format. If it is an ODF-XForms document, there is no way to convert it back to Word 2000 binary .doc. If it is a compound document of any complexity, there will be loss in the conversion. No matter what happens, this process simply isn't going to work. In fact, no one in their right mind would ever set up or allow for such a poorly mixed environment in the first place.

Enter daVinci. The conversion of binary documents occurs at the headpoint of the process, Microsoft Word, and it is full fidelity. No need to waste expensive human attention fixing conversion artifacts. Once the process is in ODF, these documents can be passed through a mixed environment without loss of that original fidelity.

Let's take for example a more extensive and document centric business process. One that handles a trillion dollars per year in transaction-after-transaction volumes; the real estate transaction industry. For each and every transaction many professional participants must come together, exchange hundreds of documents, work under a strict and unforgiving schedule, and do this with a perfection and data exactness that, if it fails, could entirely wipe out the life time savings and asset accumulations of their clients. Today the entire real estate transaction is driven by paper documents, just as it was 50 years ago. The only difference is that today much of data and document content is provided by back end data processing systems. Almost none of which are able to talk to each other.

The documents involved must be continually reviewed and re written to make certain the data and information is current. The cost and availability of a mortgage changes by the hour. A bank might pre-approve the buyer and the property to be purchased. Even lock in the rates. But there is always the possibility that secondary mortgage market evaluations, purchases, and constantly changing loan ratios can take a lender out of the market on the very day they are scheduled to deliver a loan product. It's all an incredibly time-sensitive process.

The average real estate transaction brings together many professionals: buyer and seller representative real estate agents, their brokers and transaction coordinators, the buyers and sellers, their attorneys, a lender, a title company, inspection companies, insurance companies, appraisers, contractors filing contingency removal estimates, and key information systems providers such as the Multiple Listing Service, county records, probate courts, and courts of law recording such things as liens and encumbrances. All in all, every real estate transaction has at least twelve professionals, each having to deliver specific documents and reports, and each having a back end data processing system different and unable to speak with any other back end system. It is a mess. It is impossible to automate this process without a universal digital file format that can be produced in and translated back into the many proprietary back end systems involved. And do so with absolutely perfect fidelity of document conversion and data extraction.

So here we have a paper driven process that costs consumers upwards of 12% of the actual transaction. And it's a trillion dollar a year industry once all the before and after sale costs are factored in.

A common situation in the real estate industry is that of the professionals having to exchange documents electronically. Faxing is great for paper, but there is a problem of version control as the volume of documents approaches the final closing. Most of the industry relies on eMail attachments as the primary exchange method. What a nightmare. To exchange documents, participants first have to synchronize the versions of Microsoft Office used to produce these usually compound and data centric transaction instruments. They call it version madness in that there is no way to exchange Microsoft Office documents without first having everyone involved using exactly the same version. Meaning, the quality of fidelity conversion between different versions of MSOffice is a killer. Data loss occurs unpredictably. It is unacceptable.

How do you force everyone who must participate in a real estate transaction to all use the exact same versions of software? Well, the same way governments routinely force tax payers to use the same versions of Microsoft Office. You either get the right versions or you don't get to participate.

There are other problems with eMail attachments of proprietary binary documents. Let's say you are exchanging a sales report spreadsheet with inventory, pricing, availability, special discounts or price cuts, sales, customer name and ID, etc. Every time that spreadsheet changes hands, the data has to be verified and updated. The more rapid and wide the exchange, the more intense the data management - data extraction problem becomes. It is a mess. A dangerous and risky mess.

Those spreadsheets could have important formulas and data bindings. Using the XForms or Jabber P2P data binding models, an ODF spreadsheet document can manage itself. Every time the sales people connect to the Internet, the inventory, discounts, sales totals, etc fields can be updated by "the system" with exacting precision. The end user no longer has to worry about data verification and integrity. The system takes care of it, no matter how fast and furious the changes occur. It's automated within the document. Bullet proof. The same can be done with a Microsoft EOOXML document.

However, try to convert one of these documents, and risk dropping formulas, data bindings, rules based routing, or workflow scripting, and the whole process will come to a screeching halt. As it should. This is but another way some proprietary vendors lock in their customers. If you can't do a full fidelity conversion, and be confident of the results, the only alternative is to turn to a single vendor and year after year pay the price.

Enter the OpenDocument Foundation. Since early 2003, a loose gaggle of ODF developers had been seeking methods to improve the OpenOffice.org ("OOo") conversion of Microsoft Office binary formats to what later evolved into ODF. By December of 2003, the OOo community actually launched an official project to develop an ODF plugin for Microsoft Office. The difficulties were enormous, and at the time there was no perceived “business case” for the effort that might have justified allocation of sufficient resources from the Sun Microsystems StarOffice group, the primary developers for OpenOffice.org.

However, as Microsoft began its own long march to XML formats, openly discussing important MS binary format to XML conversion issues, these “difficulties” began to be understood. In 2005 there were a series of breakthroughs based partly on released information from Microsoft. The breakthroughs also sprang from our own intuition of what was really behind some of the oddities in the Microsoft approach. Here's a quick list of reference materials Microsoft provided that really helped:

When Massachusetts put out their request for information, as it happened we were only an installation program away from our first ODF 1.0 compatible plugin for Microsoft Word, the most heavily-used application in Microsoft Office. In June Massachusetts sent us our first set of 150 primary importance test documents. We sent them their first plugin install, which was dialed for perfect interoperability with the ODF 1.0-compliant reference implementation, OOo 2.0. In August, we sent Massachusetts the first da Vinci version of the plugin, which was dialed for perfect, 100% round trip conversion fidelity with the legacy of Microsoft Word binaries.

A lot happened in Massachusetts between June and August of 2006. The short form is that as Massachusetts came to understand how our plugin worked, we came to understand the true nature of the challenges they faced. We saw up close and personal the challenges of the Microsoft Office bound business processes and the need for perfect fidelity “round-tripping” of documents between different formats within automated business processes. The Microsoft assistive technologies were never an issue for us because the plugin works natively within Word. Microsoft Word handles the assistive technologies, and all the plugin does is the conversions from Word's in-memory-binary-representation to ODF <> and back.

The situation with the MSOffice bound business processes was such that everyone agreed we had to have a much higher fidelity converting those binaries than the extraordinary 85 per cent fidelity credited to the OpenOffice.org file conversion engine. We had to have perfect fidelity because there was no reasonable expectation of ever successfully migrating those business processes to a Microsoft Office alternative like ODF-ready OpenOffice.org, StarOffice, WorkPlace or Novell Office. Such a re-engineering of the business processes would be costly and beyond disruptive.

If however we could achieve full fidelity conversions of legacy Microsoft binary documents to ODF, and were able to guarantee the roundtrip process of these newly christened ODF documents in a mixed desktop environment, one comprised of ODF-enabled Microsoft Office, OpenOffice.org, Novell Office, WorkPlace, and KOffice, the existing MSOffice bound business processes could continue being used even as new ODF ready workstations were added to the workflow. Massachusetts could migrate to non-Microsoft Office software in manageable phases, restoring competition -- and sanity -- to the Commonwealth's software procurement program.

If on the other hand, there is no full fidelity conversion to ODF of legacy documents available at the head point of the migration -- Microsoft Office -- then the business process will break under the weight of users having to stop everything to fix and repair the artifacts of lossy file conversions. What Massachusetts discovered is that users will immediately revert to a Microsoft-only process wherever the business process system breaks down due to conversion fidelity problems. It is a productivity killer and a show stopper for migration to ODF-supporting software.

Most people believe that full fidelity file conversion is impossible, largely influenced by the poor fidelity generally experienced in converting documents from one vendor's binary file formats to another vendor's binary file formats. The major reason some people transfer such beliefs to the daVinci plugin is that they do not realize the plugin works natively inside Microsoft Word. The Word program itself performs the conversions using its internal processes for native support of a file format. The da Vinci plugin triggers the native Microsoft Word conversion process, intercepts the output at precisely the right moment, and maps it to ODF. When opening an ODF document, da Vinci simply reverses the process (although there is nothing simple about what da Vinci does or how da Vinci does this).

Think of it this way. The Microsoft Compatibility Pack is a EOOXML plugin for older versions of Microsoft Office. If these versions of Microsoft Office can use a plugin to perfect a conversion process between legacy binary formats and EOOXML, the same process can be used for ODF. In fact, it's actually easier to perfect a similar conversion process to and from ODF instead of EOOXML, particularly for Microsoft. Easier in that the ODF specification is lean and clean by comparison and very carefully structured. Even easier if you have the blueprints for those binary file formats.

My educated guess is that it would take all of two weeks for Microsoft to write a Microsoft Word ODF plugin. Indeed, Microsoft developers reportedly told Massachusetts that it would be "trivial" for them to implement ODF in Microsoft Office. If they can do it for something as complex and convoluted as EOOXML, ODF will be a snap. The trick is not in XML. It's in having the secret legacy binary format blueprints.

There's a reason we call our plugin “da Vinci”. Yeah, you guessed it. There is in fact a secret code needed to unravel the remaining mysteries of the legacy binaries; the elusive 15% just beyond the reach of the OpenOffice.org conversion engine.

The da Vinci plugin is conformant with version 1.2 of the ODF specification still working it's way through the OASIS ODF sub-committees and is dependent on features of the draft specification that may conceivably change. Massachusetts was well aware of this fact, and knew we were treading on dangerous ground because ODF 1.2 was not yet final. They gave us the go-ahead anyway. Such is the importance of round trip high fidelity conversions with the legacy binaries.

Why ODF 1.2? Microsoft is notorious for frequently changing its file formats. There is even a Microsoft document jarred loose in the antitrust litigation where they brag about maintaining a "moving target" for other vendors' efforts to achieve interoperability with Microsoft software. For example, there is no single "DOC" file format. There are versions after versions of just the word processing format, all different. And their specifications are still secret. Imagine the volumes of unspecified binary objects, application version-specific or add-on specific processing instructions, and aging system dependencies buried in those billions of legacy binary files. There is no telling when or where they will pop up and da Vinci will have to somehow handle what are otherwise one unmappable black hole after another.

The easy solution is to simply wrap (enclose) these dark objects in foreign element tags. While this method is perfectly legal ODF because of ODF's strong interoperability features, see e.g., ODF v. 1.1 section 1.5, it comes at the cost of interoperability with other ODF-ready applications. Sure, inside da Vinci-enabled Microsoft Word the conversions and handling are perfect. Absolutely perfect. Send one of these dark object loaded documents to OpenOffice though and the application has no idea what to do with it.

ODF 1.2 provides da Vinci with two important advantages. The first is that ODF 1.2 ready applications “must” preserve tags whether they understand them or not. This is important to round-tripping. ODF 1.2 applications become routers of documents, instead of document end points.

The second aspect of ODF 1.2 is that of metadata XML/RDF based flexibility. Instead of wrapping these dark objects in a non descriptive tag, ODF 1.2 will enable da Vinci to wrap with a loosely descriptive generic element tag, but then nail into the more flexible attribute model everything da Vinci can describe about the blob. This gives ODF 1.l2 ready application like OpenOffice or KOffice a better than even shot at reading and rendering dark objects that would otherwise have been dropped. Over time, the elusive 15% of lossage during their conversions of Microsoft binary formats will disappear.

This is a far better way of handling dark objects than what EOOXML tries to do. Under that specification, the implementing application simply wraps the object and passes it along. There is no effort to describe these objects in XML-speak except for the tag name. The specifics needed to render these objects are left out. Which means, only those applications with access to the secret binary blueprint can properly render these dark spots.

As a proof of concept, da Vinci provides a most valuable service. First of all it proves the ODF can handle anything EOOXML can handle. ODF can handle everything and anything Microsoft Word can throw at it. Second, da Vinci proves that Microsoft developers could at least as easily have written an ODF plugin for Microsoft Word as they did an EOOXML plugin. For Microsoft, knowing the secret binary blueprint as they do, this simply can not be seen as a technical challenge. It was purely a business decision. Third, da Vinci demonstrates that it is possible to perfect a migration to ODF at minimal cost of disruption to existing business processes and application bound routines. The migration to ODF is possible. The starting point of this migration is exactly where Massachusetts ITD thought it would be – with ODF plugins forMicrosoft Office.

We have not yet developed plugins for the other major applications in Microsoft Office; Excel and Powerpoint. However, we have investigated the involved issues enough to have confidence that those tasks are also achievable. The proof of concept and working prototype of Excel is completed. Only a proof of concept for PowerPoint exists.

How to:

Interact with Microsoft business processes using OpenDocument

One of the interoperability challenges the OpenDocument format faces is that of the developers of existing non-Microsoft applications trying to implement a file format that follows after the fact of the design and development of the specific layout engines. The tradition of productivity applications is that the application was written with the file format an afterthought based on the in-memory binary representation needs of the application. Every application had a perfect binary dump of these in-memory-representations, and from there conversions to other file formats began. Early and open cooperation between existing application providers in the specification development process is essential for later interoperability.

One hope for ODF is that a new generation of ODF-ready applications will appear. And these applications will be geared from the git-go to produce perfect, highly interoperable, and round trip ready ODF.

To that end the da Vinci developers began work on the InfoSet Engine and API. This is designed to be a lightweight conversion (and someday layout) engine useful to mobile devices, server side applications, rapid development desktop productivity applications, and the easy ODF enabling of IDE's. The principal is the same as da Vinci except that the InfoSet Engine must do the MS binary conversions without the native assistance of the Microsoft Office applications. That is a step for which da Vinci relies on Microsoft Word to do natively.

So it's safe to say that the more we watch da Vinci do its work, the more we learn about how Microsoft Office perfects its own conversions. This knowledge is then passed to the InfoSet conversion algorithms. Going native gives one a whole new perspective on an enormous problem that spans decades.

The InfoSet API is intended to provide developers with easy-to-call interfaces that do all the conversions for their applications and services. We have not yet begun work on the InfoSet layout engine, but so far this is one of the more promising ODF projects we've worked on, maybe more important and certainly more lasting than da Vinci. We believe that da Vinci will have its day, cracking open Microsoft Office. But that day will only last until the Microsoft Office bound business processes and their related binaries are migrated to ODF-ready information processing chains. After that, (perhaps a three-year process) the day belongs to InfoSet and apps fully supporting ODF such as OpenOffice.org.

InfoSet is important because the world is moving towards Internet-enabled information processing chains that mesh desktops, servers and device-based information systems. We need ODF-ready chains, and the only way to get there is to make it as easy as possible for application developers to provide interop-perfect ODF.

To understand this urgency, one need only to look at the Microsoft Vista Stack, an information processing chain where every application speaks perfect EOOXML. The head point of the Vista chain is of course Microsoft Office. But the core of this chain is the Exchange/SharePoint/Groove Hub. The Hub is where information is aggregated, sorted, managed, scheduled, re purposed and bound into intelligent collaborative workflows. The driving force is that of the portable XML document, with data being bound to interactive document objects, extracted and worked as the business process flow demands.

These chains are extremely productive. So much so that i believe every Microsoft Office business process will migrate as soon as possible to either the Vista – EOOXML chain, or, to an ODF ready alternative.

Some interesting thoughts here. The da Vinci plugin and those on the development road map for the other major Microsoft Office apps will be able to convert the Microsoft Office head point to an ODF pump. This opens the way for ODF ready Hubs. Zimbra comes to mind. An InfoSet-enabled version of Zimbra (just an example :). Once the Microsoft Office bound business process moves to the enhanced productivity of an ODF ready Hub, the desktop workgroup space opens up for any ODF enabled alternative to Microsoft Office. That means, for example, any Linux desktop running OpenOffice.org or KOffice can fully participate in these Hub-hosted workflows.

Many will no doubt disagree. But I personally believe the above information processing chain scenario is the secret to Linux desktops running wild and finally replacing the 485 million Microsoft Office bound desktops that dominate business.

Related information

The da Vinci plugin and InfoSet Engine – API have been presented at technology conferences and to interested governments serious about migration to ODF. This includes:

  • EU IDABC Experts Group Contacts: Barbara Held, Peter Strickx, Elmar Geese

  • Commonwealth of Massachusetts ITD Contacts: Louis Gutierrez, Timothy Vaverchak, Claudia Bowman

  • State of California ITD Contacts: Bill Welty, CIO of the Air Resources Board

The Massachusetts Plugin Feature Priority List:

  • Microsoft Word perfect conversion fidelity plugin (ODF 1.2)
  • Plugin Accessiblity Enhanced Feature Set (ODF 1.1 compatible)
  • MSExcel perfect conversion fidelity plugin (ODF 1.2 with formulas)
  • MSPowerPoint plugin
  • Roundtrip Interoperability with ODF 1.2 version of OpenOffice.org
  • XForms Interface for da Vinci enabled Microsoft Word
  • XForms Interface for MSExcel and MSPowerPoint
  • PDF - ODF Digital Signature Interface for Microsoft Office

OASIS Open Office XML Application Vendors:

The interoperability of any universal file format very much depends on the open cooperation of existing application vendors. New applications written to the universal file format specification will of course have near perfect interoperability, but there are serious application layout and feature impedance difficulties for existing applications trying to implement a universal file format, no matter how open or consensus driven the specification development process happens to be. ODF was most fortunate to have a number desktop and enterprise application developers participati ng in the process. Although Microsoft was part of the original group of founding members, and continues to maintain that membership in the OASIS ODF XML TC, they have never actively participated. Their "observer" status did however give Microsoft direct access to all discussions, meetings, conference minutes, eMail exchanges, listserves, documents and proposals. Clearly if Microsoft had participated in the ODF process, the stubbornly defiant interoperability problems we see today between ODF and Microsoft's own proprietary XML could have been resolved.

Resources:

Friday, January 12, 2007

Microsoft EOOXML Hits the ISO Contradiction Wall

EOOXML -- What is a 'contradiction' at ISO and what are its procedures?



Here we are just six days into the ISO Contradiction Review Phase, and the legendary GrokLaw legal expert Marbux comes stampeding across the windswept steppes of standardization, across endless arrays of routers pulsing with the soft wave dance of light streaming through fiber, and hordes of angry netizens shouting for freedom behind him.

So what has so riled the keepers of the Open Internet? Well, they sit at the threshold of a new age, the age of collaborative computing, where truly open and unencumbered standards, open source community participation, and the open Internet meet. And they see dark clouds blowing in hard from Redmond. They see a threat to their beloved universal XML file format, OpenDocument.

The issue here is the ISO Contradiction Review Phase for consideration of Microsoft's EOOXML file format specification now under way - with twenty four days left on the review calendar. The clock started ticking on January 5th, 2007, and ends on February 5th, 2007. During this short review period, NB's (National Standards Body Members) of the ISO JTSC1 must review the 6,000 plus page EOOXML specification and submit their contradiction concerns.

It's actually much more than 6,000 pages. Microsoft was kind enough to pad their submission with volumes of highly persuasive, professionally cut presentation and marketing materials. All of which the legendary Marbux covers in another post; Microsoft/Ecma's submissions to ISO for Ecma Office Open XML

With these two posts, the magnificent Marbux provides everything an NB needs to quickly work their way through the muck and mire of EOOXML. And he gives the rest of the world all we need to provision our National representatives with contradiction comments and insights.

Thanks Marbux. This is serious business. The future of mankind's digital civilization is at stake. And you've perhaps saved the Open Internet with this prescient intervention. And not a moment too soon!

The MNC ODF-EOOXML Translator Project: What's not to like? And what does this have to do with the ISO Contradiction Phase?

In spite of the mighty Marbux's great work, i'm still confused about the relationship between the woefully obvious problems Microsoft anticipates with the ISO Contradiction Review Phase, and the work being done at the MS-Novel-CleverAge Translator Project?

How come our friends in Redmond think the MNC Translator Project is going to pull them through the rough waters of the Contradiction Review Phase?

I don't get it. If the MNC Translator works (which it doesn't) then it proves EOOXML is a duplicate of ODf, and therefore in contradiction with an existing ISO product. If, as is the case, the Translator Project bombs, then that would prove EOOXML to be a rather poor and inconsistent subset implementation of ODF. Worse than a duplicate contradiction.

We know that Novel fully intends on implementing the MNC EOOXML/ODF Translator as a plugin for OpenOffice. The first iteration for OOo Writer is promised for release any day now. What a challenge. Except that the Translator plugin transformation process is crap. Translator tries to do an XSLT conversion of ODF to EOOXML. Since EOOXML is non compliant with XSLT, with presentation styles damn defiant of the structural integrity XPath requires, this is near impossible to do with any halfway acceptable measure of fidelity. It's so bad that the poor folks at the Microsoft sponsored CleverAge are resorting to C# routines to get the MSOffice version of the translator plugin working.

Here's the thing: Even though EOOXML and ODF are designed to do the same thing, the interoperability and quality of transformation between the two is beyond awful. It's unacceptable. Not only does EOOXML contradict ODF, the inconsistencies it brings to the table are embarrassing to ISO.

Microsoft will of course argue that ODF was designed for OpenOffice features and is inadequate to meet the demands of the more advanced and feature loaded MSOffice (pull the string on your Alan Yates interoperability by design doll and this incessant apply directly to your forehead message will replay itself. Beware, once triggered the Yates nonsense can not be turned off: Microsoft’s Alan Yates about ODF).

This is of course not true. The Yates interoperability by design doll does this drive by shooting of ODF and then adroitly shifts to proof of this file format inadequacy claims by comparing OpenOffice features to MSOffice features. Wait a minute, isn't he supposed to be comparing ODF to EOOXML? Or proving that ODF can't handle everything the legacy of MSOffice operations has to throw at it?

Yes he's right about one thing in his slight of hand prestidigitations; the feature sets of OOo and MSOffice are not the same. But what does that have to do with ODF and MSOffice?

The truth is that ODF can handle everything OpenOffice throws at it. And, it can handle everything MSOffice throws at it. Everything! Can you say "universal"?

The proof of this can be found with any ODF plugin running natively within MSOffice. The OpenDocument Foundation's daVinci plugin proves my statement. Within OpenOffice, ODF works perfectly. Within MSOffice, ODF works perfectly. Including the legacy binaries. And on all MSOffice versions from 1997 to 2007. No problemo.

We have yet to see how perfectly the Novel Translator Project Plugin works inside OpenOffice, (yuk yuk), but that would be proof of just how well EOOXML handles whatever the feature rich OpenOffice can throw at it. From what i've heard, the results are dismal, bordering on a 65% fidelity rate. (OpenOffice conversion filters routinely hit a magnificent 85% fidelity rate).

We'll assume for the sake for argument that EOOXML can handle MSOffice and the legacy binaries because the original EOOXML charter specifically stated that as the sole objective of the effort. But so can ODF! When running natively in MSOffice, ODF hits a 100% fidelity rate with all of the legacy binaries and evolving proprietary XML InfoSet binaries.

Maybe i'm just having a bad day, but it seems to me that the MNC Translator Project is of no help to Microsoft as they try to shuck, shuffle and jive their way through the ISO Contradiction Review gauntlet. If MNC Translator works, it's a clear clean contradiction. If the MNC Translator fails, it is further proof of an inept contradiction. Persoanlly, i would rather fail the ISO Contradiction Review based on a clear clean contradiction. The alternative is embarrassing. Both to ISO and, Microsoft - ECMA.

Some days you get the bear. And some days the bear gets you. Munch munch Microsoft.

~ge~

  • - post by garyedwards
  • Links to ISO JTSC1 National Body Representatives

    The voting "P" member national bodies of JTC1 are listed here.


The national bodies' contradictions must be received by the Secretariat of JTC1 by the deadline on February 5, 2007. The current secretariat is held by Mrs. Lisa Rajchel of ANSI. Her contact information is here.

Full reference to the documents Microsoft submitted to ISO can be found here, again with many thanks to the legendary Marbux :Microsoft/Ecma's submissions to ISO for Ecma Office Open XML

And this blurb on the MNC EOOXML/ODF Translator Project: Tectonic: Novell to boost document interoperability

"As far as anyone" can tell, Novell has yet to make any contributions to the MS-CleverAge Translator Project. They simply mounted the existing translator work as a Novel OpenOffice plugin..."

Microsoft's New Binary File Formats: The EOOXMl XML Binary InfoSet

PlexNex: Analyzing the Microsoft Office Open XML License

  • Whoa! Check out the Marbux commentary. This has already happened. In his blog titled, "The Formats of Excel 2007",

    XML expert Rob Weir demonstrates for us that MSOffice 2007 Excel has a new file format. Rob demonstrates that there are four file format choices in Excel; EOOXML, Legacy XLS binary, and two new binary extensions of EOOXML: "Excel Macro-Enabled Workbook" - xlsxm, and "Excel Binary Workbook" - xlsb.

    The new binaries are proprietary extensions to EOOXML. xlsb in particular looks to be something known as a XML Binary InfoSet.. XBiS is a compressed form of an XML file used in situations where bandwidth and device cpu constraints demand such an extreme. We can't be sure about xlsb, but it looks like a duck, walks like a duck, quacks like a duck and therefore....

    This must be some kind of record. EOOXML isn't yet 30 days old and Micrsoft has eXtended it with a proprietary binary representation not available to the rest of the world. And XBiS was designed so that implementations would be open and application and platform independent. But that's not what we see with Microsoft's xlsb.

    What Marbux is pointing out here is that only Microsoft has the legal rights to do this proprietary eXtension of EOOXML. Beat the drums. Sound the alarms. Hide the women and children. Nothing has changed. The longboats are fancier, there are more of them. The swords of the pillagers remain just as sharp. Their determination and drive just as strong.

    Some quick background references:

    Compression, XML, Binary Infoset - O'Reilly Digital Media Blog

Wednesday, January 10, 2007

Game Time for OpenDocument

Game Time for OpenDocument:

The contradictions between ODF and EooXML are beyond real. They are beyond repair. There are two questions concerning these intractable contradictions. The first is, what will ISO do with EooXML? As of the January 5th, 2007 submission, EooXML has the critical 30 day period known as the ISO Contradiction Review Phase. Any allegation of a contradiction between EooXML and other ISO products (ODf) and it's back to the drawing board for MSECMA. Second, given the serious disparity of these contradictions, how does the marketplace resolve these contradictions? How do users get from where they are today, to where they need to be tomorrow?

Today they are stuck with a vast investment in legacy MSOffice productivity environments, their information bound to MSOffice proprietary formats, their information processes and critical day to day business activities strapped, bound and locked into that same application – API dependent base. Moving to ODF isn't a file format switch or simple application swap out affair. No, it's taking your digital business, information processing and daily workflows out of Microsoft's control, and putting your future into your own hands.

Lots of stuff happening this week. ECMA-376 is only a few weeks old, and the forces of Redmond were quick to throw down the gauntlet' with as smug a sneer as one would expect. Three articles caught my attention:

. .......originally posted as " MS Winning Office Doc Battle"

Great stuff. But what interest me most is not the obvious contradictions between our two universal file format contenders, ODf and EOOML. No, what interest me most is how to break the monopolist iron fisted grip on our digital lives. Mandating ODf isn't enough. There has to be a bridge between where we are today, locked into the MSOffice productivity environment, and where we really want to be tomorrow – the free flow of information between loosely coupled applications that adhere to open standards and the universal access and exchange of the open Internet.

Given that objective, it is Matt Asay's InfoWorld Blog posting "The Future of Lock-in" that has caught my attention today.

May 17, 2006 :: The Future of Lock-in

Hi Matt,

Right on! You nailed it. Although i tend to think in terms of an information processing chain where EooXML is both the file format and the transport. Microsoft gets the portable document model, where content, data and streaming media traverse in highly exchangeable XML document containers.

The Microsoft chain runs MSOffice <> EooXML <> VSTO <> IE 7.0 <> the Exchange/SharePoint Hub. From the E/S Hub, it splashes out into a galaxy of server and device side services such as MSLive, MS SQL Server, and things like MS ERP.

The core of the chain is that everything speaks EooXML, even applications written with VSTO 2005, (they dropped MSXML in favor of EooXML).

What most people fail to understand is that there are two barriers to entry/migration from MSOffice. The first barrier is that of the billions of binary file formats bound to MSOffice.

The second barrier is that of MSOffice bound business processes, and it's near impossible to overcome. Most people don't even get this far, giving up in frustration with non interoperable file formats and lossy conversions that plague the first barrier. However, for those who do get this far, which they did in Massachusetts, the second barrier is near impossible. The barrier of MSOffice bound LoB's (Line of Business), business processes, and assistive add-ons seems impregnable.

Since 1995, MSOffice has evolved to become the platform of critical day to day business processes. You can't replace these workgroup – workflow processes with OpenOffice. Nor can OOo "participate" in existing processes unless there is perfect fidelity of file conversions - otherwise known as perfect roundtrip fidelity.

The good news is that in the very near future every one of these MSOffice bound business processes is going to migrate to an Internet "XML ready" Hub of some sort. The bad news is that Microsoft is killing everything in sight with the MSXML-EooXML E/S Hub. Including Lotus Notes.

As you might guess, the E/S Hub has great "integration" with the MSOffice desktop productivity environment. A level of integration that can't be touched by any other vendor, be they Oracle, IBM, CA, BEA, SalesForce.com, JBoss, or Apache OSS. What makes the migration of business processes to E/S Hubs inevitable is the incredible bump in productivity any process gets when moved to an integration Hub.

Interestingly, the MSOffice point of integration in this emerging processing chain is that of the XML file format; EooXML. Because the portable document model is so critically important to these Internet enabled processing chains (you can't do this kind of sprawling data binding - workflow routing with binaries), many governments are seeking an ODF plugin for MSOffice.

The idea behind the ODF plugins for MSOffice is to turn MSOffice into an ODF pump instead of a pump for EooXML. The advantages are twofold - both the first and second barriers to migration are broken. And broken without any disruption to the current business process flow, or cost of re engineering to an MSOffice alternative (if that were even possible).

Once MSOffice is converted, and an ODF pump is in the anchor position, those MSOffice bound business processes can be migrated to anything that can speak ODF. All the server side services (hello IBM and Oracle) that get cut out of the Microsoft chain can cut into an ODF one without a problem. To get "great" integration and perfect interoperability though, applications in the chain have to break with the long standing traditions of being information "end points", and become routers of information - adding value but not breaking the flow.

Sadly, few application understand the emerging processing chains and the new demands for "routing" information. For instance, Google Writely supports ODF, but only as an "end point". Information might go into Writely, but it doesn't come out in a useful ODF structure. The law of these emerging processing chains seems to be that of perfect "roundtrip" fidelity on transformation. Applications must assume that the information flows they link into never stops. Even enterprise publication, content and archive management systems must keel to the law of interoperability if they are to compete against the MSOffice <> EooXML <> E/S Hub Juggernaut.

So what i think we need most at this moment in time is ODF ready Hubs; where content, eMail, scheduling, workflow management, data binding, workgroup management, and project management merge with the ODF – XForms document model.

If we can intercept the migration of MSOffice bound business processes, using ODF plugins and ODF ready Hubs, the final step to breaking the monopolist grip is within reach. Once the business processes are in ODF and residing at the Hubs, where all kinds of server and device systems can integrate as needed, it's easy to replace MSOffice on the desktop with an OpenOffice – Mozilla one two punch. Yes, the desire of the plugin makers is to eventually replace themselves with OOo. Before that can be done though, we have to work through a period of mixed applications whose only point of interoperability is that of speaking perfect ODF.

You are right about the longterm lockin. I watched this happen last year with the real estate industry. It's a good story, but too long for a blog comment. The take away point however is that the real estate industry bought into the E/S Hub big time, and many vertical applications of extraordinary productivity swept into the marketplace. The E/S verticals quickly replaced near every desktop productivity shrinkwrap app, even those with over fifteen years of dominant marketshare. Vertical vendors fought to replace their own shrinkwrap stuff with their own E/S products. It was the only way to survive. The E/S Hubs automagically converted binary bound documents to MSXML, making them unreadable with anything other than MSOffice 2003 and IE 7.0. Every Windows 2000 desktop was EOL'd one fine Friday afternoon last year when the mothership sent out a security upgrade to E/S. An upgrade that required IE 7.0, for which there is no W2K version available. A Friday afternoon in real estate. Fry's was hopping.

The Realtors moved to E/S Hubs en mass because the productivity gains truly are extraordinary. One could easily argue that that industry is locked in for the next 15 years.

Even though Massachusetts was mandating ODF, they were buying E/S servers, thinking perhaps it's just an easy to administer eMail system. Right. The Commonwealth is just one court docket system written to E/S away from having to convert all those ODF docs back to EooXML. Oh wait, with the MS Translator, E/S will automagically do that for you :)

The Linux desktop:

Another point to make is that the OF processing chain opens the way for Linux Desktops. As the MSOffice bound business processes are first moved to an ODF footing (the plugin) and then migrated to ODF ready Hubs (Zimbra/Alfresco), the way is cleared for ODF ready Linux desktops to crack the monopolists stronghold.

It's this lifting of the MSOffice bound business processes into the Internet that open up the way for Linux desktops. Most Linux distro providers are totally unaware that the second barrier is also the "Linux desktop barrier".

Furthermore, the ODF processing chain opportunity is "replacement - upgrade- next workstation to be purchased" opportunity. It's not about wiping out existing Windows installs as much as it's about the next purchase made for a particular work group.

This fact ought to impact how Linux providers like Linspire go about the business of selling their systems. First of all, it's a box business opportunity, not one based on rip out and replace. Second, the desktops must be ODF - XForms - Java ready, with OOo and Mozilla holding down the application core. Third, the Linspire's of the world really need the plugin and ODF Hubs in place and the migration underway before the monopoly base is truly up for grabs. All 485 million of them. This begs the question of why Linux Desktop and Server providers missed the high stakes involved with the ODF - EooXML choice. IMHO, they missed this primarily due to the fact that Linux users are rarely if ever "workgroup - workflow" bound. They tend instead to be individuals working outside the tight constraints of information exchange and perfect roundtrip interop workflows. And if they do participate in a "workflow", it's more like that of the GrokLaw process. A process based on forum publication simple HTML. As long as they can get their information into HTML, most Linux users have no problem entering the workflows important to them.

Funny, but for Linspire to get their breakthrough moment, the ODF processing chain must first succeed in breaking Microsoft's grip on those MSOffice bound business processes.

Thanks Matt for bringing this issue forward,

~ge~

Background on the above comments ::::

Over the years Microsoft has perfected many ways to lock digital consumers into their Windows and MSOffice application platform. The forced march - upgrade treadmill Redmond has perfected is perhaps the most extraordinary profit machinery ever unleashed. The most innovative mesh of lockin schemes ever conceived. At the core of this machinery is the user digital dilemma of user owned information and information processes being bound to the Microsoft applications and platform services used to "work" that information.

When the OASIS Open Office XML file format, now known as ODF or OpenDocument, was first ratified as an international standard, many thought it was the beginning of the end for Redmond monopolist and the great treadmill machinery they had established. XML is after all the language of future Internet. It's the way data, content, and streaming media traverse the Web 2.0, with a surrounding galaxy of solutions and services from SaaS to SOA to desktop productivity environments, to backend transaction and information processing systems, to enterprise publication, content and archive management systems.

The promise of XML is that of application and platform independent global transport, exchange, and crystal clean transformation. In this maw of emerging universal connectivity and collaborative computing solutions, the OpenDocument XML standard promised to merge the legacy of desktop productivity information and processes into the larger stream of Internet ready services and systems. The future looked good.

Microsoft however was not to be denied or left behind. The threat of the Internet to their monopolist enterprise was and is very real. First they stopped Netscape from establishing the ubiquitous cross platform browser as an Internet application platform. Then they stopped Java from becoming an Internet - enterprise systems application platform. It took a few years, much litigation, some time served, but finally Microsoft is ready to make their run at the many Internet interlopers who dare challenge their empire. Brace yourselves for a whole new era of massive lockin. A stack based lockin machinery that, if successful, will bind business processes to the Vista stack of desktop, server and device systems for years to come.

Matt Asay's article is one of the first to signal the alarm and describe the early parameters of what's at stake. He sights the rising dominance of the Microsoft SharePoint server, pointing out that behind it all is a new lockin strategy at work. One far more insidious and breathtaking in it's reach than anything ever imagined with binary file formats. File formats bind information to specific applications and platform services. SharePoint is designed to bind information processes to the Vista Stack.

At the heart of the Vista Stack is the recently approved MS – ECMA Open Office XML file format. Oooops. That's MS-ECMA Office Open XML file format, not to be confused with the OASIS Open Office XML file format now known as OpenDocument; ODF and EooXML for short.

What Microsoft has done is to fully leverage their application control over the billions of legacy binary documents locked into the applications used to work this user owned content. EooXML was written to provide backwards compatibility with the BoBs (billions of binary documents). In 2002, when asked why they didn't join the OASIS Open Office XML standardization effort, Microsoft responded that OpenDocument (then Open Office XML) was unable to handle the application specific vagaries of the BoBs. ODF was said by Redmond to be inadequate. And that was before the technical committee work on ODF even began.

Without Microsoft's participation, most felt it would be impossible for ODF to “accommodate” and fully reflect the application specific BoBs. After all, only Microsoft understood the years of arbitrary and treadmill accelerating changes to the binary file formats causing users to upgrade to ever new versions of MSOffice applications. The version madness we are all so familiar with. They, and they alone had the blueprint able to unlock the BoBs.

So Microsoft set out to develop a whole new generation of application/platform bound XML file formats. The Vista generation. OpenDocument was of course developed as an application and platform independent XML file format. Internal document dependencies are based on other open and internationally recognized XML standards (XHTML, CSS, XForms, SVG, SMiL, XSLT, etc.) MSXML and MS-ECMA OOXML on the other hand were developed in the great treadmill traditions of being bound to Microsoft applications and platforms, bound through a cascading entanglement of system specific and very proprietary dependencies. Nothing new to report here except that the documentation for this mash is over 6,000 pages.

The short description for EooXML is that it's an XML wrapper of binary encodings unique to the legacy of Microsoft Office applications.

Still, many held out hope that with EooXML, Microsoft would finally release the secret blueprint and fully document the conversion of BoBs to clean XML. With the EooXML ratification and release in December of 2006, perfectly timed by the way to coincide with the release of Vista, MSOffice 2007, VSTO 2005, and a new version of the Exchange/SharePoint Hub, we can now see that this was anything but full disclosure and documentation. With EooXML, Microsoft concedes nothing while leveraging the BoBs into a launch of a future lockin strategy that is breathtaking in scope and ambition.

Here's something interesting. We now know that EooXML wraps the difficult to transform binary encodings of BoBs in exactly the same way as the OASIS Open Office XML Technical Committee described in February of 2003! In February of 2003, Phil Boutros, the legendary reverse engineer expert from Stellent, submitted to the technical committee the XML binary and processing instruction wrapping model affectionately known as the tags. To avoid any hint of application specific references, in the ODF specification (section 1.5) these tags are referred to as . Other casual references include and tags.

Another interesting point, courtesy of the inexhaustible Rob Weir, is that MSOffice 2007 has the unique ability to produce two kinds of EooXML. I've never seen an application do this before, and one has to wonder why?

What Rob discovered is that if you import a legacy BoB into MSOffice 2007, the application will convert it to EooXML fully preserving the originating application binary encodings – even doing so within laughably and descriptively colorful named XML tags. Fine. We can easily do that with ODF using the infamous tag model. No inadequacy to be found here. (Damn, if only we had patented that technique. Phil Boutros must be lapping this up :) One of the examples Rob pointed out is the use of the long since deprecated VRM encoding. Good work MSOffice 2007!

Next Rob re created that same legacy document “natively” in MSOffice 2007. Exactly the same! Saved it as EooXML. Then examined the XML, comparing the two EooXML files. Well well well. They are substantially different! Same application. Same file format. Same document content and presentation. Different EooXML! Interestingly, for one thing there is no VRM encoding. It's been replaced by the proprietary application/platform dependent but forward Vista ready DrawingML.

Some will argue that this is the only way to preserve backward compatibility. I would argue that this will result in an information nightmare. Only one of the EooXML files is backwards compatible. The other is ready for the Vista bound information processing chain centered on the Exchange/SharePoint Hub. How are organizations going to keep things straight?

IMHO, the better way of handling this backward compatibility dilemma is to write a plugin for the legacy applications. Let the plugin do the nasty dark work of conversions, and do so without disruption or confusion to end users. All the files should of course be converted to clean, open and highly transportable/transformable XML, and stored in that state. (uh, that would be ODF instead of EooXML :) This is the optimal workflow and exchange “state”. Let the plugins do the application specific “user interface” only conversion back to the proper in-memory-binary-representation when needed.

So why did Microsoft choose this strange dual XML file format role for MSOffice 2007? Why not provide a useful binary <> EooXML plugin for legacy applications instead of mashing up an XML file format specification with legacy application version specific madness?

IMHO, Matt is right. SharePoint is the future lockin point. The Vista Stack and Vista information processing chain model will probably convert without exception all those legacy specific EooXML files to Vista specific EooXML. You may not have to upgrade MSOffice 97, 98, 2000, or 2003 to MSOffice 2007, unless and until you find yourself participating in an Exchange/SharePoint Hub based process. So maybe this isn't a traditional force march to MSOffice 2007? Although it remains to be seen, this may in fact be a lockin and upgrade strategy designed to strongly encourage users to move from Win2k and XP to Vista. A move designed to demand both the desktop and server versions of Vista applications and systems.

For more information, try this web site: Office Business Applications Developer Portal, with special attention to this document: White Paper: Building Office Business Applications

Brace yourselves my freedom loving friends. It's game time.

~ge~