Tuesday, October 23, 2007

CDF and Grand Convergence

This a lengthy response to "ODF calls time on da Vinci coding .... Still seeking OASIS approval", By Lucy Sherriff of the Register.

Well, for starters, we are no longer waiting for anything from OASIS. That ended in April of 2007. We've moved on to the W3C's Compound Document Format in hopes of solving the same market requirements that we were unable to use ODF for.

It does look like The Register got lost in the acronym alphabet soup. Nevertheless, da Vinci still lives and breathes. We're no longer trying to hammer the OpenDocument format into a task it was not designed for though. A task outside the ODF Charter, and the scope of ODF's purpose. This news should cheer up our friends at OASIS. That we are refactoring da Vinci to support the W3C's Compound Document Formats ("CDF") instead of trying to eXtend ODF to perform a challenge it was never intended for has certainly made our work enjoyable again.

So, what should people understand about this change? ODF continues to do exactly what it was intended to do. MS-OOXML continues to do exactly what it was intended to do. And it looks to us as if CDF can bridge the gap between while opening the world wide webs beyond.

There are some bumps of misunderstanding that must be addressed.

For instance, people continue to insist that if only Microsoft would implement ODF natively in MSOffice, we could all hop on down the yellow brick road, hand in hand, singing kumbaya to beat the band.

Sadly, life doesn't work that way. Wish it did.

Sure, Microsoft could implement ODF - but only with the addition of application specific eXtensions to the current ODF specification.

The MSOffice specific eXtensions might be based on the sprawl of feature sets and business development efforts that distinguish MSOffice from other desktop office suites such as OpenOffice. This approach might result in a thousand pages of application specific element/attribute pairs that are stuck into the ODF spec, but would still not implementable by other ODF ready applications.

Or hey, maybe either MSOffice or OpenOffice would be willing to throw out their current layout - implementation model and adopt that of the other? With 550 million desktops expecting some sort of continuity and backwards compatibility, this is a tall order for Microsoft. Even taller for the 550 million desktops we would be asking to live through the transition.

On the flip side of the coin, Sun has already made it clear at the OASIS ODF TC that they are not going to compromise (or degrade) the new and innovative features and implementation model of OpenOffice just to be compatible with the existing 550 million MSOffice desktops. Their solution is to point out that it's far easier and more sensible to have the 550 million MSOffice desktops download OpenOffice and replace MSOffice. Instant interoperability, with innovative features; as long as everyone is running the same version of OOo.

OK. So we're stuck between a rock and hard place, leaving the file format as the only place to turn to. And even there, concessions and accommodations must be made. Everyone has to give up something if the markets are to get what they really want - a single universal file format useful to all kinds of applications while being bound to none.

Let's dig deeper.

Beneath the feature set - business process development - accessibility add-on layers of MSOffice, what we're really dealing with is basic document structures. The bottom line is that MSOffice and OpenOffice differ at the feature set, feature set implementation model, and business process development models. These feature set – layout engine differentials are expressed through the file format implementations of basic document structures such as lists, tables, fields, sections and page dynamics. ODf is perfectly designed for OpenOffice features and layout engine. And since MS-OOXML is a reflection of the in-memory-binary representation of MSOffice documents, it's also “near” perfectly designed for MSOffice features and layout methods.

A file format designed specifically for one applications features and layout methods is not going to work for another application without some kind of serious flexibility at the document structure layer.

The challenge is to find simplest method of bridging these fundamental differentials with a single file format. For us, the answer was in a simple set of five generic elements for lists, fields, tables, sections and page dynamics. We called these the ODf iX “interoperability enhancements” - or iX for short. (Note, MS-OOXML uses generics to maintain backwards compatibility wherever deprecated legacy issues threaten to shred interop).

These basic document structures account for nearly all “MS-Binary <> MS-OOXML <> ODF” conversion fidelity problems. Although there are many ways to go about eXtending and enhancing ODf with this generic model, our favorite approach was to use the new ODf Metadata RDF feature as the implementation means.

Between July of 2006 and April of 2007, there were five comprehensive proposals submitted to OASIS ODf members for discussion and consideration. Some parts of iX made it to formal proposal vote. Most didn't. We gave up on ODf iX in April of 2007 with the defeat of the “List Enhancement Proposal”, and the gutting of W3C RDF.

Note that ODf iX would only enable us to convert (with a high level of “round trip” fidelity), MS binaries and xml to ODf iX. Application interoperability would still depend on ODf ready applications also implementing iX. This is the only way these apps would be enabled to properly read and write ODf iX documents within workgroup oriented business processes that include (and are most likely driven by) MSOffice desktops.

We believe that integrating into those MSOffice bound business processes is the primary barrier blocking the use of OpenOffice and Linux desktops in enterprises, smb's and governments.

In short, we can't crack the MSOffice workgroup barrier without ODf iX. End of story.

To understand why we turned to the W3C's CDF “Compound Document Format”, and continue to fight, one has to grasp how seriously important this MSOffice bound business process issue is to the future of collaborative computing. If we can't neutralize and re purpose MSOffice, the future will belong to MS-OOXML and the MS Stack. Note the MS Stack noticeably replaces W3C Open Web technologies with Microsoft's own embraced “enhancements”. Starting with MS-OOXML/Smart Tags as a replacement for HTML-XHTML-RDF Metadata.

HTML and the Open Web are the targets here. ODf is being used as a diversion from the real end game – the taking of the Internet.

Before issuing their RFi, “Request for information concerning the feasibility of a ODf plug-in for MSOffice”, the Commonwealth of Massachusetts had conducted a year long Pilot Study on implementing ODf. The RFi was nothing short of than a cry for help. Massachusetts had already mandated ODf, but the Pilot Study fully demonstrated how difficult and costly it would be to implement ODf. Hence the RFi.

Keep in mind that all the big ODf vendors participated in the Pilot, and were unable to implement ODf short of the costly replacement of MSOffice. There were two aspects to this “cost” factor. The first was the disruption cost caused by critical day to day business processes bound to MSOffice. The second cost was that of re engineering business processes for OpenOffice based replacements. The RFi was a clear signal to all that Massachusetts was unwilling to pay those costs.

So after a year long Pilot Study, Massachusetts had come to the conclusion that the only way they could realistically implement ODf was through a clone of the MS-OOXML Compatibility Pack plug-in for existing MSOffice workgroup desktops. Non MSOffice workgroups and HTML based processes could easily move to OpenOffice ODf. (It turns out there were only 24 of those desktops in the Massachusetts government).

The big ODf vendors insisted that such a plug-in clone was impossible. Massachusetts countered that there was no other way forward. Enter da Vinci, first demonstrated to Massachusetts on June 19th, 2006 at an event hosted by IBM. On July 4th and 5th, da Vinci was publicly demonstrated at a EU-IDABC conference in Brussels, Belgium.

Here's a hard fact. Although the ODf 1.0 version of da Vinci was optimized for compatibility with OpenOffice ODf 1.0, it did not meet the Massachusetts “requirements” as determined by the year long Pilot Study. CIO Louis Gutierrez and his Massachusetts ITD group knew exactly what they needed, and were able to explain to us why the requirements were so demanding. We took those requirements and cross referenced them with other interested governments in California, Belgium, Denmark and the EU-IDABC to find just how universal this MSOffice bound workgroup – business process problem was. The Massachusetts position was fully confirmed, and this lead us to revise our “market requirements” for the second version of da Vinci - which had to include the essential ODf iX.

Briefly stated, the Massachusetts market requirements are:

  • Compatibility with existing documents - file formats :: including the volumes of MS binary documents.
  • Interoperability with existing applications :: including the over 500 million MSOffice bound workgroups.
  • Grand Convergence of desktop, server, device, and web systems as fluid and highly interoperable routers of documents, data, and media.

The first two requirements, “compatibility – interoperability” involve the conversion of existing documents, applications and processes to XML. The third “grand convergence” requirement is all about a W3C Web ready XML.

This migration – conversion to XML is currently a primary driver in enterprise, smb and government decisions. The only question is, “Which XML?”

Take a look at the July statement from Sun's Jon Bosak, made when casting Sun's ANSI – ISO vote in favor of ISO approval of MS-OOXML:

“We wish to make it completely clear that we support DIS 29500 becoming an ISO Standard and are in complete agreement with its stated purposes of enabling interoperability among different implementations and providing interoperable access to the legacy of Microsoft Office documents.”

Sounds like our government driven “market requirements” doesn't it? There is no way to interpret this statement other than to say that Sun agrees with Microsoft that ODf does not meet the same implementation requirements as MS-OOXML.

In October of 2007, at the GOSCON Conference in Portland, Oregon, Sun once again sided with Microsoft on the issue of multiple file formats. IBM's Arnaud Le Hors was passionately arguing and making the case of a single file format standard useful for many purposes. Microsoft's Jason Matusow of course argued in favor of multiple file formats, pointing out that ODf and MS-OOXML fulfilled different purposes. The surprise was that Sun jumped in with both feet in support of Microsoft's argument, opposing IBM's single file format theory.

If Sun agrees that only MS-OOXML is able to meet these compatibility – interoperability with legacy Microsoft document and application market requirements, and ODf cannot, what's the point of arguing this any further? Sun controls both the OASIS ODf TC process and the OpenOffice reference implementation.

ODf was not designed to fulfill our three market requirements. And efforts to adapt ODf to those needs have failed. That leaves us with only one option; apologize for trying to make something of ODf that it was not intended to be, and move on.

We apologize and move on.

The fact remains though that the market requirements are real, and they are not going away. The migration to XML is going to happen with, or without us. If we can't convert existing MS documents, applications and processes to ODf, then the market has no other choice but to transition to MS-OOXML. It's that simple.

And at this point, ISO approval of MS-OOXML matters little. With upwards of 85% or more marketshare for MS-OOXML ready MSOffice desktops (using the easy to download and install Compatibility Pack plug-in), the stage is set for a very non disruptive and low cost transition. At the other end of the portable document pipeline sits the Exchange/SharePoint developers Hub, with upwards of 65% marketshare and climbing.

The E/S Hub automagically converts all MS binary attachments to MS-OOXML, with Compatibility Pack enabled MSOffice desktops able to automagically convert them back into application useful binaries. This routine has been in place for two years with most E/S Hub users blissfully unaware of what is going on behind the scenes. Is there anything ISO can do that will cause governments to rip out and replace this burgeoning pipeline?

I mean, that's what it really comes down to. Microsoft is expert at commercializing interoperability to meet their monopolistic needs, and then garnering so much marketshare that becomes too too painful and punishing to consumers for governments to rectify the damage.

The MSOffice <> MS-OOXML <> E/S developers Hub juggernaut is happening right before our eyes. We believe that the only way to stop this Juggernaut is to neutralize and re purpose MSOffice, enabling E/S alternatives like Zimbra, Alfresco, Plone and a host of other open source web applications to have a shot. The name of the game for many Office 2.0 initiatives is to perfect a non disruptive integration into existing businesses processes such that “collaboration” is a value added feature and not a process splitting thorn. For most workgroups, this means cracking into MSOffice with a web ready integration “re purposing”.

Ripping out and replacing MSOffice is a disruptive cost beyond the threshold of what markets will bear. The events in Massachusetts, California and the EU-IDABC prove this. So why not take a page out of the SAMBA / WiNE / Zimbra (Outlook) interoperability with Microsoft technologies book, and make a fight for the world of opportunities that will be within our reach if we act now to break the monopolists iron grip?

The simple truth is that ODf was not designed to be compatible – interoperable with existing Microsoft documents, applications and processes. Nor was it designed for grand convergence. And as we found out in our five years participation at the OASIS ODf TC, there is an across the boards resistance to eXtending ODf to be compatible with Microsoft documents, applications and processes.

CDF on the other hand was designed exactly for grand convergence. No problemo there. The only question for us was the challenge of using CDF to solve our first two “compatibility – interoperability” with existing Microsoft documents, applications and processes requirements. We are very confident we can do it, but it's involved to say the least.

The primary usefulness of converting existing MS documents, applications and processes to CDF is a web ready, W3C approved grand convergence possibility not driven and directed by Microsoft or the MS-OOXML specific MS Stack. This is not a desktop application to desktop application exchange process that falls within the design range of ODf. It really is about neutralizing and re purposing the MSOffice installed base of desktops (all 550 million) to become full participants in an Open Stack desktop, server, device, and web grand convergence model.

We intend on distributing CDF da Vinci as a free download, no strings attached. Hopefully this will help crack the monopolists iron grip, enabling the many Zimbra's, Alfresco's, Web 2.0, Office 2.0, Enterprise 2.0, SaaS and SOA efforts to integrate into MSOffice bound business processes with the same fidelity and round trip advantages applications in the MS Stack enjoy. We might not be able to undo the MS desktop monopoly, but maybe we can slow down the Microsoft's insidious leveraging that monopoly into servers, devices, and the web.

If anyone thinks they can do this with ODf, we urge them to get out of the blogs and out of the standards committees, and get busy doing it. There is no time to waste. Pragmatism rules the migration to XML. Meaning this is a fight that will be won or lost in the real world trenches of converting existing documents, applications and processes to XML. And doing so without disruption or re engineering cost.

Hope this helps,

~ge~ Hey buddy, could you spare me a garage?

Tuesday, July 10, 2007

An ODF Quest Grounded by Hard Reality

One of the hard realities is that this particular blog post has been reviewed by a major publication for reprint. Stay tuned.

~ge~
~marbux~

Thursday, June 21, 2007

Connecting the dots

Is the Failure of ODF in Massachusetts related to the question, “Is Open Source Dying?

eWeek has posted an interesting series of articles, “Is Open Source Dying from 1,000 Cuts? Among these is a good “connect the dots” story from Michael Hickins, “Is Open Source Dying?

Michael makes the interesting comment: “State and local governments have latched onto the idea that having their documents hostage to a single vendor, no matter how well-intentioned, might not be such a good idea. Dell recently jumped on the Linux bandwagon and is offering Ubuntu on its PCs. And Microsoft's attempt to have its partly-proprietary OOXML (Office Open XML) format rubber-stamped by a friendly standards body hasn't gone as smoothly as expected. ”

“But behind the scenes, things are not quite as rosy. The Commonwealth of Massachusetts ..... broke important ground by mandating that state agencies switch to open-source platforms. There's just one problem: They can't seem to manage the transition. Sources close to the situation tell me that former state CIO Peter Quinn's resignation happened at least in part because of delaying tactics by vendors who publicly support open source but do their best to scuttle it behind the scenes”.

I wonder if Michael knows just how close he's coming to discovering a dark ODf suspicion that seems to grow by the day. Are big ODf vendors supporting ODf? Or are they trading away ODf Interoperability and implementation for sweet sweet deals with Microsoft?

Could the same thing be happening throughout Open Sourcedom?

I'm hardly alone in my thinking that once again, Sir Richard Stallman has been proven right. The big vendors are trouble. If not today, then likely at some point in the future when the interests of open source communities come into conflict with corporate interests.

Whether we're talking LiNUX, OpenDocument, or patent indemnification - interoperability deals with Microsoft, big vendors are willing to trade the efforts of open source communities as if they were just strategic corporate assets. It's not that big vendors mean opens source communities any harm. It's simply that they treat collaborative efforts in the same way they regard all corporate asset investments. Something that can be used for corporate advantage when the opportunity arises.

Everyone one of the big ODF vendors, with the exception of Google and Oracle, now has or is involved in an interoperability deal with Microsoft.

Sun cut their deal in 2004, and ODF interoperability has never been the same. Novel cut their deal in 2006, and now LiNUX vendors everywhere are being offered plump deals that require them to support and ship the Novell OfficeOpenXML Translator plugin for OpenOffice. The MS-Novell deal promises to put IBM hardware back into competition with Sun, and could be an explosive boon to the PowerPC 6 line of processors. The ability to run Microsoft applications is that important. Even if it is through a VM over SuSE LinNUX.

So far the tech media has focused on the patent indemnification – son of SCO aspects of the Microsoft interoperability deals. The patent threats however involve both LiNUX and, LiNUX desktop darling OpenOffice – which is also the OpenDocument champion. Time to connect the dots indeed.

The 2004 MS-Sun deal indemnified StarOffice, while leaving OpenOffice, the open source version of that same code base, hanging out to dry. Fast forward to the 2006 MS-Novell-IBM deal and we find that the Novell Office version of OpenOffice has been indemnified! Including both Windows and LiNUX versions. Very cool. That ends any thoughts Sun might have had about monetizing their $175 Million investment in the OpenOffice/StarOffice code base by invoking MS patent claims against OpenOffice as a pretext for migrating the entire OOo install – ODf user base to StarOffice.

The interesting thing to note is that with the continuing wave of Microsoft interop deals, we have all these big ODF - LiNUX vendors now supporting the full conversion of OpenOffice to a OfficeOpenXML application.

There's a list of links documenting Novell and Sun support for OfficeOpenXML here.

Meanwhile, many of these same big ODF vendors declined to support the ODF Plugin for MSOffice that the future of ODF in Massachusetts was riding on! After conducting a year long ODf Pilot Study, Massachusetts concluded that ODf plugins for MSOffice were the only way they could make the transition to ODf.

No doubt the plugin approach Massachusetts proposed was in serious conflict with the big vendor business plans behind the many desktop alternatives that bombed in the Pilot Study. So where Massachusetts was trying to figure out how to overcome the MSOffice bound business process barrier to successfully implement ODf, the big ODf vendors were more worried about their heavily invested corporate assets and plans. The thought of leaving MSOffice on the desktop, even if the documents and business processes were in fully compliant ODf, was too much for the big ODf vendor business plans to bear.

With Microsoft bringing on the political pressure, and eventually succeeding in cutting off his budget, Massachusetts CIO Louis Gutierrez had no choice but to turn to the big ODF vendors for support. Thinking that there was no way these vendors would ever let ODF fail in Massachusetts, Louis believed he could overcome the budget problem. He was wrong about the big ODf vendors. They left him hanging.

Shortly after Louis Gutierrez was notified by his selected group of big ODF Vendors that they would not support his community effort to develop an internal ODF plugin for MSOffice, Louis resigned. Of course, the ODf plugin was a comparatively small part of projects impacted by the lack of an approved IT budget. But the impact was big. They had serious HomeLand Security projects hanging in the balance because of Microsoft's determination to stop ODf.

The failure of ODF in Massachusetts has had global ramifications. In California the CIO's asked the question as to whether or not it's even possible to implement ODF? They all know about Massachusetts and the Pilot Study. In Europe, the IDABC has set out to create their own non big vendor - non big vendor standards consortia file format; a highly interoperable fork of ODF called ODEF; "OpenDocument Exchange Format". No big vendors, no ISO, no OASIS, no Ecma, no big vendor standards consortia of any kind were invited to the ODEF table.

So here we have the strange situation where the same big ODF vendors who declined to support an internal ODF plugin that would be able to convert 500 million MSOffice bound desktops to ODf, and save ODf in Massachusetts, are supporting OfficeOpenXML solutions. These same big vendors are making sweet sweet deals with Microsoft, agreeing to provide MS OfficeOpenXML ready versions of OpenOffice.

On the one hand they are against an ODf plugin for MSOffice. And on the other hand they are rushing to provide the OOXML plugin for their shipments of OpenOffice.

Doesn't make sense until you realize two things. The first is that the Microsoft anti trust settlement allowed Microsoft to commercialize the very same interoperability critically important to the goal of creating competitive markets for information technologies. It's hardly surprising that sooner of later Microsoft would be weaving that interoperability into the push and pull deal making fabric of these marketplaces exactly to stifle any threat of competition. First up; ODf and LiNUX. Once the first big vendor fell, the rest were doomed.

2004. Not a good year for ODf and Open Source.

~ge~

Friday, June 08, 2007

Politics is only part of the problem

ComputerWorld: Microsoft legislatively TKO’s open document formats. At least stateside. by ZDNet's David Berlind -- ComputerWorld has a five-page report this morning detailing how Microsoft has managed to score a technical knockout of open document formats (not necessarily the OpenDocument Format) in five out of six states. The story sheds a bit of light on how closely (and in some cases quickly) vendors are working with legislators to sway public [...]

Sorry David, politics is only a small part of the problem.

The question we should be asking is why State CIO's and IT divisions are not backing the legislative proposals? It's not the lobbying that is killing ODF. It's the lack of support from IT departments responsible for the challenge of implementing ODF solutions if the legislation is ever approved. The silence of the CIO's is deafening.

One has to wonder why? I've had more than a few conversations with CIO's and their IT warriors, and there is no lack of enthusiasm for ODF. They would implement ODF in a heartbeat if they could. No legislative mandate necessary.

Of course, it seems not a week goes by without a major ODF vendor announcing some sweet interoperability - patent protection deal with Microsoft. But where's the beef?

Even at the ISO-OASIS ODF Technical Committee level, where the essence of interoperability with MS Office lays within the broader structural enhancement requirements of "compatibility with existing file formats and application interoperability", these issues have been pushed aside as being "out of bounds", "out of scope", "outside the charter", and "that's a problem for converters and translators-let them solve it".

The marketplace of CIO's and IT warriors in California, Massachusetts, Belgium, Denmark, NATO, and throughout the EU IDABC are consistent in their demands. Provide them the interoperability tools they need to migrate existing documents and business processes to ODF, and it will be done. No legislative mandates needed.

There are three quotes i've seen batted about that pretty much say it all:
  • ...... "Interoperability isn't just a feature. It's the basic requirement for getting your XML file format and applications considered"
  • ...... "The challenge is that of migrating our existing documents and business processes to XML. The question is which XML? OpenDocument or OpenXML?"
  • ....... "Under those conditions, is it even possible to implement OpenDocument?"

The challenge for OpenDocument isn't at the legislative level. Nor is it at the International Standards level. No, the challenge for ODF is at the implementation level where there is a serious lack of MS Office compatibility tools and solutions needed to make that difficult transition to ODF.

So we are left with the real world question of whether or not the "demand side" of the information technology equation can get from where they're at today, 500 million desktops bound to MS Office workgroup-workflow processes, to where they would like to be tomorrow? Which is ODF.

Sadly, ODF doesn't get to start with the world as a clean slate. Otherwise this decision would be simple and done. With no amount of political lobbying able to stop real world uptake. ODF wouldn't even need legislative proposals.

But that's not the case. Hardly. With upwards of 500 million workgroup desktops bound to MSOffice bound business processes, we are a long way from the 1995 office suite "feature set" wars of yesteryear. All the new and innovative features sets in the world aren't going to help ODF office suite applications crack into those MS Office bound business processes. What's needed instead is an all out - no compromise focus on compatibility, interop, and convergence issues.

The documents and business processes that make these critical day to day workgroup-workflow tasks possible are going to transition to XML. The question is, "Which one? ODF or MS OOXML?"

Where the rubber meets the road, the challenge for ODF is that of matching doc for doc, proc for proc, the non disruptive cost Microsoft offers with their OfficeOpenXML plugin.

The real world doesn't have the luxury of evaluating OpenDocument and OfficeOpenXmL based on the expert level of proper XML, open standards governance, IPR encumbrances, or reuse of existing XML standards. No, instead they have a bigger problem of getting existing documents and business processes into XML. And from there they can move into SOA, SaaS, and the Web 3.0.

So CIO's are forced by the everyday reality of MSOffice bound business processes to demand from ODF solutions three primary characteristics:

  • .... Compatibility with existing file formats (MS Binaries/HTML/XHTML/RTF)
  • .... Interoperability(application level - including existing apps like MSOffice!)
  • .... Convergence (the portable XML document as the end user interface into information systems that span desktop-server-device-web)..... If your tied to the desktop, you're dead. But if your an XML file format tied instead to something like the Vista Stack, well, you've got a shot as long as the other guy remains a no show.
  • .... Harmonization (the worst of all compromises; the successful implementation of the above three characteristics, but on terms dictated by MOOXML.

These are serious questions the ODF community has to come to terms with if the demand side of the equation is to have some sort of choice other than OfficeOpenXML.

That's not to say that the demand side is sitting still. No way. Check out the EU IDABC "Advanced eGovernment" Conference held February 28th-March 1st, 2007 in Berlin. They've got their own "optimized for interoperability" proposal known as ODEF - The Open Document Exchange Format.

You've got to read this stuff to believe it. No big vendors need apply. No ISO either! I take that as a swipe at both the big vendor standards consortia, OASIS and ECMA.

The EU IDABC has even gone so far as to identify the "interoperability break points" that can be found throughout ODF and OpenOfficeXML: the "optional" methods of implementation.

There is a famous quote from the infamous Doc Searls that goes like this, "Open source is where the demand side has taken over their own supply".

Could it be that we are now witnessing the same thing with Open Standards? One has to wonder.

Meanwhile though, The OpenDocument Foundation has made the decision to go with the marketplace - to stick with the demand side. If the EU gives us the ODEF spec, we'll provide them with an ODEF version of our da Vinci plugin for MSOffice. As applications (and application converters :)) move to ODEF, i have no doubt OpenDocument will jump to make whatever compatibility-interop-convergence changes needed. And that would be a good thing.

A very good thing,
~ge~

Wednesday, February 28, 2007

Three Stages of XML Migration and the Challenge to OpenDocument

  1. Conversion Fidelity - the billions of binaries problem
  2. Round Trip Fidelity - the MSOffice bound business processes, line of business integrated apps, and assistive technology type add-ons
  3. Application Interop - the cross platform, inter application, cross information domain problem

First it was the EU's controversial Valoris Report that centered on a fictional open standard XML universal file format known as “OpenDocument”. This was fictional open standard XML universal file format known as “OpenDocument”.

This was followed by the Massachusetts's pilot study and mandate for the real thing, the ISO/OASIS XML OpenDocument file format. The States of Texas and Minnesota weighed in with similar legislative proposals once again describing a open standard XML universal file format in measures that only the ISO/OASIS OpenDocument can measure up to.

And now comes the crusher. The State of California, touting what would be the world's eighth largest economy if it were a nation, steps forward into the breach with a legislative proposal describing in such detailed parameters what can only be the OpenDocument universal file format that one can almost see the butterfly tattoo on the specifications buttocks.

No doubt the great migration to XML is upon us, with XML universal file formats leading the way. Deciding to go with OpenDocument is the easy part.
Getting there is something else. A quote from Peter Quinn, the legendary CIO of Massachusetts who threw that great Commonwealth into the breach, says it all:
"Open document formats: I get it! But how do I get there? Discuss."
Sam Hiser has provided us with another fine commentary,“Open Standards Mandatory in Denmark”that's loaded with insight. He discusses the recent announcement by Denmark of their intentions to mandate support for two conflicting, contradictory, and irreconcilable XML file formats; the ISO/OASIS OpenDocument and the Microsoft Ecma Office Open XML proposal. For the life of me i don't see how mandating both ODF and OOXML will be helpful to any government. It's true that both are “XML” by design, but beyond that the promise and expectations of XML are broken by the proprietary application, platform and system specific dependencies that make up MS Ecma OOXML. In fact, the quality of XSL transformation filters between the two XML file formats is so bad it might as well be zero.

Although Microsoft assured Massachusetts that such a transformation would be trivial, nearly two years later they have nothing to show for their braggadocio.

Which means Denmark will be hoist on the petard of having their documents in two file formats that are not interchangeable. And except for the MSOffice ODF Plugins from Sun and the Foundation, application interoperability might as well be zero.

A Moving Target – MSOffice 2007 Plugin Architecture:

It's important to keep in mind that both daVinci and ACME 376, the Foundation's Plugins, were working fine in every beta release of MSOffice 2007, but broke in the final public version. I suspect Sun had the very same problem we did. It seems that Microsoft altered the system default for “zip” file format packages such that whenever a “zip” package like ODf is double clicked, the OOXML conversion engine automatically starts to convert the package to MSOffice 2007 in-memory-binary-representation. Of course, the OOXML conversion engine knows nothing about ODf. So the problem we now have is reverse engineering what looks to be a low level system “default” setting that loads the OOXML conversion engine instead of the daVinci conversion engine. Microsoft has a long history of dirty tricks and an ever moving APi, effectively protecting their monopoly from would be poachers and thieves, er “competitive application and service providers working the Windows – Vista platform”. And here we go again. This first act of aggressively blocking the ODf Plugins should signal governments loud and clear that moving into the Vista Stack of desktop, servers, devices and services is going to be the equivalent of exclusively mandating OOXML. We've known this to be the case in the past, the primary example being the Exchange/SharePoint Hub and developer platform which is optimized for MSOffice 2003 MSXML. So much so that the E/S Hub has to be considered an ODf killer. Yet even those governments exclusively mandating ODf are boldly going forward with E/S Hub purchases, totally unaware of the consequences. As i've said many times, Massachusetts is just one E/S Hub Court Docket System away from revoking their ODf requirement and standardizing on OOXML. (Watch carefully now, the hand is quicker than the eye; ViSTO 2005, which was released with MSOffice 2007, dropped support for MSXML entirely in favor of the MS version of OOXML. (i mention this because there is clear evidence that MOOXML, legacy MOOXML, and now MOOXL Binary InfoSet for Excel all include eXtensions and dependencies that differ from the Ecma 376 version submitted to ISO/IEC). Remember our Peter Quinn quote, "Open document formats: I get it! But how do I get there? Discuss.”? The simple truth is that the ODf Community is not providing the means to get to ODf. There is no bridge from the legacy installations of MSOffice and the billions of binary documents, to ODf ready applications and services.

Get to XML:

What most ODF-OOXML warriors forget is that the real issue for workgroup and workflow oriented consumers is getting their binary documents into XML. Which XML, ODf or OOXML, is a secondary consideration. The one thing everyone understands is that the way to connect important information domains, systems and architectures is through the skillful use of open XML and open Internet technologies. Important information domains and architectures includes:
  • Desktop Productivity Environments (the Office Suites)

  • Enterprise publication, content management and archive systems

  • SOA – Service Oriented Architectures

  • SaaS – Software as a Service systems

  • The Internet – Web 2.0 and beyond

You can't efficiently exchange information across these domains if it's trapped in application – platform specific and unstructured binary formats. Conversion fidelity requirements will break a binary bound exchange process every time. Even Microsoft realizes the need to move to XML, although they will of course strive to control that XML through continued use of application and platform specific dependencies and, artificially contrived implementation constraints.

XML Migration: The Three Stages

The problem as we saw in Massachusetts, Munich, and Bristol is that there are three stages of XML file format migration. The three stages for migrating to XML file formats are inextricably linked and must be tackled in exactly this order:
  1. Conversion Fidelity - the billions of binaries problem
  2. Round Trip Fidelity - the MSOffice bound business processes, line of business integrated apps, and assistive technology type add-ons
  3. Application Interop - the cross platform, inter application, cross information domain problem

Sadly, the ODf Community is near entirely focused on stage three, leaving the critically important first and second stages up to customers.

Interestingly, OOXML nails all three stages perfectly, with the proviso being that all interop applications be Microsoft application – platform specific. This is what they mean when they say, “interoperability by design”; applications designed to speak fluent OOXML, making exclusive internal use of .NET 3.0 system dependencies.

Note well that since the starting point of XML migration is that of MSOffice bound binaries and business processes, OOXML is today the only XML file format proposal that can perfectly answer all three stages of XML migration requirements. The OpenDocument Foundation does have three products in the works that are designed to also perfectly answer these same XML requirements, but the Foundations approach has met with considerable resistance, argument, doubt and continued outrage from the ODf Community at large. Note also that these three stages are only important to workgroup and workflowsituations – situations where MSOffice bound business processes are critical day to day to day operations. People who are involved in simple document exchange can easily move to ODf through an OpenOffice download and some simple conversion artifact fixes as documents are converted as needed. Workgroups and workflows require a very high level of round trip fidelity. Which is why mixed application environments present workgroups with an near insurmountable problem; even where the mix is limited to that of different MSOffice versions! Of course no one wants to see the life span of MSOffice eXtended indefinitely,which is exactly the conclusion most come too regarding the daVinci Plugin. daVinci makes it too too easy to stay with MSOffice since there is zero disruption to MSOffice bound business processes, line of business integrated apps, and the functioning of assistive technology add-ons. The ODf community doesn't realize three things.

  • First, it's an absolute real world requirement that the three stages of XML migration be met perfectly by any file format contestant.

  • Second, Microsoft has handed the ODf Community the opportunity of a lifetime regarding the most difficult stage for ODf; stage two, the migration of MSOffice bound business processes.

  • Third, consumers migrating to XML have no stomach, tolerance or funds for the highly disruptive and costly rip out and replace” approach most in the ODf community favor.

The Massachusetts RFi:

Massachusetts ITD deserves credit for figuring this out. The three stages are the reason behind their ODf Plugin for MSOffice RFi. Which is about as desperate a cry for help from the ODf Community as i've ever seen. There is no other way to describe the events in Massachusetts except to say that the ODf Community refused to listen, and worse, made no attempt whatsoever to understand the exact nature of problem consumers wanting to migrate to ODf face. Hence the unprecedented RFi (Request for Information). So what's going on here? It's the reality of being caught in the clutches of a monopolist for over ten years.

In 1995 Microsoft won the Office Suite application wars, with the next two years mopping up as they paved the way for the real crusher, MSOffice 97. The office suite wars were marked by the vicious marketing of comparative feature sets and competitive price inducements. The feature wars ended in 1995, were set by 1997, a time period during which a new movement had begun – the use of the ubiquitous MSOffice as a development platform for business processes, line of business integrated applications, and a proliferation of add-ons like those for assistive technologies.

In short, by 1997 it was no longer about “features”. It was all about MSOffice bound business processes. And the binding occurs at two levels; the MSOffice application layer, and the binary file format layer. This creates a binding trifecta of MSOffice, MS Binary Documents, MSOffice bound business processes.

The real battle today is over how to migrate or participate in the MSOffice bound business processes developed since 1995. This is the core of the problem every organization trying to migrate to ODf must confront. Massachusetts, to their everlasting credit, came up with a rather magical solution; the idea of using ODf Plugins for MSOffice to solve the problems of stages one and two. The stage three ODf problem of application interoperability is a work in progress, and i would direct everyone's attention to the ODf 1.2 progress ofthe three OASIS ODf sub committees; Accessibility, Open Formula, and Metadata RDF/XML. A great leap in application interop is brewing there, and expectations for ODf 1.2 run high with good reason.

The Problem With Rip Out and Replace:

Many people mistakenly believe that migrating to ODf is as easy as downloading for free OpenOffice 2.0. Which is true if your collaborative work is that of simple document exchange. For workgroups and workflows currently based on MSOffice bound business processes and binaries though, the disruptive and re engineering cost of moving those processes to ODf ready OpenOffice is impossibly high. While it's true that Novell is working on an automated means of converting bound business process VBA scripts to an OpenOffice footing, it's not there yet. But that's just a fraction of the problem. These business processes include compound documents for templates, forms, formula bound spreadsheets, and reports - often involving systems dependent data bindings. Rather than trying to unwind these MSOffice bound business processes and re write them to somehow become application independent and ODf ready, we believe there is another approach demanding consideration. Instead of migrating to an ODf ready desktop MSOffice alternative, we believe the ODf Community should instead be focused on the Microsoft migration of these same business processes to the OOXML enhanced Exchange/SharePoint/Groove Hub.

The Great Opportunity: The Vista Information Processing Chain

Briefly the greatopportunity is this: Microsoft has been very busy migrating those MSOffice bound business processes to the Exchange/SharePoint Hub, which will soon be joined with Groove Collaboration servers. XML Hubs are interesting in that they are an indispensable core to any SOA effort. The Hubs are a very effective point where access, aggregation and repurposing of XML services and information streams connecting to such things as backend data, transaction, inventory and billing processing or content –archival management systems. As a universal transformation layer able to connect to many disparate and purpose specific backend legacy systems, XML is unparalleled.

But what if these same XML Hubs are integrated as a way station connecting desktops and devices to Internet server systems as well as the legacy backends? And there my friends is the magic of an Exchange/SharePoint/Groove Hub. On the desktop side E/S Hubs integrate directly with the MSOffice productivity environment using OOXML (versions Ecma 376, MOOXML, legacy MOOXML, and MOOXML Binary InfoSet) as the both the XML container and transport. The E/S Hub provides the integration point where eMail, documents, workflows, project participants (people), and information resources (web services, XHRequest streams, media streams, and message streams) can be aggregated, sorted, re purposed, published, managed and scheduled. At the E/S Hub it's easy to bind data from MS SQL Server systems, MS CRM, MS ERP and MS Live to the portable OOXML document transport.

The most important point about XML Hubs is that migrating business processes to them always results in extraordinary productivity gains. Yes, the E/S Hub represents a new lock-in point for MSOffice bound workgroup customers. The immediate productivity gains however will far outweigh the long term cost of having your business lock-in to MS only applications and services for the next fifteen years. So where's the opportunity you ask?

By intercepting this migration at the head point, MSOffice, converting the documents to ODf, and provisioning ODf ready Hubs the ODf Plugins could potentially walk off with the entire monopoly base of over 500 million MSOffice bound desktops. The ODf Plugin route is infinitely cheaper in that there is no need for desktop upgrades. And, the cost of long term MS lock-in is completely negated.

The ODf Plugin alone however isn't enough - which is exactly why the Foundation began development of the lightweight, efficient and highly portable ODf InfoSet Engine and APi. We need ODf ready Hubs to complete the ODf Information Processing Chain alternative. ODf Hubs that can compete with E/S Hub Juggernaut.

C'mon Alfresco, Lotus, and Zimbra - the whole world is waiting.

One more thing about these ODf Hubs and ODf ready applications that span desktop, servers and device platforms. The Internet has ushered in a new world of universal connectivity, exchange and collaborative computing. Some ODf ready applications will be designed to participate in specific information processing chains, so they will be built to act as efficient routers of information.

Other applications will profess to be ODf ready, but continue to act in the legacy traditions of information end points - ignoring the ODf document needs as defined by other information processing chains; dropping objects and data binding mechanisms they don't understand or lack the feature sets needed to make use of. Or, they might add value by asserting application or platform specific dependencies that would otherwise corrupt the documents use in other information processing chains.

Application Interoperability

The problems of application interoperability are far more difficult for ODf than for OOXML because ODf was designed for a very different objective.

OOXML was designed exactly for MSOffice bound binary and business process compatibility, (stage I & II of the XML migration requirement) including application and platform specific dependencies.

ODf was designed to be an application and platform independent universal file format, dependent on open XML and other Internet technologies available to any application or service.

For ODf, proprietary or platform specific dependencies are an interop killer. For OOXML, proprietary and platform specific dependencies are the monopolists life blood of "interoperability by design”.

The world is going to move to XML, and from there to RDF/XML. That's a given. OOXML has the incredible advantage of meeting the three “migration to XML” requirements. But the migration comes at the cost of long term business process lock-in, with interoperability demands limiting applications and services to the Vista Stack of:

MSOffice <> OOXML <> IE <>ViSTO .NET <> E/S/G Hub <> MS Active Directory <> MS SQL Server <> MS CRM <> MS ERP <> MS Live etc. etc. etc.

The core of this chain is the MSOffice <> OOXML <> Exchange/SharePoint Hub

ODf can similarly meet the the three “migration to XML” requirements, and do so just as well as OOXML – in spite of MS claims otherwise. Although they anger and upset the rip out and replace bent ODf community crowd, the ODf plugins for MSOffice from Sun and the Foundation are the only way ODf can crack and perfect these critically important “migration to XML” requirements. Danes please take note!

And of course, there is today no ODf Stack or ODf information Processing Chain comparable to the one Microsoft has unleashed. Some great stuff cooking with Lotus Notes, but they are obviously lacking a strategy for stage I & II of the “migration to XML” requirements. Alfresco is hard charging, but they have that same problem. Zimbra and Google Hubs are still mired in the application as end points approach where they see conversion of documents as a one way process where continuing and persistent loss of fidelity and the dropping of feature related objects or bindings is the expected and unavoidable collateral damage cost, so get over it.

Meanwhile, the ISO/IEC consideration of Ecma 376 is a diversion from where the real action is; the optimized for MSOffice OOXML Exchange/SharePoint juggernaut.

It's high noon for XML. Do you know where your ODf information processing chain is?

~ge~

Notes: The OpenDocument Foundation is working on three products that we believe to be essential to the development of an ODf Information Processing Chain that can connect desktops, servers, devices and Internet systems transitioning information based on the highly interactive and collaborative portable XML document/data model/streaming media model.

These products are still in development. What follows are the design goals and objectives:

  • daVinci ODf 1.2 Plugin for MSOffice

  • InfoSet Engine & APi

  • Interop Wizard for OpenOffice

  • ACME 376 XML-RTF Plugin for MSOffice

The daVinci ODf 1.2 Plugin:

To the best of our knowledge, daVinci continues to be the only ODf plugin for MSOffice designed to work “internally”. What happens is that daVinci triggers the internal conversion process Microsoft uses to do MSOffice conversions.

DaVinci triggers the internal process, intercepting an undocumented internal super structure we call “MS Universal RTF”, for lack of a better name. This undocumented super structure accounts for the extraordinary near perfect conversion fidelity achieved by daVinci and demonstrated by the ACME 376 plugin.

We do not believe similar conversion fidelity results can be achieved by either the MCN XSL Translator plugin approach, or the traditional “external” binary file reverse engineering approach used by OpenOffice and the Sun ODf Plugin. Which is perhaps why Microsoft themselves use this same internal process to convert legacy MS binary documents to MS OOXML.

The daVinci conversion process follows this “internal” sequence (and it'sreverse):

imbr <> MS Universal RTF <> daVinci InfoSet <> ODf 1.2

imbr :: Microsoft in-memory-binary-representation which becomes a MS Binary (dump) on save (the reverse on load).

MS Universal RTF: This is a very special structure that MSOffice uses to do all conversions from imbr to MSXML, OOXML, and RTF (the RTF the rest of the world uses).

DaVinci InfoSet: A super InfoSet structure daVinci imposes on the MS Universal RTF structure.

ODf 1.2 :: daVinci maps from InfoSet to ODf 1.0 (first daVinci version) and ODf 1.2 to produce the ODf file. The mapping mechanism could be redirected to Chinese UOF or even a subset of OOXML with some work.

The ODf 1.2 metadata RDF/XML model provides daVinci with the descriptive flexibility needed to maintain a high level of interoperability. The first version of daVinci provided to Massachusetts in June of 2006 targeted ODf 1.0 and high interoperability with OpenOffice. This came at the expense of perfect conversion fidelity and much needed round trip fidelity with the legacy of billions of MS binary document. In essence, the 85% conversion fidelity one expects from OpenOffice was exactly the same conversion fidelity delivered by daVinci ODf 1.0.

ODf 1.2 enables daVinci to hit near perfect conversion fidelity without compromising on the high level of application interoperability users expect.

The ACME 376 plugin is available for public download and testing - (the ODf 1.2 metadata RDF/XML proposal will not be completed until April of 2007). Although ACME 376 perfects a XML encoding of RTF, it demonstrates the extraordinary near perfect conversion fidelity provided by the daVinci engine.

The daVinci Plugin design objectives includes these features:

  • daVinci plugin for MSWord, Excel and PowerPoint :: MSOffice versions 1998, 2000, XP Office, XP 2003 and 2007 (note- unexpected problems with final release of MSOffice 2007)

  • Accessibility Interface installed with daVinci (tagging graphic & media objects in MSOffice with ODf 1.1 accessibility eXtensions)

  • PDF/ODF w/Digital Signatures (XML-XForms sig model) :: a combination PDF/ODF file that can be read in Acrobat, but edited in any ODf ready application given proper digital rights :: excellent for data binding, transport and extraction

  • XForms Interface for MSOffice installed with daVinci

  • Enhanced document library storage search, re use, and re purposing through the advanced ODf 1.2 metadata RDF/XML model

  • Enhanced cross platform/cross information domain application

  • interoperability through the ODf 1.2 metadata RDF/XML model

InfoSet Engine & APi:

Based on our knowledge and experience of working with the MSOffice internal conversion process, we began development on a ODf InfoSet Engine with developers APi. The objective is to provide developers, IDE's, Server-Device-Desktop Application providers and cross platform run time engines with a complete ODf conversion and layout engine that is light weight, portable, and easy to embed. Work on the “layout” engine portion has not yet begun.

The Interop Wizard for OpenOffice:

This is a plugin for OpenOffice that enables workgroup user to set either the individual document or default document settings to provide perfect interop with MSOffice desktops.

The issue here is that mixed environment workgroups (ODf Plugin MSOffice & OpenOffice desktops) will face a near insurmountable problem of round trip fidelity corruption. This has nothing to do with file formats or conversion fidelity processing, and everything to do with an issue known as layout engine impedance mismatch.

OpenOffice has a very sophisticated and complex layout engine that is optimized for the advanced presentation and use of “styles”. MSOffice on the other hand has a comparatively simple layout engine design and does not attempt to do common presentation tasks such as “table within a table”.

The layout engine mismatch limits and defines certain feature sets that make it structurally impossible to map documents between the two application suites. The “structural” problems are well known, and fall into these five categories: lists, fields, sections, page breaks, and tables.

The Interop Wizard shuts off the advanced feature sets of OpenOffice, limiting functionality to mimic near exactly the structural features of MSOffice. When the wizard is on, the advanced features are not available unless the end user overrides the warning.

Monday, February 05, 2007

God Save the Queen!

The British Standards Institute, which represents the UK with the International Standards Organization, has issued a " contradiction" to Microsoft's specification.

And it is just one of many national bodies that had until today to contradict the application, which was being fast-tracked following its endorsement by the European Computer Manufacturers' Association (ECMA). Reports are that Malaysia will follow suit with their own encyclopedia of contradictions.

"Microsoft standards bid faces failure"

"Microsoft's bid to get its Open XML formats recognised as an international standard faces a delay for at least three months and could fail altogether, it emerged today."

Tuesday, January 30, 2007

Yankees in the Court of King Arthur, with a Microsoft Agenda

ANSI/INCiTS has completed their review of Ecma 376, and is ready to cast their ISO/IEC Contradiction Review Phase Fast Track Ballot in favor of Ecma 376 being rammed through ISO. As Sam Hiser points out in his PlexNex blog, not only are the findings of contradictions, inconsistencies, and proprietary dependencies pouring into the public view, there's not much an American can do about it. ANSI/INCiTS has determined that no contradictions exist."

Hi Sam,

As a fellow American, prepare yourself for humiliation and shame. ANSI/INCiTS has decided that they will not object to MS Ecma 376 on the ISO/IEC Fast Track Ballot. This in spite of the massive compilation of contradictions and inconsistencies compiled by GrokDoc, whose members raced through the weekend just to have the document ready for the ANSI/INCiTS meetings.

Rather than confront the clear evidence of contradictions and inconsistencies, the brave hearts at ANSI/INCiTS choose to narrow the definition of what a contradiction is. And narrow it they did.

They decided that one standard contradicts another standard only if the proposed standard causes the existing standard not to work.

Sounds good doesn't it? Au contraire mon ami. This is the kind of self serving maneuvering only a bureaucrat could love.

Using the analogy of the Chinese WAPI WiFi networking standard that was defeated last year because the protocol caused radio interference with existing 801.11 networks, our standards champions came up with their mechanical interference measure of “contradiction”.

Because both files can physically exist on the same disk without interfering with each other, our champions determined that OOXML did not contradict ODF.

Maybe they thought this would go unnoticed, but as one disheartened friend of open standards pointed out, “this argument can be used for every XML format, every programming language, every operating system, in fact every software standard, since software is ultimately data, and data can be segregated on disks. So they essentially chose a definition so narrow that it nullified the concept of "contradiction" for most of what JTC1 has authority over”.

So narrow a definition of “contradiction” that you can drive a fleet of monopolist trucks through the hole they've carved out of the ISO/IEC standards process.

So here we are. The champions appointed by our National Institute of Standards and Technology to represent our interest in the International Standards process have carved out a dangerous and possibly enduring loophole in the ISO/IEC fast track process. A loophole designed to serve the interest of a single proprietary monopolist seeking to control mankind's digital future.

Sam, where do we hide?

To our friends abroad fighting desperately to preserve the integrity of international open standards, with our digital freedom and the future of open Internet on the line, we ask that they dig in their heels and fight this to the end. Those Yankees you see striding into the Court of King Arthur are not from Connecticut, and the Camelot of ISO had better beware. Stalking standards to the tune of that ugliest of all Americans, the corporation from Redmond, ANSI/INCiTS has sold our souls that the world might be fodder for Vista.

Dig in your heels friends,

~ge~

Thursday, January 25, 2007

Brace Yourselves! ACME 376 XML Is Here!

The ACME 376 Proposal for International Standardization
The following letter is to Rob Weir, who somehow managed to get his own personal XML language approved as a standard. We too would like our personal XML language and file format approved. But we don't have Rob's connections. We also don't pack the massive checkbook he carries! Still, our XML is just as worthy of international consideration. We also are providing the standards bodies members with an easy to download ACME 376 Compatibility Kit for versions of MSOffice 97-2007 so that they can test and use our personal XML language. We need help Rob!



Dear Rob,
We were very inspired by your "ECMA Weirish" effort. We too have ideas, thoughts and business needs so unique that we need our own XML language. So we invented ACME 376. We even have a ACME 376 Plugin for MSOffice so that coyote's everywhere can immediately improve their road runner catching productivity using our own XML language, "ACME 376". Now coyote's, plumbers and amateur bomb-makers everywhere can finally join the XML revolution, converting their billions of legacy binary documents to perfect, 100% high-fidelity ACME 376!

It's a great day for coyote's the world over. Woe to the hapless roadrunners who dissed us in the past.

So Rob, how did you get ECMA to rubber stamp "ECMA Weirish"? We need the same no-questions-asked rubber stamping for ACME 376. And from there we need to push it through ISO/IEC, but in a way so that the road runners of the world are caught unawares of our nefarious plot for their destruction.

Could you please review the following proposal and advise,

~ge~

Wanted: A Standards Body Willing To Rubber Stamp
Inspired by Rob Weir’s blog who intended to standardize “Weirish” as an international standard, we thought that if Rob has his own standard then we need one too.

So, we invented the “ACME 376” file format. It meets all the requirements of the ECMA TC 45, namely:

  • * it is XML! Actually “.acme” files are ZIP files which contain pure XML.
  • * it is compatible with Microsoft Office (2000, XP and 2003).
  • * it can convert those billions of binary documents with perfect fidelity
  • * it is great!

Make your Microsoft Word desktop productivity environment “ACME 376” ready today. You can download the installer here. This is not a joke! If you have problems, leave a comment here.

Standards bodies of the world! ACME 376 needs your consideration and approval! We need your vote for ISO/IEC approval now. We made sure not to use any other existing standard so a review for contradiction is not necessary.

Make ACME 376 an ISO standard today!

Download the installer for ACME 376 Compatibility Kit for Microsoft Office.

Monday, January 22, 2007

Running On OpenDocument Inside of Microsoft Office

Perfect Conversion Fidelity & The daVinci ODF Plugin for Microsoft Office

By now it's clear that Microsoft's Ecma-approved Open Office XML file format specification is filled with contradictions to existing ISO/IEC standards products. Beyond the traditions of international standards consideration, there is a second perhaps even more important concern: How do we reasonably migrate from a world where Microsoft Office bound business processes drive critically important economic, governmental and organizational concerns? How do we migrate these processes to OpenDocument ("ODF"), which is an international standard designed for interoperability? And is there any possibility of converting with acceptable fidelity the billions of binary documents trapped in Microsoft's proprietary file formats? The world wants to move to ODF XML. But the question is, "Can This Be Done?". And further, "Can this be done without costly disruption to our day to day business processes?" Microsoft has long claimed that only their proprietary Office Open XML could convert those billions of binaries to XML without loss of fidelity (data loss, or "lossiness"). The claimed that ODF was inadequate and unable to handle the rich feature set of Microsoft Office. This is a strange claim in that the "X" in XML stands for eXtensible. Since XML formats are eXtensible, of course ODF can handle anything Microsoft Office or those billions of binary documents have to throw at it. The real truth about this issue is that no one has ever been able to crack the secret code of those Microsoft binaries that hold so much of the world's documents. And Microsoft is not about to disclose the specifications for those formats. Years of reverse engineering efforts by non-Microsoft develoeprs have brought us within range, but the binaries are an ever-moving target for interoperability. With each new version of Microsoft Office applications, the binary formats change arbitrarily. For example, there is no single DOC format; there are a host of DOC formats that all use the same file extension, DOC. Do not kid yourself. Microsoft is not about to be the one that converts all those binary documents to the open standard ODF. Instead, they set out to convert those billions of binaries to their own proprietary XML. So, the next question is of course, "Is there any possibility of converting those binaries to Microsoft's EOOXML and from there a transformation to ODF?" After all, easy and perfect transformation is the promise of XML. Microsoft's Steve Ballmer answers that question for us when he claims that conversions between EOOXML and ODF can be done, but that Microsoft's plugin will never provide full fidelity conversions between EOOXML and ODF; in other words, only a core set of features will be converted and the conversions of documents implementing other features will be lossy, resulting in data loss in largely unpredictable situations, depending on the differences in particular documents. So Microsoft's solution will not allow the automated conversion of those billions of binary documents to ODF. Not without data loss. We'll have transformation processes between EOOXML and ODF, it just won't be worth doing unless you are willing to manually compare documents when rendered both on Microsoft's applications and on an application that fully supports ODF, just to ensure that no crucial data was lost in the conversion. Essentially Microsoft is claiming that only they can convert the billions of binary documents to XML with the fidelity needed, perfect fidelity (no lossiness), and only to Microsoft's flavor of XML, EOOXML. And Microsoft wants the ISO to award it a monopoly in converting its legacy formats to XML by making its own personal XML its own personal international standard. Microsoft is also funding an open source project to perfect conversions (more properly, "transformations" in XML lingo) between EOOXML and ODF. Maybe just to prove they are good guys who can be trusted with our information, and that they are not out to "replace" ODF, but rather to perfect the conversion of those legacy billions of binaries to XML. One objective of the Microsoft-Novell Translator Project is to provide an easy to install EOOXML <> ODF plugin for both Microsoft Office and OpenOffice. Betas for both of these plugins are expected to be released this month. Because this work is somewhat in the open, we are well aware of the intransigence of continuing conversion fidelity problems in both versions. (The Novell work on the OpenOffice.org Writer Translator plugin is complete, but not open to the public, or contributed back to OpenOffice.org -- yet). Note that the Translator Project is based on a XSL Transformation process. So it's expected to be both application and platform independent. The only question is, "Can they achieve the quality of fidelity needed to be of any use?" Steve Ballmer says no. And he's funding the project. There's also the consideration that a scorpion can't help but continue to act like a scorpion. This famous quote from Bill Gates continues to haunt the technology industry to this day. He might as well been referencing the Microsoft binary file formats:
"I doubt they [Digital Research] will be able to clone Windows. It is very difficult to do technically, we have made it a moving target and we have some visual copyright and patent protection. I believe people underestimate the impact DR-DOS has had on us in terms of pricing." (May 18, 1989 - Bill Gates)

The ODF Plugins Appear::

When Massachusetts announced their Request for Information concerning the possibility of an ODF Plugin for Microsoft Office, there were many responses. Each with a different approach to the problem. Microsoft responded to the RFi with the promise of their XSLT based Translator Plugin. Sun provided two different plugin designs; the first based on a OpenOffice Server side conversion, the second based on a C# routine connecting Microsoft Office functions to a locally installed OpenOffice conversion. In both cases it was the OpenOffice conversion engine we know and love that was doing the work of MS binary formats to ODF and back. Importantly, neither the Microsoft nor the Sun plugins allow ODF to be set as the default file save format in Microsoft Office, what is known as "native support" for a file format, a situation sure to produce lots of accidental non-ODF files. Imagine what it would be like to have to train yourself never to hit the "Save file" option or keystroke shortcut in Microsoft Office. Instead, you must open a special menu option to save a file as ODF. Do you think you just might accidentally use the normal file save commands every now and then? An unpredictable mixture of file formats in a network can have unpredictable consequences, particularly when automated processes are involved. A third ODF Plugin for Microsoft Office was proposed and submitted by the OpenDocument Foundation. This plugin conversion process was based on internally triggering a Microsoft Office native conversion process; one the Foundation believes is the same or similar to that which is used when the EOOXML Compatibility Pack is installed, and the Microsoft Office in-memory-binary representation of the document ("IMBR") is converted to EOOXML. Unlike the other plugins, the Foundation plugin adds full native support for the ODF file formats to Microsoft Office. (The present version adds that support to Microsoft Word; later versions will add it to Excel and Powerpoint.) You can use the normal file save dialogs and commands. Moreover, you will not need to study/rewrite all of your existing scripts to ensure that files are saved to the right format. Just set the default file save format to ODF. There were other submissions, but these are three we know the most about. They each represent a different approach to the problem of converting those billions of binary documents to XML. They each offer a different quality of conversion fidelity. Only one allows ODF to be set as the default file save format in Microsoft Office. The real question is whether or not any of the three provide file conversion fidelity acceptable enough so that there would be little or no disruption to existing Microsoft Office bound business processes, line of business dependencies and the functioning of assistive technology add-ons. In short, without near-perfect conversion fidelity, there is no measure of "interoperability" worth talking about. That is the reality of Microsoft's EOOXML blackmail attempt. For years they have withheld from competitors and customers the secret binary file format details needed for perfect conversions, reserving that advantage for themselves. Only Microsoft holds the key to unlock your information from Microsoft applications it remains bound to. So because they alone hold the key to so much of our binary-bound information, they insist that the world must adopt their proprietary, self serving, application and platform bound, monopoly leveraging XML as an international standard. We are being blackmailed by the problems of converting those legacy billions of binary documents to XML. I have a counter offer that ISO/IEC might consider; Give us the keys to those legacy binaries and the documentation for the new MSXML InfoSet binaries that first appeared in Microsoft Office EXcel 2007, and we'll give you international standardization for EOOXML. A fair trade i think, because it will break the monopolist's grip, level the competitive playing field, and restore competition wherever desktop, server and device systems need to interconnect and exchange information.

Three Conversion Approaches to Consider::

What we have then are three conversion methods, all enabled through an easy to install plugin model, and each with a different level of conversion fidelity:
  • The MS Translator XSLT method (EOOXML <> ODF) :: note that this initially was an application and platform-independent approach. In the strict sense, this is a "transformation", not a "conversion"
  • The Sun external OpenOffice.org conversion engine method (MS Binary Files <> ODF) :: note that the OOo conversion engine is based on years of reverse engineering needed to understand the secret structure of those mysterious and enigmatic billions of binary documents. A secret only Microsoft can unlock with perfect fidelity.
  • The Foundation's daVinci "internal" conversion process (MS in-memory-binary representation <> ODF) :: note that this process harnesses the internal conversion methods of Microsoft Office applications in much the same way as the EOOXML Compatibility Pack.

A brief description of the daVinci internal process:

So how does daVinci do this magic of triggering an internal process and letting the native resources of Microsoft Office perfect the conversion? Well, the first problem was getting inside Microsoft Office applications and working "natively". The know how for this was provided by Microsoft themselves when they mounted their first MSXML add-on to Microsoft Office 2003. And what a great job they did! Once inside Microsoft Office and working natively, the entire view of how to best convert MS binaries to ODF changes radically. Rather than trying to crack the intransigent and enigmatic binaries externally, on the inside you simply let the Microsoft Office applications do it for you. We don't know for sure, but there is every indication that daVinci works very similar to how the EOOXML Compatibility Pack works. There is no doubt that without the public information Microsoft has provided concerning the early versions of MSXML, we would not have had the series of breakthrough discoveries that make daVinci possible. So the key to daVinci is in letting Microsoft Office apps handle the billions of binary documents, especially their conversion to IMBR (the Microsoft Office apps in-memory-binary representation). Internally, when a conversion process of any sort is triggered, Microsoft Office apps follow pretty much the same routine. There is a point where this internal conversion process can be intercepted and routed (mapped) to a different, non Microsoft, file format structure. Imagine if an internal conversion process from IMBR <> EOOXML is triggered, and you intercept the process the moment before mapping to EOOXML begins. And you reroute by mapping to ODF. That's daVinci. DaVinci triggers, intercepts, and maps to ODF. It could just as easily be configured to map to Chinese UOF. Or to EOOXML (well, forget the easy part regarding EOOXML; the haphazard and sprawling structure of EOOXML makes this mapping difficult - but as the MS Compatibility Pack proves, it is possible :). daVinci could even be configured to map to Romanian XML or Oracle XML. The conversion quality of the daVinci process really depends on the flexibility of the XML schemas it is mapping to. Let me say that again, "The conversion quality of the daVinci process really depends on the flexibility of the XML schemas it is mapping to". Since EOOXML was made expressly and specifically for mapping Microsoft Office IMBR to, you better get perfect fidelity. What about ODF? Yes, you can get the same perfect fidelity. The flexibility is there, and has been there since the February 2003 addition of the <foreign element> tags, section 1.5 of the ODF v1.0 standard (casually referred to as the <microsoft tags> because of what they can do). So yes, if you can break the secret of the proprietary IMBR, understand their hidden structure and function, you absolutely can get perfect-fidelity conversions to ODF and EOOXML. This is an incredible achievement for the OASIS ODF Technical Committee ("TC"). ODF was designed to be a universal file format, totally application and platform-independent, and it has the built in flexibility to easily handle anything the enigmatic billions of binaries might throw at. Tapping into the Microsoft Office IMBR just makes it easier for daVinci to see what's actually happening inside those unspecified binary blocks that blanket the billions of binary documents were trying to convert to XML. As Rob Weir has remarked: "If Microsoft supported ODF 1.0 in Office today, using the foreign attribute support already specified in ODF 1.0, they could achieve backwards compatibility with their legacy documents. There is nothing that prevents them from adding a "DoItLikeWord95" attribute to an ODF document."

Blanketed with Unspecified Binary Objects - The dark spots ::

The real problems of converting those billions of binary documents or working as a near native file format within Microsoft Office has nothing to do with either EOOXML or ODF. And everything to do with the secret, enigmatic binary file formats. Microsoft is busily spinning the world to convince us otherwise, but it only takes one demonstration of daVinci to set things straight. We shouldn't give in to blackmail. Especially blackmail designed to leverage the Microsoft desktop monopoly deep into our future of converged and highly interoperable multi platform systems. Control of the file formats, and keeping them bound to proprietary applications and platforms, is control of everyone's information and information processes. So Microsoft Office will do a great conversion of those billions of binaries to IMBR for us. And when triggered, IMBR will set things up for daVinci to intercept an internal conversion structure and map to ODF. Because there are billions of binary documents out there, with years of file level application feature tweaking and enhancements by independent LOB - business process developers and assistive technology add-ons to deal with, there's no telling what kind of unspecified binary objects daVinci will encounter and have to map to ODF. daVinci needs the mapping flexibility in the XML target structure to place these unspecfied anomalies otherwise called "dark objects". The thing is that these binary object anomalies are unspecified on both ends of the conversion equation. They are unspecified with regard to the historical annals of reverse engineering, which itself is based on the cryptic, enigmatic, and often misleading documentation Microsoft has provided for RTF and the MS binaries. And, they are unspecified by the XML structures at the other end of the conversion equation. Like ODF.

The Skinny on daVinci inside::

When a user loads a binary document (or creates a binary document in Microsoft Office applications), the apps themselves convert the binary documents to IMBR (the Microsoft Office apps in-memory-binary representation). The user works the document in IMBR mode. This means all application features, business process adaptations, assistive technology add-ons, whatever, are available and cooking without disruption or change. When an internal MS conversion process of IMBR is triggered, daVinci intercepts the results, and maps to ODF. The ODF version is saved to file. An internal conversion process is triggered whenever functions like save, save as, open, or open most recent is called for.

Conversion Fidelity & Interoperability ::

We fully believe that ODF version 1.0 provides daVinci with the flexibility we need to hit the same quality of fidelity of conversion of those billions of binary documents that EOOXML promises. Which is to say that ODF 1.0 has long offered Microsoft the same opportunity to convert everything to ODF and back. There is no technical reason for Microsoft not to have implemented ODF. And there is no technical reason for them to now ask that ISO/IEC consider a second universal file format specification as an International standard. But what's beyond the issue of conversion fidelity? Inter application and cross platform interoperability; the ability to transport and exchange documents across many different kinds of information domains without loss of fidelity or structural compromise. Interop is a tall order. Especially after years of living with application bound file formats that only the application and platform vendors can transport and exchange effectively. In their EOOXML pitch, Microsoft promises something called, "Interoperability by design". Translated this means that all Microsoft applications will be designed to work perfectly with EOOXML. Most likely we will also see MS applications able to handle the binary extensions of EOOXML that showed up Excel 2007. This includes desktop, server and device systems written to .NET 3.0 and the Vista platform. To make certain this happens, Microsoft has provided us with a new version VSTO 2005 where they drop support for MSXML and introduce support for EOOXML. They make it easy. The thing is that if you're a non Microsoft application, most likely you won't be able to fully implement EOOXML. Definitely you won't have access to the binary InfoSet extensions of EOOXML. Least ways not without a price, and never if you're a competitor like Oracle, IBM or Sun. ODF Interoperability is open and freely available to anyone wanting to implement ODF. Participation in the OASIS ODF TC specification process is open and affordable. There are no application or platform specific dependencies, or licensing restrictions, or patent encumbrances - legal risks holding anyone back. Universal file format interoperability is a given with ODF. Application interoperability is another matter. Especially existing applications that might have layout engines developed long before ODF became available. With new ODF applications this won't be a problem since they can develop directly to the specification. This is one of the reasons so much work is going into ODF 1.2, to accommodate the differences of traditional layout engines as they implement ODF. We don't have the power or authority of a Microsoft to rewrite every application to work perfectly with ODF. Nor do we have a similar command of the marketplace to force a user base of over 550 million desktops to upgrade to a Vista platform of Microsoft Office 2007 - VSTO -IE 7.0 -Exchange/SharePoint/Groove - MS SQL Server, MS Active Directory Server, etc. So instead, we have ODF 1.2 waiting in the wings. That's where our solution to universal application interop lies.

About the flexibility of ODF 1.0 - The Interop of ODF 1.2 ::

It is true that ODF has had, since February of 2003, an extremely flexible set of tags were added to the specification. They are called the <foreign element> and <alien attribute> tags, and were designed exactly to handle the billions of unspecified and conversion defiant binary object anomalies known to comprise years of Microsoft proprietary binary file format use. When you're mapping from an IMBR conversion process to ODF, you have to have something to put the unknowns in. You have to map to an existing tag. Since the very nature of these dark objects is that they are "unspecified" and previously unknown, they are also outside the range of ODF. Using these tags, daVinci can get perfect fidelity between the billions of binary documents and ODF. And get it right every time, with every Microsoft Office version from 1997 to 2007. But this "perfect fidelity" comes at a high cost of interoperability with other ODF ready applications. Simply put, the other ODF applications have no idea what to do with the daVinci <foreign elements>! An Microsoft Office with daVinci knows. But for all other ODF ready applications these dark objects are still a mystery. In many ways the <foreign element> tags are the equivalent of the what EOOXML does with the same volumes of unspecified legacy tags. They're there, but no one excepting Microsoft Office Compatibility Pack enabled installations know what to do with them. The same is true with daVinci ODF. Only daVinci knows what to do with these dark objects. In other words, from day one ODF has had the exact same means wrapping in proper XML an unspecified binary object or processing instruction as that which EOOXML is now parading about as something absolutely necessary (and unique to EOOXML) for converting those billions of binary documents to XML. With ODF 1.2, daVinci gains the flexibility to map whatever dark objects found in ways that will dramatically improve interop with other ODF 1.2 ready applications. (see the <interop eXtensions> proposal submitted by Florian Reuter to the ODF Metadata SC). Using the new metadata model, daVinci can then proceed to fully describe everything known and intuited about the dark object. Keep in mind that daVinci has an inside view. What daVinci sees is the IMBR context and conversion structure that is missing from a binary file format as well as EOOXML's cryptic tags. This descriptive model will provide every other ODF 1.2 ready application a much better chance to handle and render the dark objects. With ODF 1.0, we were limited in how we describe for interop purposes these unspecified creatures. With ODF 1.2, daVinci can field these objects on the fly, and give other ODF 1.2 ready applications a fighting chance to properly render them. Sticking a binary object into an an XML wrapper is just kicking the can forward. It's passing the problem onto someone else. Yes, it solves the momentary problem of an XML file format plugin running inside Microsoft Office (EOOXML and ODF). No problemo for those users. But it punts the problem of roundtrip interoperability with other ODF ready applications. They are left hanging. With the generic <interop eXtension> approach, and the metadata descriptive model, my guess is that ODF 1.2 ready applications will handle upwards of 98% of these problems instead of having to ignore the entire binary block. Over time of course, we will come to understand, specify properly, and map directly these binary objects. Years of reverse engineering has brought us to upwards of 85% conversion fidelity. Now we need to nail that remaining and highly elusive 15%. Uncompromising demands from Massachusetts and the EU have forced Microsoft to come out in the open with their proprietary XML. They are fighting tooth and nail to keep their application bound binary secrets secret. And with good reason. If we crack that last 15%, and do it in a way that provides users with a totally non disruptive migration to ODF path, the monopoly will have been cracked open. Sometimes i wonder if the ISO/IEC JTC-1 members realize that they have it in their power to do what no government has thus far been able to do - stop the Microsoft monopoly from illegally leveraging their control into other markets, and restore open competition to technology marketplaces. The daVinci ODF 1.2 and ODF 1.0 plugin demonstrations will be made available to ISO/IEC members as positive and irrefutable proof that "it can be done". Hopefully we can get a video demonstration to walk them through daVinci so they can see for themselves. Hope this helps, ~ge~

Notes:

Microsoft joined the original OASIS Open Office XML effort in November 2002 (now OpenDocument or "ODF"). But they refused to participate or comment, instead quietly observing the work of the ODF Technical Committee for the next four years. Meanwhile, they began work on a proprietary XML file format designed specifically and solely to meet the "XML" needs of their Microsoft Office applications and emerging Vista platform of desktop, server and device systems. In 2004, Microsoft presented their proprietary XML effort to the European Union in response to a famous study known as the "Valoris Report". The report recommended that EU governments and organizations mandate information technology purchase requirements based on a far reaching but uncompromising infrastructure of SOA, Open Standards, and Open XML mandates. Valoris also recommended the development of a universal XML file format that was application and platform independent, able to service the portable document needs of an SOA infrastructure stretching over desktops, servers, devices and across the Internet. The universal file they envisioned was tagged "OpenDocument". When Massachusetts followed the EU with a clear mandate for Open Standards and open XML file formats that were recognized by Open Standards bodies, Microsoft formed the MS Ecma 45 workgroup with the objective of developing an open standard XML file format perfectly compatible with the existing Microsoft Office XML file formats. Meaning the end result was in the hands of the Ecma 45 workgroup before they even began work. All that needed to be done was some massive documentation of what in essence is a Microsoft Office binary dump into XML. Or, if you prefer, an XML encoding of Microsoft Office proprietary binary file formats.