Open Stack: OpenDocument as the perfect Microsoft Office file format

Monday, January 22, 2007

OpenDocument as the perfect Microsoft Office file format

How to:

Add native file support for OpenDocument to Microsoft Office

What is the da Vinci plugin for Microsoft Word? Where did it come from? How does it work? And can it really convert files generated by Microsoft Word versions 97-2007 into fully compliant, OpenDocument format-ready applications? Can it do so so without disruption to bound business processes, dependent line of business applications, and assistive technology add-ons?

Let's begin from a simple truth, discussed in more detail below. The OpenDocument format ("ODF") is able to handle anything Microsoft Office can throw at it, and handle it at least as well as Microsoft's new EOOXML file formats. That includes the bound business processes, assistive technology add-ons, line of business dependencies and advanced feature sets unique to various versions of Microsoft Office. The converse however can not be said of Ecma Office Open XML -- the new Microsoft EOOXML file formats -- as evidenced by Microsoft's own ODF Translator Plug-in Project difficulties.

In April of 2006, the Commonwealth of Massachusetts Information Technology Division broke ranks with IT tradition and put out an open Request for Information concerning the feasibility of an ODF Plugin for Microsoft Office. It is not that Massachusetts ITD had rejected the many current software offerings of ODF vendors and open source community efforts. But they lacked a critical tool to perfect a migration to ODF -- from years of Microsoft Office business processes and binary document dependencies to ODF-ready solutions -- without impossibly costly disruptions to their day-to-day business activities.

Massachusetts ITD correctly reasoned that ODF plugins for the Microsoft Office major apps would solve this problem, enabling a mixed environment of interoperable desktop solutions, giving their business processes much needed time to be migrated for the right reasons at a manageable pace. The right reasons are productivity gains and efficiency enhancements, as opposed to merely reactive efforts to squelch the costly insanity of file format madness. The manageable pace is variable, depending on human and other resource availability, rather than on a disruptive rip-and-replace of most software across entire networks to achieve interoperability.

Imagine what your own office would be like if you came to work one day and discovered that all of the software staff normally used had been replaced on every machine by different software. Could your office afford to shut down long enough for everyone to learn all of the new software and procedures? Is that kind of trauma something you would like to go through? People need to be able to exchange documents. That means software interoperability is key, and it simply is not workable for people to exchange files in formats that are incompatible with software used by the recipients.

Now imagine how much worse the problem would be if your office is dependent on fully-automated business processes where processing is entirely dependent and bound to specific applications, and those applications spanning the process pipeline have a secret language only they can speak. Additional workstations must have the right software if they are to participate in the process. Business process change becomes an issue for the application vendors since they alone can tweak the language specific applications. That's expensive. So much so that many a business lets much-needed process changes lapse until they simply are no longer competitive or profitable, and the cost of rewriting the applications to re engineer the business process must be paid; at any cost. Still, a driving and determining factor is that any new adjustments must be compatible with the information that drove prior iterations of that business process. Anything less than full fidelity in converting legacy documents to new formats greatly limits the changes and productivity enhancements that can be gained as well as ensuring that your applications and service continue to come from the same vendors -- for years and years to come. -- No matter what the cost.

Full fidelity is a term used by the file conversion congnoscenti. It refers to the quality of conversions from one file format to another. Full fidelity means the conversion from one format to another cannot result in data loss. If data loss is possible, for example depending on what software features are invoked in a given document, then we refer to the conversion as lossy.

For example, there is the always problematically lossy and always one way-only conversion of a Microsoft Word 97 binary file format (.doc) to a Microsoft Word 2003 binary file format (.doc). There is also the common file format conversion of a Microsoft Word binary .doc to Rich Text Format ("RTF") and back. For our purposes, we are concerned with the fidelity of file conversions between OpenDocument and binary files bound to Microsoft Word versions from Word 97 to Word 2007. Microsoft is able to perfect a full fidelity conversion of those same binary documents to and from EOOXML, using the Microsoft XML compatibility pack plugin. We argue that the same full fidelity of conversion is possible between Microsoft Word and OpenDocument, using the daVinci plugin for Microsoft Word. In fact, the process of how this is done is very similar to that used by the MS XML compatibility pack.

Automated business processes are key to increased productivity and profit, but they are still relatively few and far between, a situation not expected to last long. Automated business process systems are regarded as one of the next Big Things in the software industry. Legacy business process systems often require much end user interaction such as manual management of information, verification of data, tracking of data as it transits the process, etc. Many are dependent on particular legacy software applications' proprietary binary file formats.

Basically, an automated system provides exacting interfaces so that the system handles the data, workflow and intelligent routing of documents, and users concentrate on the decision points and architecture of the process itself. The Workflow Management Coalition defines workflow, in general, as "the automation of a business process, in whole or parts, where documents, information or tasks are passed from one participant to another to be processed, according to a set of procedural rules," where participant can mean either a human being or another automated process.

To get from the partial automation of legacy systems to the full automation of future systems, information transfer must be accelerated systematically and human participants must be provided with interfaces that enable rapid reorganizing of people, resources, documents, eMail, messaging, wiki collaborative documents, direct access to data bindings within those documents, and the ability to push change into advanced workflows managed by "the system". There is simply no practical way to do this without having a universal file format based on the XML portable document with system-bound data model. The universal XML file formats exist.

The systems-level XML hubs are coming on strong as evidenced by the Exchange/SharePoint/Groove Hub, Workplace/Lotus Notes/Domino, Scalix, and Zimbra. This is where data and various information streams merge with the portable document model, with information streams collapsing into a single easy to schedule and manage aggregate; the merge occurs in the context of end user project and workflow management interfaces.

Full fidelity conversions of the information that drives our current business processes to universal portable XML document models is the only way we can make that productivity leap to fully automated business processes. Every time there is a loss of fidelity in data conversion we basically lose a node in the process. We have to go manual and fix a problem that should have been handled by the system. That impedes any potential productivity gains.

The major problem with anything less than full fidelity conversion is that human intervention to fix the artifacts is required. This is both disruptive and costly. Imagine a situation where an important binary document is worked in MSOffice and NOT converted to ODF at the headpoint by daVinci. That binary document then flows through the system and is converted to ODF and further processed at an OpenOffice workstation. There is some loss of presentation, but no big deal for the OpenOffice wiz. From there it goes to a Workplace enabled workstation where it is processed and responded to. No problemo, WorkPlace can read and write ODF perfectly. Including an ODF-XForms document. From there it goes back to an MSOffice workstation for review and data extraction. The individual sends it back to the WorkPlace desktop and asks for a Word 2000 .doc format. If it is an ODF-XForms document, there is no way to convert it back to Word 2000 binary .doc. If it is a compound document of any complexity, there will be loss in the conversion. No matter what happens, this process simply isn't going to work. In fact, no one in their right mind would ever set up or allow for such a poorly mixed environment in the first place.

Enter daVinci. The conversion of binary documents occurs at the headpoint of the process, Microsoft Word, and it is full fidelity. No need to waste expensive human attention fixing conversion artifacts. Once the process is in ODF, these documents can be passed through a mixed environment without loss of that original fidelity.

Let's take for example a more extensive and document centric business process. One that handles a trillion dollars per year in transaction-after-transaction volumes; the real estate transaction industry. For each and every transaction many professional participants must come together, exchange hundreds of documents, work under a strict and unforgiving schedule, and do this with a perfection and data exactness that, if it fails, could entirely wipe out the life time savings and asset accumulations of their clients. Today the entire real estate transaction is driven by paper documents, just as it was 50 years ago. The only difference is that today much of data and document content is provided by back end data processing systems. Almost none of which are able to talk to each other.

The documents involved must be continually reviewed and re written to make certain the data and information is current. The cost and availability of a mortgage changes by the hour. A bank might pre-approve the buyer and the property to be purchased. Even lock in the rates. But there is always the possibility that secondary mortgage market evaluations, purchases, and constantly changing loan ratios can take a lender out of the market on the very day they are scheduled to deliver a loan product. It's all an incredibly time-sensitive process.

The average real estate transaction brings together many professionals: buyer and seller representative real estate agents, their brokers and transaction coordinators, the buyers and sellers, their attorneys, a lender, a title company, inspection companies, insurance companies, appraisers, contractors filing contingency removal estimates, and key information systems providers such as the Multiple Listing Service, county records, probate courts, and courts of law recording such things as liens and encumbrances. All in all, every real estate transaction has at least twelve professionals, each having to deliver specific documents and reports, and each having a back end data processing system different and unable to speak with any other back end system. It is a mess. It is impossible to automate this process without a universal digital file format that can be produced in and translated back into the many proprietary back end systems involved. And do so with absolutely perfect fidelity of document conversion and data extraction.

So here we have a paper driven process that costs consumers upwards of 12% of the actual transaction. And it's a trillion dollar a year industry once all the before and after sale costs are factored in.

A common situation in the real estate industry is that of the professionals having to exchange documents electronically. Faxing is great for paper, but there is a problem of version control as the volume of documents approaches the final closing. Most of the industry relies on eMail attachments as the primary exchange method. What a nightmare. To exchange documents, participants first have to synchronize the versions of Microsoft Office used to produce these usually compound and data centric transaction instruments. They call it version madness in that there is no way to exchange Microsoft Office documents without first having everyone involved using exactly the same version. Meaning, the quality of fidelity conversion between different versions of MSOffice is a killer. Data loss occurs unpredictably. It is unacceptable.

How do you force everyone who must participate in a real estate transaction to all use the exact same versions of software? Well, the same way governments routinely force tax payers to use the same versions of Microsoft Office. You either get the right versions or you don't get to participate.

There are other problems with eMail attachments of proprietary binary documents. Let's say you are exchanging a sales report spreadsheet with inventory, pricing, availability, special discounts or price cuts, sales, customer name and ID, etc. Every time that spreadsheet changes hands, the data has to be verified and updated. The more rapid and wide the exchange, the more intense the data management - data extraction problem becomes. It is a mess. A dangerous and risky mess.

Those spreadsheets could have important formulas and data bindings. Using the XForms or Jabber P2P data binding models, an ODF spreadsheet document can manage itself. Every time the sales people connect to the Internet, the inventory, discounts, sales totals, etc fields can be updated by "the system" with exacting precision. The end user no longer has to worry about data verification and integrity. The system takes care of it, no matter how fast and furious the changes occur. It's automated within the document. Bullet proof. The same can be done with a Microsoft EOOXML document.

However, try to convert one of these documents, and risk dropping formulas, data bindings, rules based routing, or workflow scripting, and the whole process will come to a screeching halt. As it should. This is but another way some proprietary vendors lock in their customers. If you can't do a full fidelity conversion, and be confident of the results, the only alternative is to turn to a single vendor and year after year pay the price.

Enter the OpenDocument Foundation. Since early 2003, a loose gaggle of ODF developers had been seeking methods to improve the OpenOffice.org ("OOo") conversion of Microsoft Office binary formats to what later evolved into ODF. By December of 2003, the OOo community actually launched an official project to develop an ODF plugin for Microsoft Office. The difficulties were enormous, and at the time there was no perceived “business case” for the effort that might have justified allocation of sufficient resources from the Sun Microsystems StarOffice group, the primary developers for OpenOffice.org.

However, as Microsoft began its own long march to XML formats, openly discussing important MS binary format to XML conversion issues, these “difficulties” began to be understood. In 2005 there were a series of breakthroughs based partly on released information from Microsoft. The breakthroughs also sprang from our own intuition of what was really behind some of the oddities in the Microsoft approach. Here's a quick list of reference materials Microsoft provided that really helped:

When Massachusetts put out their request for information, as it happened we were only an installation program away from our first ODF 1.0 compatible plugin for Microsoft Word, the most heavily-used application in Microsoft Office. In June Massachusetts sent us our first set of 150 primary importance test documents. We sent them their first plugin install, which was dialed for perfect interoperability with the ODF 1.0-compliant reference implementation, OOo 2.0. In August, we sent Massachusetts the first da Vinci version of the plugin, which was dialed for perfect, 100% round trip conversion fidelity with the legacy of Microsoft Word binaries.

A lot happened in Massachusetts between June and August of 2006. The short form is that as Massachusetts came to understand how our plugin worked, we came to understand the true nature of the challenges they faced. We saw up close and personal the challenges of the Microsoft Office bound business processes and the need for perfect fidelity “round-tripping” of documents between different formats within automated business processes. The Microsoft assistive technologies were never an issue for us because the plugin works natively within Word. Microsoft Word handles the assistive technologies, and all the plugin does is the conversions from Word's in-memory-binary-representation to ODF <> and back.

The situation with the MSOffice bound business processes was such that everyone agreed we had to have a much higher fidelity converting those binaries than the extraordinary 85 per cent fidelity credited to the OpenOffice.org file conversion engine. We had to have perfect fidelity because there was no reasonable expectation of ever successfully migrating those business processes to a Microsoft Office alternative like ODF-ready OpenOffice.org, StarOffice, WorkPlace or Novell Office. Such a re-engineering of the business processes would be costly and beyond disruptive.

If however we could achieve full fidelity conversions of legacy Microsoft binary documents to ODF, and were able to guarantee the roundtrip process of these newly christened ODF documents in a mixed desktop environment, one comprised of ODF-enabled Microsoft Office, OpenOffice.org, Novell Office, WorkPlace, and KOffice, the existing MSOffice bound business processes could continue being used even as new ODF ready workstations were added to the workflow. Massachusetts could migrate to non-Microsoft Office software in manageable phases, restoring competition -- and sanity -- to the Commonwealth's software procurement program.

If on the other hand, there is no full fidelity conversion to ODF of legacy documents available at the head point of the migration -- Microsoft Office -- then the business process will break under the weight of users having to stop everything to fix and repair the artifacts of lossy file conversions. What Massachusetts discovered is that users will immediately revert to a Microsoft-only process wherever the business process system breaks down due to conversion fidelity problems. It is a productivity killer and a show stopper for migration to ODF-supporting software.

Most people believe that full fidelity file conversion is impossible, largely influenced by the poor fidelity generally experienced in converting documents from one vendor's binary file formats to another vendor's binary file formats. The major reason some people transfer such beliefs to the daVinci plugin is that they do not realize the plugin works natively inside Microsoft Word. The Word program itself performs the conversions using its internal processes for native support of a file format. The da Vinci plugin triggers the native Microsoft Word conversion process, intercepts the output at precisely the right moment, and maps it to ODF. When opening an ODF document, da Vinci simply reverses the process (although there is nothing simple about what da Vinci does or how da Vinci does this).

Think of it this way. The Microsoft Compatibility Pack is a EOOXML plugin for older versions of Microsoft Office. If these versions of Microsoft Office can use a plugin to perfect a conversion process between legacy binary formats and EOOXML, the same process can be used for ODF. In fact, it's actually easier to perfect a similar conversion process to and from ODF instead of EOOXML, particularly for Microsoft. Easier in that the ODF specification is lean and clean by comparison and very carefully structured. Even easier if you have the blueprints for those binary file formats.

My educated guess is that it would take all of two weeks for Microsoft to write a Microsoft Word ODF plugin. Indeed, Microsoft developers reportedly told Massachusetts that it would be "trivial" for them to implement ODF in Microsoft Office. If they can do it for something as complex and convoluted as EOOXML, ODF will be a snap. The trick is not in XML. It's in having the secret legacy binary format blueprints.

There's a reason we call our plugin “da Vinci”. Yeah, you guessed it. There is in fact a secret code needed to unravel the remaining mysteries of the legacy binaries; the elusive 15% just beyond the reach of the OpenOffice.org conversion engine.

The da Vinci plugin is conformant with version 1.2 of the ODF specification still working it's way through the OASIS ODF sub-committees and is dependent on features of the draft specification that may conceivably change. Massachusetts was well aware of this fact, and knew we were treading on dangerous ground because ODF 1.2 was not yet final. They gave us the go-ahead anyway. Such is the importance of round trip high fidelity conversions with the legacy binaries.

Why ODF 1.2? Microsoft is notorious for frequently changing its file formats. There is even a Microsoft document jarred loose in the antitrust litigation where they brag about maintaining a "moving target" for other vendors' efforts to achieve interoperability with Microsoft software. For example, there is no single "DOC" file format. There are versions after versions of just the word processing format, all different. And their specifications are still secret. Imagine the volumes of unspecified binary objects, application version-specific or add-on specific processing instructions, and aging system dependencies buried in those billions of legacy binary files. There is no telling when or where they will pop up and da Vinci will have to somehow handle what are otherwise one unmappable black hole after another.

The easy solution is to simply wrap (enclose) these dark objects in foreign element tags. While this method is perfectly legal ODF because of ODF's strong interoperability features, see e.g., ODF v. 1.1 section 1.5, it comes at the cost of interoperability with other ODF-ready applications. Sure, inside da Vinci-enabled Microsoft Word the conversions and handling are perfect. Absolutely perfect. Send one of these dark object loaded documents to OpenOffice though and the application has no idea what to do with it.

ODF 1.2 provides da Vinci with two important advantages. The first is that ODF 1.2 ready applications “must” preserve tags whether they understand them or not. This is important to round-tripping. ODF 1.2 applications become routers of documents, instead of document end points.

The second aspect of ODF 1.2 is that of metadata XML/RDF based flexibility. Instead of wrapping these dark objects in a non descriptive tag, ODF 1.2 will enable da Vinci to wrap with a loosely descriptive generic element tag, but then nail into the more flexible attribute model everything da Vinci can describe about the blob. This gives ODF 1.l2 ready application like OpenOffice or KOffice a better than even shot at reading and rendering dark objects that would otherwise have been dropped. Over time, the elusive 15% of lossage during their conversions of Microsoft binary formats will disappear.

This is a far better way of handling dark objects than what EOOXML tries to do. Under that specification, the implementing application simply wraps the object and passes it along. There is no effort to describe these objects in XML-speak except for the tag name. The specifics needed to render these objects are left out. Which means, only those applications with access to the secret binary blueprint can properly render these dark spots.

As a proof of concept, da Vinci provides a most valuable service. First of all it proves the ODF can handle anything EOOXML can handle. ODF can handle everything and anything Microsoft Word can throw at it. Second, da Vinci proves that Microsoft developers could at least as easily have written an ODF plugin for Microsoft Word as they did an EOOXML plugin. For Microsoft, knowing the secret binary blueprint as they do, this simply can not be seen as a technical challenge. It was purely a business decision. Third, da Vinci demonstrates that it is possible to perfect a migration to ODF at minimal cost of disruption to existing business processes and application bound routines. The migration to ODF is possible. The starting point of this migration is exactly where Massachusetts ITD thought it would be – with ODF plugins forMicrosoft Office.

We have not yet developed plugins for the other major applications in Microsoft Office; Excel and Powerpoint. However, we have investigated the involved issues enough to have confidence that those tasks are also achievable. The proof of concept and working prototype of Excel is completed. Only a proof of concept for PowerPoint exists.

How to:

Interact with Microsoft business processes using OpenDocument

One of the interoperability challenges the OpenDocument format faces is that of the developers of existing non-Microsoft applications trying to implement a file format that follows after the fact of the design and development of the specific layout engines. The tradition of productivity applications is that the application was written with the file format an afterthought based on the in-memory binary representation needs of the application. Every application had a perfect binary dump of these in-memory-representations, and from there conversions to other file formats began. Early and open cooperation between existing application providers in the specification development process is essential for later interoperability.

One hope for ODF is that a new generation of ODF-ready applications will appear. And these applications will be geared from the git-go to produce perfect, highly interoperable, and round trip ready ODF.

To that end the da Vinci developers began work on the InfoSet Engine and API. This is designed to be a lightweight conversion (and someday layout) engine useful to mobile devices, server side applications, rapid development desktop productivity applications, and the easy ODF enabling of IDE's. The principal is the same as da Vinci except that the InfoSet Engine must do the MS binary conversions without the native assistance of the Microsoft Office applications. That is a step for which da Vinci relies on Microsoft Word to do natively.

So it's safe to say that the more we watch da Vinci do its work, the more we learn about how Microsoft Office perfects its own conversions. This knowledge is then passed to the InfoSet conversion algorithms. Going native gives one a whole new perspective on an enormous problem that spans decades.

The InfoSet API is intended to provide developers with easy-to-call interfaces that do all the conversions for their applications and services. We have not yet begun work on the InfoSet layout engine, but so far this is one of the more promising ODF projects we've worked on, maybe more important and certainly more lasting than da Vinci. We believe that da Vinci will have its day, cracking open Microsoft Office. But that day will only last until the Microsoft Office bound business processes and their related binaries are migrated to ODF-ready information processing chains. After that, (perhaps a three-year process) the day belongs to InfoSet and apps fully supporting ODF such as OpenOffice.org.

InfoSet is important because the world is moving towards Internet-enabled information processing chains that mesh desktops, servers and device-based information systems. We need ODF-ready chains, and the only way to get there is to make it as easy as possible for application developers to provide interop-perfect ODF.

To understand this urgency, one need only to look at the Microsoft Vista Stack, an information processing chain where every application speaks perfect EOOXML. The head point of the Vista chain is of course Microsoft Office. But the core of this chain is the Exchange/SharePoint/Groove Hub. The Hub is where information is aggregated, sorted, managed, scheduled, re purposed and bound into intelligent collaborative workflows. The driving force is that of the portable XML document, with data being bound to interactive document objects, extracted and worked as the business process flow demands.

These chains are extremely productive. So much so that i believe every Microsoft Office business process will migrate as soon as possible to either the Vista – EOOXML chain, or, to an ODF ready alternative.

Some interesting thoughts here. The da Vinci plugin and those on the development road map for the other major Microsoft Office apps will be able to convert the Microsoft Office head point to an ODF pump. This opens the way for ODF ready Hubs. Zimbra comes to mind. An InfoSet-enabled version of Zimbra (just an example :). Once the Microsoft Office bound business process moves to the enhanced productivity of an ODF ready Hub, the desktop workgroup space opens up for any ODF enabled alternative to Microsoft Office. That means, for example, any Linux desktop running OpenOffice.org or KOffice can fully participate in these Hub-hosted workflows.

Many will no doubt disagree. But I personally believe the above information processing chain scenario is the secret to Linux desktops running wild and finally replacing the 485 million Microsoft Office bound desktops that dominate business.

Related information

The da Vinci plugin and InfoSet Engine – API have been presented at technology conferences and to interested governments serious about migration to ODF. This includes:

EU IDABC Experts Group Contacts: Barbara Held, Peter Strickx, Elmar Geese
Commonwealth of Massachusetts ITD Contacts: Louis Gutierrez, Timothy Vaverchak, Claudia Bowman
State of California ITD Contacts: Bill Welty, CIO of the Air Resources Board

The Massachusetts Plugin Feature Priority List:

Microsoft Word perfect conversion fidelity plugin (ODF 1.2)
Plugin Accessiblity Enhanced Feature Set (ODF 1.1 compatible)
MSExcel perfect conversion fidelity plugin (ODF 1.2 with formulas)
MSPowerPoint plugin
Roundtrip Interoperability with ODF 1.2 version of OpenOffice.org
XForms Interface for da Vinci enabled Microsoft Word
XForms Interface for MSExcel and MSPowerPoint
PDF - ODF Digital Signature Interface for Microsoft Office

OASIS Open Office XML Application Vendors:

The interoperability of any universal file format very much depends on the open cooperation of existing application vendors. New applications written to the universal file format specification will of course have near perfect interoperability, but there are serious application layout and feature impedance difficulties for existing applications trying to implement a universal file format, no matter how open or consensus driven the specification development process happens to be. ODF was most fortunate to have a number desktop and enterprise application developers participati ng in the process. Although Microsoft was part of the original group of founding members, and continues to maintain that membership in the OASIS ODF XML TC, they have never actively participated. Their "observer" status did however give Microsoft direct access to all discussions, meetings, conference minutes, eMail exchanges, listserves, documents and proposals. Clearly if Microsoft had participated in the ODF process, the stubbornly defiant interoperability problems we see today between ODF and Microsoft's own proprietary XML could have been resolved.

Resources:

Wired Magazine article, "MS Fights to Own Your Office Docs"
Rob Weir, "The Vast Blue-Wing Conspiracy", "A Foolish Inconsistency", "The Chernobyl Design Pattern", Calling Captain Kirk, Guillaume Portes Redux, How to hire Guillaume Portes
Open Stack, "Game Time for OpenDocument"
Groklaw: GrokDoc, EOOXML at JTC-1 and EOOXML Objections
Groklaw: Searching for Openness in Microsoft's OOXML and Finding Contradictions

11 comments:

Anonymous said...: You should read
http://ooxmlhoaxes.blogspot.com/
as it shows some of your article to be very onesided; 10:21 AM
david said...: Is the da Vinci plugin available for download anywhere? Does it work in Office 2007?; 3:15 PM
Anonymous said...: Unfortunately whilst the ODF probably can handle 'anything Microsoft can throw at it' OO can't, which is the problem. OO can't handle embedded flash in ppt documents - Which makes it useless for us ( if it did we would switch 450 desktops tomorrow )

With something like this 99% compatibility = 0% useabilty; 3:48 AM
Anonymous said...: There are so many online data entry jobs in the internet but I
would like to take a chance with any reliable company.; 5:53 AM
Anonymous said...: I know this is an old post... but seriously, is the da Vinci plugin available anywhere? I have been seeking the holy grail of document conversion between .doc and .odt for some time. If it is not available yet, is there some kind of mailing list for notification when it is available?; 6:41 AM
Unknown said...: Hi Anony,

In August of 2007 we dropped ODF as the da Vinci target conversion format, and moved to the W3C's Compound Document Format (CDF) with an ePUB wrapper.

The reason for this move is that we could not establish a reasonable degree of interoperability with OpenOffice ODF unless Sun supported the five generic eXtensions to ODF needed to hit the high fidelity conversion the da Vinci process is capable of.

Since da Vinci is a clone of the MSOffice OOXML compatibility Kit, we use the same internal conversion process where imbr (in-memory-binary-representation) is converted to another format: imbr <> OOXML or, imbr <> RTF.

While it's entirely compliant to eXtend ODF, without Sun's changes to OpenOffice ODF the application-platform-vendor independent interoperability end users expect would be meaningless.

The problem as we see it is this; it is impossible to do a high fidelity conversion between two application specific XML formats.

It is however quite possible to do a conversion between an application specific format and a generic (application-platform-vendor independent) format.

CDF is that generic format. It's also very special in that CDF components (XHTML, CSS, SVG, XForms, XSL-FO, XSL, RDFa and RDF) are all Web ready. Neither MSOffice-OOXML or ODF are Web ready.

The file format interchange - interoperability the world seeks is possible at the generic format level, but not at the application specific level. Which means, the harmonization ISO now seeks between MSOffice-OOXML and OpenOffice-ODF is not possible. A comment born out by the DIN Workgroup's February 2008 preliminary report on harmonization filed with ISO just prior to the Geneva BRM.

Keep in mind that OpenOffice-ODF began life as an XML encoding of the application's binary dump. Similarly, MSOffice-OOXML began life as an XML encoding of that application's binary dump. This encoding provides the syntax of the specifications. These XML encodings then have to be semantically described so that others can fully implement specification.

In many ways, the past five and a half years of work on OASIS OpenOffice-ODF has largely been that semantic process.

Once the encodings are fully described, a third stage begins where the syntax is genericized, separating application specific settings from the interoperable baseline of generic and interoperable settings.

OpenOffice-ODF still has a long way to go in this respect. Three areas in particular stick out with ISO 26300 as haplessly under specified (semantically described): lists (especially numbered lists), formulas, and the entire presentation layer (also known as styles).

Interoperability depends on this genericization as well a strong interoperability framework. An interop framework defines compliance through implementation rules. Neither OpenOffice-ODF or MSOffice-OOXML have an interop framework worth a damn. Meaning, application vendors can introduce undocumented eXtensions wherever they want. This is good for vendor stacks, where desktop applications are connected to server side data, media and content services and applications. Like the Lotus Symphony (OOo 1.1.4) and Lotus Notes-WebShere-DB2 accelerators. Or, the MSOffice-SharePoint-MS SQL Server proprietary eXtensions baked into the MSOffice-OOXML spec but as yet, undefined.

This kind of compliance where vendor specific but undocumented eXtensions can be freely used is a killer for the interoperability consumers expect from a standard.

Simply stated, CDF has a very different starting point than either OpenOffice-ODF and MSOffice-OOXML. There are no application-platform-vendor specific bindings with CDF. It is wholly generic and Web ready. The only question is, can we convert the complex, business process rich office suite documents to CDF?

We believe the da Vinci internal conversion engine can be re purposed for exactly that.

We also believe though that MSOffice Web ready formats are only part of the story.

The entire legacy of "client/server" computing is giving way to a new model of client/ Web-Stack/ server computing where SOA, SaaS, Web 2.0 and highly collaborative "Cloud Computing" will dominate.

The thing is, through their MSOffice-Windows monopoly, Microsoft owns client side of the legacy "client/server" equation. They of course seek to control the transition process as existing "client/server" systems, many of which are entangled with MSOffice bound business processes, migrate to versions of the newly emerging "client/Web-Stack/server" model.

To control this transition, and direct the volumes towards the MS Web-Stack (Exchange, SharePoint, MS-SQL, IIS and the cascade of other services like MS Dynamics, MS Live and MS Communications) Microsoft had to control the interoperability of MSOffice.

The trick to controlling interop is defining protocols, formats and access to component dependencies (API's).

The true meaning of ISO approval of MSOffice-OOXML is that Microsoft now has near total control over the great transition.

To understand this, one must first realize that the "client/Web-Stack/server" model (with it's many variations and mixes of SOA, SaaS, Web 2.0 and Cloud Computing) is fundamentally a Webification of information and information processing systems. The trick is getting existing information systems able to work with highly structured information that is portable and application independent. Logic must be separated from data and content.

The key here turns out to be XML as a means of creating structured languages for specific domains, and then sharing that language with others working the same domain. Domains could be vertical disciplines like organic chemistry, cancer research, law, or interactive and highly collaborative web publication (XHTML). Or it could be vertical industry databases like real estate, finance and computer components. Or, as it turns out, a domain could even be an application specific area like that of desktop office suites.

XML provided us with the means of enabling the separation of data from logic with backend transaction, relational and data server systems. The data and logic were also portable.

The great innovation of OpenOffice-ODF is that this same XML model was applied to client side application bound information systems; the desktop office suite.

Microsoft's key problem with the Webification of MSOffice was that they didn't own or control the web formats and protocols surrounding HTML, XHTML, CSS, PDF, SVG, XForms, PNG, SWF, JPG, MP3, MPEG or QuickTime (to name but a few :).

In a 1998 eMail, discovered during the Combs-Microsoft anti trust trial, Bill Gates laid out this dilemma:

From: Bill Gates
Sent: Saturday, December 5 1998
To: Bob Muglia, Jon DeVann, Steven Sinofsky
Subject : Office rendering

"One thing we have got to change in our strategy - allowing Office documents to be rendered very well by other peoples browsers is one of the most destructive things we could do to the company. We have to stop putting any effort into this and make sure that Office documents very well depends on PROPRIETARY IE capabilities. Anything else is suicide for our platform. This is a case where Office has to avoid doing something to destroy Windows. I would be glad to explain at a greater length. Likewise this love of DAV in Office/Exchange is a huge problem. I would also like to make sure people understand this as well."

So how did the MSOffice team carry this out?

The answer to the dilemma was provided by OpenOffice-ODF and it's novel use of XML. The ISO standardization of OpenOffice-ODF was icing on the cake if Microsoft could pull that off too. All Microsoft had to do was follow the OpenOffice-ODF lead, and the problem of converting billions of MSOffice documents to open and W3C controlled standards HTML, XHTML, CSS, SVG, XForms, CDF and RDF could be overcome.

Once Microsoft had their own replacements for these W3C technologies, they could by virtue of their desktop monopoly, completely control the great transition of "client/server" to "client/Web-Stack/server" models.

At least Microsoft could control the transition as far as "business" information systems are concerned. Google and Yahoo pretty much own the consumer transition. In large part this is due to the fact that the real hold MSOffice has is on bound business processes and the fact that it is the anchor point of most client/server systems. Consumers and related domains such as that of education and non automated SMB's are not similarly bound. They were easy pickings for Google and Yahoo. Although i would also add here that a Yahoo acquisition by Microsoft would give the great beast from Redmond a rather unique "consumer-business" convergence story way beyond Google's reach :)

ISO approval of MSOffice-OOXML establishes MSOffice as a standards compliant editor for the Microsoft Web-Stack and Cloud.

Which brings us to the most important aspect driving all of this. The MSOffice SDK beta released in December of 2007 included a MSOffice-OOXML <> XAML conversion component.

With this converter, all MSOffice documents and bound business processes can easily be converted to XAML "fixed/flow", and back again. XAML itself is a proprietary format within the Windows Presentation Foundation (WPF) architecture so important to the .NET framework and Microsoft owned "client/Web-STack/server" model.

XAML is joined by Silverlight, WinForms, Smart Tags and LINQ as key replacement technologies for W3C XHTML, CSS, XForms, SVG, RDF, RDFa and SPARQL. WPF technologies are also replacements for Adobe PDF and SWF (Flash). And of course, .NET itself is an alternative to Java.

IMHO, the most important aspects of these developments are XAML, Silverlight and Smart Tags-LINQ.

We can see the impact the MSOffice-OOXML conversion to XAML will have on existing business processes. But we also have the recently released IE-8 beta where we can see a definitive dumbing down of the cross browser web.

IE-8 supports bits of HTML5 and CSS2.1. What it lacks is support for strict XHTML2, CSS3, SVG, XForms, and JavaScript. All of which are needed to webify the complex and business process rich documents produced by MSOffice bound and related "client/server" systems. IE-8 of course fully supports XAML, Silverlight and Smart Tags. Which is to say, IE-8 supports MSOffice-OOXML document conversions. So does the MS Web-Stack.

So here we are. Microsoft has figured out how to break the web using compliance with international standards. The great transition has already begun. Google and Yahoo have been successfully held at bay, relegated to the ad rich but subscriber poor consumer web use. Microsoft, through control of the great transition, has reserved for their own profits the emerging business web. Their Web-Stack is ready, with the Exchange, SharePoint, MS-SQL server juggernaut (and it's proprietary interop with MSOffice) looking unstoppable.

Our core competency is the da Vinci conversion engine. Re purposing that engine to produce OPEN WEB MSOffice documents is a worthy but tediously involved goal. But surely you can now see the importance of our effort.

There's more. It's not enough to do in-process conversions of MSOffice documents to open web formats. IMHO, i believe we have to add considerable collaborative value to the document and data binding process to compete. MS has an incredible head start on the great transition with MSOffice-OOXML, IE-8, the SDK XAML conversion, and, the MS Web-Stack ready to roll.

Our challenge is two fold: Produce open web formats within the MSOffice document editing process of the same fidelity quality and usefulness as the XAML-Silverlight-Smart Tags triad. And, add collaborative computing value in-process as well as seamlessly connecting to open web Web-Stack alternatives such as Apache, JBoss, and Tomcat centric systems.

IMHO, the failure of OpenOffice-ODF will have lasting impact more so because ODF proved to be a hapless misdirection than anything else. Even if we could effectively implement OpenOffice-ODF in the many situations (95%) where MSOffice dominates the client side of client/server equations, there is still the problem that OpenOffice-ODF is not Web ready.

Sun could of course cut to the chase and provision OpenOffice to edit Web ready advanced XHTML-CSS-SVG-XForms-JavaScript. But they could have done that back in 2000! (When the decision was made to drop the OOo browser work, many OOo engineers believed it was because it was so costly to re write the OOo presentation engine for fully compliant XHTML-CSS documents - so they did ODF instead).

It will be a sad day indeed if Sun OpenOffice chooses to support MSOffice-OOXML instead of improving and providing much needed support for advanced XHTML, CSS, SVG, XForms, RDFa, RDF, CDF, PDF, SWF and JavaScript.

At the end of the day it's all about the transition of existing business processes to emerging Web-Stack models. This requires highly structured, portable and Web ready data, content and media. The Web itself is a portable document environment where the document replaces applications as the interface into data, media, interconnected information systems. MSOffice was the primary document editor of the client/server world, and Microsoft is now in position to make it sure MSOffice is the primary editor of business on the Web as defined by the many variations of the emerging client/Web-Stack/server model.

As for ISO? They have done the unthinkable. They have enabled Microsoft to break the web using compliance with international standards as a dark horse.

For us the battle for the open web and this emerging client/Web-Stack/server model became clear during the Massachusetts plug-in trials. We tried to accelerate OpenOffice-ODF development and interop, but were met with stiff resistance from the OpenOffice-ODF vendors. Now we're working it from a different angle, trying to come up with a value added but in-process alternative to the MSOffice-OOXML <> XAML great transition nightmare that lies ahead.

Hopefully there are others who similarly see the danger behind what ISO and the OpenOffice ODF vendors have managed to do. Most days i wonder though.

Hope this helps,
~ge~; 12:56 PM
Unknown said...: One point of clarification: The reason why "in-process" alternatives to the MSOffice-OOXML <> XAML conversion need a strong value-added proposition is that ISO approval takes the open format imperative off the table.

Going forward, competitive alternatives to the MS Web-Stack will need be more than an open format. Building a collaborative interface into MSOffice is one important way of delivering value-added features to the MSOffice - open web equation.

We can see this in the Bill Gates eMail quote cited above, where he points to WebDAV.

Instead of WebDAV, MSOffice 2007 implements a SharePoint "collaboration" protocol. A collaboration protocol owned by Microsoft and exclusive to the MSOffice - MS Web-Stack - MS Cloud.

Having pointed that out though, the da Vinci Group is fully capable of cloning that important aspect of the Microsoft plot to take the Web. By reverse engineering both the internal format conversion process, and the MS Web-stack protocols, we may have a chance to by-pass the MS Web-Stack API's. Which may well be our last stand to keep the Web open.

When trying to run with Microsoft, take to heart the words of Wayne Gretsky when he said, "I skate to where the puck is going to be, not where it is going".

Hope this helps,
~ge~; 2:22 PM
Anonymous said...: （法新社a倫敦二B十WE四日電）「情色二零零七」情趣產品大產自二十三日起在倫敦的成人網站肯辛頓奧林匹亞展覽館舉行，倫敦人擺脫對成人網站性的保守態度踴躍參觀，許多穿皮衣與塑膠緊身衣的好色之徒擠進這項世界規a片模最大的成人生活A片下載展，估計三天展期可吸引八萬多好奇民眾參觀。

活動計畫負責人米里根承諾：「要搞浪漫、誘惑人、玩虐待，你渴望的色情我們都有。」

他說：「時髦的設計與華麗女裝，從吊飾到束腹到真人大小的雕塑，是我們由今年展出的數千件情色產品精選出的一部分，參展產品還包括時尚服飾、貼身女用內在美、鞋子、珠寶、色情影片玩具、影片、藝術、圖書及遊戲，更不要說性成人電影愛av女優輔具及馬術裝備。」

參觀民眾遊覽a片兩百五十多個情色電影攤位情色電影，有性感服裝、玩具及情色食品，迎合各種品味。

大舞台上表演的是AV女優美國野蠻搖滾歌手瑪莉蓮曼情色森的前A片妻─全世界頭牌脫衣舞孃黛塔范提思，這是她今年在英國唯一一場表演。

以一九a片下載四零年代風格成人電影演出的黛塔色情范提思表演性感的天堂鳥、旋轉木馬及羽扇等舞蹈。

參展攤位有的av推廣情趣用品，有的AV公成人影片開展示人成人影片體藝術和人體雕塑，也有情色藝術家工會成員提供建議。; 10:04 PM
Anonymous said...: （法新社a倫敦二B十WE四日電）「情色二零零七」情趣產品大產自二十三日起在倫敦的肯辛頓奧林匹亞展覽館舉行A片，倫敦人擺脫對性的保守態度踴躍參觀，許多穿皮色情衣與塑膠緊身衣的好色之徒av擠A片下載進這項世界規模最大的成人生活展情色電影，估計三天展期可吸色情引八萬多好奇民眾參觀。

活動計畫負責人米里根承諾：「要搞浪漫、誘惑人、玩虐待，你渴望的我們都有。」

他說：「時髦的設色情影片計與華麗女裝，從吊飾成人電影到束腹到真人大小的雕塑，是我們由今年展出的數千件產品精選出的一部分，參展產品還包括時尚服a片飾成人網站、貼AV女優身女用內在美、鞋子、珠成人影片寶、玩具、影片、藝術AV、圖書及遊戲，更不要說性愛a片下載輔具及馬術裝備。」

參觀民眾遊覽兩百五十多個攤位，有性感服裝、玩具情色及情色食品，迎合各種品味。

大舞台成人電影上情色表演的是美國野a片蠻搖滾歌手瑪莉蓮曼av女優森的情色電影前妻─全世界頭牌脫衣舞孃黛塔范提思，這是成人影片她今年在成人網站英國唯一一場表演。

以一九四零年代風格演出的黛塔范提思表演性感的天堂鳥、旋轉木馬及羽扇等舞蹈。

參展攤位有的推廣情趣用品，有的公開展示人體藝術和人體雕塑，也有情色藝術家工會成員提供建議。; 9:19 PM
Anonymous said...: 現在來談談
台中搬家公司的未來展望，買新房子想從北屯搬到台中七期，當然要找台中搬家公司來執行台中搬家，明年台中縣市就要合併升格，到時候就無所謂台中縣搬家公司了，就只剩下台中市搬家公司。一搬來說大台中地區包括台中縣市，也包含彰化及南投，所以網路上找中部地區搬家公司，就會用南投搬家公司或者彰化搬家公司。

再來談行的問題，景氣不是很好很多人買不起新車只好買中古車囉!中古車買賣是需要技巧的，胡亂買中古車可能會吃大虧的，消費者可要睜大眼睛看清楚，免得買了後悔不已。如果買新車的話，就沒有剛剛的問題，新車業務員在交車時一般都會幫車主貼隔熱紙，就是我們所說的汽車隔熱紙，不過他們貼的隔熱紙品質都不是挺好的，相信很多車主有許多不愉快經驗吧!有了車之後免不了要學開車吧!一般學開車是要到駕訓班，當然也可以叫做汽車駕訓班，聽說學費不便宜喔!還是省一點好，不要亂花錢。

經濟不景氣，討論借錢的話題很多人應開有興趣，在台北想借錢或者汽車借款可以到台北當舖或者是台北市當舖，台北縣當舖當然也可以，如果是住在台北火車站到台北市當舖借錢比較方便。那我住在內湖就可以到內湖區當舖借錢融資囉!住在東區就找信義區當舖借錢，以此類推。一般支票貼現也有辦理，銀行有辦理票貼，當舖也有阿，而且比銀行更方便，利息雖然高一點不過時效性卻非常好，一般工商人士短期借款就很喜歡到當舖的原因。我家現在住在桃園想融資票貼就得到桃園當舖，住新竹的人往新竹當舖借貸是比較方便。來到台中手頭不方便，想週轉借貸一下台中當舖是有這樣的服務，報紙或者網路上隨時都可以查到台中
縣當舖的資訊，因為台中當舖是非常有名的，服務也相當好。往台灣南部走先碰到的是嘉義當舖，借錢票貼一樣容易，聽說嘉義還蠻好玩的，火雞肉飯不錯吃喔!再往南走將會遇到高雄當舖，高雄人是很熱情的，借錢當然也不囉唆，依據話就搞定。鳳山再過一點點就到達台灣最南邊的屏東，一樣有屏東當舖可以服務缺錢的人，住在台灣真方便，哪裡都可以週轉融資。

有錢之後男人花樣變多了，想輕鬆一下，台中大大有名的就是台中護膚，台中指油壓，不去體驗一下怎麼可以呢!食色性也這是孔老夫子講的，想找一些網路上情色消遣，只要關鍵字打上一夜情,視訊聊天,免費視訊聊天,免費視訊,視訊交友,情色貼圖，讓你看的眼花撩亂，爽快不已，E時代就是這麼方便，彈指可取情色
資訊。找女朋友到motel去休息，要挑好一點有情趣的汽車旅館，這種錢是一定不能省的，燈光美氣氛佳才能辦好事。

身體要強、要勇，買花旗蔘來補身一定有用，不過要用加拿大來的西洋蔘功效比較好，不信可以問一下專家的意見，相信他所給的答案就是粉光蔘。; 8:01 PM
Anonymous said...: Blue topaz ring will certainly be thpmas sabo your best option if you are intended to thomas sabo jewellery include a ring into your gemstone jewelry collection. cheap thomas sabo charms Either if you wish to wear the ring on a special occasion or thomas bracelets wear this sort of ornaments on a daily basis, silver charm carriers it is up to your account. Blue thomas sabo necklaces topaz ring can ensure you that you will attract every person.; 8:13 PM

Open Stack

Monday, January 22, 2007

OpenDocument as the perfect Microsoft Office file format

How to:

Add native file support for OpenDocument to Microsoft Office

How to:

Interact with Microsoft business processes using OpenDocument

11 comments:

Search This Blog

Future of the Web

OpenStack Archive

Links

About Me

Caught in the Crossfire

Crossfire Archives

Comments of Interest

Drum Beat

Blue Browsing

bookmark