Skip to content

More on the Office 2007 doc formats

Containing our series on the Office 2007 formats.

OFFICE 2007 DOCUMENT FORMAT

We continue our series on the Office 2007 document format with a look at how open the new format will really be. Some good news about the password and encryption integrity of the new format plus our opinion of the whole shebang.



HOW OPEN WILL IT REALLY BE?

There’s all manner of ‘open’ formats and code. Microsoft says that the formats are ‘open’ but that isn’t enough for many proponents of the truly open source movement.

The technical details of the Office 2007 formats will be publicly available but that doesn’t mean that you can do anything with that information. If you write code to read or write documents in the new formats then that code has to be attributed to Microsoft – even though you wrote every line of code. That might seem fairly reasonable but it doesn’t fit with the popular concept of General Public License.

The GPL requires any code to be not attributed to a particular company. The code is added to the public pool of source code for that program – free to be amended and adopted by others. The Microsoft license for their Office 2007 formats would seem to prevent that. If that’s true, then GPL open source projects would not be able to include any compatibility with the Office 2007 documents formats – not for technical reasons but legal ones.

Microsoft argues that it is their R & D time and money that’s been put into the development of these new formats. Though they are making them freely available, the company feels that it’s reasonable to expect anyone making use of the formats to acknowledge the source of the technology (even though no money is changing hands).

There are some question about whether the Microsoft license for Office 2007 formats is enforceable, but I doubt anyone in the open source world has the money or stomach for a court battle with Redmond.

In addition the Microsoft proposed formats are not compliant with the OASIS [ Organization for the Advancement of Structured Information Standards] OpenDocument file format. This may make some companies wary of adopting the Office 2007 formats.

PASSWORD & ENCRYPTION

The ZIP format has password and encryption options but thankfully, Microsoft will not be using them in the new Office 2007 document formats.

Any password protection or encryption is done by Microsoft before the compression into a DOCX etc file. The data is encrypted separately and ZIP is only used to compress and store the resulting data.

This is a good choice because ZIP isn’t sufficiently secure for many purposes and there are plenty of ZIP ‘breaking’ tools out there.


COMPRESSING A DOCUMENT OUTSIDE OFFICE 2007

For developers there is the intriguing possibility of making an entire Office 2007 document without using Office at all. I’m not saying that it would be easy but with XML components and ZIP compression it would be very possible.

The XML components would have to be created with all the necessary linkages then compressed using any standard ZIP tool. Microsoft’s testing is including trying the new formats with compression done by many external compression tools.

WHY WORRY ABOUT CORRUPTION?

There’s a lot of talk about the robust nature of the new formats.

Some people have expressed concern that Microsoft’s focus on recovery of data from damaged files is an indication of their ‘secret’ knowledge that Office or Windows are somehow faulty.

We don’t accept that. The mentions of data recovery are an acknowledgement of a computing reality – data does sometimes get corrupted regardless of the software or operating system.

One advantage of the new format is that splitting documents into individual components (albeit transparent to the ordinary user) means that each piece is checked for accuracy separately.

If a piece is ‘bad’ the rest is untouched and can be recovered much more easily than at present.

GOOD OR BAD?

Generally the announcement of the improved XML based formats is a good move. Microsoft is telling their customers well ahead of time what they have in mind. It’s understandable that the company is accentuating all the positives but there’s much to like in the Office 2007 format.

The compressed nature of the new format makes emailing documents much easier – just send the document. For large companies there could be savings in network bandwidth (though any benefits may well be erased by other technology changes).

The separate XML components and integrity checks on each piece should make file recovery more reliable than the current system.

The openly published format, while not meeting standard open source requirements is a step forward from the current, mostly unfathomable, binary format.

It is important to remember that the format change is optional and won’t make any direct difference to most people. All Office users save documents without worrying about the details of the document format and that will continue in Office 2007.

Microsoft has promised add-ons for Office 2003 and some previous versions of Office to let you read and save in the new formats with older versions of Office.

As usual with these announcements there’s a sting or two in the tail. Exactly how many stingers there are, where they are and how big will be a story for the next twelvemonth.

Naturally Office Watch will be watching closely.

About this author