What is the 'Custom XML' feature in Word?

About the feature that Microsoft has to remove from Office 2007.

Just what is the ‘Custom XML’ feature that Microsoft has to remove from Office 2007?

The Office 2007 document formats (docx xlsx pptx etc.) are stored in XML format which looks like the HTML used in web pages. For example here’s a bit of a .docx file that describes a single paragraph

Office Watch

Word takes the XML and makes it into the document that you see and print. Normally you’d never have to worry about the innards of a document; suffice to say that the Office 2007 formats are far superior to the old .doc formats for many reasons Office Watch has talked about for years. (ie. the changeover hassles are worth it).

One advantage for larger businesses is that it’s much easier to write programs to edit an Office 2007 format document compared to the older formats. XML is very commonly used (it’s not proprietary to Microsoft) and there are plenty of developer tools available.

In fact it’s quite possible to write a program to make an entire document or spreadsheet without using Microsoft Office at all. A company might do that to take information from a database and compile it into a document to send to staff.

But all that is standard XML, part of the specification for Office 2007 documents that Microsoft has worked on for many years.


‘Custom XML’

Programmers can add additional lines of hidden information to any Office 2007 document. One of the beauties of XML is that anything the reading program (eg Office) doesn’t understand, it ignores.

This allows companies the option to add more information to documents that Office itself doesn’t need, but can be scanned by an external program.

For example here’s a simple line of Custom XML

Fred Dagg

 

This line identifies the customer that the document relates to, whether it’s a letter, invoice, spreadsheet or presentation.

With that single line, a program can scan thousands of documents across many computers in a company to find all the documents that involve a particular customer.

That’s just a simple example. Another is invoices – while it might seem simple, it can be hard to find a piece of data like an invoice number inside a document. Just one mistyped character can ruin a standard document search. But if the reference is in a fixed form that computers can understand then the job is a lot easier.

Invoice: 123456 
123456

The first line is read by Word and displayed in the document.

The second line is Custom XML which is the same information but in a structured format which is easy to find.

The two can be combined like this:

Invoice: 123456 
 

That idea can be extended to an entire invoice with all the details, product codes, prices etc all tagged with Custom XML.

You don’t need to program these extra tags.  You can insert them directly into a Word document from the Developer tab.

Word 2007 - Custom XML example image from What is the

We can hear programmers gnashing their teeth already and preparing to fire off an angry email to us … chill out. This is a very simple example of Custom XML that could be duplicated in Office via custom fields. In the real world Custom XML in Office 2007 documents is much more complex.


What removing Custom XML will mean

Microsoft is producing (against its will) a version of Office 2007 without the Custom XML feature.

This means a document with Custom XML code will be read by Office but as soon as you save the document, all the Custom XML code will be removed. At present Office 2007 ignores Custom XML when rendering the document but preserves it and saves the Custom XML back to the document (called ’round tripping’ by Microsoft).

The result of the patent decision is that Microsoft is required to remove the Custom XML features from Word 2007. The specific process used by Microsoft for dealing with Custom XML has been found to breach the copyright of another company. Microsoft says that Office 2010 has the Custom XML feature, implemented in a way that doesn’t breach the patent.

Custom XML is an important part of Microsoft Office for large organizations who have invested a lot of time and money on systems that rely on it. Despite recent downplaying of the feature by Microsoft (‘little used feature’ has become the standard line in recent weeks), Custom XML is important to large and influential customers. The feature was and is one of the selling points when pushing Office 2007.

We believe it’s better to use the main, unaltered, Office 2007 rather than the unknown of a crippled version, even if that crippling might not seem to affect you.