Skip to content

Hidden information in Microsoft documents

Word hides important, personal information inside its documents. You can get rid of some of the information most of the time – but it’s very, very difficult to delete potentially embarrassing information all of the time.

I’m astounded at how many people just don’t get it. Word hides important, personal information inside its documents. You can get rid of some of the information most of the time – but it’s very, very difficult to delete potentially embarrassing information all of the time.

THE PERSONAL INFO HARVESTING SHTICK

Man, if Microsoft can’t get it right, how can you?

The folks in Redmond continue to post documents with all sorts of internal details on their Web site. While I haven’t found any earth-shattering anti-trust-busting bits of “metadata”, the stuff I have found leaves me wondering if anybody can get it right.

We’re going to show you just how easy it is to publish Word documents with information you might not want others to see. We’ll do that by taking examples from Microsoft itself.

Having shown how even the supposed Word experts can get trapped, in future issues of Office Watch and Office for Mere Mortals we’ll show you and Microsoft how to publish just the document and no more.

In a recent article in Office for Mere Mortals, I talked about two documents with embarrassing embedded data. One contributed to the downfall of one of England’s most influential politicians. The other exposed a Microsoft dirty trick.

An Office for Mere Mortals reader pointed me to an entire collection of documents posted by one state’s Supreme Court. I didn’t see anything particularly damning in the documents, but they’re strewn with names and email addresses of clerks, law firms, and individuals; file locations, server names, and so on – a few hours’ worth of harvesting could lead to a credible blueprint of sections of this Supreme Court’s word processing system.

Worth noting: few (if any) US federal agencies – from all branches of government – post Word documents on the Web any more. Everything from the White House to the CIA to the US Supreme Court appears to be in PDF. Bravo.

AT&T researcher Simon Byers has a report on the hidden data problems facing the Word-using world today – all 400,000,000 of us. You can download it here. One part of his conclusion really hits home:

“…typical behavior patterns of Word users and the default settings of the Word program leads to an uncomfortable state of affairs for Word users concerned about information security.”

This isn’t strictly a voyeuristic exercise. When you leave dribs and drabs of information floating around on the Web, there’s no telling how it can be used. I would guess that a dedicated cretin with a fast Internet connection could come up with a working roadmap to parts of Microsoft’s development and marketing networks, just by looking at the flotsam and jetsam buried in readily available documents – documents posted on Microsoft’s own Web site.

To recap, if you use Word 97 or 2000, Word maintains a detailed log of who has edited the document, and where it was located when it was opened – and there’s nothing you can do about it.

If you use Outlook 2002 (the version in Office XP), and you send a document by attaching it to an email message, Outlook brands the document with the email address, name, and a number that can be traced to the PC that was used to send the file (although you need access to the PC to nail it for sure). It also brands the document with the subject of the email message that carried the file.

If you explicitly tell Word 2002 to remove personally identifiable information (Tools | Options | Security, check the box marked Remove Personal Information From File Properties on Save, and uncheck the box marked Store Random Number to Improve Merge Accuracy), and you send the document with Outlook 2002, Outlook still sticks the number that can be traced to your PC inside the file – the _AdHocReviewCycleID.

I’m very happy to report that Outlook 2003 seems to be doing it right. Finally. Telling Word 2003 to remove personally identifiable information is sufficient, in a default installation of Outlook 2003, to keep any personal info from being “branded” onto a doc when it’s sent attached to a message.

Microsoft’s Knowledge Base talks about the kinds of data that can be squirreled away in Word documents, and gives some tips for removing that data (when it’s possible). But the simple fact is that most people, most of the time, don’t bother.

  • Word 97 discussion

     

     

  • Word 2000 discussion

     

     

  • Word 2002 (Office XP) discussion

An article from Frank Rice on the Microsoft Web site gives an excellent overview of the problems and some solutions. Microsoft still hasn’t posted a similar article for Word 2003, but I noticed that they updated one of their key personal info articles for Office XP, KB 223396, just last week.

A SIMPLE BIT OF INTERNAL DETAILS

Microsoft posts Word documents all the time, and many of those docs include “metadata” with information that identifies individuals inside the company.

For example, the Licensing 6.0 changes for the Home Use Program in the “Software Assurance Customer Guide” posted here was originated by David Lasky. He sent it out for review in an email message with the subject line “Change to HUP in Customer Guide & Website”, and he was either using Outlook 2002 or 2003 at the time. Darrell Craig edited the file.

It’s easy to confirm all of that: just download the file, open it in Word, and click File | Properties | Custom.

A MORE COMPLEX EXAMPLE

Microsoft’s documents on the Web can contain much more than simple personal information like that detailed above.

For example, there’s a document on Windows 2003 Server Virtualization posted here. It was written and edited by “Judith Bloch (Independent Contractor)”. She sent it out for review in an email message with the subject line “VM release anticipated to today at noon — press release link needed for Microsoft.com product page”.

The template Judith used was located at PonlineomvssShadowWindowsWNETServertemplatesWindowsServer2003Template.dot. (Remember six months ago when we had such a brouhaha about how difficult it would be to discover the names and precise locations of templates and documents?)

Ah, that’s not all. There was a fellow named Alfredo Pizzirani, involved in the reviews. Michael Kessler, Jane Dow and someone named seaton also made changes to the document. At least one of the pictures in the document was created in Adobe Photoshop.

I sure hope MS has its Photoshop licenses up to date.

A SMOKING GUN?

Microsoft’s Web site contains a document that says it’s an analysis of Office XP vs. Office 2000, from the “American Institutes For Research” – “AIR Project No. 01674.001” it says (AIR must have a lot of projects, eh?)

As you might expect, the report comes to the conclusion that Office XP is vastly superior to Office 2000, and customers should immediately shell out the bucks and shekels and dinars to make the upgrade.

Funny thing, though. If you look inside the document, you’ll see that it was edited by Nicole von Kaenel. At the time, Nicole was one of the Product Managers for Office XP – a Microsoft employee. It was also edited by someone named Lydia V, who appears to be an AIR researcher. Dennise Heitkemper changed the document, too. I Googled her, and found the title “Microsoft Partner Development Manager”.

It certainly appears as if Microsoft got to make changes to the AIR report before it was published.

The file itself was last sent attached to an email message from Eric Ligman, with a Subject line of “Value of $$$… Strong ROI”.

Hey, draw your own conclusions! Download the file (if Microsoft hasn’t yanked it already), click File | Open, in the Files of Type box, choose Recover Text From Any File, and open it. Look down at the end of the file for all of these details.

OTHER EXAMPLES

Lest you think these are isolated examples, they aren’t. In fact, Microsoft has a hell of a time reliably wiping personal information off of Word documents before they get posted to Microsoft.com.

There’s a document on Windows 2003 scalability posted here. It was also written and edited by judithb. The template she used is at C:Documents and SettingsklsMy DocumentsClientsMicrosoftAndiStarkNetProj2002Windows.NETServerTemplate.dot.

There’s a document on migrating Windows apps to Windows 2003 posted here. It appears to be from the same person or company (“KLS”??), although I couldn’t find any direct references to judithb. The template is at C:Documents and SettingsklsMy DocumentsWORDTEMP32xMSWinnetServerWPProject.dot.

Apparently Sara Snyder and Matter of Fact Communications wrote the “Case Study” of a wireless implementation at Calpine Corp (posted here). Although neither Sara’s name nor “Matter of Fact” appear in the report, references to both are hidden inside the file. Marc Strauch wrote or edited the case study about Epicor (posted here). Again, his name doesn’t appear in the report, but it’s buried in the file itself.

I’ve seen many dozens more.

THE LATEST – OFFICE 2003

What, you aren’t impressed?

Let’s take a look at the crop of Office 2003 documents. Download Microsoft’s latest Office 2003 Reviewer’s Guide (also called the “Office 2003 Editions Product Guide”) from here.

Open the document, click File | Properties | Custom, and you’ll find that the doc was last sent by email from Noelle Robertson,, at Sakson & Taylor, in a message with the subject “Office 2003 Product Guide — access issue with ftp”. “Access issue with FTP”? Hmmmm…

If you open that doc using Recover Text From Any File, you find that Noelle stored the document at C:My Documentsa_SalesMS_DADBrokenOut_docsOffice2003_ProductGuide_WP.doc , and that she crashed Word at least once while editing the document, retrieving a copy from C:Documents and SettingsNoelleRApplication DataMicrosoftWordAutoRecovery save of Office2003_ProductGuide_WP.asd . The document was also changed by KarenJ, ShannonS, and Jeune Ji (a Product Manager for Office, who apparently edited it on September 18, 2003).

What worries me: these people should be the most knowledgeable – and most cautious – Office 2003 users around. They’re using the latest technology, and they’re certainly sensitive to the fact that their documents can be viewed by anybody. Brad Sills left his name inside the Office 2003 Licensing document posted here. Scottxx edited the Microsoft Office Live Communications Server 2003 Deployment Guide posted here. Ramkumar Dixit and Marc Strauch left their names hidden inside the Outlook 2003 Reviewer’s Guide (posted here), which was based on the template C:Documents and SettingsPatricia RuffioApplication DataMicrosoftTemplatesBV_WP_StyleTemplate.dot . And on and on.

(I also note with no small amount of glee that the poor ‘Softies who wrote the Publisher 2003 Reviewer’s Guide, got bit by the Word 2002/2003 style naming bug/inanity which produced styles called (I kid you not) Body Text, Body Text Char1, Body Text Char Char, Body Text Char1 Char Char, Body Text Char Char Char Char and Body Text Char, Body Text Char1 Char, Body Text Char Char Char, Body Text Char1 Char Char Char, Body Text Char Char Char Char Char among others. The Outlook 2003 Reviewer’s Guide mentioned earlier has a deleted style called F Char Char Char Char Char Char Char Char Char Char Char Char Char Char. I feel your pain, folks!)

Microsoft employees and contractors forget to wipe all the personal information from their files before they get posted on the Internet. How can you expect your employees, co-workers and friends to do any better?

Get a clue: post in PDF.

POST IN PDF

When the time comes for Microsoft to post a document on the Internet, they’ll post it in Word format, right?

Welllll… no, not always.

I was surprised when I downloaded the coupons for the Office 2003 “Technology Guarantee” program – the one that offers you a free copy of Office 2003 if you buy one of the qualifying versions of Office XP that are available now.

The coupons – you guessed it – are all in Adobe PDF format.

A very quick trip through Google brings up thousands of PDF files on microsoft.com: an Office XP flyer for students, an Office XP deployment guide, a Visio brochure. There’s even a new OneNote marketing document that’s in PDF format.

I guess eating your own dog food can only go so far.

Oh. No, I won’t tell you which state’s Supreme Court posts documents laden with personal information on the Web. I’ll leave it to them to figure it out.

EXECUTIVES IN GLASS HOUSES

Before Microsoft execs rush off to pillory the Softies named in this issue, they should remember that these are just random samples of docs all over the Microsoft web site.

It’s an indication of a much wider problem not the errors of a few isolated staff (I’m anticipating a common Microsoft PR defense here).

A more careful search may well find public documents with the names of more senior Microsoft executives embedded in them. People in glass houses …

In order to make the point about the lack of privacy in Word documents we’ve taken the rare step of naming individual Microsoft staff – something we rarely do. Note that we have NOT posted the email addresses of those people and others – even though they are usually available in the documents we’ve cited.

About this author