Skip to content

The Case for PDF

If you’re going to send a .doc file to someone, or post it on the Web, seriously consider converting it to PDF before you let it go.

Three years ago, I recommended that you convert your documents to PDF format before handing them out. I’ve mentioned PDF a dozen times since then, extolling its virtues, even for those of us who live and die by Word.

In short, distributing Word .doc files can lead to all sorts of embarrassing situations. If you’re going to send a .doc file to someone, or post it on the Web, seriously consider converting it to PDF before you let it go.

Word stores a lot of junk in .doc files, and you can get bit – bad – if you let a .doc file with a checkered past out of your grasp. Recently we’ve seen an instance where flotsam and jetsam in a Word .doc helped to change the course of politics in the UK (details here). Microsoft was embarrassed by junk in a Word doc when a bogus “Mac to PC convert” was found to be an employee of a PR agency hired by Microsoft to pull the wool over unsuspecting eyes (details here). There are many, many more examples. (Heck, back in February 1997, I was infected by the first Word 97-specific macro virus, W97M/Wazzu.A, by opening a document that I download from Microsoft’s Web site. Microsoft posted an infected marketing .doc file. PDF files don’t contain macro viruses!)

Word isn’t the only culprit: Outlook 2002 can “brand” .docs with personally-identifiable information, too. Outlook doesn’t have the temerity (or, it would seem, the brains) to brand PDF files. But Word documents sent attached to email messages using Outlook 2002 pick up a lot of potentially embarrassing information (details here).

Simon Byers, a security researcher at AT&T, recently downloaded 100,000 Word docs from the Web and found all sorts of hidden information (details here). Byers used automated tools, but if you want to look at the text buried inside a document – or the log of all the people who made edits to a Word 97 or 2000 document – all you have to do is click File | Open, Click File | Open, in the Files of Type box, choose Recover Text From Any File, then navigate to the file and open it. Pay special attention to the text at the end of the file.

WHAT IS PDF?

A PDF file isn’t anything at all like a Word document.

A Word .doc contains all sorts of things: text, formatting, macros, revisions, histories, links to other files, histories of links to other files, and heaven-knows-what-else. A PDF (“Portable Document Format”) file is more like a snapshot of a printout: a representation of what the printed document should look like.

Adobe Systems – the company that truly revolutionized desktop publishing – was founded in 1982 by Chuck Geschke and John Warnock, two engineers at Xerox’s famed Palo Alto Research Center. Adobe first developed and then popularized PostScript, a language that describes how text should appear on a page. PostScript has many virtues, but most of all it’s “device independent”: a PostScript file should print precisely the same way on any printer, no matter what printer, no matter which operating system, no matter which computer originated the file.

PDF is based on PostScript, with a couple of important extras. First, a PDF file can contain fonts, so if you create a PDF file with WoodrowGothic 17-point bold, you can be sure that whoever prints the file will see precisely the fonts you intended. Second, PDF compresses everything on the fly, so there’s no need to independently compress, say, picture files that are embedded in the document.

Automatic compressing may sound like a nit, but it isn’t. In a previous article, I mentioned a 2.6 MB file that Microsoft posted here. If the Microsoft employees who posted that file had simply used Word’s own “compress” capability, they would’ve turned a 2.6 MB download into a 1.1 MB download. (To compress all the pics in a word document, right-click on any picture in the doc, choose Format | Picture | Compress, choose the compression you want, and click OK.) That’s a huge savings, both for Microsoft’s servers and for Microsoft’s customers – but the ‘Softies who posted the doc didn’t bother. If they had been using PDF, no separate compression would’ve been necessary. (Thanks to long-time Office Watch reader BM for pointing out that doc to me!)

With PDF, all of that is automatic.

PDF works. It’s become ubiquitous. Even the U.S. Internal Revenue Service uses PDF for all of its forms.

And yes, even Microsoft uses it. The ‘Technology Guarantee’ forms for people wanting a cheap upgrade to Office 2003 are in PDF form for the various countries around the world.

WORKING WITH PDF FILES

Adobe gives away its PDF viewer/printer. Free. The Adobe Reader, as it’s called, can be downloaded here. The Readers is a remarkably stable piece of software, and I install it on all of my machines, without hesitation. You should, too.

Of course, Adobe isn’t giving away the Reader out of the goodness of its heart. They want to sell you Adobe Acrobat, the program that lets you create, search, and modify PDF files (for about US $450). Acrobat also lets you create fill-in-the-blanks forms, which can be filled in by anybody with the free Reader. More info about Acrobat can be found here.

There are many, many programs that create files in PDF format: PDF is a well-documented file format specification, and (according to Adobe), 1,800 companies now make products that use the PDF format.

One of our long-time advertisers, Document Automation Developers, sells a program called MakePDF (about US $40) that lets you save a Word file directly in PDF format – you don’t need Acrobat; it’s as simple as clicking on a Word menu. If you don’t need all of Acrobat’s features, and prefer to work in Word (or Outlook, Notes, Eudora, and more) click here for more information.


PDF IN THE FUTURE

You’re going to hear PDF mentioned more often, as an alternative to Word .doc files. There are just too many Word documents with embarrassing hidden information running around. Comparison with Office-generated XML files is inevitable (although the technology is entirely different). At this point, PDF is a simple, cheap, reliable ubiquitous alternative. Whether XML will ever reach that stage remains to be seen.

 

About this author