Word for Windows has three options for saving a document to a web page, Web Page, Single Web Page or Web Page, Filtered. Here’s a look at all three and how to choose the best option for you.
What you choose depends on how much similarity you need between the Word document and the web page plus how much editing you want to do with the web code/HTML later.
In theory, Word for Windows can make web pages directly. But in practice the HTML code Word makes is very messy and full of extra formatting code that’s not needed on a public facing web page. At worst, a Word created web page is unreadable or doesn’t appear as you’d like.
Microsoft assumes customers need full fidelity with the web page appearing as close as possible to the Word document. However, that’s not always what people want because a ‘full fidelity’ conversion includes a lot of messy HTML code that make editing a nightmare.
For example, a simple one-page Word document with one image converts to a web page with over a thousand lines of HTML before the first visible letter of the document. The image comes with many lines of additional code like all this to support a simple image.
In many cases, users need a conversion which retains the major formatting (headings, bold, italics etc) but not the fine-grained detail.
We have a little sympathy for Microsoft’s dilemma with Word to Web conversion. Some users might want a full fidelity conversion. At the other end of the scale, a very basic conversion with only major styles and basic formatting (bold, italic etc) included. Between those two extremes are many different expectations.
Which is why we suggest, wherever possible, to avoid Word’s own Save As to web page options.
Better Word to Web Page conversion
The easier way to convert Word to web page is using any tools available in the web editor to convert a Word document (or pasted content from a Word doc). Check whatever CMS or software you’re using to see what’s available.
Simple copying from Word and pasting into the web editor works better than any Save As … option from Word itself. The receiving software should be setup to convert Word formatting into something usable in a way that Word’s export cannot handle.
For example, Microsoft Sharepoint Designer 2007 (free and still commonly used) has Tools | Optimize HTML | Word HTML option to remove a lot of the excess Word code. Not all, but most.
WordPress, the popular online publishing system will remove all the formatting extras when pasting from Word (turn off ‘Paste as Text’). There are also plugins to convert whole documents into WordPress pages or posts.
Save As to web page options
Go to File | Save As then look down the list to see three web page options. They have been in Word for Windows for many years.
Web Page (.htm or .html)
A full fidelity conversion to HTML code. Images and other files are saved to a sub-folder.
Single File Web Page (.mht or .mhtml)
Makes a single page with images bundled into that file. Handy for pages with pictures because there are no sub-folders or other files to deal with. A full-fidelity conversion.
Web Page, Filtered (.htm .html)
Web Page, Filtered is the choice if you want fairly simple HTML. Even so there are problems as we note below.
The different extensions (eg .htm or .html) are a matter of choice. They make no difference to the exported file.
Word for Mac
Word for Mac only has one option: Single File Web Page .mht or .mhtml
The whole shebang, a full HTML rendering of the document with a lot of HTML tags and code. It’s very complex HTML that’s hard to edit or simplify.
Any images and some other formatting files are saved to a sub-folder and linked from the main web page. Word saves the .htm file plus a sub-folder like this:
In the sub-folder are images and some other related XML files.
IMPORTANT: If you move the .htm/html file created by Word, make sure you move the matching sub-folder as well
Single File Web Page .mht or .mhtml
The annoyance of the sub-folders extras caused Microsoft to create a single package file format: .mht or .mhtml.
It’s full fidelity HTML with all the Word tag accoutrement.
Everything, including images, is saved in a single large file. Images are web-encoded in a similar way to how pictures are included in HTML emails.
MHT/MHTML format is supported by Internet Explorer and partially by other browsers like Google Chrome. Incredibly, .mht will not work in Microsoft Edge… yet another reason to avoid this ‘modern’ browser.
Web Page, Filtered
If you’re looking to convert a document into a standard web page, this is the one to choose.
A lot of the extra Word XML is stripped out leaving the text and some formatting (text size, bold, italics etc).
Like the Web Page option, there’s a subfolder with images and other files.
But there are still annoyances which make further editing a right pain. Word Page, Filtered doesn’t convert bullet points or numbered paragraphs into the HTML <ul> and <ol> tags.
For example, here’s some Word bullet points converted by Web Page, Filtered.
There are specific indents and the bullets are added with <span> tags. Crucially, there’s no vital <ul> Unordered List and <li> tags that would make later editing much easier.
A simpler and more effective ‘Filtered’ conversion of the same list might look like this.
Other Word to HTML conversion options
There are various web sites that offer ‘Word to HTML’ conversion. We’re always wary of online/cloud services because they can save your private documents for later use.
That said, you might want to try one of these sites if the Word Save As … options don’t suit you.
Office Watch has the latest news and tips about Microsoft Office. Independent since 1996. Delivered once a week.