Skip to content

A cool way to check file types including Office docs

There’s another really cool way to check what a file really is, even if the file extension is missing or wrong.  It works for many file types including Word, Excel and PowerPoint documents, both old and new formats as well as password protected docs.

Thanks to Office Watcher, Bill G. for reminding us on this useful Linux feature.

You’ll need a Linux computer, a separate or virtual machine. Most likely a Windows Subsystem Linux (WSL) machine that’s available in Windows 10 or 11.  We’ve used Ubuntu in WSL2 on Windows 11 for these examples.

The File command in Linux

The File command in Linux checks the contents of a file and reports what type it is. Unlike Windows which relies on the file extension like .docx .mp3 etc.

We made a series of test files, mostly Office documents, without extensions to show how File works on mystery files.

Let’s start with MP3 and PDF files with no extension.  As you can see File reports not just the file type but some details like MP3 quality and PDF version.

A text file result looks like this, showing the encoding type:

Checking Office documents with File

Here are the results for ‘modern’ Office document formats in Word (.docx), Excel (.xlsx) and PowerPoint (.pptx).

File looks inside and shows the contents are the latest formats used in Office 2007 and later.

Password Protected Office documents

File can’t help much with an encrypted Office document without knowing the password.

All it shows is “CDFV2 Encrypted”.

As we explain in Beating Bots, Spies & Cock-ups – Safely & securely send files and documents (chapter “Microsoft Office documents”) there’s more clear text information available in an encrypted Office document.

Inside ‘old’ Office documents

The File command also works on the older Office files (doc, xls and ppt).  It shows a lot more detail about each file including author. Within those details is ‘Name of Creating Application’.

File command syntax

File syntax is simple, just File  followed by the file name or path and file name.

Use double-quote marks if path or name has spaces. Note: in Linux file names are case-sensitive.

file <path to file>/<file name>

For example:

file  MyDoc.
file /documents/MySheet.xlsx
file “/My Documents/My Presentation.pptx”

For more info on the file command.:

file -h 

How the Linux File command works

File does three separate tests to determine the file type:

  1. File System
    • Does a system call to ensure it’s a valid, not empty file.
  2. Magic
    • Looks at the start of file and compares it with ‘magic’ data files which have info for known file types.  Magic data is stored in various locations on the machine. Custom magic data can be added.
    • That means you may not see the same results as we’ve shown above.
  3. Language
    • For text files, checks if it’s UTF-8, UTF-16 or some other encoding. Then tests for various languages.

About this author

Office-Watch.com

Office Watch is the independent source of Microsoft Office news, tips and help since 1996. Don't miss our famous free newsletter.