There’s another really cool way to check what a file really is, even if the file extension is missing or wrong. It works for many file types including Word, Excel and PowerPoint documents, both old and new formats as well as password protected docs.
Thanks to Office Watcher, Bill G. for reminding us on this useful Linux feature.
You’ll need a Linux computer, a separate or virtual machine. Most likely a Windows Subsystem Linux (WSL) machine that’s available in Windows 10 or 11. We’ve used Ubuntu in WSL2 on Windows 11 for these examples.
The File command in Linux
The File command in Linux checks the contents of a file and reports what type it is. Unlike Windows which relies on the file extension like .docx .mp3 etc.
We made a series of test files, mostly Office documents, without extensions to show how File works on mystery files.
Let’s start with MP3 and PDF files with no extension. As you can see File reports not just the file type but some details like MP3 quality and PDF version.
A text file result looks like this, showing the encoding type:
Checking Office documents with File
Here are the results for ‘modern’ Office document formats in Word (.docx), Excel (.xlsx) and PowerPoint (.pptx).
File looks inside and shows the contents are the latest formats used in Office 2007 and later.
Password Protected Office documents
File can’t help much with an encrypted Office document without knowing the password.
All it shows is “CDFV2 Encrypted”.
As we explain in Beating Bots, Spies & Cock-ups – Safely & securely send files and documents (chapter “Microsoft Office documents”) there’s more clear text information available in an encrypted Office document.
Inside ‘old’ Office documents
The File command also works on the older Office files (doc, xls and ppt). It shows a lot more detail about each file including author. Within those details is ‘Name of Creating Application’.
File command syntax
File syntax is simple, just File followed by the file name or path and file name.
Use double-quote marks if path or name has spaces. Note: in Linux file names are case-sensitive.
file <path to file>/<file name>
file “/My Documents/My Presentation.pptx”
For more info on the file command.:
How the Linux File command works
File does three separate tests to determine the file type:
- File System
- Does a system call to ensure it’s a valid, not empty file.
- Looks at the start of file and compares it with ‘magic’ data files which have info for known file types. Magic data is stored in various locations on the machine. Custom magic data can be added.
- That means you may not see the same results as we’ve shown above.
- For text files, checks if it’s UTF-8, UTF-16 or some other encoding. Then tests for various languages.