Fully indexing web code pages

How to index the code on ASP and ASP.net web pages with Windows Search

A problem with Windows Search is the default treatment of web pages with programming code like PHP, ASP and ASP.net.

The Windows Search indexing filter for these types only indexes the visible parts of the web page using a HTML filter, not the code nor any variables that might turn into displayed text on the final page. For PHP the default setting from Microsoft is to only index file properties not the content.

That makes it hard for even occasional programmers to find important parts of their code. For example if a setting has to be changed across many pages how can you find them all?

To see what index filter is used for a particular file type go to Control Panel | Indexing Options | Advanced | File Types

Windows - Indexing File Types.jpg image from Fully indexing web code pages at Office-Watch.com

Scroll to the file extension you’re interested in and look at the Filter Description (for asp and aspx the default is ‘HTML Filter’). Also note that the indexing service is reading both the file properties and the contents.

Some programming tools have search capabilities but Windows itself should be able to do the job.

The solution is to change the default indexing type for certain file extensions from the default to the plain text type. The Plain Text indexing filter will look at the entire file and index it all.

Usual Warning: tinkering with the registry is for advanced users. Don’t do it unless you’re confident and have a backup of the registry before you being editing.

To make the change go to Regedit, HKEY_CLASSES_ROOT and look down the long list to find the file extension you’re interested in.

Under that file extension there’s a ‘PersistentHandler’ key with a default value which is the internal code for the indexing filter to use.

The Indexing Service default for asp and asp.net is the HTML Filter which has the PersistentHandler code {eec97550-47a9-11cf-b952-00aa0051fe20}

Change that code to the Plain Text filter: {5e941d80-bf96-11cd-b579-08002b30bfeb} . You can check that’s the right code for yourself by checking the PersistentHandler value for the ‘txt’ file extension.

Regedit - changing asp.net to a plain text filter.jpg image from Fully indexing web code pages at Office-Watch.com

Check that the change is accepted by returning to the File Types list mentioned above to see that the Filter Description has changed to Plain Text.

After you’ve made the registry changes you’ll have to re-build your index which can take some time unless you want to speed up the process. To rebuild the index go to Control Panel | Indexing Options | Advanced | Rebuild.

This trick will work for any file that is in plain text but uses a different extension – that means most web code pages asp, aspx, ascx, php and many more. Fairly obviously you won’t get any results if you try it on file types that aren’t readable text (ie in Notepad) such as .doc .docx .jpg etc.

Want More?

Office Watch has the latest news and tips about Microsoft Office.  Delivered once a week.