Skip to main content

Cloud

Document Type to IFilter Association in SharePoint

The SharePoint crawler uses an extensible architecture for indexing files of many different types. Using an administrative GUI, an administrator specifies the file extensions that are to be indexed when crawling a content source. Separately, the administrator ensures appropriate handlers are installed for the specified file types, which enable the files to be opened and read. The handlers are referred to as IFilters, because they implement the IFilter interface.

When scanning a content source, the SharePoint crawler will visit each file it finds and check to see if it is a recognized file type. If it is, the appropriate IFilter is loaded and called based on the file extension. Each recognized file extension must have an IFilter and the DLL that implements an IFilter may do so for one or more file types.

I became curious about how the file extension was resolved to the DLL implementing the IFilter. This is not specified in the administrative interface along with the recognized file extensions: the IFilter is installed separately, so I assumed the association was somewhere in the registry. A little digging found the answer:

1. Find the file extension in the registry under HKCR. The default value of the PersistentHandler subkey is a GUID.
2. Locate that GUID under HKCRCLSID. There will be a subkey PersistentAddinsRegistered, and a subkey of that which is a GUID. The default value of that GUID key is also a GUID.
3. Locate that last GUID under HKCRCLSID. There will be an InprocServer32 subkey under that. The default value of that subkey gives you the IFilter DLL.

Here is a graphical depiction of this algorithm, in this case finding the IFilter for .DOC files:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

PointBridge Blogs

More from this Author

Follow Us
TwitterLinkedinFacebookYoutubeInstagram