Skip to main content

Cloud

How to handle noise words with FAST ESP?

Having used SharePoint Search for a while, I got into the habit of quickly updating the noise words list by just updating the noise words list on the server without giving it too much thought. While working on a FAST ESP project, dealing with noise words seemed to be a bit more involved and the results might not always be what you were expecting.
One positive aspect is that there seems to be more than one approach to handle noise words in FAST ESP. You could use anti-phrasing, query side synonyms or IMS flows. All those approaches will remove noise words at query time which presents an advantage as no re-feeding is needed.
The best approach for this project it seemed was to use the anti-phrasing dictionary. Typically, anti-phrasing dictionary are storing irrelevant strings that might cloud the query such as “I am looking to search information on [relevant query words].” In our case, we wanted to target specific words considered as noise words by the business that might not necessarily be part of the out-of-the-box ant phrasing dictionary.
Our solution had to facilitate multiple languages, for that reason, we made sure to create an anti-phrasing dictionary per language. Each language would have its search profile in which the search view would point to a appropriate dictionary.
In order to feed the dictionary, I used the dictman tool, import your list of noise words into an existing or new dictionary.

create fr_antiphrases fr

import AP indep_antiphrases c:noisewords.txt

Another tool I found useful was linguistics studio as I could go through a dictionary and verify if my values where inserted properly or I could just to see what values an existing dictionary had:





One last thing, you need to modify your query to enable spelling and force a rewrite of the query in order to have the noise words ignore at query time:

qtf_synonym:querysynonyms=true

spell=on

qtf_didyoumean:rewrite_post=value

Considerations:
After doing some investigation and looking at our results we decided not to remove noise words anymore and as it altered the results coming back to the users. If your search ranking is properly defined and set up, the right and relevant results should always show at the top of the search results page and irrelevant results will be dropped. As a best practice, it’s preferred to let the user decide if a result is relevant or not to the search that was executed. Starting to alter the query can have negative effects on phrase queries. Phrase query is a query mode that can be used in FAST ESP that will look for an exact match to the user query. If we remove at query time some words we consider noise words, the user’s query might never find a match if it happens that a noise word was used in the query though a real match could have come back in the results.

Tags

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

PointBridge Blogs

More from this Author

Follow Us