Fuzzy search. Synonym matching. Escaping special characters in the user input. How does one provide a quality search component in Sitecore that allows for user text input? Your site’s users will come to expect your site’s search experience to be “Google-like”. The intent of this blog post is to consolidate various uses of the Azure Search API to help you, the developer, create a good search experience for your site’s users.
Why Use Azure Search API Over Sitecore ContentSearch API
While the Sitecore ContentSearch API will cover most search-related use cases, I found the Azure Search API to provide the functionality that I was looking for when using a user inputted string to construct my query. In my case, I needed to support fuzzy searching. The Azure Search API provided an easy way to do this.
Preparing The User Input
Before firing off your Azure Search query, you will want to prepare the user input to make use of the Azure Search functionalities you want to use. The code below escapes special characters and formats the user input to allow for synonym matching and fuzzy searching. We will dissect different pieces of this code throughout this blog.
private string PrepareQueryStringForSynonymAndFuzzyMatching(string queryString) { var preparedQueryString = "*"; if (!string.IsNullOrWhiteSpace(queryString)) { var wordsInQueryString = SanitizeSpecialCharacters(queryString) .Split(' ').Where(w => !string.IsNullOrWhiteSpace(w)); var wordsInQueryStringWithFuzzyToken = wordsInQueryString .Select(w => string.Format("{0}~1", w)).ToArray(); var fuzzySearchVersionOfWords = string.Join(" ", wordsInQueryStringWithFuzzyToken); var synonymSearchVersionOfWords = string.Join(" ", wordsInQueryString); preparedQueryString = $"{synonymSearchVersionOfWords} | {fuzzySearchVersionOfWords}"; } return preparedQueryString; } private string SanitizeSpecialCharacters(string stringToSanitize) { var sanitizedString = stringToSanitize; sanitizedString = sanitizedString.Replace("\\", "\\\\"); sanitizedString = sanitizedString.Replace("/", "\\/"); sanitizedString = sanitizedString.Replace("|", ""); sanitizedString = sanitizedString.Replace("+", "\\+"); sanitizedString = sanitizedString.Replace("-", "\\-"); sanitizedString = sanitizedString.Replace("&", ""); sanitizedString = sanitizedString.Replace("!", "\\!"); sanitizedString = sanitizedString.Replace("(", "\\("); sanitizedString = sanitizedString.Replace(")", "\\)"); sanitizedString = sanitizedString.Replace("{", "\\{"); sanitizedString = sanitizedString.Replace("}", "\\}"); sanitizedString = sanitizedString.Replace("[", "\\["); sanitizedString = sanitizedString.Replace("]", "\\]"); sanitizedString = sanitizedString.Replace("^", "\\^"); sanitizedString = sanitizedString.Replace("\"", "\\\""); sanitizedString = sanitizedString.Replace("~", "\\~"); sanitizedString = sanitizedString.Replace("*", "\\*"); sanitizedString = sanitizedString.Replace("?", "\\?"); sanitizedString = sanitizedString.Replace(":", "\\:"); return sanitizedString; }
Sanitizing User Input
Before sending off your query to Azure Search, you will need to escape special characters in the user input with a backslash (“\”). The special characters are as follows:
- +
- –
- &&
- ||
- !
- (
- )
- {
- }
- [
- ]
- ^
- “
- ~
- *
- ?
- :
- \
- /
When sanitizing the input, the order in which you remove the special characters matters. You will need to escape the forward slash and the backslash first. If you escape other characters first, you will have the issue where the forward slash and back slash characters will get “double-escaped”. The method SanitizeSpecialCharacters()
in the code snippet above shows how you can perform this sanitation. If the method receives an empty string to sanitize, it will return a wildcard (“*”) to indicate a wildcard search.
Fuzzy Searching
To prepare the user input for fuzzy search, you will need to append a “~” after each word. For example, if the user inputted “laser pointers”, you will need to prepare it like so: “laser~ pointers~”. If the user accidentally misspelled “laser” like “lasr”, the tilde will inform Azure Search to expand its search to include “laser”. The code in the code snippet above that is associated with using fuzzy search functionality is:
var wordsInQueryStringWithFuzzyToken = wordsInQueryString.Select(w => string.Format("{0}~1", w)) .ToArray(); var fuzzySearchVersionOfWords = string.Join(" ", wordsInQueryStringWithFuzzyToken);
Take note that I also included an optional parameter after the tilde. You can include a value of 0, 1, or 2 after the tilde to specify the edit distance for correcting a user’s misspelling. If you do not specify a number, 2 is used by default.
Synonym Matching
To prepare the user input for synonym matching, you simply have to provide the user input unaltered in the Azure Search query. In order for a synonym to match on a user inputted word, the user inputted word must match character for character with the synonym found in the synonym map. Leaving the user input unaltered presents a problem if you’re wanting to perform a fuzzy search with it. Thus, you will have to combine queries using the OR syntax. Using the example from before, if you wanted to perform a synonym match and a fuzzy search on “laser pointers”, you would need to prepare the query to look like this: “laser pointers | laser~ pointers~”. The code in the code snippet above associated with setting up the query for synonym matching and fuzzy searching is this:
var fuzzySearchVersionOfWords = string.Join(" ", wordsInQueryStringWithFuzzyToken); var synonymSearchVersionOfWords = string.Join(" ", wordsInQueryString); preparedQueryString = $"{synonymSearchVersionOfWords} | {fuzzySearchVersionOfWords}";
If you want more information on how to set up a content-authorable synonym map in your Azure Search Service to pair with this prepared query string above, I recommend you read through my other blog post here.
Putting It All Together
Where does all this query string preparation come into play in the grand scheme of things? It comes into play in the .Search()
API call that goes out to your Azure Search Service. Here is a sample code snippet of where this prepared query string would be used:
public IEnumerable<CustomSearchResult> GetAzureSearchResults(string queryString) { List<CustomSearchResult> customSearchResults = new List<CustomSearchResult>(); var searchIndexClient = _searchIndexClientProvider.GetSitecoreSearchIndexClient(); var parameters = new SearchParameters() { Select = new[] { $"customcomputedfield1_s", $"customcomputedfield2_s", }, SearchFields = new[] { $"customcomputedfield3_s", }, QueryType = QueryType.Full }; var searchResults = searchIndexClient.Documents.Search<CustomSearchResult>( PrepareQueryStringForSynonymAndFuzzyMatching(queryString), parameters); customSearchResults = searchResults.Results.Select(r => r.Document).ToList(); return customSearchResults; }
Note that the PrepareQueryStringForSynonymAndFuzzyMatching()
method call is implemented at the top of this blog post. To make use of the Lucene parser to allow for fuzzy searching, you will want to set your QueryType
search parameter to have a value of Full
.
Conclusion
I hope this blog post helps you set up a quality search experience for your site’s users. Whether they know it or not, fuzzy searching and synonym matching are some baseline features that users will be expecting your search experience to have. Stay tuned for another blog post in the near future on how to connect to a Sitecore-managed index found in your Azure Search Service!